[PATCH v2 0/8] Support Intel APX EGPR

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v2 0/8] Support Intel APX EGPR
@ 2023-11-02 11:29 Cui, Lili
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
                   ` (8 more replies)
  0 siblings, 9 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant

This is V2 of all APX patch.
1. Merged patch part II 1/6 into patch 1/8.
2. Created a new patch for empty EVEX_MAP4_ sub-table.
3. The NF patch needs to be suspended, Where NF should be placed is under discussion. Since the patch part II 2/6 depends on the NF patch, it is also suspended.
4. There are no comments yet for APX linker patch.


Cui, Lili (4):
  Support APX GPR32 with rex2 prefix
  Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  Support APX GPR32 with extend evex prefix
  Add tests for APX GPR32 with extend evex prefix

Hu, Lin1 (2):
  Support APX NDD optimized encoding.
  Support APX JMPABS

Mo, Zewei (1):
  Support APX Push2/Pop2

konglin1 (1):
  Support APX NDD

 gas/config/tc-i386.c                          |  480 +++++-
 gas/doc/c-i386.texi                           |    3 +-
 gas/testsuite/gas/i386/apx-jmpabs-inval.l     |    3 +
 gas/testsuite/gas/i386/apx-jmpabs-inval.s     |    6 +
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |    5 +
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |    9 +
 gas/testsuite/gas/i386/i386.exp               |    2 +
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |    4 +-
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |    4 +-
 .../gas/i386/x86-64-apx-egpr-inval.l          |  203 +++
 .../gas/i386/x86-64-apx-egpr-inval.s          |  210 +++
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |   16 +
 .../gas/i386/x86-64-apx-egpr-promote-inval.s  |   17 +
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |   20 +
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |   21 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |   37 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |   38 +
 .../gas/i386/x86-64-apx-evex-promoted-intel.d |  326 +++++
 .../gas/i386/x86-64-apx-evex-promoted.d       |  326 +++++
 .../gas/i386/x86-64-apx-evex-promoted.s       |  322 ++++
 .../gas/i386/x86-64-apx-jmpabs-intel.d        |   14 +
 .../gas/i386/x86-64-apx-jmpabs-inval.d        |   55 +
 .../gas/i386/x86-64-apx-jmpabs-inval.s        |   17 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    |   14 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |   10 +
 .../gas/i386/x86-64-apx-ndd-optimize.d        |  124 ++
 .../gas/i386/x86-64-apx-ndd-optimize.s        |  117 ++
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       |  161 ++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       |  154 ++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     |   42 +
 .../gas/i386/x86-64-apx-push2pop2-inval.l     |   11 +
 .../gas/i386/x86-64-apx-push2pop2-inval.s     |   15 +
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d |   42 +
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s |   39 +
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |   83 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |   86 ++
 gas/testsuite/gas/i386/x86-64-evex.d          |    2 +-
 gas/testsuite/gas/i386/x86-64-inval-movbe.l   |   31 +-
 gas/testsuite/gas/i386/x86-64-inval-movbe.s   |    1 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |    6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |    4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |    4 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |    4 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |   42 +
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |   49 +
 gas/testsuite/gas/i386/x86-64-pseudos.d       |   63 +
 gas/testsuite/gas/i386/x86-64-pseudos.s       |   65 +
 gas/testsuite/gas/i386/x86-64.exp             |   15 +
 include/opcode/i386.h                         |    2 +
 opcodes/i386-dis-evex-len.h                   |   10 +
 opcodes/i386-dis-evex-mod.h                   |   52 +
 opcodes/i386-dis-evex-prefix.h                |   73 +
 opcodes/i386-dis-evex-reg.h                   |   77 +
 opcodes/i386-dis-evex-w.h                     |   10 +
 opcodes/i386-dis-evex-x86-64.h                |  140 ++
 opcodes/i386-dis-evex.h                       |  347 ++++-
 opcodes/i386-dis.c                            |  574 ++++++--
 opcodes/i386-gen.c                            |   52 +-
 opcodes/i386-opc.h                            |   27 +-
 opcodes/i386-opc.tbl                          | 1291 ++++++++++-------
 opcodes/i386-reg.tbl                          |   64 +
 61 files changed, 5217 insertions(+), 824 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/apx-jmpabs-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

-- 
2.25.1

Thanks,
Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-02 17:05   ` Jan Beulich
                     ` (3 more replies)
  2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant

APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

gas/ChangeLog:

2023-09-27  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add register_type_of_address_mismatch
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 rex-byte and rex2_encoding for
	gpr32 r16-r31.
	(is_cpu): Add apx_f.
	(register_number): Handle RegRex2 for gpr32.
	(is_any_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s:  Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(USED_REX2): Extend for gpr32.
	(REX2_M): Ditto.
	(PREFIX_REX2): Ditto.
	(ILLEGAL_PREFIX_REX2): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (if_entry_needs_special_handle): News.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 176 ++++++++++++++--
 gas/doc/c-i386.texi                           |   3 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |   4 +-
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |   4 +-
 .../gas/i386/x86-64-apx-egpr-inval.l          |  23 +++
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |   4 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |   4 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  42 ++++
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  49 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  22 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 194 +++++++++++++-----
 opcodes/i386-gen.c                            |  48 ++++-
 opcodes/i386-opc.h                            |  12 +-
 opcodes/i386-opc.tbl                          |   2 +-
 opcodes/i386-reg.tbl                          |  64 ++++++
 23 files changed, 783 insertions(+), 90 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 714354d5116..3d917c34d15 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -234,6 +234,7 @@ enum i386_error
     operand_size_mismatch,
     operand_type_mismatch,
     register_type_mismatch,
+    register_type_of_address_mismatch,
     number_of_operands_mismatch,
     invalid_instruction_suffix,
     bad_imm4,
@@ -247,6 +248,7 @@ enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -354,6 +356,7 @@ struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -406,6 +409,11 @@ struct _i386_insn
     /* Compressed disp8*N attribute.  */
     unsigned int memshift;
 
+    /* No CSPAZO flags update.*/
+    bool has_nf;
+
+    bool has_zero_upper;
+
     /* Prefer load or store in encoding.  */
     enum
       {
@@ -427,6 +435,9 @@ struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1164,6 +1175,7 @@ static const arch_entry cpu_arch[] =
   VECARCH (sm4, SM4, ANY_SM4, reset),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1693,6 +1705,7 @@ is_cpu (const insn_template *t, enum i386_cpu cpu)
     case CpuHLE:      return t->cpu.bitfield.cpuhle;
     case CpuAVX512F:  return t->cpu.bitfield.cpuavx512f;
     case CpuAVX512VL: return t->cpu.bitfield.cpuavx512vl;
+    case CpuAPX_F:    return t->cpu.bitfield.cpuapx_f;
     case Cpu64:       return t->cpu.bitfield.cpu64;
     case CpuNo64:     return t->cpu.bitfield.cpuno64;
     default:
@@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
+  if (r->reg_flags & RegRex2)
+    nr += 16;
+
   if (r->reg_flags & RegVRex)
     nr += 16;
 
@@ -3890,6 +3906,8 @@ build_vex_prefix (const insn_template *t)
 static INLINE bool
 is_evex_encoding (const insn_template *t)
 {
+   /* When modifying this function, you also need to modify the evex judgment
+      part of process_i386_opcode_modifier in i386-gen.c.  */
   return t->opcode_modifier.evex || t->opcode_modifier.disp8memshift
 	 || t->opcode_modifier.broadcast || t->opcode_modifier.masking
 	 || t->opcode_modifier.sae;
@@ -3901,6 +3919,12 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || is_evex_encoding (t);
 }
 
+static INLINE bool
+is_any_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4158,6 +4182,19 @@ build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+*/
+static void
+build_rex2_prefix (void)
+{
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4423,12 +4460,16 @@ optimize_encoding (void)
 	  i.suffix = 0;
 	  /* Convert to byte registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* There are 32 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5278,6 +5319,9 @@ md_assemble (char *line)
 	case register_type_mismatch:
 	  err_msg = _("register type mismatch");
 	  break;
+	case register_type_of_address_mismatch:
+	  err_msg = _("register type of address mismatch");
+	  break;
 	case number_of_operands_mismatch:
 	  err_msg = _("number of operands mismatch");
 	  break;
@@ -5340,6 +5384,9 @@ md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5578,7 +5625,7 @@ md_assemble (char *line)
       as_warn (_("translating to `%sp'"), insn_name (&i.tm));
     }
 
-  if (is_any_vex_encoding (&i.tm))
+ if (is_any_vex_encoding (&i.tm))
     {
       if (!cpu_arch_flags.bitfield.cpui286)
 	{
@@ -5594,6 +5641,13 @@ md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2 || i.rex2_encoding)
+	{
+	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5633,11 +5687,11 @@ md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
-
-      i.rex |= REX_OPCODE;
+      if (!i.rex2)
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5647,9 +5701,11 @@ md_assemble (char *line)
 	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
-		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
-			register_prefix, i.op[x].regs->reg_name);
+		{
+		  as_bad (_("can't encode register '%s%s' in an "
+			    "instruction requiring REX/REX2 prefix."),
+			  register_prefix, i.op[x].regs->reg_name);
+		}
 
 	      /* Otherwise it is equivalent to the extended register.
 		 Since the encoding doesn't change this is merely
@@ -5660,11 +5716,11 @@ md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5674,6 +5730,7 @@ md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5681,7 +5738,13 @@ md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (i.rex2 != 0 || i.rex2_encoding)
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before ();
@@ -5852,6 +5915,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (t->opcode_modifier.noegpr)
+    {
+      for (unsigned int op = 0; op < i.operands; op++)
+	{
+	  if (i.types[op].bitfield.class != Reg
+	      /* Special case for (%dx) while doing input/output op */
+	      || i.input_output_operand)
+	    continue;
+
+	  if (i.op[op].regs->reg_flags & RegRex2)
+	    {
+	      i.error = register_type_mismatch;
+	      return 1;
+	    }
+	}
+
+      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+	{
+	  i.error = register_type_of_address_mismatch;
+	  return 1;
+	}
+
+      /* Check pseudo prefix {rex2} are valid.  */
+      if (i.rex2_encoding)
+	{
+	  i.error = invalid_pseudo_prefix;
+	  return 1;
+	}
+    }
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
       /* Do not verify operands when there are none.  */
       if (!t->operands)
 	{
-	  if (VEX_check_encoding (t))
+	  if (VEX_check_encoding (t) || check_EgprOperands (t))
 	    {
 	      specific_error = progress (i.error);
 	      continue;
@@ -7461,6 +7566,13 @@ match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGRPS operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX enconding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.  */
@@ -8363,6 +8475,18 @@ static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8846,8 +8970,7 @@ build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8918,8 +9041,7 @@ build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8957,8 +9079,7 @@ build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10105,6 +10226,12 @@ output_insn (void)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_any_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index b04e1b00b4b..5d79a332f53 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
 @code{avx10.1/512},
 @code{avx10.1/256},
 @code{avx10.1/128},
+@code{apx},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..605548285f2 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -11,11 +11,11 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..c9d3f2fdbb6 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -11,11 +11,11 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..c69d01b099a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,23 @@
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: register type of address mismatch for `xsave'
+.*:8: Error: register type of address mismatch for `xsave64'
+.*:9: Error: register type of address mismatch for `xrstor'
+.*:10: Error: register type of address mismatch for `xrstor64'
+.*:11: Error: register type of address mismatch for `xsaves'
+.*:12: Error: register type of address mismatch for `xsaves64'
+.*:13: Error: register type of address mismatch for `xrstors'
+.*:14: Error: register type of address mismatch for `xrstors64'
+.*:15: Error: register type of address mismatch for `xsaveopt'
+.*:16: Error: register type of address mismatch for `xsaveopt64'
+.*:17: Error: register type of address mismatch for `xsavec'
+.*:18: Error: register type of address mismatch for `xsavec64'
+GAS LISTING .*
+#...
+[ 	]*1[ 	]+\# Check Illegal 64bit APX_F instructions
+[ 	]*2[ 	]+\.text
+[ 	]*3[ 	]+\.arch \.noapx_f
+[ 	]*4[ 	]+test    \$0x7, %r17d
+[ 	]*5[ 	]+\.arch \.apx_f
+[ 	]*6[ 	]+\?\?\?\? D510F7C1 		test    \$0x7, %r17d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..c4d2308a604
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@
+# Check Illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..543f0f573d4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..03126541f24 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -11,11 +11,11 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..0200f3dfd92 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -10,11 +10,11 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..63aeb739b7d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -4,3 +4,45 @@
 .*:5: Error: .*`vmovaps'.*
 .*:6: Error: .*`vmovaps'.*
 .*:7: Error: .*`rorx'.*
+.*:8: Error: .*`xsave'.*
+.*:9: Error: .*`xsaves'.*
+.*:10: Error: .*`xsaves64'.*
+.*:11: Error: .*`xsavec'.*
+.*:12: Error: .*`xrstors'.*
+.*:13: Error: .*`xrstors64'.*
+.*:17: Error: .*`mov'.*
+.*:18: Error: .*`movabs'.*
+.*:19: Error: .*`cmps'.*
+.*:20: Error: .*`lods'.*
+.*:21: Error: .*`lods'.*
+.*:22: Error: .*`lods'.*
+.*:23: Error: .*`movs'.*
+.*:24: Error: .*`movs'.*
+.*:25: Error: .*`scas'.*
+.*:26: Error: .*`scas'.*
+.*:27: Error: .*`scas'.*
+.*:28: Error: .*`stos'.*
+.*:29: Error: .*`stos'.*
+.*:30: Error: .*`stos'.*
+.*:33: Error: .*`jo'.*
+.*:34: Error: .*`jno'.*
+.*:35: Error: .*`jb'.*
+.*:36: Error: .*`jae'.*
+.*:37: Error: .*`je'.*
+.*:38: Error: .*`jne'.*
+.*:39: Error: .*`jbe'.*
+.*:40: Error: .*`ja'.*
+.*:41: Error: .*`js'.*
+.*:42: Error: .*`jns'.*
+.*:43: Error: .*`jp'.*
+.*:44: Error: .*`jnp'.*
+.*:45: Error: .*`jl'.*
+.*:46: Error: .*`jge'.*
+.*:47: Error: .*`jle'.*
+.*:48: Error: .*`jg'.*
+.*:51: Error: .*`wrmsr'.*
+.*:52: Error: .*`rdtsc'.*
+.*:53: Error: .*`rdmsr'.*
+.*:54: Error: .*`sysenter'.*
+.*:55: Error: .*`sysexit'.*
+.*:56: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..91630eb589b 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,52 @@ pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xa* prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+
+	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..708c22b5899 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..29a0c3368fc 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,19 @@ _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2} pmullw %mm0,%mm6
+
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +435,15 @@ _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 52711cdcf6f..a698a467c53 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -360,6 +360,8 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval" "-al"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 87ecf0f5e23..22c450ac414 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,11 @@ struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* REX2 prefix for the current instruction use gpr32(r16-r31). */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +174,7 @@ struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -269,9 +275,17 @@ struct dis_private {
       ins->rex_used |= REX_OPCODE;			\
   }
 
+#define USED_REX2(value)				\
+  {							\
+    if ((ins->rex2 & value))				\
+      ins->rex2_used |= value;				\
+  }
+
 
 #define EVEX_b_used 1
 #define EVEX_len_used 2
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
 
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
@@ -286,6 +300,7 @@ struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -367,6 +382,7 @@ fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1872,23 +1888,23 @@ static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1926,23 +1942,23 @@ static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2091,12 +2107,12 @@ static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2390,22 +2406,30 @@ static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31"
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d"
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w"
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b"
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2794,9 +2818,9 @@ static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3364,7 +3388,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3382,7 +3406,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -8125,7 +8149,7 @@ static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8289,6 +8313,7 @@ ckprefix (instr_info *ins)
 {
   int i, length;
   uint8_t newrex;
+  unsigned char rex2_payload;
 
   i = 0;
   length = 0;
@@ -8323,6 +8348,24 @@ ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case 0xd5:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      rex2_payload = *ins->codep;
+	      ins->rex2 = rex2_payload >> 4;
+	      ins->rex = (rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8490,6 +8533,8 @@ prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9128,6 +9173,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* M0 in rex2 prefix represents map0 or map1.  */
+  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9454,6 +9504,14 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
   switch (dp->prefix_requirement)
     {
     case PREFIX_DATA:
@@ -9468,6 +9526,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       ins.used_prefixes |= PREFIX_DATA;
       /* Fall through.  */
     case PREFIX_OPCODE:
+    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:
       /* If the mandatory PREFIX_REPZ/PREFIX_REPNZ/PREFIX_DATA prefix is
 	 unused, opcode is invalid.  Since the PREFIX_DATA prefix may be
 	 used by putop and MMX/SSE operand and may be overridden by the
@@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+	   && (ins.rex2 & 0x7))
+	  || dp == &bad_opcode))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9541,7 +9607,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     ins->illegal_masking = true;
 
   USED_REX (rexmask);
+  USED_REX2 (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11310,6 +11382,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11414,6 +11488,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
     shift = 0;
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->intel_syntax)
     intel_operand_size (ins, bytemode, sizeflag);
   append_seg (ins);
@@ -11444,8 +11519,11 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	{
 	  vindex = ins->sib.index;
 	  USED_REX (REX_X);
+	  USED_REX2 (REX_X);
 	  if (ins->rex & REX_X)
 	    vindex += 8;
+	  if (ins->rex2 & REX_X)
+	    vindex += 16;
 	  switch (bytemode)
 	    {
 	    case vex_vsib_d_w_dq_mode:
@@ -11866,7 +11944,7 @@ static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11877,10 +11955,11 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     }
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12590,8 +12669,11 @@ OP_EX (instr_info *ins, int bytemode, int sizeflag)
 
   reg = ins->modrm.rm;
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index cfc5a7a6172..589f9682699 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -259,6 +259,8 @@ static const dependency isa_dependencies[] =
     "SSE2" },
   { "WIDEKL",
     "KL" },
+  { "APX_F",
+    "XSAVE" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -380,6 +382,7 @@ static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -469,6 +472,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (ATTSyntax),
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
+			       const char *cpu_flags)
+{
+  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
+  if (strcmp (cpu_flags, "XSAVES") >= 0
+      || strcmp (cpu_flags, "XSAVEC") >= 0
+      || strcmp (cpu_flags, "Xsave") >= 0
+      || strcmp (cpu_flags, "Xsaveopt") >= 0
+      || !strcmp (cpu_flags, "3dnow")
+      || !strcmp (cpu_flags, "3dnowA"))
+    return true;
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if ((space == 0 && (opcode >> 4 == 0x4
+		      || opcode >> 4 == 0x7
+		      || opcode >> 4 == 0xA))
+      || (space == SPACE_0F && (opcode >> 4 == 0x3
+				|| opcode >> 4 == 0x8)))
+    return true;
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool has_special_handle)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	fprintf (stderr,
 		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
 		 filename, lineno);
+
+      /* The part about judging EVEX encoding should be synchronized with
+	 is_evex_encoding.  */
+      if (modifiers[Vex].value
+	  || ((space > SPACE_0F || has_special_handle)
+	      && !modifiers[EVex].value
+	      && !modifiers[Disp8MemShift].value
+	      && !modifiers[Broadcast].value
+	      && !modifiers[Masking].value
+	      && !modifiers[SAE].value))
+	modifiers[NoEgpr].value = 1;
+
     }
 
   if (space >= ARRAY_SIZE (spaces) || !spaces[space])
@@ -1350,8 +1391,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
 	   ident, 2 * (int)length, opcode, end, i);
   free (ident);
 
+  /* Add some specilal handle for current entry.  */
+  bool  has_special_handle = if_entry_needs_special_handle (opcode, space, cpu_flags);
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				has_special_handle);
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 149ae0e950c..c8082971f81 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -317,6 +317,8 @@ enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -352,6 +354,7 @@ enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -742,6 +745,10 @@ enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -789,6 +796,7 @@ typedef struct i386_opcode_modifier
   unsigned int attsyntax:1;
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1001,7 +1009,8 @@ typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1028,6 +1037,7 @@ typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index e60184ba154..17be21fdf0e 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@ r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@ r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@ r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@ r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant

opcode/ChangeLog:

	* i386-dis-evex.hi: Added an empty EVEX_MAP4_ sub-table for
	legacy insn promote to EVEX insn.
	* opcodes/i386-dis-evex.h: Add EVEX_MAP4.
---
 opcodes/i386-dis-evex.h | 291 ++++++++++++++++++++++++++++++++++++++++
 opcodes/i386-dis.c      |   1 +
 2 files changed, 292 insertions(+)

diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index e6295119d2b..7ad1edbe72d 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -872,6 +872,297 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
   },
+  /* EVEX_MAP4_ */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 50 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 58 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 98 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
   /* EVEX_MAP5_ */
   {
     /* 00 */
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 22c450ac414..0754b4c22dd 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1283,6 +1283,7 @@ enum
   EVEX_0F = 0,
   EVEX_0F38,
   EVEX_0F3A,
+  EVEX_MAP4,
   EVEX_MAP5,
   EVEX_MAP6,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 3/8] Support APX GPR32 with extend evex prefix
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
  2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant

This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the “MOVBE
  reg1, reg2”.

  gas/ChangeLog:

	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
	function for APX cpu flag checking.
	(cpu_flags_match): handle cpu_flags_not_or_check.
	(install_template): Add AMX_TILE and APX combine.
	(is_any_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(check_EgprOperands): Add nodgpr check for apx.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.l: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-mod.h: Handle MOD_EVEX_MAP4_65,
	MOD_EVEX_MAP4_66_PREFIX_0, MOD_EVEX_MAP4_8A_W_0,
	MOD_EVEX_MAP4_DA_PREFIX_1, MOD_EVEX_MAP4_DB_PREFIX_1,
	MOD_EVEX_MAP4_DC_PREFIX_1, MOD_EVEX_MAP4_DD_PREFIX_1,
	MOD_EVEX_MAP4_DE_PREFIX_1, MOD_EVEX_MAP4_DF_PREFIX_1,
	MOD_EVEX_MAP4_F8_PREFIX_1, MOD_EVEX_MAP4_F8_PREFIX_2,
	MOD_EVEX_MAP4_F8_PREFIX_3, MOD_EVEX_MAP4_F9,
	MOD_EVEX_MAP4_8B.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_60,
	PREFIX_EVEX_MAP4_61, PREFIX_EVEX_MAP4_66,
	PREFIX_EVEX_MAP4_8B_M_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8,
	PREFIX_EVEX_MAP4_FC.
	* i386-dis-evex-reg.h: Handle REG_EVEX_MAP4_D8_PREFIX_1,
	REG_EVEX_0F38F3_L_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* i386-dis.c (REG enum): Add REG_EVEX_MAP4_D8_PREFIX_1.
	(MOD enum): Add MOD_EVEX_MAP4_65, MOD_EVEX_MAP4_66_PREFIX_0,
	MOD_EVEX_MAP4_8A_W_0, MOD_EVEX_MAP4_8B,
	MOD_EVEX_MAP4_DA_PREFIX_1, MOD_EVEX_MAP4_DB_PREFIX_1,
	MOD_EVEX_MAP4_DC_PREFIX_1, MOD_EVEX_MAP4_DD_PREFIX_1,
	MOD_EVEX_MAP4_DE_PREFIX_1, MOD_EVEX_MAP4_DF_PREFIX_1,
	MOD_EVEX_MAP4_F8_PREFIX_1, MOD_EVEX_MAP4_F8_PREFIX_2,
	MOD_EVEX_MAP4_F8_PREFIX_3, MOD_EVEX_MAP4_F9,
	REG_EVEX_0F38F3_L_0.
	(PREFIX enum): Add PREFIX_EVEX_MAP4_60, PREFIX_EVEX_MAP4_61,
	PREFIX_EVEX_MAP4_66, PREFIX_EVEX_MAP4_8B_M_0,
	PREFIX_EVEX_MAP4_D8, PREFIX_EVEX_MAP4_DA,
	PREFIX_EVEX_MAP4_DB, PREFIX_EVEX_MAP4_DC,
	PREFIX_EVEX_MAP4_DD, PREFIX_EVEX_MAP4_DE,
	PREFIX_EVEX_MAP4_DF, PREFIX_EVEX_MAP4_F0,
	PREFIX_EVEX_MAP4_F1, PREFIX_EVEX_MAP4_F2,
	PREFIX_EVEX_MAP4_F8, PREFIX_EVEX_MAP4_FC.
	(EVEX_LEN_enum): Add EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	(EVEX_X86_enum): Add X86_64_EVEX_0F90, X86_64_EVEX_0F91,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F3849,
	X86_64_EVEX_0F384B, X86_64_EVEX_0F38E0, X86_64_EVEX_0F38E1,
	X86_64_EVEX_0F38E2, X86_64_EVEX_0F38E3, X86_64_EVEX_0F38E4,
	X86_64_EVEX_0F38E5, X86_64_EVEX_0F38E6, X86_64_EVEX_0F38E7,
	X86_64_EVEX_0F38E8, X86_64_EVEX_0F38E9, X86_64_EVEX_0F38EA,
	X86_64_EVEX_0F38EB, X86_64_EVEX_0F38EC, X86_64_EVEX_0F38ED,
	X86_64_EVEX_0F38EE, X86_64_EVEX_0F38EF, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0.
	(struct instr_info): Deleted bool r.
	(putop): Ditto.
	(PREFIX_DATA_AND_NP_ONLY): New define.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier):
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                        | 127 +++++++++++++--
 gas/testsuite/gas/i386/x86-64-evex.d        |   2 +-
 gas/testsuite/gas/i386/x86-64-inval-movbe.l |  31 ++--
 gas/testsuite/gas/i386/x86-64-inval-movbe.s |   1 +
 opcodes/i386-dis-evex-len.h                 |  10 ++
 opcodes/i386-dis-evex-mod.h                 |  42 +++++
 opcodes/i386-dis-evex-prefix.h              |  69 +++++++++
 opcodes/i386-dis-evex-reg.h                 |  14 ++
 opcodes/i386-dis-evex-x86-64.h              | 140 +++++++++++++++++
 opcodes/i386-dis-evex.h                     |  94 +++++------
 opcodes/i386-dis.c                          | 163 ++++++++++++++++++--
 opcodes/i386-gen.c                          |   2 +
 opcodes/i386-opc.h                          |   2 +
 opcodes/i386-opc.tbl                        |  69 ++++++++-
 14 files changed, 664 insertions(+), 102 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 3d917c34d15..398909a6a30 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1796,6 +1796,36 @@ cpu_flags_equal (const union i386_cpu_flags *x,
     }
 }
 
+static INLINE int
+cpu_flags_not_or_check (const union i386_cpu_flags *x,
+			const union i386_cpu_flags *y)
+{
+  switch (ARRAY_SIZE(x->array))
+    {
+    case 5:
+      if ((~x->array[4] | y->array[4]) != 0xffffffff)
+	return 0;
+      /* Fall through.  */
+    case 4:
+      if ((~x->array[3] | y->array[3]) != 0xffffffff)
+	return 0;
+      /* Fall through.  */
+    case 3:
+      if ((~x->array[2] | y->array[2]) != 0xffffffff)
+	return 0;
+      /* Fall through.  */
+    case 2:
+      if ((~x->array[1] | y->array[1]) != 0xffffffff)
+	return 0;
+      /* Fall through.  */
+    case 1:
+      return ((~x->array[1] | y->array[1]) == 0Xffffffff);
+      break;
+    default:
+      abort ();
+    }
+}
+
 static INLINE int
 cpu_flags_check_cpu64 (const insn_template *t)
 {
@@ -1989,6 +2019,12 @@ cpu_flags_match (const insn_template *t)
 		  && (!x.bitfield.cpugfni || cpu.bitfield.cpugfni))
 		match |= CPU_FLAGS_ARCH_MATCH;
 	    }
+	  else if (x.bitfield.cpuapx_f)
+	    {
+	      /* All cpu in x need to be enabled in cpu_arch_flags.  */
+	      if (cpu_flags_not_or_check (&x, &cpu_arch_flags))
+		match |= CPU_FLAGS_ARCH_MATCH;
+	    }
 	  else
 	    match |= CPU_FLAGS_ARCH_MATCH;
 	}
@@ -3712,16 +3748,16 @@ install_template (const insn_template *t)
 
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
-  {
-      if ((is_cpu (t, CpuAVX) || is_cpu (t, CpuAVX2))
-	  && is_cpu (t, CpuAVX512F))
+    {
+      if ((is_cpu (t, CpuAVX) || is_cpu (t, CpuAVX2) || is_cpu (t, CpuAMX_TILE))
+	  && (is_cpu (t, CpuAVX512F) || is_cpu (t, CpuAPX_F)))
 	{
 	  if (need_evex_encoding ())
 	    {
 	      i.tm.opcode_modifier.vex = 0;
 	      i.tm.cpu.bitfield.cpuavx = 0;
 	      if (is_cpu (&i.tm, CpuAVX2))
-	        i.tm.cpu.bitfield.isa = 0;
+		i.tm.cpu.bitfield.isa = 0;
 	    }
 	  else
 	    {
@@ -3729,7 +3765,7 @@ install_template (const insn_template *t)
 	      i.tm.cpu.bitfield.cpuavx512f = 0;
 	    }
 	}
-  }
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3919,6 +3955,14 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || is_evex_encoding (t);
 }
 
+static INLINE bool
+is_any_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_any_apx_rex2_encoding (void)
 {
@@ -4195,6 +4239,27 @@ build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= 0xef;
+  if (i.vex.register_specifier
+      && register_number (i.vex.register_specifier) > 0xf)
+    i.vex.bytes[3] &= 0xf7;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= 0xfb;
+}
+
 static void
 process_immext (void)
 {
@@ -5642,19 +5707,42 @@ md_assemble (char *line)
 	}
 
       /* Check for explicit REX2 prefix.  */
-      if (i.rex2 || i.rex2_encoding)
+      if (i.rex2_encoding)
 	{
 	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_any_apx_evex_encoding ())
+	{
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4 && (i.prefix[DATA_PREFIX] != 0))
+	    {
+	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	      i.prefix[DATA_PREFIX] = 0;
+	    }
+
+	  build_apx_evex_prefix ();
+
+	  /* Encode the NDD bit of the instruction promoted from the legacy
+	     space.  */
+	  if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.vex.bytes[3] |= 0x10;
+
+	  /* Encode the NF bit of the instruction promoted from legacy and vex
+	     space.  */
+	  if (i.has_nf)
+	    i.vex.bytes[3] |= 0x04;
+	}
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -5681,16 +5769,17 @@ md_assemble (char *line)
      instruction already has a prefix, we need to convert old
      registers to new ones.  */
 
-  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
-       && (i.op[0].regs->reg_flags & RegRex64) != 0)
-      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
-	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
-      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
-	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && (i.rex != 0 || i.rex2 != 0)))
+  if (!is_any_vex_encoding (&i.tm)
+      && ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
+	   && (i.op[0].regs->reg_flags & RegRex64) != 0)
+	  || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
+	      && (i.op[1].regs->reg_flags & RegRex64) != 0)
+	  || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
+	       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
+	      && (i.rex != 0 || i.rex2 != 0))))
     {
       int x;
-      if (!i.rex2)
+      if (!is_any_apx_rex2_encoding ())
 	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
@@ -7061,7 +7150,7 @@ VEX_check_encoding (const insn_template *t)
 static int
 check_EgprOperands (const insn_template *t)
 {
-  if (t->opcode_modifier.noegpr)
+  if (t->opcode_modifier.noegpr && !need_evex_encoding())
     {
       for (unsigned int op = 0; op < i.operands; op++)
 	{
@@ -8049,7 +8138,8 @@ process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -14260,6 +14350,9 @@ static bool check_register (const reg_entry *r)
 
   if (r->reg_flags & RegRex2)
     {
+      if (is_evex_encoding (current_templates->start))
+	i.vec_encoding = vex_encoding_evex;
+
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-inval-movbe.l b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
index 1c8ceb55c11..44ddfe4f034 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-movbe.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
@@ -1,29 +1,30 @@
 .*: Assembler messages:
-.*:4: Error: .*
 .*:5: Error: .*
 .*:6: Error: .*
 .*:7: Error: .*
 .*:8: Error: .*
-.*:11: Error: .*
+.*:9: Error: .*
 .*:12: Error: .*
 .*:13: Error: .*
 .*:14: Error: .*
 .*:15: Error: .*
+.*:16: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\# Check illegal movbe in 64bit mode\.
 [ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+foo:
-[ 	]*4[ 	]+movbe	\(%rcx\),%bl
-[ 	]*5[ 	]+movbe	%ecx,%ebx
-[ 	]*6[ 	]+movbe	%bx,%rcx
-[ 	]*7[ 	]+movbe	%rbx,%rcx
-[ 	]*8[ 	]+movbe	%bl,\(%rcx\)
-[ 	]*9[ 	]+
-[ 	]*10[ 	]+\.intel_syntax noprefix
-[ 	]*11[ 	]+movbe bl, byte ptr \[rcx\]
-[ 	]*12[ 	]+movbe ebx, ecx
-[ 	]*13[ 	]+movbe rcx, bx
-[ 	]*14[ 	]+movbe rcx, rbx
-[ 	]*15[ 	]+movbe byte ptr \[rcx\], bl
+[ 	]*3[ 	]+\.arch \.noapx_f
+[ 	]*4[ 	]+foo:
+[ 	]*5[ 	]+movbe	\(%rcx\),%bl
+[ 	]*6[ 	]+movbe	%ecx,%ebx
+[ 	]*7[ 	]+movbe	%bx,%rcx
+[ 	]*8[ 	]+movbe	%rbx,%rcx
+[ 	]*9[ 	]+movbe	%bl,\(%rcx\)
+[ 	]*10[ 	]+
+[ 	]*11[ 	]+\.intel_syntax noprefix
+[ 	]*12[ 	]+movbe bl, byte ptr \[rcx\]
+[ 	]*13[ 	]+movbe ebx, ecx
+[ 	]*14[ 	]+movbe rcx, bx
+[ 	]*15[ 	]+movbe rcx, rbx
+[ 	]*16[ 	]+movbe byte ptr \[rcx\], bl
diff --git a/gas/testsuite/gas/i386/x86-64-inval-movbe.s b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
index 38f09b14d64..380a9191b6a 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-movbe.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
@@ -1,5 +1,6 @@
 # Check illegal movbe in 64bit mode.
 	.text
+	.arch .noapx_f
 foo:
 	movbe	(%rcx),%bl
 	movbe	%ecx,%ebx
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..1933a045822 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { REG_TABLE(REG_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-mod.h b/opcodes/i386-dis-evex-mod.h
index f9f912c5094..a60c19add3c 100644
--- a/opcodes/i386-dis-evex-mod.h
+++ b/opcodes/i386-dis-evex-mod.h
@@ -1 +1,43 @@
 /* Nothing at present.  */
+  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
+  {
+    { Bad_Opcode },
+    { "encodekey128", { Gd, Ed }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
+  {
+    { Bad_Opcode },
+    { "encodekey256", { Gd, Ed }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
+  {
+    { "aesenc128kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
+  {
+    { "aesdec128kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
+  {
+    { "aesenc256kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
+  {
+    { "aesdec256kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
+  {
+    { "enqcmds",	{ Gva, M },  0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
+  {
+    { "movdir64b",	{ Gva, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
+  {
+    { "enqcmd",		{ Gva, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F9 */
+  {
+    { "movdiri",	{ Edq, Gdq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..e8f32324ade 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -338,6 +338,75 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_66 */
+  {
+    { "wrssK",	{ M, Gdq }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_EVEX_MAP4_D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DA_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DB_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DC_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DD_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_DE_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_DF_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A",	{ Gdq, Eb }, 0 },
+    { "invept",	{ Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q",	{ Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q",	{ Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_1) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_2) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_3) },
+  },
+  /* PREFIX_EVEX_MAP4_FC */
+  {
+    { "aadd",	{ Mdq, Gdq }, 0 },
+    { "axor",	{ Mdq, Gdq }, 0 },
+    { "aand",	{ Mdq, Gdq }, 0 },
+    { "aor",	{ Mdq, Gdq }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..c3b4f083346 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,17 @@
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",		{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",		{ VexGdq, Edq }, 0 },
+  },
+  /* REG_EVEX_MAP4_D8_PREFIX_1 */
+  {
+    { "aesencwide128kl",	{ M }, 0 },
+    { "aesdecwide128kl",	{ M }, 0 },
+    { "aesencwide256kl",	{ M }, 0 },
+    { "aesdecwide256kl",	{ M }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..1121223d877
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,140 @@
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F90) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F91) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F92) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F93) },
+  },
+  /* X86_64_EVEX_0F3849 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
+  },
+  /* X86_64_EVEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
+  },
+  /* X86_64_EVEX_0F38E0 */
+  {
+    { Bad_Opcode },
+    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E1 */
+  {
+    { Bad_Opcode },
+    { "cmpnoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E2 */
+  {
+    { Bad_Opcode },
+    { "cmpbxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E3 */
+  {
+    { Bad_Opcode },
+    { "cmpnbxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E4 */
+  {
+    { Bad_Opcode },
+    { "cmpzxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E5 */
+  {
+    { Bad_Opcode },
+    { "cmpnzxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E6 */
+  {
+    { Bad_Opcode },
+    { "cmpbexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E7 */
+  {
+    { Bad_Opcode },
+    { "cmpnbexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E8 */
+  {
+    { Bad_Opcode },
+    { "cmpsxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E9 */
+  {
+    { Bad_Opcode },
+    { "cmpnsxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EA */
+  {
+    { Bad_Opcode },
+    { "cmppxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EB */
+  {
+    { Bad_Opcode },
+    { "cmpnpxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EC */
+  {
+    { Bad_Opcode },
+    { "cmplxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38ED */
+  {
+    { Bad_Opcode },
+    { "cmpnlxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EE */
+  {
+    { Bad_Opcode },
+    { "cmplexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EF */
+  {
+    { Bad_Opcode },
+    { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..65a2cbeaeb2 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@ static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@ static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_66) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1", { XM, EXxmm }, 0 },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F9) },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 0754b4c22dd..ef431087ba5 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@ enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -211,7 +218,6 @@ struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -219,6 +225,8 @@ struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -301,6 +309,7 @@ struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -794,6 +803,7 @@ enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -812,6 +822,8 @@ enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -871,7 +883,9 @@ enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0,
+  REG_EVEX_MAP4_D8_PREFIX_1
 };
 
 enum
@@ -911,6 +925,17 @@ enum
   MOD_0F38DC_PREFIX_1,
 
   MOD_VEX_0F3849_X86_64_L_0_W_0,
+
+  MOD_EVEX_MAP4_DA_PREFIX_1,
+  MOD_EVEX_MAP4_DB_PREFIX_1,
+  MOD_EVEX_MAP4_DC_PREFIX_1,
+  MOD_EVEX_MAP4_DD_PREFIX_1,
+  MOD_EVEX_MAP4_DE_PREFIX_1,
+  MOD_EVEX_MAP4_DF_PREFIX_1,
+  MOD_EVEX_MAP4_F8_PREFIX_1,
+  MOD_EVEX_MAP4_F8_PREFIX_2,
+  MOD_EVEX_MAP4_F8_PREFIX_3,
+  MOD_EVEX_MAP4_F9,
 };
 
 enum
@@ -1146,6 +1171,20 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_66,
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+  PREFIX_EVEX_MAP4_FC,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1256,6 +1295,35 @@ enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F3849,
+  X86_64_EVEX_0F384B,
+  X86_64_EVEX_0F38E0,
+  X86_64_EVEX_0F38E1,
+  X86_64_EVEX_0F38E2,
+  X86_64_EVEX_0F38E3,
+  X86_64_EVEX_0F38E4,
+  X86_64_EVEX_0F38E5,
+  X86_64_EVEX_0F38E6,
+  X86_64_EVEX_0F38E7,
+  X86_64_EVEX_0F38E8,
+  X86_64_EVEX_0F38E9,
+  X86_64_EVEX_0F38EA,
+  X86_64_EVEX_0F38EB,
+  X86_64_EVEX_0F38EC,
+  X86_64_EVEX_0F38ED,
+  X86_64_EVEX_0F38EE,
+  X86_64_EVEX_0F38EF,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
 };
 
 enum
@@ -1286,6 +1354,7 @@ enum
   EVEX_MAP4,
   EVEX_MAP5,
   EVEX_MAP6,
+  EVEX_MAP7,
 };
 
 enum
@@ -1438,6 +1507,8 @@ enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -4478,6 +4549,8 @@ static const struct dis386 x86_64_table[][2] = {
     { Bad_Opcode },
     { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
   },
+
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8668,6 +8741,9 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8905,9 +8981,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch ((*ins->codep & 0x7))
 	{
 	default:
 	  return &bad_opcode;
@@ -8920,12 +9000,19 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
 	case 0x6:
 	  vex_table_index = EVEX_MAP6;
 	  break;
+	case 0x7:
+	  vex_table_index = EVEX_MAP7;
+	  break;
 	}
 
       /* The second byte after 0x62.  */
@@ -8936,9 +9023,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -8968,12 +9054,31 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.L’L, EVEX.z and the
+	 lower 2 bits of EVEX.aaa must be 0.
+	 EVEX from evex instrucions require that EVEX.L’L and the lower 2 bits of
+	 EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy || ins->evex_type == evex_from_vex)
+	{
+	  if ((*ins->codep & 0x3) != 0
+	      || (*ins->codep >> 6 & 0x3) != 0
+	      || (ins->evex_type == evex_from_legacy
+		  && (*ins->codep >> 5 & 0x1) != 0)
+	      || (ins->evex_type == evex_from_vex
+		  && !ins->vex.b))
+	    return &bad_opcode;
+	}
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9386,6 +9491,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, prefix
+	 0x66 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9566,6 +9678,19 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    default:
+      break;
+
     }
 
   /* Check if the REX prefix is used.  */
@@ -10274,7 +10399,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11168,7 +11293,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11385,7 +11510,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11533,6 +11658,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require REX_X not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11732,7 +11864,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -11934,10 +12066,7 @@ OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12567,7 +12696,7 @@ OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13574,7 +13703,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 589f9682699..3ab2362a3cc 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -1050,6 +1050,7 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(XOP08),
@@ -1153,6 +1154,7 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	 is_evex_encoding.  */
       if (modifiers[Vex].value
 	  || ((space > SPACE_0F || has_special_handle)
+	      && !(space == SPACE_EVEXMAP4)
 	      && !modifiers[EVex].value
 	      && !modifiers[Disp8MemShift].value
 	      && !modifiers[Broadcast].value
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c8082971f81..d7d28bf3d93 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -972,6 +972,7 @@ typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      8: XOP 08 opcode space.
@@ -982,6 +983,7 @@ typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4  4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_XOP08	8
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 17be21fdf0e..bb42270483b 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -109,6 +109,7 @@
 #define SpaceXOP09 OpcodeSpace=SPACE_XOP09
 #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
 
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4
 #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
 #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
 
@@ -136,6 +137,8 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+#define APX_F APX_F|x64
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -189,6 +192,7 @@ mov, 0xf24, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe|APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -336,6 +340,7 @@ adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 aaa, 0x37, No64, NoSuf, {}
@@ -1314,13 +1319,16 @@ getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT|No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT|x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, APX_F|EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT|No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT|x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, APX_F|EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID|No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID|x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, APX_F|INVPCID, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1420,7 +1428,9 @@ pcmpestrm, 0x660f3a60, SSE4_2|x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { I
 pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F, W|Modrm|No_sSuf|No_qSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2|x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVex128|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1832,13 +1842,21 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 // BMI2 instructions.
 
 bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+rorx, 0xf2f0, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
 sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1909,10 +1927,15 @@ lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 // BMI instructions
 
 andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+andn, 0xf2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bextr, 0xf7, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2041,13 +2064,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2107,8 +2137,11 @@ kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, {
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
 kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>|APX_F, Modrm|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
 kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>|APX_F, Modrm|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
 kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>|APX_F, D|Modrm|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2584,8 +2617,11 @@ kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, AVX512BW|APX_F, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
 kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0x<dq:kpfx>91, AVX512BW|APX_F, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
 kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0xf292, AVX512BW|APX_F, D|Modrm|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2984,9 +3020,13 @@ rdsspq, 0xf30f1e/1, SHSTK|x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK|APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK|x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, APX_F|SHSTK, Modrm|NoSuf|Size64|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK|APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK|x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3034,7 +3074,9 @@ cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B|APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3063,7 +3105,9 @@ vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, ENQCMD|APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, ENQCMD|APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3124,8 +3168,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE|x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE|x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, AMX_TILE|APX_F, Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, AMX_TILE|APX_F, Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3137,9 +3181,9 @@ tdpbuud, 0x5e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|No
 tdpbusd, 0x665e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE|x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE|x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE|x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, AMX_TILE|APX_F, Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, AMX_TILE|APX_F, Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, AMX_TILE|APX_F, Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE|x64, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3151,15 +3195,25 @@ tilezero, 0xf249, AMX_TILE|x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3308,6 +3362,7 @@ prefetchit1, 0xf18/6, PREFETCHI|x64, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 // CMPCCXADD instructions.
 
 cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64|APX_F, Modrm|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3327,9 +3382,13 @@ wrmsrlist, 0xf30f01c6, MSRLIST|x64, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT|APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT|APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT|APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT|APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (2 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-08  9:11   ` Jan Beulich
  2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant

gas/ChangeLog:

	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: Add some insn don't
	support gpr32.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-inval-movbe.l: And .noapx_f for movbe
	reg to reg.
	* testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Add new test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.s: New test.
---
 .../gas/i386/x86-64-apx-egpr-inval.l          | 196 ++++++++++-
 .../gas/i386/x86-64-apx-egpr-inval.s          | 194 ++++++++++-
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  16 +
 .../gas/i386/x86-64-apx-egpr-promote-inval.s  |  17 +
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |  20 ++
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  31 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  29 ++
 .../gas/i386/x86-64-apx-evex-promoted-intel.d | 326 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.d       | 326 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.s       | 322 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   7 +-
 12 files changed, 1495 insertions(+), 10 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s

diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
index c69d01b099a..b03a5eb60f7 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -12,12 +12,192 @@
 .*:16: Error: register type of address mismatch for `xsaveopt64'
 .*:17: Error: register type of address mismatch for `xsavec'
 .*:18: Error: register type of address mismatch for `xsavec64'
-GAS LISTING .*
-#...
-[ 	]*1[ 	]+\# Check Illegal 64bit APX_F instructions
-[ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+\.arch \.noapx_f
-[ 	]*4[ 	]+test    \$0x7, %r17d
-[ 	]*5[ 	]+\.arch \.apx_f
-[ 	]*6[ 	]+\?\?\?\? D510F7C1 		test    \$0x7, %r17d
+.*:20: Error: register type of address mismatch for `phaddw'
+.*:21: Error: register type of address mismatch for `phaddd'
+.*:22: Error: register type of address mismatch for `phaddsw'
+.*:23: Error: register type of address mismatch for `phsubw'
+.*:24: Error: register type of address mismatch for `pmaddubsw'
+.*:25: Error: register type of address mismatch for `pmulhrsw'
+.*:26: Error: register type of address mismatch for `pshufb'
+.*:27: Error: register type of address mismatch for `psignb'
+.*:28: Error: register type of address mismatch for `psignw'
+.*:29: Error: register type of address mismatch for `psignd'
+.*:30: Error: register type of address mismatch for `palignr'
+.*:31: Error: register type of address mismatch for `pabsb'
+.*:32: Error: register type of address mismatch for `pabsw'
+.*:33: Error: register type of address mismatch for `pabsd'
+.*:34: Error: register type of address mismatch for `blendpd'
+.*:35: Error: register type of address mismatch for `blendps'
+.*:36: Error: register type of address mismatch for `blendvpd'
+.*:37: Error: register type of address mismatch for `blendvps'
+.*:38: Error: register type of address mismatch for `blendvpd'
+.*:39: Error: register type of address mismatch for `blendvps'
+.*:40: Error: register type of address mismatch for `dppd'
+.*:41: Error: register type of address mismatch for `dpps'
+.*:42: Error: register type of address mismatch for `extractps'
+.*:43: Error: register type mismatch for `extractps'
+.*:44: Error: register type of address mismatch for `insertps'
+.*:45: Error: register type of address mismatch for `movntdqa'
+.*:46: Error: register type of address mismatch for `mpsadbw'
+.*:47: Error: register type of address mismatch for `packusdw'
+.*:48: Error: register type of address mismatch for `pblendvb'
+.*:49: Error: register type of address mismatch for `pblendvb'
+.*:50: Error: register type of address mismatch for `pblendw'
+.*:51: Error: register type of address mismatch for `pcmpeqq'
+.*:52: Error: register type of address mismatch for `pextrb'
+.*:53: Error: register type mismatch for `pextrb'
+.*:54: Error: register type of address mismatch for `pextrw'
+.*:55: Error: register type of address mismatch for `pextrd'
+.*:56: Error: register type of address mismatch for `pextrq'
+.*:57: Error: register type of address mismatch for `phminposuw'
+.*:58: Error: register type mismatch for `pinsrb'
+.*:59: Error: register type of address mismatch for `pinsrb'
+.*:60: Error: register type mismatch for `pinsrd'
+.*:61: Error: register type of address mismatch for `pinsrd'
+.*:62: Error: register type mismatch for `pinsrq'
+.*:63: Error: register type of address mismatch for `pinsrq'
+.*:64: Error: register type of address mismatch for `pmaxsb'
+.*:65: Error: register type of address mismatch for `pmaxsd'
+.*:66: Error: register type of address mismatch for `pmaxud'
+.*:67: Error: register type of address mismatch for `pmaxuw'
+.*:68: Error: register type of address mismatch for `pminsb'
+.*:69: Error: register type of address mismatch for `pminsd'
+.*:70: Error: register type of address mismatch for `pminud'
+.*:71: Error: register type of address mismatch for `pminuw'
+.*:72: Error: register type of address mismatch for `pmovsxbw'
+.*:73: Error: register type of address mismatch for `pmovsxbd'
+.*:74: Error: register type of address mismatch for `pmovsxbq'
+.*:75: Error: register type of address mismatch for `pmovsxwd'
+.*:76: Error: register type of address mismatch for `pmovsxwq'
+.*:77: Error: register type of address mismatch for `pmovsxdq'
+.*:78: Error: register type of address mismatch for `pmovsxbw'
+.*:79: Error: register type of address mismatch for `pmovzxbd'
+.*:80: Error: register type of address mismatch for `pmovzxbq'
+.*:81: Error: register type of address mismatch for `pmovzxwd'
+.*:82: Error: register type of address mismatch for `pmovzxwq'
+.*:83: Error: register type of address mismatch for `pmovzxdq'
+.*:84: Error: register type of address mismatch for `pmuldq'
+.*:85: Error: register type of address mismatch for `pmulld'
+.*:86: Error: register type of address mismatch for `roundpd'
+.*:87: Error: register type of address mismatch for `roundps'
+.*:88: Error: register type of address mismatch for `roundsd'
+.*:89: Error: register type of address mismatch for `roundss'
+.*:90: Error: register type of address mismatch for `pcmpestri'
+.*:91: Error: register type of address mismatch for `pcmpestrm'
+.*:92: Error: register type of address mismatch for `pcmpgtq'
+.*:93: Error: register type of address mismatch for `pcmpistri'
+.*:94: Error: register type of address mismatch for `pcmpistrm'
+.*:96: Error: register type of address mismatch for `aesdec'
+.*:97: Error: register type of address mismatch for `aesdeclast'
+.*:98: Error: register type of address mismatch for `aesenc'
+.*:99: Error: register type of address mismatch for `aesenclast'
+.*:100: Error: register type of address mismatch for `aesimc'
+.*:101: Error: register type of address mismatch for `aeskeygenassist'
+.*:102: Error: register type of address mismatch for `pclmulqdq'
+.*:103: Error: register type of address mismatch for `pclmullqlqdq'
+.*:104: Error: register type of address mismatch for `pclmulhqlqdq'
+.*:105: Error: register type of address mismatch for `pclmullqhqdq'
+.*:106: Error: register type of address mismatch for `pclmulhqhqdq'
+.*:108: Error: register type of address mismatch for `gf2p8affineqb'
+.*:109: Error: register type of address mismatch for `gf2p8affineinvqb'
+.*:110: Error: register type of address mismatch for `gf2p8mulb'
+.*:112: Error: register type of address mismatch for `vblendpd'
+.*:113: Error: register type of address mismatch for `vblendpd'
+.*:114: Error: register type of address mismatch for `vblendps'
+.*:115: Error: register type of address mismatch for `vblendps'
+.*:116: Error: register type of address mismatch for `vblendvpd'
+.*:117: Error: register type of address mismatch for `vblendvpd'
+.*:118: Error: register type of address mismatch for `vblendvps'
+.*:119: Error: register type of address mismatch for `vblendvps'
+.*:120: Error: register type of address mismatch for `vdppd'
+.*:121: Error: register type of address mismatch for `vdpps'
+.*:122: Error: register type of address mismatch for `vdpps'
+.*:123: Error: register type of address mismatch for `vhaddpd'
+.*:124: Error: register type of address mismatch for `vhaddpd'
+.*:125: Error: register type of address mismatch for `vhsubps'
+.*:126: Error: register type of address mismatch for `vhsubps'
+.*:127: Error: register type of address mismatch for `vlddqu'
+.*:128: Error: register type of address mismatch for `vlddqu'
+.*:129: Error: register type of address mismatch for `vldmxcsr'
+.*:130: Error: register type of address mismatch for `vmaskmovpd'
+.*:131: Error: register type of address mismatch for `vmaskmovpd'
+.*:132: Error: register type of address mismatch for `vmaskmovps'
+.*:133: Error: register type of address mismatch for `vmaskmovps'
+.*:134: Error: register type of address mismatch for `vmaskmovpd'
+.*:135: Error: register type of address mismatch for `vmaskmovpd'
+.*:136: Error: register type of address mismatch for `vmaskmovps'
+.*:137: Error: register type of address mismatch for `vmaskmovps'
+.*:138: Error: register type mismatch for `vmovmskpd'
+.*:139: Error: register type mismatch for `vmovmskpd'
+.*:140: Error: register type mismatch for `vmovmskps'
+.*:141: Error: register type mismatch for `vmovmskps'
+.*:142: Error: register type of address mismatch for `vpblendvb'
+.*:143: Error: register type of address mismatch for `vpblendvb'
+.*:144: Error: register type of address mismatch for `vpblendw'
+.*:145: Error: register type of address mismatch for `vpblendw'
+.*:146: Error: register type of address mismatch for `vpcmpestri'
+.*:147: Error: register type of address mismatch for `vpcmpestrm'
+.*:148: Error: register type of address mismatch for `vperm2f128'
+.*:149: Error: register type of address mismatch for `vphaddd'
+.*:150: Error: register type of address mismatch for `vphaddsw'
+.*:151: Error: register type of address mismatch for `vphaddw'
+.*:152: Error: register type of address mismatch for `vphsubd'
+.*:153: Error: register type of address mismatch for `vphsubsw'
+.*:154: Error: register type of address mismatch for `vphsubw'
+.*:155: Error: register type of address mismatch for `vphaddd'
+.*:156: Error: register type of address mismatch for `vphaddsw'
+.*:157: Error: register type of address mismatch for `vphaddw'
+.*:158: Error: register type of address mismatch for `vphsubd'
+.*:159: Error: register type of address mismatch for `vphsubsw'
+.*:160: Error: register type of address mismatch for `vphsubw'
+.*:161: Error: register type of address mismatch for `vphminposuw'
+.*:162: Error: register type mismatch for `vpmovmskb'
+.*:163: Error: register type mismatch for `vpmovmskb'
+.*:164: Error: register type of address mismatch for `vpsignb'
+.*:165: Error: register type of address mismatch for `vpsignw'
+.*:166: Error: register type of address mismatch for `vpsignd'
+.*:167: Error: register type of address mismatch for `vpsignb'
+.*:168: Error: register type of address mismatch for `vpsignw'
+.*:169: Error: register type of address mismatch for `vpsignd'
+.*:170: Error: register type of address mismatch for `vptest'
+.*:171: Error: register type of address mismatch for `vptest'
+.*:172: Error: register type of address mismatch for `vrcpps'
+.*:173: Error: register type of address mismatch for `vrcpps'
+.*:174: Error: register type of address mismatch for `vrcpss'
+.*:175: Error: register type of address mismatch for `vrsqrtps'
+.*:176: Error: register type of address mismatch for `vrsqrtps'
+.*:177: Error: register type of address mismatch for `vrsqrtss'
+.*:178: Error: register type of address mismatch for `vstmxcsr'
+.*:179: Error: register type of address mismatch for `vtestps'
+.*:180: Error: register type of address mismatch for `vtestps'
+.*:181: Error: register type of address mismatch for `vtestpd'
+.*:182: Error: register type of address mismatch for `vtestps'
+.*:183: Error: register type of address mismatch for `vtestpd'
+.*:184: Error: register type of address mismatch for `vpblendd'
+.*:185: Error: register type of address mismatch for `vpblendd'
+.*:186: Error: register type of address mismatch for `vperm2i128'
+.*:187: Error: register type of address mismatch for `vpmaskmovd'
+.*:188: Error: register type of address mismatch for `vpmaskmovd'
+.*:189: Error: register type of address mismatch for `vpmaskmovq'
+.*:190: Error: register type of address mismatch for `vpmaskmovq'
+.*:191: Error: register type of address mismatch for `vpmaskmovd'
+.*:192: Error: register type of address mismatch for `vpmaskmovd'
+.*:193: Error: register type of address mismatch for `vpmaskmovq'
+.*:194: Error: register type of address mismatch for `vpmaskmovq'
+.*:195: Error: register type of address mismatch for `vaesimc'
+.*:196: Error: register type of address mismatch for `vaeskeygenassist'
+.*:197: Error: register type of address mismatch for `vroundpd'
+.*:198: Error: register type of address mismatch for `vroundps'
+.*:199: Error: register type of address mismatch for `vroundsd'
+.*:200: Error: register type of address mismatch for `vroundss'
+.*:201: Error: register type of address mismatch for `vpcmpistri'
+.*:202: Error: register type of address mismatch for `vpcmpistrm'
+.*:203: Error: register type of address mismatch for `vpcmpeqb'
+.*:204: Error: register type of address mismatch for `vpcmpeqw'
+.*:205: Error: register type of address mismatch for `vpcmpeqd'
+.*:206: Error: register type of address mismatch for `vpcmpeqq'
+.*:207: Error: register type of address mismatch for `vpcmpgtb'
+.*:208: Error: register type of address mismatch for `vpcmpgtw'
+.*:209: Error: register type of address mismatch for `vpcmpgtd'
+.*:210: Error: register type of address mismatch for `vpcmpgtq'
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
index c4d2308a604..71fcb91ce89 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -1,4 +1,4 @@
-# Check Illegal 64bit APX_F instructions
+# Check illegal 64bit APX_F instructions
 	.text
 	.arch .noapx_f
 	test    $0x7, %r17d
@@ -16,3 +16,195 @@
 	xsaveopt64 (%r16, %r31)
 	xsavec (%r16, %rbx)
 	xsavec64 (%r16, %r31)
+#SSE
+	phaddw          (%r17),%xmm0
+	phaddd          (%r17),%xmm0
+	phaddsw         (%r17),%xmm0
+	phsubw          (%r17),%xmm0
+	pmaddubsw       (%r17),%xmm0
+	pmulhrsw        (%r17),%xmm0
+	pshufb          (%r17),%xmm0
+	psignb          (%r17),%xmm0
+	psignw          (%r17),%xmm0
+	psignd          (%r17),%xmm0
+	palignr $100,(%r17),%xmm6
+	pabsb          (%r17),%xmm0
+	pabsw          (%r17),%xmm0
+	pabsd          (%r17),%xmm0
+	blendpd $100,(%r18),%xmm6
+	blendps $100,(%r18),%xmm6
+	blendvpd %xmm0,(%r19),%xmm6
+	blendvps %xmm0,(%r19),%xmm6
+	blendvpd (%r19),%xmm6
+	blendvps (%r19),%xmm6
+	dppd $100,(%r20),%xmm6
+	dpps $100,(%r20),%xmm6
+	extractps $100,%xmm4,(%r21)
+	extractps $100,%xmm4,%r21
+	insertps $100,(%r21),%xmm6
+	movntdqa (%r21),%xmm4
+	mpsadbw $100,(%r21),%xmm6
+	packusdw (%r21),%xmm6
+	pblendvb %xmm0,(%r22),%xmm6
+	pblendvb (%r22),%xmm6
+	pblendw $100,(%r22),%xmm6
+	pcmpeqq (%r22),%xmm6
+	pextrb $100,%xmm4,(%r22)
+	pextrb $100,%xmm4,%r22
+	pextrw $100,%xmm4,(%r22)
+	pextrd $100,%xmm4,(%r22)
+        pextrq $100,%xmm4,(%r22)
+	phminposuw (%r23),%xmm4
+	pinsrb $100,%r23,%xmm4
+	pinsrb $100,(%r23),%xmm4
+	pinsrd $100, %r23d, %xmm4
+	pinsrd $100,(%r23),%xmm4
+	pinsrq $100, %r24, %xmm4
+	pinsrq $100,(%r24),%xmm4
+	pmaxsb (%r24),%xmm6
+	pmaxsd (%r24),%xmm6
+	pmaxud (%r24),%xmm6
+	pmaxuw (%r24),%xmm6
+	pminsb (%r24),%xmm6
+	pminsd (%r24),%xmm6
+	pminud (%r24),%xmm6
+	pminuw (%r24),%xmm6
+	pmovsxbw (%r24),%xmm4
+	pmovsxbd (%r24),%xmm4
+	pmovsxbq (%r24),%xmm4
+	pmovsxwd (%r24),%xmm4
+	pmovsxwq (%r24),%xmm4
+	pmovsxdq (%r24),%xmm4
+	pmovsxbw (%r24),%xmm4
+	pmovzxbd (%r24),%xmm4
+	pmovzxbq (%r24),%xmm4
+	pmovzxwd (%r24),%xmm4
+	pmovzxwq (%r24),%xmm4
+	pmovzxdq (%r24),%xmm4
+	pmuldq (%r24),%xmm4
+	pmulld (%r24),%xmm4
+	roundpd $100,(%r24),%xmm6
+	roundps $100,(%r24),%xmm6
+	roundsd $100,(%r24),%xmm6
+	roundss $100,(%r24),%xmm6
+	pcmpestri $100,(%r25),%xmm6
+	pcmpestrm $100,(%r25),%xmm6
+	pcmpgtq (%r25),%xmm4
+	pcmpistri $100,(%r25),%xmm6
+	pcmpistrm $100,(%r25),%xmm6
+#AES
+	aesdec (%r26),%xmm6
+	aesdeclast (%r26),%xmm6
+	aesenc (%r26),%xmm6
+	aesenclast (%r26),%xmm6
+	aesimc (%r26),%xmm6
+	aeskeygenassist $100,(%r26),%xmm6
+	pclmulqdq $100,(%r26),%xmm6
+	pclmullqlqdq (%r26),%xmm6
+	pclmulhqlqdq (%r26),%xmm6
+	pclmullqhqdq (%r26),%xmm6
+	pclmulhqhqdq (%r26),%xmm6
+#GFNI
+	gf2p8affineqb $100,(%r26),%xmm6
+	gf2p8affineinvqb $100,(%r26),%xmm6
+	gf2p8mulb (%r26),%xmm6
+#VEX without evex
+	vblendpd $7,(%r27),%xmm6,%xmm2
+	vblendpd $7,(%r27),%ymm6,%ymm2
+	vblendps $7,(%r27),%xmm6,%xmm2
+	vblendps $7,(%r27),%ymm6,%ymm2
+	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
+	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
+	vblendvps %xmm4,(%r27),%xmm2,%xmm7
+	vblendvps %ymm4,(%r27),%ymm2,%ymm7
+	vdppd $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%ymm6,%ymm2
+	vhaddpd (%r27),%xmm6,%xmm5
+	vhaddpd (%r27),%ymm6,%ymm5
+	vhsubps (%r27),%xmm6,%xmm5
+	vhsubps (%r27),%ymm6,%ymm5
+	vlddqu (%r27),%xmm4
+	vlddqu (%r27),%ymm4
+	vldmxcsr (%r27)
+	vmaskmovpd (%r27),%xmm4,%xmm6
+	vmaskmovpd %xmm4,%xmm6,(%r27)
+	vmaskmovps (%r27),%xmm4,%xmm6
+	vmaskmovps %xmm4,%xmm6,(%r27)
+	vmaskmovpd (%r27),%ymm4,%ymm6
+	vmaskmovpd %ymm4,%ymm6,(%r27)
+	vmaskmovps (%r27),%ymm4,%ymm6
+	vmaskmovps %ymm4,%ymm6,(%r27)	
+	vmovmskpd %xmm4,%r27d
+	vmovmskpd %xmm8,%r27d
+	vmovmskps %xmm4,%r27d
+	vmovmskps %ymm8,%r27d
+	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
+	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
+	vpblendw $7,(%r27),%xmm6,%xmm2
+	vpblendw $7,(%r27),%ymm6,%ymm2
+	vpcmpestri $7,(%r27),%xmm6
+	vpcmpestrm $7,(%r27),%xmm6
+	vperm2f128 $7,(%r27),%ymm6,%ymm2
+	vphaddd (%r27),%xmm6,%xmm7
+	vphaddsw (%r27),%xmm6,%xmm7
+	vphaddw (%r27),%xmm6,%xmm7
+	vphsubd (%r27),%xmm6,%xmm7
+	vphsubsw (%r27),%xmm6,%xmm7
+	vphsubw (%r27),%xmm6,%xmm7
+	vphaddd (%r27),%ymm6,%ymm7
+	vphaddsw (%r27),%ymm6,%ymm7
+	vphaddw (%r27),%ymm6,%ymm7
+	vphsubd (%r27),%ymm6,%ymm7
+	vphsubsw (%r27),%ymm6,%ymm7
+	vphsubw (%r27),%ymm6,%ymm7
+	vphminposuw (%r27),%xmm6
+	vpmovmskb %xmm4,%r27
+	vpmovmskb %ymm4,%r27d
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vptest (%r27),%xmm6
+	vptest (%r27),%ymm6
+	vrcpps (%r27),%xmm6
+	vrcpps (%r27),%ymm6
+	vrcpss (%r27),%xmm6,%xmm6
+	vrsqrtps (%r27),%xmm6
+	vrsqrtps (%r27),%ymm6
+	vrsqrtss (%r27),%xmm6,%xmm6
+	vstmxcsr (%r27)
+	vtestps (%r27),%xmm6
+	vtestps (%r27),%ymm6
+	vtestpd (%r27),%xmm6
+	vtestps (%r27),%ymm6
+	vtestpd (%r27),%ymm6
+	vpblendd $7,(%r27),%xmm6,%xmm2
+	vpblendd $7,(%r27),%ymm6,%ymm2
+	vperm2i128 $7,(%r27),%ymm6,%ymm2
+	vpmaskmovd (%r27),%xmm4,%xmm6
+	vpmaskmovd %xmm4,%xmm6,(%r27)
+	vpmaskmovq (%r27),%xmm4,%xmm6
+	vpmaskmovq %xmm4,%xmm6,(%r27)
+	vpmaskmovd (%r27),%ymm4,%ymm6
+	vpmaskmovd %ymm4,%ymm6,(%r27)
+	vpmaskmovq (%r27),%ymm4,%ymm6
+	vpmaskmovq %ymm4,%ymm6,(%r27)
+	vaesimc (%r27), %xmm3
+	vaeskeygenassist $7,(%r27),%xmm3
+	vroundpd $1,(%r24),%xmm6
+	vroundps $2,(%r24),%xmm6
+	vroundsd $3,(%r24),%xmm6,%xmm3
+	vroundss $4,(%r24),%xmm6,%xmm3
+	vpcmpistri $100,(%r25),%xmm6
+	vpcmpistrm $100,(%r25),%xmm6
+	vpcmpeqb (%r26),%ymm6,%ymm2
+	vpcmpeqw (%r16),%ymm6,%ymm2
+	vpcmpeqd (%r26),%ymm6,%ymm2
+	vpcmpeqq (%r16),%ymm6,%ymm2
+	vpcmpgtb (%r26),%ymm6,%ymm2
+	vpcmpgtw (%r16),%ymm6,%ymm2
+	vpcmpgtd (%r26),%ymm6,%ymm2
+	vpcmpgtq (%r16),%ymm6,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
new file mode 100644
index 00000000000..5c73eea0465
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
@@ -0,0 +1,16 @@
+.*: Assembler messages:
+.*:4: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:5: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:7: Error: `invept' is not supported on `x86_64.nomovbe.noept'
+.*:8: Error: `invept' is not supported on `x86_64.nomovbe.noept'
+.*:10: Error: `kmovq' is not supported on `x86_64.nomovbe.noept.noavx512bw'
+.*:11: Error: `kmovq' is not supported on `x86_64.nomovbe.noept.noavx512bw'
+.*:13: Error: `kmovb' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
+.*:14: Error: `kmovb' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
+.*:16: Error: `kmovw' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'
+.*:17: Error: `kmovw' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'
+GAS LISTING .*
+#...
+[ 	]*1[ 	]+\# Check illegal 64bit APX EVEX promoted instructions
+[ 	]*2[ 	]+\.text
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
new file mode 100644
index 00000000000..c3914ee7437
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
@@ -0,0 +1,17 @@
+# Check illegal 64bit APX EVEX promoted instructions
+	.text
+	.arch .nomovbe
+	movbe (%r16), %r17
+	movbe (%rax), %rcx
+	.arch .noept
+	invept (%r16), %r17
+	invept (%rax), %rcx
+	.arch .noavx512bw
+	kmovq %k1, (%r16)
+	kmovq %k1, (%r8)
+	.arch .noavx512dq
+	kmovb %k1, %r16d
+	kmovb %k1, %r8d
+	.arch .noavx512f
+	kmovw %k1, %r16d
+	kmovw %k1, %r8d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
new file mode 100644
index 00000000000..c3c578675c0
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
@@ -0,0 +1,20 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX old evex insn use gpr32 with extend-evex prefix
+#source: x86-64-apx-evex-egpr.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 fb 79 48 19 04 08 01[	 ]+vextractf32x4 \$0x1,%zmm0,\(%r16,%r17,1\)
+\s*[a-f0-9]+:\s*62 fa 79 48 5a 04 1a[	 ]+vbroadcasti32x4 \(%r18,%r19,1\),%zmm0
+\s*[a-f0-9]+:\s*62 eb 7d 08 17 c4 01[	 ]+vextractps \$0x1,%xmm16,%r20d
+\s*[a-f0-9]+:\s*62 69 97 00 2a f5[	 ]+vcvtsi2sd %r21,%xmm29,%xmm30
+\s*[a-f0-9]+:\s*67 62 fe 55 58 96 36[	 ]+vfmaddsub132ph \(%r22d\)\{1to32\},%zmm5,%zmm6
+\s*[a-f0-9]+:\s*62 81 fe 18 78 fe[	 ]+vcvttss2usi \{sae\},%xmm30,%r23
+\s*[a-f0-9]+:\s*62 25 10 47 58 b4 c5 00 00 00 10[	 ]+vaddph 0x10000000\(%rbp,%r24,8\),%zmm29,%zmm30\{%k7\}
+\s*[a-f0-9]+:\s*62 4d 7c 08 2f 71 7f[	 ]+vcomish 0xfe\(%r25\),%xmm30
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
new file mode 100644
index 00000000000..7d1c5de2b6d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
@@ -0,0 +1,21 @@
+# Check 64bit old evex instructions use gpr32 with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+## DestMem
+	 vextractf32x4	$1, %zmm0, (%r16,%r17)
+## SrcMem
+	 vbroadcasti32x4	(%r18,%r19), %zmm0
+## DestReg
+	 vextractps	$1, %xmm16, %r20d
+## SrcReg
+	 vcvtsi2sdq      %r21, %xmm29, %xmm30
+## Broadcast
+	 vfmaddsub132ph  (%r22d){1to32}, %zmm5, %zmm6
+## SAE
+	 vcvttss2usi     {sae}, %xmm30, %r23
+## Masking
+	 vaddph  0x10000000(%rbp, %r24, 8), %zmm29, %zmm30{%k7}
+## Disp8memshift
+	 vcomish 254(%r25), %xmm30
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
new file mode 100644
index 00000000000..ad5b2e3cb5c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -0,0 +1,31 @@
+#objdump: -dw
+#name: x86-64 EVEX-promoted bad
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7e 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c2 ff ff[ 	]+ret    \$0xffff
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7f 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c2 ff ff[ 	]+ret    \$0xffff
+[ 	]*[a-f0-9]+:[ 	]+62 e2 f9 41 91 84[ 	]+vpgatherqq \(bad\),%zmm16\{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+cd 7b[ 	]+int    \$0x7b
+[ 	]*[a-f0-9]+:[ 	]+00 00[ 	]+add    %al,\(%rax\)
+[ 	]*[a-f0-9]+:[ 	]+00 ff[ 	]+add    %bh,%bh
+[ 	]*[a-f0-9]+:[ 	]+62 fd 7d 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c2 ff ff[ 	]+ret    \$0xffff
+[ 	]*[a-f0-9]+:[ 	]+62 fd 7d 09 60[ 	]+\(bad\)  \{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+c2 ff ff[ 	]+ret    \$0xffff
+[ 	]*[a-f0-9]+:[ 	]+62 fd 7d 28 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c2 ff ff[ 	]+ret    \$0xffff
+[ 	]*[a-f0-9]+:[ 	]+62 4c 7f[ 	]+\(bad\)  \{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+09 f8[ 	]+or     %edi,%eax
+[ 	]*[a-f0-9]+:[ 	]+bc 87 23 01 00[ 	]+mov    \$0x12387,%esp
+[ 	]*[a-f0-9]+:[ 	]+00 ff[ 	]+add    %bh,%bh
+[ 	]*[a-f0-9]+:[ 	]+62 4c 7f[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+28 f8[ 	]+sub    %bh,%al
+[ 	]*[a-f0-9]+:[ 	]+bc 87 23 01 00[ 	]+mov    \$0x12387,%esp
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
new file mode 100644
index 00000000000..9bb06d9f494
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -0,0 +1,29 @@
+# Check Illegal prefix for 64bit EVEX-promoted instructions
+
+        .allow_index_reg
+        .text
+_start:
+        #movbe %r18w,%ax set EVEX.pp = f3 (illegal value).
+        .byte 0x62, 0xfc, 0x7e, 0x08, 0x60, 0xc2
+        .byte 0xff, 0xff
+        #movbe %r18w,%ax set EVEX.pp = f2 (illegal value).
+        .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
+        .byte 0xff, 0xff
+        #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
+	#(illegal value).
+        .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
+        .byte 0xff
+        #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
+	.byte 0x62, 0xfd, 0x7d, 0x08, 0x60, 0xc2
+        .byte 0xff, 0xff
+        #EVEX_MAP4 movbe %r18w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).
+	.byte 0x62, 0xfd, 0x7d, 0x09, 0x60, 0xc2
+        .byte 0xff, 0xff
+        #EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == b001 (illegal value).
+	.byte 0x62, 0xfd, 0x7d, 0x28, 0x60, 0xc2
+        .byte 0xff, 0xff
+        #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[17:16] == 1 (illegal value).
+        .byte 0x62, 0x4c, 0x7f, 0x09, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00
+        .byte 0xff
+        #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[23:22] == 1 (illegal value).
+        .byte 0x62, 0x4c, 0x7f, 0x28, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
new file mode 100644
index 00000000000..0f8f94b800e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
@@ -0,0 +1,326 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F EVEX-Promoted insns (Intel disassembly)
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*c5 f9 90 eb[	 ]+kmovb[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f9 90 eb[	 ]+kmovd[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f8 90 eb[	 ]+kmovq[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c5 f8 90 eb[	 ]+kmovw[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*c5 f9 90 eb[	 ]+kmovb[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f9 90 eb[	 ]+kmovd[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f8 90 eb[	 ]+kmovq[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*c5 f8 90 eb[	 ]+kmovw[	 ]+k5,k3
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
new file mode 100644
index 00000000000..3e71e1afe9a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
@@ -0,0 +1,326 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F EVEX-Promoted insns
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*c5 f9 90 eb[	 ]+kmovb[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f9 90 eb[	 ]+kmovd[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f8 90 eb[	 ]+kmovq[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c5 f8 90 eb[	 ]+kmovw[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*c5 f9 90 eb[	 ]+kmovb[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f9 90 eb[	 ]+kmovd[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c4 e1 f8 90 eb[	 ]+kmovq[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*c5 f8 90 eb[	 ]+kmovw[	 ]+%k3,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
new file mode 100644
index 00000000000..a89cea92eb9
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
@@ -0,0 +1,322 @@
+# Check 64bit APX_F EVEX-Promoted instructions.
+
+	.text
+_start:
+	aadd	%r25d,0x123(%r31,%rax,4)
+	aadd	%r31,0x123(%r31,%rax,4)
+	aand	%r25d,0x123(%r31,%rax,4)
+	aand	%r31,0x123(%r31,%rax,4)
+	aesdec128kl	0x123(%r31,%rax,4),%xmm22
+	aesdec256kl	0x123(%r31,%rax,4),%xmm22
+	aesdecwide128kl	0x123(%r31,%rax,4)
+	aesdecwide256kl	0x123(%r31,%rax,4)
+	aesenc128kl	0x123(%r31,%rax,4),%xmm22
+	aesenc256kl	0x123(%r31,%rax,4),%xmm22
+	aesencwide128kl	0x123(%r31,%rax,4)
+	aesencwide256kl	0x123(%r31,%rax,4)
+	aor	%r25d,0x123(%r31,%rax,4)
+	aor	%r31,0x123(%r31,%rax,4)
+	axor	%r25d,0x123(%r31,%rax,4)
+	axor	%r31,0x123(%r31,%rax,4)
+	bextr	%r25d,%edx,%r10d
+	bextr	%r25d,0x123(%r31,%rax,4),%edx
+	bextr	%r31,%r15,%r11
+	bextr	%r31,0x123(%r31,%rax,4),%r15
+	blsi	%r25d,%edx
+	blsi	%r31,%r15
+	blsi	0x123(%r31,%rax,4),%r25d
+	blsi	0x123(%r31,%rax,4),%r31
+	blsmsk	%r25d,%edx
+	blsmsk	%r31,%r15
+	blsmsk	0x123(%r31,%rax,4),%r25d
+	blsmsk	0x123(%r31,%rax,4),%r31
+	blsr	%r25d,%edx
+	blsr	%r31,%r15
+	blsr	0x123(%r31,%rax,4),%r25d
+	blsr	0x123(%r31,%rax,4),%r31
+	bzhi	%r25d,%edx,%r10d
+	bzhi	%r25d,0x123(%r31,%rax,4),%edx
+	bzhi	%r31,%r15,%r11
+	bzhi	%r31,0x123(%r31,%rax,4),%r15
+	cmpbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmplxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmplxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnpxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnpxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmppxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmppxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	encodekey128	%r25d,%edx
+	encodekey256	%r25d,%edx
+	enqcmd	0x123(%r31d,%eax,4),%r25d
+	enqcmd	0x123(%r31,%rax,4),%r31
+	enqcmds	0x123(%r31d,%eax,4),%r25d
+	enqcmds	0x123(%r31,%rax,4),%r31
+	invept	0x123(%r31,%rax,4),%r31
+	invpcid	0x123(%r31,%rax,4),%r31
+	invvpid	0x123(%r31,%rax,4),%r31
+	crc32q	%r31, %r22
+	crc32q	(%r31), %r22
+	crc32b	%r19b, %r17
+	crc32b	%r19b, %r21d
+	crc32b	(%r19),%ebx
+	crc32l	%r31d, %r23d
+	crc32l	(%r31), %r23d
+	crc32w	%r31w, %r21d
+	crc32w	(%r31),%r21d
+	crc32	%rax, %r18
+	kmovb	%k3,%k5
+	kmovb	%k5,%r25d
+	kmovb	%k5,0x123(%r31,%rax,4)
+	kmovb	%r25d,%k5
+	kmovb	0x123(%r31,%rax,4),%k5
+	kmovd	%k3,%k5
+	kmovd	%k5,%r25d
+	kmovd	%k5,0x123(%r31,%rax,4)
+	kmovd	%r25d,%k5
+	kmovd	0x123(%r31,%rax,4),%k5
+	kmovq	%k3,%k5
+	kmovq	%k5,%r31
+	kmovq	%k5,0x123(%r31,%rax,4)
+	kmovq	%r31,%k5
+	kmovq	0x123(%r31,%rax,4),%k5
+	kmovw	%k3,%k5
+	kmovw	%k5,%r25d
+	kmovw	%k5,0x123(%r31,%rax,4)
+	kmovw	%r25d,%k5
+	kmovw	0x123(%r31,%rax,4),%k5
+	ldtilecfg	0x123(%r31,%rax,4)
+	movbe	%r18w,%ax
+	movbe	%r18w,0x123(%r16,%rax,4)
+	movbe	%r18w,0x123(%r31,%rax,4)
+	movbe	%r25d,%edx
+	movbe	%r25d,0x123(%r16,%rax,4)
+	movbe	%r31,%r15
+	movbe	%r31,0x123(%r16,%rax,4)
+	movbe	%r31,0x123(%r31,%rax,4)
+	movbe	0x123(%r16,%rax,4),%r31
+	movbe	0x123(%r31,%rax,4),%r18w
+	movbe	0x123(%r31,%rax,4),%r25d
+	movdir64b	0x123(%r31d,%eax,4),%r25d
+	movdir64b	0x123(%r31,%rax,4),%r31
+	movdiri	%r25d,0x123(%r31,%rax,4)
+	movdiri	%r31,0x123(%r31,%rax,4)
+	pdep	%r25d,%edx,%r10d
+	pdep	%r31,%r15,%r11
+	pdep	0x123(%r31,%rax,4),%r25d,%edx
+	pdep	0x123(%r31,%rax,4),%r31,%r15
+	pext	%r25d,%edx,%r10d
+	pext	%r31,%r15,%r11
+	pext	0x123(%r31,%rax,4),%r25d,%edx
+	pext	0x123(%r31,%rax,4),%r31,%r15
+	sha1msg1	%xmm23,%xmm22
+	sha1msg1	0x123(%r31,%rax,4),%xmm22
+	sha1msg2	%xmm23,%xmm22
+	sha1msg2	0x123(%r31,%rax,4),%xmm22
+	sha1nexte	%xmm23,%xmm22
+	sha1nexte	0x123(%r31,%rax,4),%xmm22
+	sha1rnds4	$0x7b,%xmm23,%xmm22
+	sha1rnds4	$0x7b,0x123(%r31,%rax,4),%xmm22
+	sha256msg1	%xmm23,%xmm22
+	sha256msg1	0x123(%r31,%rax,4),%xmm22
+	sha256msg2	%xmm23,%xmm22
+	sha256msg2	0x123(%r31,%rax,4),%xmm22
+	sha256rnds2	0x123(%r31,%rax,4),%xmm12
+	shlx	%r25d,%edx,%r10d
+	shlx	%r25d,0x123(%r31,%rax,4),%edx
+	shlx	%r31,%r15,%r11
+	shlx	%r31,0x123(%r31,%rax,4),%r15
+	shrx	%r25d,%edx,%r10d
+	shrx	%r25d,0x123(%r31,%rax,4),%edx
+	shrx	%r31,%r15,%r11
+	shrx	%r31,0x123(%r31,%rax,4),%r15
+	sttilecfg	0x123(%r31,%rax,4)
+	tileloadd	0x123(%r31,%rax,4),%tmm6
+	tileloaddt1	0x123(%r31,%rax,4),%tmm6
+	tilestored	%tmm6,0x123(%r31,%rax,4)
+	wrssd	%r25d,0x123(%r31,%rax,4)
+	wrssq	%r31,0x123(%r31,%rax,4)
+	wrussd	%r25d,0x123(%r31,%rax,4)
+	wrussq	%r31,0x123(%r31,%rax,4)
+
+.intel_syntax noprefix
+	aadd	DWORD PTR [r31+rax*4+0x123],r25d
+	aadd	QWORD PTR [r31+rax*4+0x123],r31
+	aand	DWORD PTR [r31+rax*4+0x123],r25d
+	aand	QWORD PTR [r31+rax*4+0x123],r31
+	aesdec128kl	xmm22,[r31+rax*4+0x123]
+	aesdec256kl	xmm22,[r31+rax*4+0x123]
+	aesdecwide128kl	[r31+rax*4+0x123]
+	aesdecwide256kl	[r31+rax*4+0x123]
+	aesenc128kl	xmm22,[r31+rax*4+0x123]
+	aesenc256kl	xmm22,[r31+rax*4+0x123]
+	aesencwide128kl	[r31+rax*4+0x123]
+	aesencwide256kl	[r31+rax*4+0x123]
+	aor	DWORD PTR [r31+rax*4+0x123],r25d
+	aor	QWORD PTR [r31+rax*4+0x123],r31
+	axor	DWORD PTR [r31+rax*4+0x123],r25d
+	axor	QWORD PTR [r31+rax*4+0x123],r31
+	bextr	r10d,edx,r25d
+	bextr	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	bextr	r11,r15,r31
+	bextr	r15,QWORD PTR [r31+rax*4+0x123],r31
+	blsi	edx,r25d
+	blsi	r15,r31
+	blsi	r25d,DWORD PTR [r31+rax*4+0x123]
+	blsi	r31,QWORD PTR [r31+rax*4+0x123]
+	blsmsk	edx,r25d
+	blsmsk	r15,r31
+	blsmsk	r25d,DWORD PTR [r31+rax*4+0x123]
+	blsmsk	r31,QWORD PTR [r31+rax*4+0x123]
+	blsr	edx,r25d
+	blsr	r15,r31
+	blsr	r25d,DWORD PTR [r31+rax*4+0x123]
+	blsr	r31,QWORD PTR [r31+rax*4+0x123]
+	bzhi	r10d,edx,r25d
+	bzhi	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	bzhi	r11,r15,r31
+	bzhi	r15,QWORD PTR [r31+rax*4+0x123],r31
+	cmpbexadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpbexadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpbxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpbxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmplxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmplxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnbexadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnbexadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnbxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnbxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnlexadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnlexadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnlxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnlxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnoxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnoxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnpxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnpxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnsxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnsxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpnzxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpnzxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpoxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpoxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmppxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmppxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpsxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpsxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	cmpzxadd	DWORD PTR [r31+rax*4+0x123],edx,r25d
+	cmpzxadd	QWORD PTR [r31+rax*4+0x123],r15,r31
+	encodekey128	edx,r25d
+	encodekey256	edx,r25d
+	enqcmd	r25d,[r31d+eax*4+0x123]
+	enqcmd	r31,[r31+rax*4+0x123]
+	enqcmds	r25d,[r31d+eax*4+0x123]
+	enqcmds	r31,[r31+rax*4+0x123]
+	invept	r31,OWORD PTR [r31+rax*4+0x123]
+	invpcid	r31,[r31+rax*4+0x123]
+	invvpid	r31,OWORD PTR [r31+rax*4+0x123]
+	crc32	r22,r31
+	crc32	r22,QWORD PTR [r31]
+	crc32	r17,r19b
+	crc32	r21d,r19b
+	crc32	ebx,BYTE PTR [r19]
+	crc32	r23d,r31d
+	crc32	r23d,DWORD PTR [r31]
+	crc32	r21d,r31w
+	crc32	r21d,WORD PTR [r31]
+	crc32	r18,rax
+	kmovb	k5,k3
+	kmovb	r25d,k5
+	kmovb	BYTE PTR [r31+rax*4+0x123],k5
+	kmovb	k5,r25d
+	kmovb	k5,BYTE PTR [r31+rax*4+0x123]
+	kmovd	k5,k3
+	kmovd	r25d,k5
+	kmovd	DWORD PTR [r31+rax*4+0x123],k5
+	kmovd	k5,r25d
+	kmovd	k5,DWORD PTR [r31+rax*4+0x123]
+	kmovq	k5,k3
+	kmovq	r31,k5
+	kmovq	QWORD PTR [r31+rax*4+0x123],k5
+	kmovq	k5,r31
+	kmovq	k5,QWORD PTR [r31+rax*4+0x123]
+	kmovw	k5,k3
+	kmovw	r25d,k5
+	kmovw	WORD PTR [r31+rax*4+0x123],k5
+	kmovw	k5,r25d
+	kmovw	k5,WORD PTR [r31+rax*4+0x123]
+	ldtilecfg	[r31+rax*4+0x123]
+	movbe	ax,r18w
+	movbe	WORD PTR [r16+rax*4+0x123],r18w
+	movbe	WORD PTR [r31+rax*4+0x123],r18w
+	movbe	edx,r25d
+	movbe	DWORD PTR [r16+rax*4+0x123],r25d
+	movbe	r15,r31
+	movbe	QWORD PTR [r16+rax*4+0x123],r31
+	movbe	QWORD PTR [r31+rax*4+0x123],r31
+	movbe	r31,QWORD PTR [r16+rax*4+0x123]
+	movbe	r18w,WORD PTR [r31+rax*4+0x123]
+	movbe	r25d,DWORD PTR [r31+rax*4+0x123]
+	movdir64b	r25d,[r31d+eax*4+0x123]
+	movdir64b	r31,[r31+rax*4+0x123]
+	movdiri	DWORD PTR [r31+rax*4+0x123],r25d
+	movdiri	QWORD PTR [r31+rax*4+0x123],r31
+	pdep	r10d,edx,r25d
+	pdep	r11,r15,r31
+	pdep	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pdep	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	pext	r10d,edx,r25d
+	pext	r11,r15,r31
+	pext	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pext	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	sha1msg1	xmm22,xmm23
+	sha1msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1msg2	xmm22,xmm23
+	sha1msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1nexte	xmm22,xmm23
+	sha1nexte	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1rnds4	xmm22,xmm23,0x7b
+	sha1rnds4	xmm22,XMMWORD PTR [r31+rax*4+0x123],0x7b
+	sha256msg1	xmm22,xmm23
+	sha256msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256msg2	xmm22,xmm23
+	sha256msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256rnds2	xmm12,XMMWORD PTR [r31+rax*4+0x123]
+	shlx	r10d,edx,r25d
+	shlx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shlx	r11,r15,r31
+	shlx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	shrx	r10d,edx,r25d
+	shrx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shrx	r11,r15,r31
+	shrx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	sttilecfg	[r31+rax*4+0x123]
+	tileloadd	tmm6,[r31+rax*4+0x123]
+	tileloaddt1	tmm6,[r31+rax*4+0x123]
+	tilestored	[r31+rax*4+0x123],tmm6
+	wrssd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrssq	QWORD PTR [r31+rax*4+0x123],r31
+	wrussd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrussq	QWORD PTR [r31+rax*4+0x123],r31
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index a698a467c53..dc1fa8dddb9 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -360,8 +360,13 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
-run_list_test "x86-64-apx-egpr-inval" "-al"
+run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-evex-promoted-bad"
+run_list_test "x86-64-apx-egpr-promote-inval" "-al"
 run_dump_test "x86-64-apx-rex2"
+run_dump_test "x86-64-apx-evex-promoted"
+run_dump_test "x86-64-apx-evex-promoted-intel"
+run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 5/8] Support APX NDD
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (3 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-08 10:39   ` Jan Beulich
                     ` (2 more replies)
  2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant, konglin1

From: konglin1 <lingling.kong@intel.com>

opcodes/ChangeLog:

	* opcodes/i386-dis-evex-prefix.h: Add NDD decode for adox/adcx.
	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83,  REG_EVEX_MAP4_F6,
	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
	* opcodes/i386-dis-evex.h: Add NDD insn.
	* opcodes/i386-dis.c (VexGb): Add new define.
	(VexGv): Ditto.
	(get_valid_dis386): Change for NDD decode.
	(print_insn): Ditto.
	(print_register): Ditto.
	(intel_operand_size): Ditto.
	(OP_E_memory): Ditto.
	(OP_VEX): Ditto.
	* opcodes/i386-opc.h (VexVVVV_SRC): New.
	VexVVVV_DST):  Ditto.
	* opcodes/i386-opc.tbl: Add APX NDD instructions and adjust VexVVVV.
	* opcodes/i386-tbl.h: Regenerated.

gas/ChangeLog:

	* gas/config/tc-i386.c (is_any_apx_evex_encoding): Add legacy insn
	promote to SPACE_EVEXMAP4.
	(md_assemble): Change for ndd encode.
	(process_operands): Ditto.
	(build_modrm_byte): Ditto.
	(operand_size_match):
	Support APX NDD that the number of operands is 3.
	(match_template): Support swap the first two operands for
	APX NDD.
	reg_table
	* testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
	* testsuite/gas/i386/x86-64-apx-ndd.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d : Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s : Ditto.
---
 gas/config/tc-i386.c                          |  111 +-
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |    4 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |    5 +-
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       |  161 +++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       |  154 +++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |   42 +
 gas/testsuite/gas/i386/x86-64-pseudos.s       |   43 +
 gas/testsuite/gas/i386/x86-64.exp             |    1 +
 opcodes/i386-dis-evex-prefix.h                |    4 -
 opcodes/i386-dis-evex-reg.h                   |   54 +
 opcodes/i386-dis-evex.h                       |  126 +-
 opcodes/i386-dis.c                            |  145 +-
 opcodes/i386-gen.c                            |    1 +
 opcodes/i386-opc.h                            |    9 +-
 opcodes/i386-opc.tbl                          | 1231 +++++++++--------
 15 files changed, 1354 insertions(+), 737 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 398909a6a30..5b925505435 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2317,8 +2317,10 @@ operand_size_match (const insn_template *t)
       unsigned int given = i.operands - j - 1;
 
       /* For FMA4 and XOP insns VEX.W controls just the first two
-	 register operands.  */
-      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
+	 register operands. And APX_F insns just swap the two source operands,
+	 with the 3rd one being the destination.  */
+      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
+	  || is_cpu (t,CpuAPX_F))
 	given = j < 2 ? 1 - j : j;
 
       if (t->operand_types[j].bitfield.class == Reg
@@ -3959,6 +3961,7 @@ static INLINE bool
 is_any_apx_evex_encoding (void)
 {
   return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 
+    || i.rex2_encoding
     || (i.vex.register_specifier
 	&& i.vex.register_specifier->reg_flags & RegRex2);
 }
@@ -7481,26 +7484,33 @@ match_template (char mnem_suffix)
 	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
 	  if (t->opcode_modifier.d && i.reg_operands == i.operands
 	      && !operand_type_all_zero (&overlap1))
-	    switch (i.dir_encoding)
-	      {
-	      case dir_encoding_load:
-		if (operand_type_check (operand_types[i.operands - 1], anymem)
-		    || t->opcode_modifier.regmem)
-		  goto check_reverse;
-		break;
+	    {
 
-	      case dir_encoding_store:
-		if (!operand_type_check (operand_types[i.operands - 1], anymem)
-		    && !t->opcode_modifier.regmem)
-		  goto check_reverse;
-		break;
+	      int MemOperand = i.operands - 1 -
+		(t->opcode_space == SPACE_EVEXMAP4
+		 && t->opcode_modifier.vexvvvv);
+
+	      switch (i.dir_encoding)
+		{
+		case dir_encoding_load:
+		  if (operand_type_check (operand_types[MemOperand], anymem)
+		      || t->opcode_modifier.regmem)
+		    goto check_reverse;
+		  break;
 
-	      case dir_encoding_swap:
-		goto check_reverse;
+		case dir_encoding_store:
+		  if (!operand_type_check (operand_types[MemOperand], anymem)
+		      && !t->opcode_modifier.regmem)
+		    goto check_reverse;
+		  break;
 
-	      case dir_encoding_default:
-		break;
-	      }
+		case dir_encoding_swap:
+		  goto check_reverse;
+
+		case dir_encoding_default:
+		  break;
+		}
+	    }
 	  /* If we want store form, we skip the current load.  */
 	  if ((i.dir_encoding == dir_encoding_store
 	       || i.dir_encoding == dir_encoding_swap)
@@ -7530,11 +7540,13 @@ match_template (char mnem_suffix)
 		continue;
 	      /* Try reversing direction of operands.  */
 	      j = is_cpu (t, CpuFMA4)
-		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
+		  || is_cpu (t, CpuXOP)
+		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
 	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
 	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
 	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
-	      gas_assert (t->operands != 3 || !check_register);
+	      gas_assert (t->operands != 3 || !check_register
+			  || is_cpu (t,CpuAPX_F));
 	      if (!operand_type_match (overlap0, i.types[0])
 		  || !operand_type_match (overlap1, i.types[j])
 		  || (t->operands == 3
@@ -7569,6 +7581,11 @@ match_template (char mnem_suffix)
 		  found_reverse_match = Opcode_VexW;
 		  goto check_operands_345;
 		}
+	      else if (is_cpu (t,CpuAPX_F) && i.operands == 3)
+		{
+		  found_reverse_match = Opcode_D;
+		  goto check_operands_345;
+		}
 	      else if (t->opcode_space != SPACE_BASE
 		       && (t->opcode_space != SPACE_0F
 			   /* MOV to/from CR/DR/TR, as an exception, follow
@@ -7742,6 +7759,9 @@ match_template (char mnem_suffix)
 
       i.tm.base_opcode ^= found_reverse_match;
 
+      if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	goto swap_first_2;
+
       /* Certain SIMD insns have their load forms specified in the opcode
 	 table, and hence we need to _set_ RegMem instead of clearing it.
 	 We need to avoid setting the bit though on insns like KMOVW.  */
@@ -7761,6 +7781,7 @@ match_template (char mnem_suffix)
 	 flipping VEX.W.  */
       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
 
+    swap_first_2:
       j = i.tm.operand_types[0].bitfield.imm8;
       i.tm.operand_types[j] = operand_types[j + 1];
       i.tm.operand_types[j + 1] = operand_types[j];
@@ -8588,11 +8609,10 @@ process_operands (void)
   const reg_entry *default_seg = NULL;
 
   /* We only need to check those implicit registers for instructions
-     with 3 operands or less.  */
-  if (i.operands <= 3)
-    for (unsigned int j = 0; j < i.operands; j++)
-      if (i.types[j].bitfield.instance != InstanceNone)
-	i.reg_operands--;
+     with 4 operands or less.  */
+  for (unsigned int j = 0; j < i.operands; j++)
+    if (i.types[j].bitfield.instance != InstanceNone)
+      i.reg_operands--;
 
   if (i.tm.opcode_modifier.sse2avx)
     {
@@ -8946,26 +8966,35 @@ build_modrm_byte (void)
 				     || i.vec_encoding == vex_encoding_evex));
     }
 
-  for (v = source + 1; v < dest; ++v)
-    if (v != reg_slot)
-      break;
-  if (v >= dest)
-    v = ~0;
-  if (i.tm.extension_opcode != None)
+  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
     {
-      if (dest != source)
-	v = dest;
-      dest = ~0;
+      v = dest;
+      dest-- ;
     }
-  gas_assert (source < dest);
-  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
-      && source != op)
+  else if (i.tm.opcode_modifier.vexvvvv == VexVVVV_SRC)
     {
-      unsigned int tmp = source;
+      v = source + 1;
+      for (v = source + 1; v < dest; ++v)
+	if (v != reg_slot)
+	  break;
+      if (i.tm.extension_opcode != None)
+	{
+	  if (dest != source)
+	    v = dest;
+	  dest = ~0;
+	}
+      gas_assert (source < dest);
+      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
+	  && source != op)
+	{
+	  unsigned int tmp = source;
 
-      source = v;
-      v = tmp;
+	  source = v;
+	  v = tmp;
+	}
     }
+  else
+    v = ~0;
 
   if (v < MAX_OPERANDS)
     {
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index ad5b2e3cb5c..9060b697c0d 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -28,4 +28,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+62 4c 7f[ 	]+\(bad\)
 [ 	]*[a-f0-9]+:[ 	]+28 f8[ 	]+sub    %bh,%al
 [ 	]*[a-f0-9]+:[ 	]+bc 87 23 01 00[ 	]+mov    \$0x12387,%esp
+[ 	]*[a-f0-9]+:[ 	]+00 ff[ 	]+add    %bh,%bh
+[ 	]*[a-f0-9]+:[ 	]+62 f4 ec[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+08 ff[ 	]+or     %bh,%bh
+[ 	]*[a-f0-9]+:[ 	]+c0[ 	]+.byte 0xc0
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index 9bb06d9f494..d4f4cb72e6e 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -10,7 +10,7 @@ _start:
         .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
         .byte 0xff, 0xff
         #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
-	#(illegal value).
+        #(illegal value).
         .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
         .byte 0xff
         #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
@@ -27,3 +27,6 @@ _start:
         .byte 0xff
         #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[23:22] == 1 (illegal value).
         .byte 0x62, 0x4c, 0x7f, 0x28, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00
+        .byte 0xff
+        #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
+        .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.d b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
new file mode 100644
index 00000000000..c2cf0825e11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
@@ -0,0 +1,161 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX NDD instructions with evex prefix encoding
+#source: x86-64-apx-ndd.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 e4 18 ff c0\s+inc    %rax,%rbx
+\s*[a-f0-9]+:\s*62 dc bc 18 ff c7\s+inc    %r31,%r8
+\s*[a-f0-9]+:\s*62 dc fc 10 ff c7\s+inc    %r31,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8\s+add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8\s+add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8\s+add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8\s+add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8\s+add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8\s+add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 5c fc 10 03 c7\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 38\s+add    %r31,\(%r8\),%r16
+\s*[a-f0-9]+:\s*62 5c fc 10 03 07\s+add    \(%r31\),%r8,%r16
+\s*[a-f0-9]+:\s*62 5c f8 10 03 84 07 90 90 00 00\s+add\s+0x9090\(%r31,%r16,1\),%r8,%r16
+\s*[a-f0-9]+:\s*62 44 f8 10 01 3c c0\s+add    %r31,\(%r8,%r16,8\),%r16
+\s*[a-f0-9]+:\s*62 d4 74 10 80 c5 34\s+add    \$0x34,%r13b,%r17b
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 04 83 11\s+addl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c0 34 12\s+add    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 c7 33 44 34 12\s+add    \$0x12344433,%r15,%r16
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 04 8f 33 44 34 12\s+addq   \$0x12344433,\(%r15,%rcx,4\),%r16
+\s*[a-f0-9]+:\s*62 f4 bc 18 81 c0 11 22 33 f4\s+add    \$0xfffffffff4332211,%rax,%r8
+\s*[a-f0-9]+:\s*62 f4 f4 10 ff c8    	dec    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 fe 0c 27 	decb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d0    	not    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 14 27 	notb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d8    	neg    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 1c 27 	negb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 2a 04 07 	sub    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 2b 04 07 	sub    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 2c 83 11 	subl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e8 34 12 	sub    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 1a 04 07 	sbb    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 1b 04 07 	sbb    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 1c 83 11 	sbbl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d8 34 12 	sbb    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 12 04 07 	adc    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 13 04 07 	adc    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 14 83 11 	adcl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d0 34 12 	adc    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 0a 04 07 	or     \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 0b 04 07 	or     \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 0c 83 11 	orl    \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c8 34 12 	or     \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 32 04 07 	xor    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 33 04 07 	xor    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 34 83 11 	xorl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 f0 34 12 	xor    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 22 04 07 	and    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 23 04 07 	and    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 24 83 11 	andl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e0 34 12 	and    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 08    	rorb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 cc 02 	ror    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 08 02 	rorl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 08    	rorw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c8    	ror    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 0c 83 	rorw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 00    	rolb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 c4 02 	rol    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 00 02 	roll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 00    	rolw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c0    	rol    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 04 83 	rolw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 18    	rcrb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 dc 02 	rcr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 18 02 	rcrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 18    	rcrw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d8    	rcr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 1c 83 	rcrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 10    	rclb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 d4 02 	rcl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 10 02 	rcll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 10    	rclw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d0    	rcl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 14 83 	rclw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 38    	sarb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 fc 02 	sar    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 38 02 	sarl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 38    	sarw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 f8    	sar    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 3c 83 	sarw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 28    	shrb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 ec 02 	shr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 28 02 	shrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 28    	shrw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e8    	shr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 2c 83 	shrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 84 10 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 54 05 10 24 c4 02 	shld   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 74 04 10 24 38 02 	shld   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 74 05 10 a5 08    	shld   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 a5 e0\s+shld   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 a5 2c 83\s+shld   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 84 10 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 54 05 10 2c c4 02 	shrd   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 74 04 10 2c 38 02 	shrd   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 74 05 10 ad 08\s+shrd   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 ad e0\s+shrd   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 ad 2c 83\s+shrd   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 54 6d 10 66 c7    	adcx   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 69 10 66 04 3f 	adcx   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f 	adcx   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 54 6e 10 66 c7    	adox   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 6a 10 66 04 3f 	adox   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f 	adox   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 b4 b0 10 af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx,%r25
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.s b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
new file mode 100644
index 00000000000..ee04d2f140a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
@@ -0,0 +1,154 @@
+# Check 64bit APX NDD instructions with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+inc    %rax,%rbx
+inc    %r31,%r8
+inc    %r31,%r16
+add    %r31b,%r8b,%r16b
+addb    %r31b,%r8b,%r16b
+add    %r31,%r8,%r16
+addq    %r31,%r8,%r16
+add    %r31d,%r8d,%r16d
+addl    %r31d,%r8d,%r16d
+add    %r31w,%r8w,%r16w
+addw    %r31w,%r8w,%r16w
+{store} add    %r31,%r8,%r16
+{load}  add    %r31,%r8,%r16
+add    %r31,(%r8),%r16
+add    (%r31),%r8,%r16
+add    0x9090(%r31,%r16,1),%r8,%r16
+add    %r31,(%r8,%r16,8),%r16
+add    $0x34,%r13b,%r17b
+addl   $0x11,(%r19,%rax,4),%r20d
+add    $0x1234,%ax,%r30w
+add    $0x12344433,%r15,%r16
+addq   $0x12344433,(%r15,%rcx,4),%r16
+add    $0xfffffffff4332211,%rax,%r8
+dec    %rax,%r17
+decb   (%r31,%r12,1),%r8b
+not    %rax,%r17
+notb   (%r31,%r12,1),%r8b
+neg    %rax,%r17
+negb   (%r31,%r12,1),%r8b
+sub    %r15b,%r17b,%r18b
+sub    %r15d,(%r8),%r18d
+sub    (%r15,%rax,1),%r16b,%r8b
+sub    (%r15,%rax,1),%r16w,%r8w
+subl   $0x11,(%r19,%rax,4),%r20d
+sub    $0x1234,%ax,%r30w
+sbb    %r15b,%r17b,%r18b
+sbb    %r15d,(%r8),%r18d
+sbb    (%r15,%rax,1),%r16b,%r8b
+sbb    (%r15,%rax,1),%r16w,%r8w
+sbbl   $0x11,(%r19,%rax,4),%r20d
+sbb    $0x1234,%ax,%r30w
+adc    %r15b,%r17b,%r18b
+adc    %r15d,(%r8),%r18d
+adc    (%r15,%rax,1),%r16b,%r8b
+adc    (%r15,%rax,1),%r16w,%r8w
+adcl   $0x11,(%r19,%rax,4),%r20d
+adc    $0x1234,%ax,%r30w
+or     %r15b,%r17b,%r18b
+or     %r15d,(%r8),%r18d
+or     (%r15,%rax,1),%r16b,%r8b
+or     (%r15,%rax,1),%r16w,%r8w
+orl    $0x11,(%r19,%rax,4),%r20d
+or     $0x1234,%ax,%r30w
+xor    %r15b,%r17b,%r18b
+xor    %r15d,(%r8),%r18d
+xor    (%r15,%rax,1),%r16b,%r8b
+xor    (%r15,%rax,1),%r16w,%r8w
+xorl   $0x11,(%r19,%rax,4),%r20d
+xor    $0x1234,%ax,%r30w
+and    %r15b,%r17b,%r18b
+and    %r15d,(%r8),%r18d
+and    (%r15,%rax,1),%r16b,%r8b
+and    (%r15,%rax,1),%r16w,%r8w
+andl   $0x11,(%r19,%rax,4),%r20d
+and    $0x1234,%ax,%r30w
+rorb   (%rax),%r31b
+ror    $0x2,%r12b,%r31b
+rorl   $0x2,(%rax),%r31d
+rorw   (%rax),%r31w
+ror    %cl,%r16b,%r8b
+rorw   %cl,(%r19,%rax,4),%r31w
+rolb   (%rax),%r31b
+rol    $0x2,%r12b,%r31b
+roll   $0x2,(%rax),%r31d
+rolw   (%rax),%r31w
+rol    %cl,%r16b,%r8b
+rolw   %cl,(%r19,%rax,4),%r31w
+rcrb   (%rax),%r31b
+rcr    $0x2,%r12b,%r31b
+rcrl   $0x2,(%rax),%r31d
+rcrw   (%rax),%r31w
+rcr    %cl,%r16b,%r8b
+rcrw   %cl,(%r19,%rax,4),%r31w
+rclb   (%rax),%r31b
+rcl    $0x2,%r12b,%r31b
+rcll   $0x2,(%rax),%r31d
+rclw   (%rax),%r31w
+rcl    %cl,%r16b,%r8b
+rclw   %cl,(%r19,%rax,4),%r31w
+shlb   (%rax),%r31b
+shl    $0x2,%r12b,%r31b
+shll   $0x2,(%rax),%r31d
+shlw   (%rax),%r31w
+shl    %cl,%r16b,%r8b
+shlw   %cl,(%r19,%rax,4),%r31w
+sarb   (%rax),%r31b
+sar    $0x2,%r12b,%r31b
+sarl   $0x2,(%rax),%r31d
+sarw   (%rax),%r31w
+sar    %cl,%r16b,%r8b
+sarw   %cl,(%r19,%rax,4),%r31w
+shlb   (%rax),%r31b
+shl    $0x2,%r12b,%r31b
+shll   $0x2,(%rax),%r31d
+shlw   (%rax),%r31w
+shl    %cl,%r16b,%r8b
+shlw   %cl,(%r19,%rax,4),%r31w
+shrb   (%rax),%r31b
+shr    $0x2,%r12b,%r31b
+shrl   $0x2,(%rax),%r31d
+shrw   (%rax),%r31w
+shr    %cl,%r16b,%r8b
+shrw   %cl,(%r19,%rax,4),%r31w
+shld   $0x1,%r12,(%rax),%r31
+shld   $0x2,%r8w,%r12w,%r31w
+shld   $0x2,%r15d,(%rax),%r31d
+shld   %cl,%r9w,(%rax),%r31w
+shld   %cl,%r12,%r16,%r8
+shld   %cl,%r13w,(%r19,%rax,4),%r31w
+shrd   $0x1,%r12,(%rax),%r31
+shrd   $0x2,%r8w,%r12w,%r31w
+shrd   $0x2,%r15d,(%rax),%r31d
+shrd   %cl,%r9w,(%rax),%r31w
+shrd   %cl,%r12,%r16,%r8
+shrd   %cl,%r13w,(%r19,%rax,4),%r31w
+adcx   %r15d,%r8d,%r18d
+adcx   (%r15,%r31,1),%r8d,%r18d
+adcx   (%r15,%r31,1),%r8
+adox   %r15d,%r8d,%r18d
+adox   (%r15,%r31,1),%r8d,%r18d
+adox   (%r15,%r31,1),%r8
+cmovo  0x90909090(%eax),%edx,%r8d
+cmovno 0x90909090(%eax),%edx,%r8d
+cmovb  0x90909090(%eax),%edx,%r8d
+cmovae 0x90909090(%eax),%edx,%r8d
+cmove  0x90909090(%eax),%edx,%r8d
+cmovne 0x90909090(%eax),%edx,%r8d
+cmovbe 0x90909090(%eax),%edx,%r8d
+cmova  0x90909090(%eax),%edx,%r8d
+cmovs  0x90909090(%eax),%edx,%r8d
+cmovns 0x90909090(%eax),%edx,%r8d
+cmovp  0x90909090(%eax),%edx,%r8d
+cmovnp 0x90909090(%eax),%edx,%r8d
+cmovl  0x90909090(%eax),%edx,%r8d
+cmovge 0x90909090(%eax),%edx,%r8d
+cmovle 0x90909090(%eax),%edx,%r8d
+cmovg  0x90909090(%eax),%edx,%r8d
+imul   0x90909(%eax),%edx,%r8d
+imul   0x909(%rax,%r31,8),%rdx,%r25
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 708c22b5899..1d399ffa949 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -137,6 +137,48 @@ Disassembly of section .text:
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
  +[a-f0-9]+:	31 07                	xor    %eax,\(%rdi\)
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 2a cf    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 1a cf    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 22 cf    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 0a cf    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 32 cf    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 12 cf    	adc    %r15b,%r17b,%r18b
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
  +[a-f0-9]+:	b8 45 03 00 00       	mov    \$0x345,%eax
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 29a0c3368fc..e5b3a0d625d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -134,6 +134,49 @@ _start:
 	{load} xor (%rdi), %eax
 	{store} xor %eax, (%rdi)
 	{store} xor (%rdi), %eax
+	{load}  add    %r31,(%r8),%r16
+	{load}	add    (%r8),%r31,%r16
+	{store} add    %r31,(%r8),%r16
+	{store}	add    (%r8),%r31,%r16
+	{load} 	sub    %r15d,(%r8),%r18d
+	{load}	sub    (%r8),%r15d,%r18d
+	{store} sub    %r15d,(%r8),%r18d
+	{store} sub    (%r8),%r15d,%r18d
+	{load} 	sbb    %r15d,(%r8),%r18d
+	{load}	sbb    (%r8),%r15d,%r18d
+	{store} sbb    %r15d,(%r8),%r18d
+	{store} sbb    (%r8),%r15d,%r18d
+	{load} 	and    %r15d,(%r8),%r18d
+	{load}	and    (%r8),%r15d,%r18d
+	{store} and    %r15d,(%r8),%r18d
+	{store} and    (%r8),%r15d,%r18d
+	{load} 	or     %r15d,(%r8),%r18d
+	{load}	or     (%r8),%r15d,%r18d
+	{store} or     %r15d,(%r8),%r18d
+	{store} or     (%r8),%r15d,%r18d
+	{load} 	xor    %r15d,(%r8),%r18d
+	{load}	xor    (%r8),%r15d,%r18d
+	{store} xor    %r15d,(%r8),%r18d
+	{store} xor    (%r8),%r15d,%r18d
+	{load} 	adc    %r15d,(%r8),%r18d
+	{load}	adc    (%r8),%r15d,%r18d
+	{store} adc    %r15d,(%r8),%r18d
+	{store} adc    (%r8),%r15d,%r18d
+
+	{store} add    %r31,%r8,%r16
+	{load}  add    %r31,%r8,%r16
+	{store} sub    %r15b,%r17b,%r18b
+	{load}	sub    %r15b,%r17b,%r18b
+	{store}	sbb    %r15b,%r17b,%r18b
+	{load}	sbb    %r15b,%r17b,%r18b
+	{store}	and    %r15b,%r17b,%r18b
+	{load}	and    %r15b,%r17b,%r18b
+	{store}	or     %r15b,%r17b,%r18b
+	{load}	or     %r15b,%r17b,%r18b
+	{store}	xor    %r15b,%r17b,%r18b
+	{load}	xor    %r15b,%r17b,%r18b
+	{store}	adc    %r15b,%r17b,%r18b
+	{load}	adc    %r15b,%r17b,%r18b
 
 	.irp m, mov, adc, add, and, cmp, or, sbb, sub, test, xor
 	\m	$0x12, %al
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index dc1fa8dddb9..07cb716d2a5 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -367,6 +367,7 @@ run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
+run_dump_test "x86-64-apx-ndd"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index e8f32324ade..09d8c10bdfd 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -338,10 +338,6 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
-  /* PREFIX_EVEX_MAP4_66 */
-  {
-    { "wrssK",	{ M, Gdq }, 0 },
-  },
   /* PREFIX_EVEX_MAP4_D8 */
   {
     { "sha1nexte", { XM, EXxmm }, 0 },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index c3b4f083346..b75558c40ca 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -56,6 +56,36 @@
     { "blsmskS",	{ VexGdq, Edq }, 0 },
     { "blsiS",		{ VexGdq, Edq }, 0 },
   },
+  /* REG_EVEX_MAP4_80 */
+  {
+    { "addA",	{ VexGb, Eb, Ib }, 0 },
+    { "orA",	{ VexGb, Eb, Ib }, 0 },
+    { "adcA",	{ VexGb, Eb, Ib }, 0 },
+    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
+    { "andA",	{ VexGb, Eb, Ib }, 0 },
+    { "subA",	{ VexGb, Eb, Ib }, 0 },
+    { "xorA",	{ VexGb, Eb, Ib }, 0 },
+  },
+  /* REG_EVEX_MAP4_81 */
+  {
+    { "addQ",	{ VexGv, Ev, Iv }, 0 },
+    { "orQ",	{ VexGv, Ev, Iv }, 0 },
+    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
+    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
+    { "andQ",	{ VexGv, Ev, Iv }, 0 },
+    { "subQ",	{ VexGv, Ev, Iv }, 0 },
+    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
+  },
+  /* REG_EVEX_MAP4_83 */
+  {
+    { "addQ",	{ VexGv, Ev, sIb }, 0 },
+    { "orQ",	{ VexGv, Ev, sIb }, 0 },
+    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
+    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
+    { "andQ",	{ VexGv, Ev, sIb }, 0 },
+    { "subQ",	{ VexGv, Ev, sIb }, 0 },
+    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
+  },
   /* REG_EVEX_MAP4_D8_PREFIX_1 */
   {
     { "aesencwide128kl",	{ M }, 0 },
@@ -63,3 +93,27 @@
     { "aesencwide256kl",	{ M }, 0 },
     { "aesdecwide256kl",	{ M }, 0 },
   },
+  /* REG_EVEX_MAP4_F6 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notA",	{ VexGb, Eb }, 0 },
+    { "negA",	{ VexGb, Eb }, 0 },
+  },
+  /* REG_EVEX_MAP4_F7 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notQ",	{ VexGv, Ev }, 0 },
+    { "negQ",	{ VexGv, Ev }, 0 },
+  },
+  /* REG_EVEX_MAP4_FE */
+  {
+    { "incA",   { VexGb ,Eb }, 0 },
+    { "decA",   { VexGb ,Eb }, 0 },
+  },
+  /* REG_EVEX_MAP4_FF */
+  {
+    { "incQ",   { VexGv ,Ev }, 0 },
+    { "decQ",   { VexGv ,Ev }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 65a2cbeaeb2..ef752f417c5 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -875,64 +875,64 @@ static const struct dis386 evex_table[][256] = {
   /* EVEX_MAP4_ */
   {
     /* 00 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "addB",             { VexGb, Eb, Gb }, 0  },
+    { "addS",             { VexGv, Ev, Gv }, 0 },
+    { "addB",             { VexGb, Gb, EbS }, 0 },
+    { "addS",             { VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 08 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "orB",		{ VexGb, Eb, Gb }, 0 },
+    { "orS",		{ VexGv, Ev, Gv }, 0 },
+    { "orB",		{ VexGb, Gb, EbS }, 0 },
+    { "orS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 10 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "adcB",		{ VexGb, Eb, Gb }, 0 },
+    { "adcS",		{ VexGv, Ev, Gv }, 0 },
+    { "adcB",		{ VexGb, Gb, EbS }, 0 },
+    { "adcS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 18 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "sbbB",		{ VexGb, Eb, Gb }, 0 },
+    { "sbbS",		{ VexGv, Ev, Gv }, 0 },
+    { "sbbB",		{ VexGb, Gb, EbS }, 0 },
+    { "sbbS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 20 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "andB",		{ VexGb, Eb, Gb }, 0 },
+    { "andS",		{ VexGv, Ev, Gv }, 0 },
+    { "andB",		{ VexGb, Gb, EbS }, 0 },
+    { "andS",		{ VexGv, Gv, EvS }, 0 },
+    { "shldS",		{ VexGv, Ev, Gv, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 28 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "subB",		{ VexGb, Eb, Gb }, 0 },
+    { "subS",		{ VexGv, Ev, Gv }, 0 },
+    { "subB",		{ VexGb, Gb, EbS }, 0 },
+    { "subS",		{ VexGv, Gv, EvS }, 0 },
+    { "shrdS",		{ VexGv, Ev, Gv, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 30 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "xorB",		{ VexGb, Eb, Gb }, 0 },
+    { "xorS",		{ VexGv, Ev, Gv }, 0 },
+    { "xorB",		{ VexGb, Gb, EbS }, 0 },
+    { "xorS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 40 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
     /* 48 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },
     /* 50 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -989,7 +989,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { "wrussK",	{ M, Gdq }, PREFIX_DATA },
-    { PREFIX_TABLE (PREFIX_EVEX_MAP4_66) },
+    { PREFIX_TABLE (PREFIX_0F38F6) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1019,10 +1019,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 80 */
+    { REG_TABLE (REG_EVEX_MAP4_80) },
+    { REG_TABLE (REG_EVEX_MAP4_81) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_83) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1060,7 +1060,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "shldS",		{ VexGv, Ev, Gv, CL }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     /* A8 */
@@ -1069,9 +1069,9 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
+    { "shrdS",		{ VexGv, Ev, Gv, CL }, 0 },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "imulS",		{ VexGv, Gv, Ev }, 0 },
     /* B0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1091,8 +1091,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* C0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_C0) },
+    { REG_TABLE (REG_C1) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1109,10 +1109,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* D0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_D0) },
+    { REG_TABLE (REG_D1) },
+    { REG_TABLE (REG_D2) },
+    { REG_TABLE (REG_D3) },
     { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1151,8 +1151,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_F6) },
+    { REG_TABLE (REG_EVEX_MAP4_F7) },
     /* F8 */
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
     { MOD_TABLE (MOD_EVEX_MAP4_F9) },
@@ -1160,8 +1160,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_FC) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_FE) },
+    { REG_TABLE (REG_EVEX_MAP4_FF) },
   },
   /* EVEX_MAP5_ */
   {
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index ef431087ba5..0de3959cf80 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -573,6 +573,8 @@ fetch_error (const instr_info *ins)
 #define VexGatherD { OP_VEX, vex_vsib_d_w_dq_mode }
 #define VexGatherQ { OP_VEX, vex_vsib_q_w_dq_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexGb { OP_VEX, b_mode }
+#define VexGv { OP_VEX, v_mode }
 #define VexTmm { OP_VEX, tmm_mode }
 #define XMVexI4 { OP_REG_VexI4, x_mode }
 #define XMVexScalarI4 { OP_REG_VexI4, scalar_mode }
@@ -885,7 +887,14 @@ enum
   REG_EVEX_0F38C6_L_2,
   REG_EVEX_0F38C7_L_2,
   REG_EVEX_0F38F3_L_0,
-  REG_EVEX_MAP4_D8_PREFIX_1
+  REG_EVEX_MAP4_80,
+  REG_EVEX_MAP4_81,
+  REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_D8_PREFIX_1,
+  REG_EVEX_MAP4_F6,
+  REG_EVEX_MAP4_F7,
+  REG_EVEX_MAP4_FE,
+  REG_EVEX_MAP4_FF,
 };
 
 enum
@@ -1171,7 +1180,6 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
-  PREFIX_EVEX_MAP4_66,
   PREFIX_EVEX_MAP4_D8,
   PREFIX_EVEX_MAP4_DA,
   PREFIX_EVEX_MAP4_DB,
@@ -2616,25 +2624,25 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_C0 */
   {
-    { "rolA",	{ Eb, Ib }, 0 },
-    { "rorA",	{ Eb, Ib }, 0 },
-    { "rclA",	{ Eb, Ib }, 0 },
-    { "rcrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "shrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "sarA",	{ Eb, Ib }, 0 },
+    { "rolA",	{ VexGb, Eb, Ib }, 0 },
+    { "rorA",	{ VexGb, Eb, Ib }, 0 },
+    { "rclA",	{ VexGb, Eb, Ib }, 0 },
+    { "rcrA",	{ VexGb, Eb, Ib }, 0 },
+    { "shlA",	{ VexGb, Eb, Ib }, 0 },
+    { "shrA",	{ VexGb, Eb, Ib }, 0 },
+    { "shlA",	{ VexGb, Eb, Ib }, 0 },
+    { "sarA",	{ VexGb, Eb, Ib }, 0 },
   },
   /* REG_C1 */
   {
-    { "rolQ",	{ Ev, Ib }, 0 },
-    { "rorQ",	{ Ev, Ib }, 0 },
-    { "rclQ",	{ Ev, Ib }, 0 },
-    { "rcrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "shrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "sarQ",	{ Ev, Ib }, 0 },
+    { "rolQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rorQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rclQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rcrQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shrQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
+    { "sarQ",	{ VexGv, Ev, Ib }, 0 },
   },
   /* REG_C6 */
   {
@@ -2660,47 +2668,47 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_D0 */
   {
-    { "rolA",	{ Eb, I1 }, 0 },
-    { "rorA",	{ Eb, I1 }, 0 },
-    { "rclA",	{ Eb, I1 }, 0 },
-    { "rcrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "shrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "sarA",	{ Eb, I1 }, 0 },
+    { "rolA",	{ VexGb, Eb, I1 }, 0 },
+    { "rorA",	{ VexGb, Eb, I1 }, 0 },
+    { "rclA",	{ VexGb, Eb, I1 }, 0 },
+    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
+    { "shlA",	{ VexGb, Eb, I1 }, 0 },
+    { "shrA",	{ VexGb, Eb, I1 }, 0 },
+    { "shlA",	{ VexGb, Eb, I1 }, 0 },
+    { "sarA",	{ VexGb, Eb, I1 }, 0 },
   },
   /* REG_D1 */
   {
-    { "rolQ",	{ Ev, I1 }, 0 },
-    { "rorQ",	{ Ev, I1 }, 0 },
-    { "rclQ",	{ Ev, I1 }, 0 },
-    { "rcrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "shrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "sarQ",	{ Ev, I1 }, 0 },
+    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
+    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
   },
   /* REG_D2 */
   {
-    { "rolA",	{ Eb, CL }, 0 },
-    { "rorA",	{ Eb, CL }, 0 },
-    { "rclA",	{ Eb, CL }, 0 },
-    { "rcrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "shrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "sarA",	{ Eb, CL }, 0 },
+    { "rolA",	{ VexGb, Eb, CL }, 0 },
+    { "rorA",	{ VexGb, Eb, CL }, 0 },
+    { "rclA",	{ VexGb, Eb, CL }, 0 },
+    { "rcrA",	{ VexGb, Eb, CL }, 0 },
+    { "shlA",	{ VexGb, Eb, CL }, 0 },
+    { "shrA",	{ VexGb, Eb, CL }, 0 },
+    { "shlA",	{ VexGb, Eb, CL }, 0 },
+    { "sarA",	{ VexGb, Eb, CL }, 0 },
   },
   /* REG_D3 */
   {
-    { "rolQ",	{ Ev, CL }, 0 },
-    { "rorQ",	{ Ev, CL }, 0 },
-    { "rclQ",	{ Ev, CL }, 0 },
-    { "rcrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "shrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "sarQ",	{ Ev, CL }, 0 },
+    { "rolQ",	{ VexGv, Ev, CL }, 0 },
+    { "rorQ",	{ VexGv, Ev, CL }, 0 },
+    { "rclQ",	{ VexGv, Ev, CL }, 0 },
+    { "rcrQ",	{ VexGv, Ev, CL }, 0 },
+    { "shlQ",	{ VexGv, Ev, CL }, 0 },
+    { "shrQ",	{ VexGv, Ev, CL }, 0 },
+    { "shlQ",	{ VexGv, Ev, CL }, 0 },
+    { "sarQ",	{ VexGv, Ev, CL }, 0 },
   },
   /* REG_F6 */
   {
@@ -3646,8 +3654,8 @@ static const struct dis386 prefix_table[][4] = {
   /* PREFIX_0F38F6 */
   {
     { "wrssK",	{ M, Gdq }, 0 },
-    { "adoxS",	{ Gdq, Edq}, 0 },
-    { "adcxS",	{ Gdq, Edq}, 0 },
+    { "adoxS",	{ VexGdq, Gdq, Edq}, 0 },
+    { "adcxS",	{ VexGdq, Gdq, Edq}, 0 },
     { Bad_Opcode },
   },
 
@@ -9061,6 +9069,15 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	  ins->rex &= ~REX_B;
 	  ins->rex2 &= ~REX_R;
 	}
+      if (ins->evex_type == evex_from_legacy)
+	{
+	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
+	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
+	  if (!ins->vex.b && (ins->vex.register_specifier
+				  || !ins->vex.v))
+	    return &bad_opcode;
+	  ins->rex |= REX_OPCODE;
+	}
 
       ins->need_vex = 4;
 
@@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	return &err_opcode;
 
       /* Set vector length.  */
-      if (ins->modrm.mod == 3 && ins->vex.b)
+      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type == evex_default)
 	ins->vex.length = 512;
       else
 	{
@@ -9530,6 +9547,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 		    {
 		      oappend (&ins, "{bad}");
 		      continue;
+
 		    }
 
 		  /* Instructions with a mask register destination allow for
@@ -9553,7 +9571,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 
 	  /* Check whether rounding control was enabled for an insn not
 	     supporting it.  */
-	  if (ins.modrm.mod == 3 && ins.vex.b
+	  if (ins.modrm.mod == 3 && ins.vex.b && ins.evex_type == evex_default
 	      && !(ins.evex_used & EVEX_b_used))
 	    {
 	      for (i = 0; i < MAX_OPERANDS; ++i)
@@ -11013,7 +11031,7 @@ print_displacement (instr_info *ins, bfd_signed_vma val)
 static void
 intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.b)
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       if (!ins->vex.no_broadcast)
 	switch (bytemode)
@@ -11946,7 +11964,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	  print_operand_value (ins, disp & 0xffff, dis_style_text);
 	}
     }
-  if (ins->vex.b)
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       ins->evex_used |= EVEX_b_used;
 
@@ -13307,6 +13325,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (!ins->need_vex)
     return true;
 
+  if (ins->evex_type == evex_from_legacy)
+    {
+      ins->evex_used |= EVEX_b_used;
+      /* Here vex.b is treated as "EVEX.ND.  */
+      if (!ins->vex.b)
+	return true;
+    }
+
   reg = ins->vex.register_specifier;
   ins->vex.register_specifier = 0;
   if (ins->address_mode != mode_64bit)
@@ -13398,12 +13424,19 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  names = att_names_xmm;
 	  ins->evex_used |= EVEX_len_used;
 	  break;
+	case v_mode:
 	case dq_mode:
 	  if (ins->rex & REX_W)
 	    names = att_names64;
+	  else if (bytemode == v_mode
+		   && !(sizeflag & DFLAG))
+	    names = att_names16;
 	  else
 	    names = att_names32;
 	  break;
+	case b_mode:
+	  names = att_names8rex;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 3ab2362a3cc..2e6ae807bbe 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -473,6 +473,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
+  BITFIELD (NF),
 };
 
 #define CLASS(n) #n, n
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index d7d28bf3d93..bb826bbdb34 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -636,7 +636,10 @@ enum
   /* How to encode VEX.vvvv:
      0: VEX.vvvv must be 1111b.
      1: VEX.vvvv encodes one of the register operands.
+     2: VEX.vvvv encodes as the dest register operands.
    */
+#define VexVVVV_SRC   1
+#define VexVVVV_DST   2
   VexVVVV,
   /* How the VEX.W bit is used:
      0: Set by the REX.W bit.
@@ -749,6 +752,9 @@ enum
   /* egprs (r16-r31) on instruction illegal.  */
   NoEgpr,
 
+  /* No CSPAZO flags update indication.  */
+  NF,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -779,7 +785,7 @@ typedef struct i386_opcode_modifier
   unsigned int immext:1;
   unsigned int norex64:1;
   unsigned int vex:2;
-  unsigned int vexvvvv:1;
+  unsigned int vexvvvv:2;
   unsigned int vexw:2;
   unsigned int opcodeprefix:2;
   unsigned int sib:3;
@@ -797,6 +803,7 @@ typedef struct i386_opcode_modifier
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
   unsigned int noegpr:1;
+  unsigned int nf:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index bb42270483b..b1f7491e7d7 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -138,6 +138,9 @@
 #define Vsz512 Vsz=VSZ512
 
 #define APX_F APX_F|x64
+#define VexVVVVSrc  VexVVVV=VexVVVV_SRC
+#define VexVVVVDest VexVVVV=VexVVVV_DST
+
 
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
@@ -190,6 +193,8 @@ mov, 0xf21, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De
 mov, 0xf21, x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug, Reg64 }
 mov, 0xf24, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test, Reg32 }
 
+// Move after swapping the bytes
+movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movbe, 0x60, Movbe|APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -290,22 +295,36 @@ add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 
 inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
 inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 
 sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
+sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
 dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -320,16 +339,25 @@ and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+and, 0x20, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+or, 0x8, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+xor, 0x30, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 // clr with 1 operand is really xor with 2 operands.
 clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
@@ -338,10 +366,19 @@ adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg
 adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+not, 0xf6/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 aaa, 0x37, No64, NoSuf, {}
 aas, 0x3f, No64, NoSuf, {}
@@ -375,6 +412,7 @@ cqto, 0x99, x64, Size64|NoSuf, {}
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
+imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 // imul with 2 operands mimics imul with 3 by putting the register in
@@ -392,49 +430,95 @@ rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|
 rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|NF|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|NF|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|NF|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|NF|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xc0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd2/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd2/3, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xc0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd2/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xc0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd2/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shld, 0x24, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shrd, 0x2c, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Control transfer instructions.
 call, 0xe8, No64, JumpDword|DefaultSize|No_bSuf|No_sSuf|No_qSuf|BNDPrefixOk, { Disp16|Disp32 }
@@ -940,6 +1024,7 @@ ud2b, 0xfb9, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U
 ud0, 0xfff, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+cmov<cc>, 0x4<cc:opc>, CMOV|APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 
 fcmovb, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
 fcmovnae, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
@@ -985,12 +1070,12 @@ pause, 0xf390, i186, NoSuf, {}
 // MMX/SSE2 instructions.
 
 <mmx:cpu:pfx:attr:reg:mem, +
-    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
+    $avx:AVX:66:Vex128|VexVVVVSrc|VexW0|SSE2AVX:RegXMM:Xmmword, +
     $sse:SSE2:66::RegXMM:Xmmword, +
     $mmx:MMX:::RegMMX:Qword>
 
 <sse2:cpu:attr:scal:vvvv, +
-    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
+    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVVSrc, +
     $sse:SSE2:::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
@@ -1073,7 +1158,7 @@ pxor<mmx>, 0x<mmx:pfx>0fef, <mmx:cpu>, Modrm|<mmx:attr>|C|NoSuf, { <mmx:reg>|<mm
 // SSE instructions.
 
 <sse:cpu:attr:scal:vvvv, +
-    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
+    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVVSrc, +
     $sse:SSE:::>
 <frel:imm:comm, eq:0:C, lt:1:, le:2:, unord:3:C, neq:4:C, nlt:5:, nle:6:, ord:7:C>
 
@@ -1089,8 +1174,8 @@ comiss<sse>, 0x0f2f, <sse:cpu>, Modrm|<sse:scal>|NoSuf, { Dword|Unspecified|Base
 cvtpi2ps, 0xf2a, SSE, Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
 cvtps2pi, 0xf2d, SSE, Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
 cvtsi2ss<sse>, 0xf30f2a, <sse:cpu>|No64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
-cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVVSrc|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, SSE|x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, SSE|x64, Modrm|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtss2si, 0xf32d, AVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
@@ -1108,11 +1193,11 @@ minps<sse>, 0x0f5d, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|NoSuf, { RegXMM|Unspe
 minss<sse>, 0xf30f5d, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movaps<sse>, 0x0f28, <sse:cpu>, D|Modrm|<sse:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhlps<sse>, 0x0f12, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|NoSuf, { RegXMM, RegXMM }
-movhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movhps, 0x17, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
 movhps, 0xf16, SSE, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movlhps<sse>, 0x0f16, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|NoSuf, { RegXMM, RegXMM }
-movlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movlps, 0x13, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
 movlps, 0xf12, SSE, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskps<sse>, 0x0f50, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { RegXMM, Reg32|Reg64 }
@@ -1120,7 +1205,7 @@ movntps<sse>, 0x0f2b, <sse:cpu>, Modrm|<sse:attr>|NoSuf, { RegXMM, Xmmword|Unspe
 movntq, 0xfe7, SSE|3dnowA, Modrm|NoSuf, { RegMMX, Qword|Unspecified|BaseIndex }
 movntdq<sse2>, 0x660fe7, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
 movss, 0xf310, AVX, D|Modrm|VexLIG|Space0F|VexW0|NoSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
-movss, 0xf310, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
+movss, 0xf310, AVX, D|Modrm|VexLIG|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
 movss, 0xf30f10, SSE, D|Modrm|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movups<sse>, 0x0f10, <sse:cpu>, D|Modrm|<sse:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulps<sse>, 0x0f59, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1184,8 +1269,8 @@ cvtpi2pd, 0x660f2a, SSE2, Modrm|NoSuf, { RegMMX, RegXMM }
 cvtpi2pd, 0xf3e6, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtpi2pd, 0x660f2a, SSE2, Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd<sse2>, 0xf20f2a, <sse2:cpu>|No64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
-cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVVSrc|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf20f2a, SSE2|x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf20f2a, SSE2|x64, Modrm|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 divpd<sse2>, 0x660f5e, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1195,16 +1280,16 @@ maxsd<sse2>, 0xf20f5f, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|NoSuf, { Qword|
 minpd<sse2>, 0x660f5d, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minsd<sse2>, 0xf20f5d, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movapd<sse2>, 0x660f28, <sse2:cpu>, D|Modrm|<sse2:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6616, AVX, Modrm|Vex|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6616, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movhpd, 0x6617, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
 movhpd, 0x660f16, SSE2, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6612, AVX, Modrm|Vex|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6612, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movlpd, 0x6613, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
 movlpd, 0x660f12, SSE2, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskpd<sse2>, 0x660f50, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
 movsd, 0xf210, AVX, D|Modrm|VexLIG|Space0F|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movsd, 0xf210, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
+movsd, 0xf210, AVX, D|Modrm|VexLIG|Space0F|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
 movsd, 0xf20f10, SSE2, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, <sse2:cpu>, D|Modrm|<sse2:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulpd<sse2>, 0x660f59, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1255,7 +1340,7 @@ punpcklqdq<sse2>, 0x660f6c, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { R
 
 // SSE3 instructions.
 
-<sse3:cpu:attr:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexVVVV, $sse:SSE3::>
+<sse3:cpu:attr:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexVVVVSrc, $sse:SSE3::>
 
 addsubpd<sse3>, 0x660fd0, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 addsubps<sse3>, 0xf20fd0, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1333,7 +1418,7 @@ invpcid, 0xf3f2, APX_F|INVPCID, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecifie
 // SSSE3 instructions.
 
 <ssse3:cpu:pfx:attr:vvvv:reg:mem, +
-    $avx:AVX:66:Vex128|VexW0|SSE2AVX:VexVVVV:RegXMM:Xmmword, +
+    $avx:AVX:66:Vex128|VexW0|SSE2AVX:VexVVVVSrc:RegXMM:Xmmword, +
     $sse:SSSE3:66:::RegXMM:Xmmword, +
     $mmx:SSSE3::::RegMMX:Qword>
 
@@ -1354,12 +1439,12 @@ pabsd<ssse3>, 0x<ssse3:pfx>0f381e, <ssse3:cpu>, Modrm|<ssse3:attr>|NoSuf, { <sss
 
 // SSE4.1 instructions.
 
-<sse41:cpu:attr:scal:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, $sse:SSE4_1:::>
+<sse41:cpu:attr:scal:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVVSrc, $sse:SSE4_1:::>
 <sd:ppfx:spfx:opc:vexw:elem, s::f3:0:VexW0:Dword, d:66:f2:1:VexW1:Qword>
 
 blendp<sd><sse41>, 0x660f3a0c | <sd:opc>, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
-blendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex128|Space0F3A|VexVVVV|VexW0|NoSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
-blendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex128|Space0F3A|VexVVVV|VexW0|NoSuf|Implicit1stXmm0|SSE2AVX, { RegXMM|Unspecified|BaseIndex, RegXMM }
+blendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex128|Space0F3A|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
+blendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex128|Space0F3A|VexVVVVSrc|VexW0|NoSuf|Implicit1stXmm0|SSE2AVX, { RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x660f3814 | <sd:opc>, SSE4_1, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x660f3814 | <sd:opc>, SSE4_1, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 dpp<sd><sse41>, 0x660f3a40 | <sd:opc>, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1371,8 +1456,8 @@ insertps<sse41>, 0x660f3a21, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf,
 movntdqa<sse41>, 0x660f382a, <sse41:cpu>, Modrm|<sse41:attr>|NoSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
 mpsadbw<sse41>, 0x660f3a42, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 packusdw<sse41>, 0x660f382b, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pblendvb, 0x664c, AVX, Modrm|Vex128|Space0F3A|VexVVVV|VexW0|NoSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
-pblendvb, 0x664c, AVX, Modrm|Vex128|Space0F3A|VexVVVV|VexW0|NoSuf|Implicit1stXmm0|SSE2AVX, { RegXMM|Unspecified|BaseIndex, RegXMM }
+pblendvb, 0x664c, AVX, Modrm|Vex128|Space0F3A|VexVVVVSrc|VexW0|NoSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
+pblendvb, 0x664c, AVX, Modrm|Vex128|Space0F3A|VexVVVVSrc|VexW0|NoSuf|Implicit1stXmm0|SSE2AVX, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pblendvb, 0x660f3810, SSE4_1, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 pblendvb, 0x660f3810, SSE4_1, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pblendw<sse41>, 0x660f3a0e, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1386,7 +1471,7 @@ phminposuw<sse41>, 0x660f3841, <sse41:cpu>, Modrm|<sse41:attr>|NoSuf, { RegXMM|U
 pinsrb<sse41>, 0x660f3a20, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
 pinsrb<sse41>, 0x660f3a20, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM }
 pinsrd<sse41>, 0x660f3a22, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf|IgnoreSize, { Imm8, Reg32|Unspecified|BaseIndex, RegXMM }
-pinsrq, 0x6622, AVX|x64, Modrm|Vex|Space0F3A|VexVVVV|VexW1|NoSuf|SSE2AVX, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
+pinsrq, 0x6622, AVX|x64, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|NoSuf|SSE2AVX, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
 pinsrq, 0x660f3a22, SSE4_1|x64, Modrm|Size64|NoSuf, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
 pmaxsb<sse41>, 0x660f383c, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pmaxsd<sse41>, 0x660f383d, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1416,7 +1501,7 @@ rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, <sse41:cpu>, Modrm|<sse41:scal>|<sse41
 
 // SSE4.2 instructions.
 
-<sse42:cpu:attr:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexVVVV, $sse:SSE4_2::>
+<sse42:cpu:attr:vvvv, $avx:AVX:Vex128|VexW0|SSE2AVX:VexVVVVSrc, $sse:SSE4_2::>
 
 pcmpgtq<sse42>, 0x660f3837, <sse42:cpu>, Modrm|<sse42:attr>|<sse42:vvvv>|NoSuf|Optimize, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpestri<sse42>, 0x660f3a61, <sse42:cpu>|No64, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1447,7 +1532,7 @@ xsaveopt64, 0xfae/6, Xsaveopt|x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
 
 // AES instructions.
 
-<aes:cpu:attr:vvvv, $avx:AVX|:Vex128|VexW0|SSE2AVX:VexVVVV, $sse:::>
+<aes:cpu:attr:vvvv, $avx:AVX|:Vex128|VexW0|SSE2AVX:VexVVVVSrc, $sse:::>
 
 aesdec<aes>, 0x660f38de, <aes:cpu>AES, Modrm|<aes:attr>|<aes:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 aesdeclast<aes>, 0x660f38df, <aes:cpu>AES, Modrm|<aes:attr>|<aes:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1458,7 +1543,7 @@ aeskeygenassist<aes>, 0x660f3adf, <aes:cpu>AES, Modrm|<aes:attr>|NoSuf, { Imm8,
 
 // PCLMULQDQ
 
-<pclmul:cpu:attr, $avx:AVX|:Vex128|VexW0|SSE2AVX|VexVVVV, $sse::>
+<pclmul:cpu:attr, $avx:AVX|:Vex128|VexW0|SSE2AVX|VexVVVVSrc, $sse::>
 
 pclmulqdq<pclmul>, 0x660f3a44, <pclmul:cpu>PCLMULQDQ, Modrm|<pclmul:attr>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 pclmullqlqdq<pclmul>, 0x660f3a44/0x00, <pclmul:cpu>PCLMULQDQ, Modrm|<pclmul:attr>|NoSuf|ImmExt, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1468,7 +1553,7 @@ pclmulhqhqdq<pclmul>, 0x660f3a44/0x11, <pclmul:cpu>PCLMULQDQ, Modrm|<pclmul:attr
 
 // GFNI
 
-<gfni:cpu:w0:w1, $avx:AVX|:Vex128|VexW0|SSE2AVX|VexVVVV:Vex128|VexW1|SSE2AVX|VexVVVV, $sse:::>
+<gfni:cpu:w0:w1, $avx:AVX|:Vex128|VexW0|SSE2AVX|VexVVVVSrc:Vex128|VexW1|SSE2AVX|VexVVVV, $sse:::>
 
 gf2p8affineqb<gfni>, 0x660f3ace, <gfni:cpu>GFNI, Modrm|<gfni:w1>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 gf2p8affineinvqb<gfni>, 0x660f3acf, <gfni:cpu>GFNI, Modrm|<gfni:w1>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1493,21 +1578,21 @@ gf2p8mulb<gfni>, 0x660f38cf, <gfni:cpu>GFNI, Modrm|<gfni:w0>|NoSuf, { RegXMM|Uns
     x:Vex128:ATTSyntax:RegXMM|Unspecified|BaseIndex, +
     y:Vex256:ATTSyntax:RegYMM|Unspecified|BaseIndex>
 
-vaddp<sd>, 0x<sd:ppfx>58, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vadds<sd>, 0x<sd:spfx>58, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaddsubpd, 0x66d0, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vaddsubps, 0xf2d0, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vandnp<sd>, 0x<sd:ppfx>55, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vandp<sd>, 0x<sd:ppfx>54, AVX, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vblendp<sd>, 0x660c | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vblendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vaddp<sd>, 0x<sd:ppfx>58, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vadds<sd>, 0x<sd:spfx>58, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaddsubpd, 0x66d0, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vaddsubps, 0xf2d0, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vandnp<sd>, 0x<sd:ppfx>55, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vandp<sd>, 0x<sd:ppfx>54, AVX, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vblendp<sd>, 0x660c | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vblendvp<sd>, 0x664a | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vbroadcastf128, 0x661a, AVX, Modrm|Vex=2|Space0F38|VexW=1|NoSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
 vbroadcastsd, 0x6619, AVX, Modrm|Vex256|Space0F38|VexW0|NoSuf, { Qword|Unspecified|BaseIndex, RegYMM }
 vbroadcastss, 0x6618, AVX, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vcmp<frel>p<sd>, 0x<sd:ppfx>c2/0x<frel:imm>, AVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vcmp<frel>s<sd>, 0x<sd:spfx>c2/0x<frel:imm>, AVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcmpp<sd>, 0x<sd:ppfx>c2, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vcmps<sd>, 0x<sd:spfx>c2, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vcmp<frel>p<sd>, 0x<sd:ppfx>c2/0x<frel:imm>, AVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vcmp<frel>s<sd>, 0x<sd:spfx>c2/0x<frel:imm>, AVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcmpp<sd>, 0x<sd:ppfx>c2, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vcmps<sd>, 0x<sd:spfx>c2, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vcomis<sd>, 0x<sd:ppfx>2f, AVX, Modrm|VexLIG|Space0F|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
 vcvtdq2pd, 0xf3e6, AVX, Modrm|Vex128|Space0F|VexWIG|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vcvtdq2pd, 0xf3e6, AVX, Modrm|Vex256|Space0F|VexWIG|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
@@ -1518,35 +1603,35 @@ vcvtps2dq, 0x665b, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspe
 vcvtps2pd, 0x5a, AVX, Modrm|Vex128|Space0F|VexWIG|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vcvtps2pd, 0x5a, AVX, Modrm|Vex256|Space0F|VexWIG|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
 vcvts<sd>2si, 0x<sd:spfx>2d, AVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-vcvtsd2ss, 0xf25a, AVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vcvtsi2s<sd>, 0x<sd:spfx>2a, AVX, Modrm|VexLIG|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2s<sd>, 0x<sd:spfx>2a, AVX, Modrm|VexLIG|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtss2sd, 0xf35a, AVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vcvtsd2ss, 0xf25a, AVX, Modrm|Vex=3|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vcvtsi2s<sd>, 0x<sd:spfx>2a, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2s<sd>, 0x<sd:spfx>2a, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtss2sd, 0xf35a, AVX, Modrm|Vex=3|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vcvttpd2dq<Vxy>, 0x66e6, AVX, Modrm|<Vxy:vex>|Space0F|VexWIG|NoSuf|<Vxy:syntax>, { <Vxy:src>, RegXMM }
 vcvttps2dq, 0xf35b, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vcvtts<sd>2si, 0x<sd:spfx>2c, AVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-vdivp<sd>, 0x<sd:ppfx>5e, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vdivs<sd>, 0x<sd:spfx>5e, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vdppd, 0x6641, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vdpps, 0x6640, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vdivp<sd>, 0x<sd:ppfx>5e, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vdivs<sd>, 0x<sd:spfx>5e, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vdppd, 0x6641, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vdpps, 0x6640, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vextractf128, 0x6619, AVX, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
 vextractps, 0x6617, AVX|AVX512F, Modrm|Vex128|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|NoSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, AVX|AVX512F|x64, RegMem|Vex128|EVex128|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg64 }
-vhaddpd, 0x667c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhaddps, 0xf27c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhsubpd, 0x667d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhsubps, 0xf27d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vinsertf128, 0x6618, AVX, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
-vinsertps, 0x6621, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vhaddpd, 0x667c, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhaddps, 0xf27c, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhsubpd, 0x667d, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhsubps, 0xf27d, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vinsertf128, 0x6618, AVX, Modrm|Vex256|Space0F3A|VexVVVVSrc|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
+vinsertps, 0x6621, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vlddqu, 0xf2f0, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vldmxcsr, 0xae/2, AVX, Modrm|Vex128|Space0F|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex }
 vmaskmovdqu, 0x66f7, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { RegXMM, RegXMM }
-vmaskmovp<sd>, 0x662e | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
-vmaskmovp<sd>, 0x662c | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vmaxp<sd>, 0x<sd:ppfx>5f, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vmaxs<sd>, 0x<sd:spfx>5f, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vminp<sd>, 0x<sd:ppfx>5d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vmins<sd>, 0x<sd:spfx>5d, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vmaskmovp<sd>, 0x662e | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
+vmaskmovp<sd>, 0x662c | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vmaxp<sd>, 0x<sd:ppfx>5f, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vmaxs<sd>, 0x<sd:spfx>5f, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vminp<sd>, 0x<sd:ppfx>5d, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vmins<sd>, 0x<sd:spfx>5d, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vmovap<sd>, 0x<sd:ppfx>28, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 // vmovd really shouldn't allow for 64bit operand (vmovq is the right
 // mnemonic for copying between Reg64/Mem64 and RegXMM, as is mandated
@@ -1559,11 +1644,11 @@ vmovddup, 0xf212, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { Qword|Unspecified|BaseI
 vmovddup, 0xf212, AVX, Modrm|Vex=2|Space0F|VexWIG|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
 vmovdqa, 0x666f, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vmovhlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>16, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>16, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovhp<sd>, 0x<sd:ppfx>17, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>12, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>12, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovlp<sd>, 0x<sd:ppfx>13, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovmskp<sd>, 0x<sd:ppfx>50, AVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { RegXMM|RegYMM, Reg32|Reg64 }
 vmovntdq, 0x66e7, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
@@ -1573,78 +1658,78 @@ vmovq, 0xf37e, AVX, Load|Modrm|Vex=1|Space0F|VexWIG|NoSuf, { Qword|Unspecified|B
 vmovq, 0x66d6, AVX, Modrm|Vex=1|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 vmovq, 0x666e, AVX|AVX512F|x64, D|Modrm|Vex128|EVex128|Space0F|VexW1|Disp8MemShift=3|NoSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
 vmovs<sd>, 0x<sd:spfx>10, AVX, D|Modrm|VexLIG|Space0F|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
-vmovs<sd>, 0x<sd:spfx>10, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovs<sd>, 0x<sd:spfx>10, AVX, D|Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
 vmovshdup, 0xf316, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovsldup, 0xf312, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovup<sd>, 0x<sd:ppfx>10, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vmpsadbw, 0x6642, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vmulp<sd>, 0x<sd:ppfx>59, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vmuls<sd>, 0x<sd:spfx>59, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vorp<sd>, 0x<sd:ppfx>56, AVX, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vmpsadbw, 0x6642, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vmulp<sd>, 0x<sd:ppfx>59, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vmuls<sd>, 0x<sd:spfx>59, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vorp<sd>, 0x<sd:ppfx>56, AVX, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpabs<bw>, 0x661c | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F38|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpabsd, 0x661e, AVX|AVX2, Modrm|Vex|Space0F38|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vpackssdw, 0x666b, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpacksswb, 0x6663, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpackusdw, 0x662b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpackuswb, 0x6667, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpadds<bw>, 0x66ec | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpadd<bw>, 0x66fc | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpaddd, 0x66fe, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpaddq, 0x66d4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpaddus<bw>, 0x66dc | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpalignr, 0x660f, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpand, 0x66db, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpandn, 0x66df, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpavg<bw>, 0x66e0 | (3 * <bw:opc>), AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpblendvb, 0x664c, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpblendw, 0x660e, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcmpeq<bw>, 0x6674 | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcmpeqd, 0x6676, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcmpeqq, 0x6629, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpackssdw, 0x666b, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpacksswb, 0x6663, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpackusdw, 0x662b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpackuswb, 0x6667, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpadds<bw>, 0x66ec | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpadd<bw>, 0x66fc | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpaddd, 0x66fe, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpaddq, 0x66d4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpaddus<bw>, 0x66dc | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpalignr, 0x660f, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpand, 0x66db, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpandn, 0x66df, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpavg<bw>, 0x66e0 | (3 * <bw:opc>), AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpblendvb, 0x664c, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpblendw, 0x660e, AVX|AVX2, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpeq<bw>, 0x6674 | <bw:opc>, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpeqd, 0x6676, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpeqq, 0x6629, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpcmpestri, 0x6661, AVX|No64, Modrm|Vex|Space0F3A|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 vpcmpestri, 0x6661, AVX|x64, Modrm|Vex|Space0F3A|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Imm8, Xmmword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vpcmpestrm, 0x6660, AVX|No64, Modrm|Vex|Space0F3A|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 vpcmpestrm, 0x6660, AVX|x64, Modrm|Vex|Space0F3A|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Imm8, Xmmword|Unspecified|BaseIndex|RegXMM, RegXMM }
-vpcmpgt<bw>, 0x6664 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcmpgtd, 0x6666, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcmpgtq, 0x6637, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpgt<bw>, 0x6664 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpgtd, 0x6666, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmpgtq, 0x6637, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpcmpistri, 0x6663, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 vpcmpistrm, 0x6662, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
-vperm2f128, 0x6606, AVX, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpermilps, 0x660c, AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vperm2f128, 0x6606, AVX, Modrm|Vex256|Space0F3A|VexVVVVSrc|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpermilps, 0x660c, AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpermilps, 0x6604, AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpermilpd, 0x660d, AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpermilpd, 0x660d, AVX, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpermilpd, 0x6605, AVX, Modrm|Vex|Space0F3A|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpextr<dq>, 0x6616, AVX|<dq:cpu64>, Modrm|Vex|Space0F3A|<dq:vexw64>|NoSuf, { Imm8, RegXMM, <dq:gpr>|Unspecified|BaseIndex }
 vpextrw, 0x66c5, AVX, Load|Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX, RegMem|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
-vphaddd, 0x6602, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vphaddsw, 0x6603, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vphaddw, 0x6601, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphaddd, 0x6602, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphaddsw, 0x6603, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphaddw, 0x6601, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vphminposuw, 0x6641, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM }
-vphsubd, 0x6606, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vphsubsw, 0x6607, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vphsubw, 0x6605, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
-vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpinsr<dq>, 0x6622, AVX|<dq:cpu64>, Modrm|Vex|Space0F3A|VexVVVV|<dq:vexw64>|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
-vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmaddubsw, 0x6604, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaddwd, 0x66f5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxsb, 0x663c, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxsd, 0x663d, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxsw, 0x66ee, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxub, 0x66de, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxud, 0x663f, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmaxuw, 0x663e, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminsb, 0x6638, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminsd, 0x6639, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminsw, 0x66ea, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminub, 0x66da, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminud, 0x663b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpminuw, 0x663a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphsubd, 0x6606, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphsubsw, 0x6607, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vphsubw, 0x6605, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpinsr<dq>, 0x6622, AVX|<dq:cpu64>, Modrm|Vex|Space0F3A|VexVVVVSrc|<dq:vexw64>|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmaddubsw, 0x6604, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaddwd, 0x66f5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxsb, 0x663c, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxsd, 0x663d, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxsw, 0x66ee, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxub, 0x66de, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxud, 0x663f, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmaxuw, 0x663e, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminsb, 0x6638, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminsd, 0x6639, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminsw, 0x66ea, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminub, 0x66da, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminud, 0x663b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpminuw, 0x663a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpmovmskb, 0x66d7, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { RegXMM|RegYMM, Reg32|Reg64 }
 vpmovsxbd, 0x6621, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM }
 vpmovsxbq, 0x6622, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM }
@@ -1658,66 +1743,66 @@ vpmovzxbw, 0x6630, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|Ba
 vpmovzxdq, 0x6635, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vpmovzxwd, 0x6633, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vpmovzxwq, 0x6634, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM }
-vpmuldq, 0x6628, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmulhrsw, 0x660b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmulhuw, 0x66e4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmulhw, 0x66e5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmulld, 0x6640, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmullw, 0x66d5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmuludq, 0x66f4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpor, 0x66eb, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsadbw, 0x66f6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|C|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpshufb, 0x6600, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmuldq, 0x6628, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmulhrsw, 0x660b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmulhuw, 0x66e4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmulhw, 0x66e5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmulld, 0x6640, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmullw, 0x66d5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmuludq, 0x66f4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpor, 0x66eb, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsadbw, 0x66f6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|C|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpshufb, 0x6600, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpshufd, 0x6670, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpshufhw, 0xf370, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpshuflw, 0xf270, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vpsign<bw>, 0x6608 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsignd, 0x660a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsll<dq>, 0x6672 | <dq:opc>/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsll<dq>, 0x66f2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpslldq, 0x6673/7, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsllw, 0x6671/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsllw, 0x66f1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrad, 0x6672/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrad, 0x66e2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsraw, 0x6671/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsraw, 0x66e1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrl<dq>, 0x66d2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrldq, 0x6673/3, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrlw, 0x6671/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrlw, 0x66d1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsub<bw>, 0x66f8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsub<dq>, 0x66fa | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsubs<bw>, 0x66e8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsubus<bw>, 0x66d8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsign<bw>, 0x6608 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsignd, 0x660a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsll<dq>, 0x6672 | <dq:opc>/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsll<dq>, 0x66f2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpslldq, 0x6673/7, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsllw, 0x6671/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsllw, 0x66f1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrad, 0x6672/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrad, 0x66e2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsraw, 0x6671/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsraw, 0x66e1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrl<dq>, 0x66d2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrldq, 0x6673/3, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrlw, 0x6671/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrlw, 0x66d1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsub<bw>, 0x66f8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsub<dq>, 0x66fa | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsubs<bw>, 0x66e8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsubus<bw>, 0x66d8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vptest, 0x6617, AVX, Modrm|Vex|Space0F38|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpckhbw, 0x6668, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpckhdq, 0x666a, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpckhqdq, 0x666d, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpckhwd, 0x6669, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpcklbw, 0x6660, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpckldq, 0x6662, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpcklqdq, 0x666c, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpunpcklwd, 0x6661, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpxor, 0x66ef, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpckhbw, 0x6668, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpckhdq, 0x666a, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpckhqdq, 0x666d, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpckhwd, 0x6669, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpcklbw, 0x6660, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpckldq, 0x6662, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpcklqdq, 0x666c, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpunpcklwd, 0x6661, AVX|AVX2, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpxor, 0x66ef, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vrcpps, 0x53, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vrcpss, 0xf353, AVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vrcpss, 0xf353, AVX, Modrm|Vex=3|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vroundp<sd>, 0x6608 | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexWIG|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vrounds<sd>, 0x660a | <sd:opc>, AVX, Modrm|VexLIG|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vrounds<sd>, 0x660a | <sd:opc>, AVX, Modrm|VexLIG|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vrsqrtps, 0x52, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vrsqrtss, 0xf352, AVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vshufp<sd>, 0x<sd:ppfx>c6, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vrsqrtss, 0xf352, AVX, Modrm|Vex=3|Space0F|VexVVVVSrc|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vshufp<sd>, 0x<sd:ppfx>c6, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrtp<sd>, 0x<sd:ppfx>51, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vsqrts<sd>, 0x<sd:spfx>51, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vsqrts<sd>, 0x<sd:spfx>51, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vstmxcsr, 0xae/3, AVX, Modrm|Vex128|Space0F|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex }
-vsubp<sd>, 0x<sd:ppfx>5c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vsubs<sd>, 0x<sd:spfx>5c, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vsubp<sd>, 0x<sd:ppfx>5c, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vsubs<sd>, 0x<sd:spfx>5c, AVX, Modrm|VexLIG|Space0F|VexVVVVSrc|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vtestp<sd>, 0x660e | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vucomis<sd>, 0x<sd:ppfx>2e, AVX, Modrm|VexLIG|Space0F|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
-vunpckhp<sd>, 0x<sd:ppfx>15, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vunpcklp<sd>, 0x<sd:ppfx>14, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vxorp<sd>, 0x<sd:ppfx>57, AVX, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vunpckhp<sd>, 0x<sd:ppfx>15, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vunpcklp<sd>, 0x<sd:ppfx>14, AVX, Modrm|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vxorp<sd>, 0x<sd:ppfx>57, AVX, Modrm|C|Vex|Space0F|VexVVVVSrc|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vzeroall, 0x77, AVX, Vex=2|Space0F|VexWIG|NoSuf, {}
 vzeroupper, 0x77, AVX, Vex|Space0F|VexWIG|NoSuf, {}
 
@@ -1742,59 +1827,59 @@ vpmovzxwq, 0x6634, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38
 vbroadcasti128, 0x665A, AVX2, Modrm|Vex=2|Space0F38|VexW=1|NoSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
 vbroadcastsd, 0x6619, AVX2, Modrm|Vex=2|Space0F38|VexW=1|NoSuf, { RegXMM, RegYMM }
 vbroadcastss, 0x6618, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexW0|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpblendd, 0x6602, AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpblendd, 0x6602, AVX2, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpbroadcast<bw>, 0x6678 | <bw:opc>, AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { <bw:elem>|Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM }
 vpbroadcastd, 0x6658, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexW0|Disp8MemShift|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpbroadcastq, 0x6659, AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf|Optimize, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vperm2i128, 0x6646, AVX2, Modrm|Vex=2|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpermd, 0x6636, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vperm2i128, 0x6646, AVX2, Modrm|Vex=2|Space0F3A|VexVVVVSrc|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpermd, 0x6636, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 vpermpd, 0x6601, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM }
-vpermps, 0x6616, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermps, 0x6616, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 vpermq, 0x6600, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM }
 vextracti128, 0x6639, AVX2, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
-vinserti128, 0x6638, AVX2, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
-vpmaskmov<dq>, 0x668e, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
-vpmaskmov<dq>, 0x668c, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsllv<dq>, 0x6647, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsravd, 0x6646, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlv<dq>, 0x6645, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vinserti128, 0x6638, AVX2, Modrm|Vex256|Space0F3A|VexVVVVSrc|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
+vpmaskmov<dq>, 0x668e, AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|<dq:vexw>|CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
+vpmaskmov<dq>, 0x668c, AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|<dq:vexw>|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsllv<dq>, 0x6647, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsravd, 0x6646, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrlv<dq>, 0x6645, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX gather instructions
-vgatherdpd, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vgatherdps, 0x6692, AVX2, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vgatherdps, 0x6692, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
-vgatherqp<sd>, 0x6693, AVX2, Modrm|Vex|Space0F38|VexVVVV|<sd:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <sd:elem>|Unspecified|BaseIndex, RegXMM }
-vgatherqpd, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
-vgatherqps, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherdd, 0x6690, AVX2, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherdd, 0x6690, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
-vpgatherdq, 0x6690, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vpgatherq<dq>, 0x6691, AVX2, Modrm|Vex128|Space0F38|VexVVVV|<dq:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <dq:elem>|Unspecified|BaseIndex, RegXMM }
-vpgatherqd, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherqq, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
+vgatherdpd, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vgatherdps, 0x6692, AVX2, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vgatherdps, 0x6692, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
+vgatherqp<sd>, 0x6693, AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|<sd:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <sd:elem>|Unspecified|BaseIndex, RegXMM }
+vgatherqpd, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
+vgatherqps, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherdd, 0x6690, AVX2, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherdd, 0x6690, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
+vpgatherdq, 0x6690, AVX2, Modrm|Vex|Space0F38|VexVVVVSrc|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vpgatherq<dq>, 0x6691, AVX2, Modrm|Vex128|Space0F38|VexVVVVSrc|<dq:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <dq:elem>|Unspecified|BaseIndex, RegXMM }
+vpgatherqd, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherqq, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
 
 // AES + AVX
 
-vaesdec, 0x66de, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesdeclast, 0x66df, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesenc, 0x66dc, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesenclast, 0x66dd, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesdec, 0x66de, AVX|AES, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesdeclast, 0x66df, AVX|AES, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesenc, 0x66dc, AVX|AES, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesenclast, 0x66dd, AVX|AES, Modrm|Vex|Space0F38|VexVVVVSrc|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vaesimc, 0x66db, AVX|AES, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM }
 vaeskeygenassist, 0x66df, AVX|AES, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 
 // PCLMULQDQ + AVX
 
-vpclmulqdq, 0x6644, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmullqlqdq, 0x6644/0x00, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmulhqlqdq, 0x6644/0x01, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmullqhqdq, 0x6644/0x10, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmulhqhqdq, 0x6644/0x11, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmulqdq, 0x6644, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmullqlqdq, 0x6644/0x00, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmulhqlqdq, 0x6644/0x01, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmullqhqdq, 0x6644/0x10, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmulhqhqdq, 0x6644/0x11, AVX|PCLMULQDQ, Modrm|Vex|Space0F3A|VexVVVVSrc|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 
 // GFNI + AVX
 
-vgf2p8affineinvqb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vgf2p8affineqb, 0x66ce, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vgf2p8mulb, 0x66cf, GFNI|AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8affineinvqb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vgf2p8affineqb, 0x66ce, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vgf2p8mulb, 0x66cf, GFNI|AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // FSGSBASE, RDRND and F16C
 
@@ -1817,16 +1902,16 @@ vcvtps2ph, 0x661d, F16C, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Uns
     d:AVX512F:AVX512DQ:FMA|AVX|AVX512F:66:f2:66:Space0F:Space0F38:1:Vex|EVexDYN:VexLIG|EVexLIG:VexW1:Qword, +
     h:AVX512_FP16:AVX512_FP16:AVX512_FP16::f3::EVexMap5:EVexMap6:0::EVexLIG:VexW0:Word>
 
-vfmadd<fma>p<sdh>, 0x6688 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfmadd<fma>s<sdh>, 0x6689 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vfmaddsub<fma>p<sdh>, 0x6686 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfmsub<fma>p<sdh>, 0x668a | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfmsub<fma>s<sdh>, 0x668b | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vfmsubadd<fma>p<sdh>, 0x6687 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfnmadd<fma>p<sdh>, 0x668c | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfnmadd<fma>s<sdh>, 0x668d | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vfnmsub<fma>p<sdh>, 0x668e | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfnmsub<fma>s<sdh>, 0x668f | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmadd<fma>p<sdh>, 0x6688 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmadd<fma>s<sdh>, 0x6689 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmaddsub<fma>p<sdh>, 0x6686 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsub<fma>p<sdh>, 0x668a | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsub<fma>s<sdh>, 0x668b | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmsubadd<fma>p<sdh>, 0x6687 | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmadd<fma>p<sdh>, 0x668c | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmadd<fma>s<sdh>, 0x668d | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmsub<fma>p<sdh>, 0x668e | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vex>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmsub<fma>s<sdh>, 0x668f | 0x<fma:opc>, <sdh:fma>, Modrm|<sdh:vexlig>|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // HLE prefixes
 
@@ -1841,35 +1926,35 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
 
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-bzhi, 0xf5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVDest|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
 rorx, 0xf2f0, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVDest|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
-vfmaddp<sd>, 0x6668 | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfmadds<sd>, 0x666a | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVV|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
-vfmaddsubp<sd>, 0x665c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfmsubaddp<sd>, 0x665e | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfmsubp<sd>, 0x666c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfmsubs<sd>, 0x666e | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVV|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
-vfnmaddp<sd>, 0x6678 | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfnmadds<sd>, 0x667a | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVV|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
-vfnmsubp<sd>, 0x667c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vfnmsubs<sd>, 0x667e | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVV|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
+vfmaddp<sd>, 0x6668 | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfmadds<sd>, 0x666a | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVVSrc|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
+vfmaddsubp<sd>, 0x665c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfmsubaddp<sd>, 0x665e | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfmsubp<sd>, 0x666c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfmsubs<sd>, 0x666e | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVVSrc|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
+vfnmaddp<sd>, 0x6678 | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfnmadds<sd>, 0x667a | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVVSrc|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
+vfnmsubp<sd>, 0x667c | <sd:opc>, FMA4, D|Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vfnmsubs<sd>, 0x667e | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVVSrc|VexW1|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM, RegXMM }
 
 // XOP instructions
 
@@ -1879,11 +1964,11 @@ vfnmsubs<sd>, 0x667e | <sd:opc>, FMA4, D|Modrm|VexLIG|Space0F3A|VexVVVV|VexW1|No
 
 vfrczp<sd>, 0x80 | <sd:opc>, XOP, Modrm|SpaceXOP09|VexW0|CheckOperandSize|NoSuf|Vex, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vfrczs<sd>, 0x82 | <sd:opc>, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { <sd:elem>|RegXMM|Unspecified|BaseIndex, RegXMM }
-vpcmov, 0xa2, XOP, D|Modrm|Vex|SpaceXOP08|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpcom<sign><xop>, 0xcc | 0x<sign:opc> | <xop:opc>, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpcom<irel><sign><xop>, 0xcc | 0x<sign:opc> | <xop:opc>/<irel:imm>, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf|ImmExt, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpermil2p<sd>, 0x6648 | <sd:opc>, XOP, Modrm|Vex|Space0F3A|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpermil2p<sd>, 0x6648 | <sd:opc>, XOP, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcmov, 0xa2, XOP, D|Modrm|Vex|SpaceXOP08|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpcom<sign><xop>, 0xcc | 0x<sign:opc> | <xop:opc>, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpcom<irel><sign><xop>, 0xcc | 0x<sign:opc> | <xop:opc>/<irel:imm>, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf|ImmExt, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpermil2p<sd>, 0x6648 | <sd:opc>, XOP, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpermil2p<sd>, 0x6648 | <sd:opc>, XOP, Modrm|Vex|Space0F3A|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vphaddb<dq>, 0xc2 | <dq:opc>, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
 vphaddbw, 0xc1, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
 vphadddq, 0xcb, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1895,23 +1980,23 @@ vphaddw<dq>, 0xc6 | <dq:opc>, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Un
 vphsubbw, 0xe1, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
 vphsubdq, 0xe3, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
 vphsubwd, 0xe2, XOP, Modrm|SpaceXOP09|VexW0|NoSuf|Vex, { RegXMM|Unspecified|BaseIndex, RegXMM }
-vpmacsdd, 0x9e, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacsdqh, 0x9f, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacsdql, 0x97, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacssdd, 0x8e, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacssdqh, 0x8f, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacssdql, 0x87, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacsswd, 0x86, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacssww, 0x85, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacswd, 0x96, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmacsww, 0x95, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmadcsswd, 0xa6, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpmadcswd, 0xb6, XOP, Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpperm, 0xa3, XOP, D|Modrm|Vex128|SpaceXOP08|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVV|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
+vpmacsdd, 0x9e, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacsdqh, 0x9f, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacsdql, 0x97, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacssdd, 0x8e, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacssdqh, 0x8f, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacssdql, 0x87, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacsswd, 0x86, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacssww, 0x85, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacswd, 0x96, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmacsww, 0x95, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmadcsswd, 0xa6, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpmadcswd, 0xb6, XOP, Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpperm, 0xa3, XOP, D|Modrm|Vex128|SpaceXOP08|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVVSrc|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
 vprot<xop>, 0xc0 | <xop:opc>, XOP, Modrm|Vex128|SpaceXOP08|VexW0|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
-vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVV|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
-vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVV|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
+vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVVSrc|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
+vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVVSrc|SwapSources|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
 
 <xop>
 <irel>
@@ -1921,35 +2006,35 @@ vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|VexVVVV|SwapSources
 
 llwpcb, 0x12/0, LWP, Modrm|SpaceXOP09|NoSuf|Vex, { Reg32|Reg64 }
 slwpcb, 0x12/1, LWP, Modrm|SpaceXOP09|NoSuf|Vex, { Reg32|Reg64 }
-lwpval, 0x12/1, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
-lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpval, 0x12/1, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVVSrc|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVVSrc|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-andn, 0xf2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-bextr, 0xf7, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI|APX_F, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+andn, 0xf2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bextr, 0xf7, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVDest|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVDest|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVDest|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
 
 bextr, 0x10, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP0A|No_bSuf|No_wSuf|No_sSuf, { Imm32|Imm32S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blcfill, 0x01/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blci, 0x02/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blcic, 0x01/5, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blcmsk, 0x02/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blcs, 0x01/3, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsfill, 0x01/2, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsic, 0x01/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-t1mskc, 0x01/7, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-tzmsk, 0x01/4, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcfill, 0x01/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blci, 0x02/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcic, 0x01/5, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcmsk, 0x02/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcs, 0x01/3, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsfill, 0x01/2, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsic, 0x01/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+t1mskc, 0x01/7, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+tzmsk, 0x01/4, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVVSrc|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // AMD 3DNow! instructions.
 
@@ -2040,7 +2125,11 @@ xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 
 // Multy-precision Add Carry, rdseed instructions.
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.
@@ -2081,42 +2170,42 @@ sha256msg2, 0xdd, SHA|APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|
 
 // SHA512 instructions.
 
-vsha512rnds2, 0xf2cb, SHA512, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
+vsha512rnds2, 0xf2cb, SHA512, Modrm|Vex256|Space0F38|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
 vsha512msg1, 0xf2cc, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegXMM, RegYMM }
 vsha512msg2, 0xf2cd, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegYMM, RegYMM }
 
 // SHA512 instructions end.
 
 // SM3 instructions.
-vsm3rnds2, 0x66de, SM3, Modrm|Space0F3A|Vex128|VexVVVV|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vsm3msg1, 0xda, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3rnds2, 0x66de, SM3, Modrm|Space0F3A|Vex128|VexVVVVSrc|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg1, 0xda, SM3, Modrm|Space0F38|Vex128|VexVVVVSrc|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVVSrc|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // SM3 instructions end.
 
 // SM4 instructions.
 
-vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // SM4 instructions end.
 
 // VAES
 
-vaesdec, 0x66de, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesdeclast, 0x66df, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesenc, 0x66dc, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesenclast, 0x66dd, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesdec, 0x66de, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesdeclast, 0x66df, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesenc, 0x66dc, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesenclast, 0x66dd, VAES|AVX|AVX512F, Modrm|Vex|EVexDYN|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // VAES instructions end
 
 // VPCLMULQDQ instructions
 
-vpclmulqdq, 0x6644, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmulqdq, 0x6644, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // VPCLMULQDQ instructions end
 
@@ -2130,11 +2219,11 @@ vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ|AVX|AVX512F, Modrm|Space0F3A|Vex|EVexDYN|
     x:AVX512VL:EVex128|Disp8MemShift=4|ATTSyntax:::RegXMM|Unspecified|BaseIndex:RegXMM, +
     y:AVX512VL:EVex256|Disp8MemShift=5|ATTSyntax:::RegYMM|Unspecified|BaseIndex:RegXMM>
 
-kand<bw>, 0x<bw:kpfx>41, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
-kandn<bw>, 0x<bw:kpfx>42, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
-kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
-kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
-kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kand<bw>, 0x<bw:kpfx>41, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kandn<bw>, 0x<bw:kpfx>42, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
 kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
 kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>|APX_F, Modrm|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
@@ -2149,37 +2238,37 @@ kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMa
 kshiftl<bw>, 0x6632, <bw:kcpu>, Modrm|Vex128|Space0F3A|<bw:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 kshiftr<bw>, 0x6630, <bw:kcpu>, Modrm|Vex128|Space0F3A|<bw:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 
-kunpckbw, 0x664B, AVX512F, Modrm|Vex=2|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kunpckbw, 0x664B, AVX512F, Modrm|Vex=2|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
-vaddp<sdh>, 0x<sdh:ppfx>58, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vdivp<sdh>, 0x<sdh:ppfx>5e, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vmulp<sdh>, 0x<sdh:ppfx>59, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaddp<sdh>, 0x<sdh:ppfx>58, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vdivp<sdh>, 0x<sdh:ppfx>5e, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vmulp<sdh>, 0x<sdh:ppfx>59, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vsqrtp<sdh>, 0x<sdh:ppfx>51, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vsubp<sdh>, 0x<sdh:ppfx>5c, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vadds<sdh>, 0x<sdh:spfx>58, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vdivs<sdh>, 0x<sdh:spfx>5e, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmuls<sdh>, 0x<sdh:spfx>59, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vsqrts<sdh>, 0x<sdh:spfx>51, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vsubs<sdh>, 0x<sdh:spfx>5C, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
-
-valign<dq>, 0x6603, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vblendmp<sd>, 0x6665, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpblendm<dq>, 0x6664, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermi2<dq>, 0x6676, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermi2p<sd>, 0x6677, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermt2<dq>, 0x667E, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermt2p<sd>, 0x667F, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxs<dq>, 0x663D, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxu<dq>, 0x663F, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmins<dq>, 0x6639, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminu<dq>, 0x663B, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmuldq, 0x6628, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulld, 0x6640, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vprolv<dq>, 0x6615, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vprorv<dq>, 0x6614, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsravq, 0x6646, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpternlog<dq>, 0x6625, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vsubp<sdh>, 0x<sdh:ppfx>5c, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vadds<sdh>, 0x<sdh:spfx>58, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vdivs<sdh>, 0x<sdh:spfx>5e, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmuls<sdh>, 0x<sdh:spfx>59, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsqrts<sdh>, 0x<sdh:spfx>51, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsubs<sdh>, 0x<sdh:spfx>5C, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+
+valign<dq>, 0x6603, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vblendmp<sd>, 0x6665, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpblendm<dq>, 0x6664, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermi2<dq>, 0x6676, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermi2p<sd>, 0x6677, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermt2<dq>, 0x667E, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermt2p<sd>, 0x667F, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxs<dq>, 0x663D, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxu<dq>, 0x663F, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmins<dq>, 0x6639, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminu<dq>, 0x663B, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmuldq, 0x6628, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulld, 0x6640, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vprolv<dq>, 0x6615, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vprorv<dq>, 0x6614, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsravq, 0x6646, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpternlog<dq>, 0x6625, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vbroadcastf32x4, 0x661A, AVX512F, Modrm|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { XMMword|Unspecified|BaseIndex, RegYMM|RegZMM }
 vbroadcasti32x4, 0x665A, AVX512F, Modrm|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { XMMword|Unspecified|BaseIndex, RegYMM|RegZMM }
@@ -2192,11 +2281,11 @@ vbroadcastsd, 0x6619, AVX512F, Modrm|Masking|Space0F38|VexW1|Disp8MemShift=3|NoS
 vpbroadcastq, 0x6659, AVX512F, Modrm|Masking|Space0F38|VexW1|Disp8MemShift|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpbroadcast<dq>, 0x667c, AVX512F, Modrm|Masking|Space0F38|<dq:vexw64>|NoSuf, { <dq:gpr>, RegXMM|RegYMM|RegZMM }
 
-vcmp<frel>p<sd>, 0x<sd:ppfx>C2/0x<frel:imm>, AVX512F, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt|SAE, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vcmpp<sd>, 0x<sd:ppfx>C2, AVX512F, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vcmp<frel>p<sd>, 0x<sd:ppfx>C2/0x<frel:imm>, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt|SAE, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vcmpp<sd>, 0x<sd:ppfx>C2, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vcmp<frel>s<sd>, 0x<sd:spfx>C2/0x<frel:imm>, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegMask }
-vcmps<sd>, 0x<sd:spfx>C2, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegMask }
+vcmp<frel>s<sd>, 0x<sd:spfx>C2/0x<frel:imm>, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegMask }
+vcmps<sd>, 0x<sd:spfx>C2, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegMask }
 
 vcomis<sdh>, 0x<sdh:ppfx>2f, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM }
 vucomis<sdh>, 0x<sdh:ppfx>2e, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM }
@@ -2238,23 +2327,23 @@ vcvtps2ph, 0x661D, AVX512F, Modrm|EVex512|Masking|Space0F3A|VexW0|Disp8MemShift=
 vcvts<sd>2si, 0x<sd:spfx>2d, AVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 vcvts<sdh>2usi, 0x<sdh:spfx>79, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
-vcvtsd2ss, 0xF25A, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVV|VexW1|Disp8MemShift=3|NoSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsd2ss, 0xF25A, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVVSrc|VexW1|Disp8MemShift=3|NoSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sd, 0xF22A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sd, 0xF27B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsi2ss, 0xF32A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2ss, 0xF32A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2ss, 0xF37B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2ss, 0xF37B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2ss, 0xF32A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2ss, 0xF32A, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2ss, 0xF37B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2ss, 0xF37B, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtss2sd, 0xF35A, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVV|VexW0|Disp8MemShift=2|NoSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtss2sd, 0xF35A, AVX512F, Modrm|EVexLIG|Masking|Space0F|VexVVVVSrc|VexW0|Disp8MemShift=2|NoSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vcvttpd2dq<Exy>, 0x66e6, AVX512F|<Exy:vl>, Modrm|<Exy:attr>|Masking|Space0F|VexW1|Broadcast|NoSuf|<Exy:sae>, { <Exy:src>|Qword, <Exy:dst> }
 vcvttpd2udq<Exy>, 0x78, AVX512F|<Exy:vl>, Modrm|<Exy:attr>|Masking|Space0F|VexW1|Broadcast|NoSuf|<Exy:sae>, { <Exy:src>|Qword, <Exy:dst> }
@@ -2279,17 +2368,17 @@ vextracti32x4, 0x6639, AVX512F, Modrm|Masking|Space0F3A|VexW=1|Disp8MemShift=4|N
 vextractf64x4, 0x661B, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti64x4, 0x663B, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vfixupimmp<sd>, 0x6654, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfixupimms<sd>, 0x6655, AVX512F, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8|Imm8S, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfixupimmp<sd>, 0x6654, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfixupimms<sd>, 0x6655, AVX512F, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8|Imm8S, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vgetmantp<sdh>, 0x<sdh:pfx>26, <sdh:cpu>, Modrm|Masking|Space0F3A|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vgetmants<sdh>, 0x<sdh:pfx>27, <sdh:cpu>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vgetmants<sdh>, 0x<sdh:pfx>27, <sdh:cpu>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vrndscalep<sdh>, 0x<sdh:pfx>08 | <sdh:opc>, <sdh:cpu>, Modrm|Masking|Space0F3A|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vrndscales<sdh>, 0x<sdh:pfx>0a | <sdh:opc>, <sdh:cpu>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrndscales<sdh>, 0x<sdh:pfx>0a | <sdh:opc>, <sdh:cpu>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vscalefp<sdh>, 0x662c, <sdh:cpu>, Modrm|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vscalefs<sdh>, 0x662d, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vscalefp<sdh>, 0x662c, <sdh:cpu>, Modrm|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vscalefs<sdh>, 0x662d, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vgatherdpd, 0x6692, AVX512F, Modrm|EVex=1|Masking|NoDefMask|Space0F38|VexW1|Disp8MemShift=3|VecSIB256|NoSuf, { Qword|Unspecified|BaseIndex, RegZMM }
 vgatherdps, 0x6692, AVX512F, Modrm|EVex512|Masking|NoDefMask|Space0F38|VexW0|Disp8MemShift=2|VecSIB512|NoSuf, { Dword|Unspecified|BaseIndex, RegZMM }
@@ -2303,21 +2392,21 @@ vpgatherqq, 0x6691, AVX512F, Modrm|EVex=1|Masking|NoDefMask|Space0F38|VexW1|Disp
 vmovntdqa, 0x662A, AVX512F, Modrm|Space0F38|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { XMMword|YMMword|ZMMword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vgetexpp<sdh>, 0x6642, <sdh:cpu>, Modrm|Masking|<sdh:spc2>|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vgetexps<sdh>, 0x6643, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vgetexps<sdh>, 0x6643, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc2>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vinsertf32x4, 0x6618, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vinserti32x4, 0x6638, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinsertf32x4, 0x6618, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinserti32x4, 0x6638, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
-vinsertf64x4, 0x661A, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexVVVV|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
-vinserti64x4, 0x663A, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexVVVV|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinsertf64x4, 0x661A, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinserti64x4, 0x663A, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
 
-vinsertps, 0x6621, AVX512F, Modrm|EVex128|Space0F3A|VexVVVV|VexW0|Disp8MemShift=2|NoSuf, { Imm8, RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vinsertps, 0x6621, AVX512F, Modrm|EVex128|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=2|NoSuf, { Imm8, RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vmaxp<sdh>, 0x<sdh:ppfx>5f, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vmaxs<sdh>, 0x<sdh:spfx>5f, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmaxp<sdh>, 0x<sdh:ppfx>5f, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vmaxs<sdh>, 0x<sdh:spfx>5f, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vminp<sdh>, 0x<sdh:ppfx>5d, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vmins<sdh>, 0x<sdh:spfx>5d, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vminp<sdh>, 0x<sdh:ppfx>5d, <sdh:cpu>, Modrm|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vmins<sdh>, 0x<sdh:spfx>5d, <sdh:cpu>, Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vmovap<sd>, 0x<sd:ppfx>28, AVX512F, D|Modrm|Masking|Space0F|<sd:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovntp<sd>, 0x<sd:ppfx>2B, AVX512F, Modrm|Space0F|<sd:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
@@ -2331,56 +2420,56 @@ vmovntdq, 0x66E7, AVX512F, Modrm|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|No
 vmovdqu32, 0xF36F, AVX512F, D|Modrm|Masking|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqu64, 0xF36F, AVX512F, D|Modrm|Masking|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vmovhlps, 0x12, AVX512F, Modrm|EVex=4|Space0F|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
-vmovlhps, 0x16, AVX512F, Modrm|EVex=4|Space0F|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovhlps, 0x12, AVX512F, Modrm|EVex=4|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovlhps, 0x16, AVX512F, Modrm|EVex=4|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
 
-vmovhp<sd>, 0x<sd:ppfx>16, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>16, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|<sd:vexw>|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovhp<sd>, 0x<sd:ppfx>17, AVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlp<sd>, 0x<sd:ppfx>12, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>12, AVX512F, Modrm|EVexLIG|Space0F|VexVVVVSrc|<sd:vexw>|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovlp<sd>, 0x<sd:ppfx>13, AVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 
 vmovq, 0xF37E, AVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66D6, AVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 
 vmovs<sdh>, 0x<sdh:spfx>10, <sdh:cpu>, D|Modrm|EVexLIG|Masking|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
-vmovs<sdh>, 0x<sdh:spfx>10, <sdh:cpu>, D|Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVV|<sdh:vexw>|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovs<sdh>, 0x<sdh:spfx>10, <sdh:cpu>, D|Modrm|EVexLIG|Masking|<sdh:spc1>|VexVVVVSrc|<sdh:vexw>|NoSuf, { RegXMM, RegXMM, RegXMM }
 
 vmovshdup, 0xF316, AVX512F, Modrm|Masking|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovsldup, 0xF312, AVX512F, Modrm|Masking|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpabs<dq>, 0x661e | <dq:opc>, AVX512F, Modrm|Masking|Space0F38|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpaddd, 0x66FE, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpaddq, 0x66d4, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpand<dq>, 0x66db, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpandn<dq>, 0x66df, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmuludq, 0x66f4, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpor<dq>, 0x66eb, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsub<dq>, 0x66fa | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhdq, 0x666A, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhqdq, 0x666d, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckldq, 0x6662, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpcklqdq, 0x666c, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpxor<dq>, 0x66ef, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpaddd, 0x66FE, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpaddq, 0x66d4, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpand<dq>, 0x66db, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpandn<dq>, 0x66df, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmuludq, 0x66f4, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpor<dq>, 0x66eb, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsub<dq>, 0x66fa | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhdq, 0x666A, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhqdq, 0x666d, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckldq, 0x6662, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpcklqdq, 0x666c, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpxor<dq>, 0x66ef, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 <irel:imm, eq:0, lt:1, le:2, neq:4, nlt:5, nle:6>
 
-vpcmpeqd, 0x6676, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpeqq, 0x6629, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpgtd, 0x6666, AVX512F, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpgtq, 0x6637, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<dq>, 0x661f, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpu<dq>, 0x661e, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<irel><dq>, 0x661f/<irel:imm>, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<irel>u<dq>, 0x661e/<irel:imm>, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpeqd, 0x6676, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpeqq, 0x6629, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpgtd, 0x6666, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpgtq, 0x6637, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<dq>, 0x661f, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpu<dq>, 0x661e, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<irel><dq>, 0x661f/<irel:imm>, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<irel>u<dq>, 0x661e/<irel:imm>, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vptestm<dq>, 0x6627, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vptestnm<dq>, 0xf327, AVX512F, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vptestm<dq>, 0x6627, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vptestnm<dq>, 0xf327, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
 vpermilpd, 0x6605, AVX512F, Modrm|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpermilpd, 0x660d, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermilpd, 0x660d, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpermpd, 0x6616, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vpermq, 0x6636, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermpd, 0x6616, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermq, 0x6636, AVX512F, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vpmovdb, 0xF331, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegZMM, RegXMM|Unspecified|BaseIndex }
 vpmovsdb, 0xF321, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegZMM, RegXMM|Unspecified|BaseIndex }
@@ -2417,34 +2506,34 @@ vpmovzxwd, 0x6633, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexWIG|Disp8MemShift=
 vpmovsxwq, 0x6624, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegZMM }
 vpmovzxwq, 0x6634, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegZMM }
 
-vprol<dq>, 0x6672/1, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpror<dq>, 0x6672/0, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vprol<dq>, 0x6672/1, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpror<dq>, 0x6672/0, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpshufd, 0x6670, AVX512F, Modrm|Masking|Space0F|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vpsll<dq>, 0x66f2 | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsll<dq>, 0x6672 | <dq:opc>/6, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsra<dq>, 0x66e2, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsra<dq>, 0x6672/4, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsrl<dq>, 0x66d2 | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX512F, Modrm|Masking|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsll<dq>, 0x66f2 | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsll<dq>, 0x6672 | <dq:opc>/6, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsra<dq>, 0x66e2, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsra<dq>, 0x6672/4, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrl<dq>, 0x66d2 | <dq:opc>, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vrcp14p<sd>, 0x664C, AVX512F, Modrm|Masking|Space0F38|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vrcp14s<sd>, 0x664D, AVX512F, Modrm|EVexLIG|Masking|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrcp14s<sd>, 0x664D, AVX512F, Modrm|EVexLIG|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vrsqrt14p<sd>, 0x664E, AVX512F, Modrm|Masking|Space0F38|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vrsqrt14s<sd>, 0x664F, AVX512F, Modrm|EVexLIG|Masking|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrsqrt14s<sd>, 0x664F, AVX512F, Modrm|EVexLIG|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vshuff32x4, 0x6623, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vshufi32x4, 0x6643, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshuff32x4, 0x6623, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshufi32x4, 0x6643, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
-vshuff64x2, 0x6623, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vshufi64x2, 0x6643, AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshuff64x2, 0x6623, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshufi64x2, 0x6643, AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
-vshufp<sd>, 0x<sd:ppfx>C6, AVX512F, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vshufp<sd>, 0x<sd:ppfx>C6, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vunpckhp<sd>, 0x<sd:ppfx>15, AVX512F, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vunpcklp<sd>, 0x<sd:ppfx>14, AVX512F, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vunpckhp<sd>, 0x<sd:ppfx>15, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vunpcklp<sd>, 0x<sd:ppfx>14, AVX512F, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512F instructions end.
 
@@ -2464,10 +2553,10 @@ vplzcnt<dq>, 0x6644, AVX512CD, Modrm|Masking|Space0F38|<dq:vexw>|Broadcast|Disp8
 vexp2p<sd>, 0x66C8, AVX512ER, Modrm|EVex512|Masking|Space0F38|<sd:vexw>|Broadcast|Disp8MemShift=6|NoSuf|SAE, { RegZMM|<sd:elem>|Unspecified|BaseIndex, RegZMM }
 
 vrcp28p<sd>, 0x66CA, AVX512ER, Modrm|EVex512|Masking|Space0F38|<sd:vexw>|Broadcast|Disp8MemShift=6|NoSuf|SAE, { RegZMM|<sd:elem>|Unspecified|BaseIndex, RegZMM }
-vrcp28s<sd>, 0x66CB, AVX512ER, Modrm|EVexLIG|Masking|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrcp28s<sd>, 0x66CB, AVX512ER, Modrm|EVexLIG|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vrsqrt28p<sd>, 0x66CC, AVX512ER, Modrm|EVex512|Masking|Space0F38|<sd:vexw>|Broadcast|Disp8MemShift=6|NoSuf|SAE, { RegZMM|<sd:elem>|Unspecified|BaseIndex, RegZMM }
-vrsqrt28s<sd>, 0x66CD, AVX512ER, Modrm|EVexLIG|Masking|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrsqrt28s<sd>, 0x66CD, AVX512ER, Modrm|EVexLIG|Masking|Space0F38|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // AVX512ER instructions end.
 
@@ -2613,9 +2702,9 @@ vpmovzxdq, 0x6635, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8
 
 // AVX512BW instructions.
 
-kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
-kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
-kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
+kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
+kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
+kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, AVX512BW|APX_F, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
 kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
@@ -2623,95 +2712,95 @@ kmov<dq>, 0x<dq:kpfx>91, AVX512BW|APX_F, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|N
 kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 kmov<dq>, 0xf292, AVX512BW|APX_F, D|Modrm|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
-kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
+kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 ktest<dq>, 0x<dq:kpfx>99, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
-kxnor<dq>, 0x<dq:kpfx>46, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
-kxor<dq>, 0x<dq:kpfx>47, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kunpckdq, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|Vsz512|NoSuf, { RegMask, RegMask, RegMask }
-kunpckwd, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW0|Vsz256|NoSuf, { RegMask, RegMask, RegMask }
+kxnor<dq>, 0x<dq:kpfx>46, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
+kxor<dq>, 0x<dq:kpfx>47, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
+kunpckdq, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW1|Vsz512|NoSuf, { RegMask, RegMask, RegMask }
+kunpckwd, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|Vsz256|NoSuf, { RegMask, RegMask, RegMask }
 kshiftl<dq>, 0x6633, AVX512BW, Modrm|Vex128|Space0F3A|<dq:vexw>|<dq:kvsz>|NoSuf, { Imm8, RegMask, RegMask }
 kshiftr<dq>, 0x6631, AVX512BW, Modrm|Vex128|Space0F3A|<dq:vexw>|<dq:kvsz>|NoSuf, { Imm8, RegMask, RegMask }
 
-vdbpsadbw, 0x6642, AVX512BW, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vdbpsadbw, 0x6642, AVX512BW, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vmovdqu8, 0xF26F, AVX512BW, D|Modrm|Masking|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqu16, 0xF26F, AVX512BW, D|Modrm|Masking|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpabs<bw>, 0x661c | <bw:opc>, AVX512BW, Modrm|Masking|Space0F38|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpmaxsb, 0x663C, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminsb, 0x6638, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshufb, 0x6600, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpmaddubsw, 0x6604, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxuw, 0x663E, AVX512BW, Modrm|Masking|VexWIG|Space0F38|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminuw, 0x663A, AVX512BW, Modrm|Masking|VexWIG|Space0F38|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhrsw, 0x660B, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpackssdw, 0x666B, AVX512BW, Modrm|Masking|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpacksswb, 0x6663, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpackuswb, 0x6667, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpackusdw, 0x662B, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpadd<bw>, 0x66fc | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpadds<bw>, 0x66ec | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpaddus<bw>, 0x66dc | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpavg<bw>, 0x66e0 | (3 * <bw:opc>), AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxub, 0x66DE, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminub, 0x66DA, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsub<bw>, 0x66f8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsubs<bw>, 0x66e8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsubus<bw>, 0x66d8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhbw, 0x6668, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpcklbw, 0x6660, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpmaxsw, 0x66EE, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminsw, 0x66EA, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhuw, 0x66E4, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhw, 0x66E5, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmullw, 0x66D5, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsllw, 0x6671/6, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsllw, 0x66F1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsraw, 0x6671/4, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsraw, 0x66E1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlw, 0x6671/2, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsrlw, 0x66D1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhwd, 0x6669, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpcklwd, 0x6661, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpalignr, 0x660F, AVX512BW, Modrm|Masking|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpblendm<bw>, 0x6666, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxsb, 0x663C, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminsb, 0x6638, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshufb, 0x6600, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpmaddubsw, 0x6604, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxuw, 0x663E, AVX512BW, Modrm|Masking|VexWIG|Space0F38|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminuw, 0x663A, AVX512BW, Modrm|Masking|VexWIG|Space0F38|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhrsw, 0x660B, AVX512BW, Modrm|Masking|Space0F38|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpackssdw, 0x666B, AVX512BW, Modrm|Masking|Space0F|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpacksswb, 0x6663, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpackuswb, 0x6667, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpackusdw, 0x662B, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpadd<bw>, 0x66fc | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpadds<bw>, 0x66ec | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpaddus<bw>, 0x66dc | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpavg<bw>, 0x66e0 | (3 * <bw:opc>), AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxub, 0x66DE, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminub, 0x66DA, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsub<bw>, 0x66f8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsubs<bw>, 0x66e8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsubus<bw>, 0x66d8 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhbw, 0x6668, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpcklbw, 0x6660, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpmaxsw, 0x66EE, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminsw, 0x66EA, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhuw, 0x66E4, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhw, 0x66E5, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmullw, 0x66D5, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsllw, 0x6671/6, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsllw, 0x66F1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsraw, 0x6671/4, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsraw, 0x66E1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrlw, 0x6671/2, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrlw, 0x66D1, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhwd, 0x6669, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpcklwd, 0x6661, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpalignr, 0x660F, AVX512BW, Modrm|Masking|Space0F3A|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpblendm<bw>, 0x6666, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpbroadcast<bw>, 0x6678 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F38|VexW0|Disp8MemShift|NoSuf, { RegXMM|<bw:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpbroadcast<bw>, 0x667a | <bw:opc>, AVX512BW, Modrm|Masking|Space0F38|VexW0|NoSuf, { Reg32, RegXMM|RegYMM|RegZMM }
 
-vpermi2<bw>, 0x6675, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermt2<bw>, 0x667d, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vperm<bw>, 0x668d, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsllvw, 0x6612, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsravw, 0x6611, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlvw, 0x6610, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermi2<bw>, 0x6675, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermt2<bw>, 0x667d, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vperm<bw>, 0x668d, <bw:cpubmi>, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsllvw, 0x6612, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsravw, 0x6611, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrlvw, 0x6610, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpcmpeq<bw>, 0x6674 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpgt<bw>, 0x6664 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<bw>, 0x663f, AVX512BW, Modrm|Masking|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpu<bw>, 0x663e, AVX512BW, Modrm|Masking|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<irel><bw>, 0x663f/<irel:imm>, AVX512BW, Modrm|Masking|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmp<irel>u<bw>, 0x663e/<irel:imm>, AVX512BW, Modrm|Masking|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpeq<bw>, 0x6674 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpgt<bw>, 0x6664 | <bw:opc>, AVX512BW, Modrm|Masking|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<bw>, 0x663f, AVX512BW, Modrm|Masking|Space0F3A|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpu<bw>, 0x663e, AVX512BW, Modrm|Masking|Space0F3A|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<irel><bw>, 0x663f/<irel:imm>, AVX512BW, Modrm|Masking|Space0F3A|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmp<irel>u<bw>, 0x663e/<irel:imm>, AVX512BW, Modrm|Masking|Space0F3A|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vpslldq, 0x6673/7, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsrldq, 0x6673/3, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpslldq, 0x6673/7, AVX512BW, Modrm|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrldq, 0x6673/3, AVX512BW, Modrm|Space0F|VexWIG|VexVVVVSrc|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpextrw, 0x66C5, AVX512BW, Load|Modrm|EVex128|Space0F|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX512BW, RegMem|EVex128|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift|NoSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
 
-vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVV|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
-vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVV|Disp8MemShift=1|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
-vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVVSrc|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVVSrc|Disp8MemShift=1|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVVSrc|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVVSrc|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vpmaddwd, 0x66F5, AVX512BW, Modrm|Masking|Space0F|VexVVVV|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaddwd, 0x66F5, AVX512BW, Modrm|Masking|Space0F|VexVVVVSrc|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpmov<bw>2m, 0xf329, AVX512BW, Modrm|EVexDYN|Space0F38|<bw:vexw>|NoSuf, { RegXMM|RegYMM|RegZMM, RegMask }
 vpmovm2<bw>, 0xf328, AVX512BW, Modrm|EVexDYN|Space0F38|<bw:vexw>|NoSuf, { RegMask, RegXMM|RegYMM|RegZMM }
@@ -2735,13 +2824,13 @@ vpmovzxbw, 0x6630, AVX512BW, Modrm|EVex=1|Masking|Space0F38|VexWIG|Disp8MemShift
 vpmovzxbw, 0x6630, AVX512BW|AVX512VL, Modrm|EVex=2|Masking|VexWIG|Space0F38|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vpmovzxbw, 0x6630, AVX512BW|AVX512VL, Modrm|EVex=3|Masking|VexWIG|Space0F38|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
 
-vpsadbw, 0x66F6, AVX512BW, Modrm|Space0F|VexVVVV|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsadbw, 0x66F6, AVX512BW, Modrm|Space0F|VexVVVVSrc|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpshufhw, 0xF370, AVX512BW, Modrm|Masking|Space0F|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpshuflw, 0xF270, AVX512BW, Modrm|Masking|Space0F|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vptestm<bw>, 0x6626, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vptestnm<bw>, 0xf326, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vptestm<bw>, 0x6626, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vptestnm<bw>, 0xf326, AVX512BW, Modrm|Masking|Space0F38|VexVVVVSrc|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
 // AVX512BW instructions end.
 
@@ -2754,13 +2843,13 @@ vptestnm<bw>, 0xf326, AVX512BW, Modrm|Masking|Space0F38|VexVVVV|<bw:vexw>|Disp8S
     x:AVX512VL:EVex128|Disp8MemShift=4::ATTSyntax:RegXMM|Unspecified|BaseIndex, +
     y:AVX512VL:EVex256|Disp8MemShift=5::ATTSyntax:RegYMM|Unspecified|BaseIndex>
 
-kadd<bw>, 0x<bw:kpfx>4A, AVX512DQ, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
+kadd<bw>, 0x<bw:kpfx>4A, AVX512DQ, Modrm|Vex256|Space0F|VexVVVVSrc|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 ktest<bw>, 0x<bw:kpfx>99, AVX512DQ, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 
-vandnp<sd>, 0x<sd:ppfx>55, AVX512DQ, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vandp<sd>, 0x<sd:ppfx>54, AVX512DQ, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vorp<sd>, 0x<sd:ppfx>56, AVX512DQ, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vxorp<sd>, 0x<sd:ppfx>57, AVX512DQ, Modrm|Masking|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vandnp<sd>, 0x<sd:ppfx>55, AVX512DQ, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vandp<sd>, 0x<sd:ppfx>54, AVX512DQ, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vorp<sd>, 0x<sd:ppfx>56, AVX512DQ, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vxorp<sd>, 0x<sd:ppfx>57, AVX512DQ, Modrm|Masking|Space0F|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vbroadcastf32x2, 0x6619, AVX512DQ, Modrm|Masking|Space0F38|VexW0|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM }
 vbroadcastf32x8, 0x661B, AVX512DQ, Modrm|EVex=1|Masking|Space0F38|VexW=1|Disp8MemShift=5|NoSuf, { YMMword|Unspecified|BaseIndex, RegZMM }
@@ -2799,16 +2888,16 @@ vcvtuqq2ps<Exy>, 0xf27a, AVX512DQ|<Exy:vl>, Modrm|<Exy:attr>|Masking|Space0F|Vex
 
 vextractf32x8, 0x661B, AVX512DQ, Modrm|EVex=1|Masking|Space0F3A|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti32x8, 0x663B, AVX512DQ, Modrm|EVex=1|Masking|Space0F3A|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
-vinsertf32x8, 0x661A, AVX512DQ, Modrm|EVex512|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
-vinserti32x8, 0x663A, AVX512DQ, Modrm|EVex512|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinsertf32x8, 0x661A, AVX512DQ, Modrm|EVex512|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinserti32x8, 0x663A, AVX512DQ, Modrm|EVex512|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
 
 vpextr<dq>, 0x6616, AVX512DQ|<dq:cpu64>, Modrm|EVex128|Space0F3A|<dq:vexw64>|Disp8MemShift|NoSuf, { Imm8, RegXMM, <dq:gpr>|Unspecified|BaseIndex }
-vpinsr<dq>, 0x6622, AVX512DQ|<dq:cpu64>, Modrm|EVex128|Space0F3A|VexVVVV|<dq:vexw64>|Disp8MemShift|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpinsr<dq>, 0x6622, AVX512DQ|<dq:cpu64>, Modrm|EVex128|Space0F3A|VexVVVVSrc|<dq:vexw64>|Disp8MemShift|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vextractf64x2, 0x6619, AVX512DQ, Modrm|Masking|Space0F3A|VexW=2|Disp8MemShift=4|NoSuf, { Imm8, RegYMM|RegZMM, RegXMM|Unspecified|BaseIndex }
 vextracti64x2, 0x6639, AVX512DQ, Modrm|Masking|Space0F3A|VexW=2|Disp8MemShift=4|NoSuf, { Imm8, RegYMM|RegZMM, RegXMM|Unspecified|BaseIndex }
-vinsertf64x2, 0x6618, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vinserti64x2, 0x6638, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinsertf64x2, 0x6618, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinserti64x2, 0x6638, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vfpclassp<sd>, 0x6666, AVX512DQ, Modrm|Masking|Space0F3A|<sd:vexw>|Broadcast|Disp8ShiftVL|NoSuf|IntelSyntax, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegMask }
 vfpclassp<sd>, 0x6666, AVX512DQ, Modrm|Masking|Space0F3A|<sd:vexw>|Broadcast|Disp8ShiftVL|NoSuf|ATTSyntax, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|BaseIndex, RegMask }
@@ -2820,13 +2909,13 @@ vfpclasss<sdh>, 0x<sdh:pfx>67, <sdh:cpudq>, Modrm|EVexLIG|Masking|Space0F3A|<sdh
 vpmov<dq>2m, 0xf339, AVX512DQ, Modrm|EVexDYN|Space0F38|<dq:vexw>|NoSuf, { RegXMM|RegYMM|RegZMM, RegMask }
 vpmovm2<dq>, 0xf338, AVX512DQ, Modrm|EVexDYN|Space0F38|<dq:vexw>|NoSuf, { RegMask, RegXMM|RegYMM|RegZMM }
 
-vpmullq, 0x6640, AVX512DQ, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmullq, 0x6640, AVX512DQ, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vrangep<sd>, 0x6650, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vranges<sd>, 0x6651, AVX512DQ, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrangep<sd>, 0x6650, AVX512DQ, Modrm|Masking|Space0F3A|VexVVVVSrc|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vranges<sd>, 0x6651, AVX512DQ, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vreducep<sdh>, 0x<sdh:pfx>56, <sdh:cpudq>, Modrm|Masking|Space0F3A|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vreduces<sdh>, 0x<sdh:pfx>57, <sdh:cpudq>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
+vreduces<sdh>, 0x<sdh:pfx>57, <sdh:cpudq>, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // AVX512DQ instructions end.
 
@@ -2838,37 +2927,37 @@ clwb, 0x660fae/6, CLWB, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 
 // AVX512IFMA instructions
 
-vpmadd52huq, 0x66B5, AVX512IFMA, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmadd52luq, 0x66B4, AVX512IFMA, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmadd52huq, 0x66B5, AVX512IFMA, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmadd52luq, 0x66B4, AVX512IFMA, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512IFMA instructions end
 
 // AVX-IFMA instructions.
 
-vpmadd52huq, 0x66B5, AVX_IFMA, Modrm|Vex|Space0F38|VexVVVV|VexW1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpmadd52luq, 0x66B4, AVX_IFMA, Modrm|Vex|Space0F38|VexVVVV|VexW1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmadd52huq, 0x66B5, AVX_IFMA, Modrm|Vex|Space0F38|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpmadd52luq, 0x66B4, AVX_IFMA, Modrm|Vex|Space0F38|VexVVVVSrc|VexW1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX-IFMA instructions end.
 
 // AVX512VBMI instructions
 
-vpmultishiftqb, 0x6683, AVX512VBMI, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmultishiftqb, 0x6683, AVX512VBMI, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512VBMI instructions end
 
 // AVX512_4FMAPS instructions
 
-v4fmaddps, 0xf29a, AVX512_4FMAPS, Modrm|EVex=1|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-v4fnmaddps, 0xf2aa, AVX512_4FMAPS, Modrm|EVex=1|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-v4fmaddss, 0xf29b, AVX512_4FMAPS, Modrm|EVex=4|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
-v4fnmaddss, 0xf2ab, AVX512_4FMAPS, Modrm|EVex=4|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
+v4fmaddps, 0xf29a, AVX512_4FMAPS, Modrm|EVex=1|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+v4fnmaddps, 0xf2aa, AVX512_4FMAPS, Modrm|EVex=1|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+v4fmaddss, 0xf29b, AVX512_4FMAPS, Modrm|EVex=4|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
+v4fnmaddss, 0xf2ab, AVX512_4FMAPS, Modrm|EVex=4|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // AVX512_4FMAPS instructions end
 
 // AVX512_4VNNIW instructions
 
-vp4dpwssd, 0xf252, AVX512_4VNNIW, Modrm|EVex=1|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-vp4dpwssds, 0xf253, AVX512_4VNNIW, Modrm|EVex=1|Masking|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+vp4dpwssd, 0xf252, AVX512_4VNNIW, Modrm|EVex=1|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+vp4dpwssds, 0xf253, AVX512_4VNNIW, Modrm|EVex=1|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
 
 // AVX512_4VNNIW instructions end
 
@@ -2886,59 +2975,59 @@ vpcompressw, 0x6663, AVX512_VBMI2, Modrm|Masking|Space0F38|VexW=2|Disp8MemShift=
 vpexpandb, 0x6662, AVX512_VBMI2, Modrm|Masking|Space0F38|VexW=1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpexpandw, 0x6662, AVX512_VBMI2, Modrm|Masking|Space0F38|VexW=2|Disp8MemShift=1|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vpshldv<dq>, 0x6671, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshldvw, 0x6670, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshldv<dq>, 0x6671, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshldvw, 0x6670, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpshrdv<dq>, 0x6673, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshrdvw, 0x6672, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshrdv<dq>, 0x6673, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshrdvw, 0x6672, AVX512_VBMI2, Modrm|Masking|Space0F38|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpshld<dq>, 0x6671, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshldw, 0x6670, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshld<dq>, 0x6671, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshldw, 0x6670, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpshrd<dq>, 0x6673, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshrdw, 0x6672, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshrd<dq>, 0x6673, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshrdw, 0x6672, AVX512_VBMI2, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512_VBMI2 instructions end
 
 // AVX512_VNNI instructions
 
-vpdpbusd, 0x6650, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpdpwssd, 0x6652, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpbusd, 0x6650, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpwssd, 0x6652, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpdpbusds, 0x6651, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpdpwssds, 0x6653, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpbusds, 0x6651, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpwssds, 0x6653, AVX512_VNNI, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512_VNNI instructions end
 
 // AVX_VNNI instructions
 
-vpdpbusd, 0x6650, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwssd, 0x6652, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbusd, 0x6650, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwssd, 0x6652, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 
-vpdpbusds, 0x6651, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwssds, 0x6653, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbusds, 0x6651, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwssds, 0x6653, AVX_VNNI, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX_VNNI instructions end
 
 // AVX-VNNI-INT8 instructions.
 
-vpdpbuud, 0x50, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpbuuds, 0x51, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpbssd, 0xf250, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpbssds, 0xf251, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpbsud, 0xf350, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpbsuds, 0xf351, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbuud, 0x50, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbuuds, 0x51, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbssd, 0xf250, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbssds, 0xf251, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbsud, 0xf350, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpbsuds, 0xf351, AVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX-VNNI-INT8 instructions end.
 
 // AVX-VNNI-INT16 instructions.
 
-vpdpwuud, 0xd2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwuuds, 0xd3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwusd, 0x66d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwusds, 0x66d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
-vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwuud, 0xd2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwuuds, 0xd3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwusd, 0x66d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwusds, 0x66d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVVSrc|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX-VNNI-INT16 instructions end.
 
@@ -2946,14 +3035,14 @@ vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperan
 
 vpopcnt<bw>, 0x6654, AVX512_BITALG, Modrm|Masking|Space0F38|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vpshufbitqmb, 0x668f, AVX512_BITALG, Modrm|Masking|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpshufbitqmb, 0x668f, AVX512_BITALG, Modrm|Masking|Space0F38|VexVVVVSrc|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
 // AVX512_BITALG instructions end
 
 // AVX512 + GFNI instructions
 
-vgf2p8affineinvqb, 0x66cf, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vgf2p8affineqb, 0x66ce, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8affineinvqb, 0x66cf, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8affineqb, 0x66ce, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512 + GFNI instructions end
 
@@ -3082,11 +3171,11 @@ movdir64b, 0x66f8, MOVDIR64B|APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4
 
 // AVX512_BF16 instructions.
 
-vcvtne2ps2bf16, 0xf272, AVX512_BF16, Modrm|Space0F38|VexVVVV|Masking|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vcvtne2ps2bf16, 0xf272, AVX512_BF16, Modrm|Space0F38|VexVVVVSrc|Masking|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vcvtneps2bf16<Exy>, 0xf372, AVX512_BF16|<Exy:vl>, Modrm|Space0F38|<Exy:attr>|Masking|VexW0|Broadcast|NoSuf, { <Exy:src>|Dword, <Exy:dst> }
 
-vdpbf16ps, 0xf352, AVX512_BF16, Modrm|Space0F38|VexVVVV|Masking|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vdpbf16ps, 0xf352, AVX512_BF16, Modrm|Space0F38|VexVVVVSrc|Masking|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512_BF16 instructions end.
 
@@ -3113,7 +3202,7 @@ enqcmds, 0xf3f8, ENQCMD|APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { U
 
 // VP2INTERSECT instructions.
 
-vp2intersect<dq>, 0xf268, AVX512_VP2INTERSECT, Modrm|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vp2intersect<dq>, 0xf268, AVX512_VP2INTERSECT, Modrm|Space0F38|VexVVVVSrc|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
 // VP2INTERSECT instructions end.
 
@@ -3171,15 +3260,15 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 ldtilecfg, 0x49/0, AMX_TILE|APX_F, Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 sttilecfg, 0x6649/0, AMX_TILE|APX_F, Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
-tcmmimfp16ps, 0x666c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tcmmrlfp16ps, 0x6c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tcmmimfp16ps, 0x666c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tcmmrlfp16ps, 0x6c, AMX_COMPLEX|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tdpbf16ps, 0xf35c, AMX_BF16|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpfp16ps, 0xf25c, AMX_FP16|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbssd, 0xf25e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbuud, 0x5e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbusd, 0x665e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbsud, 0xf35e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbf16ps, 0xf35c, AMX_BF16|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpfp16ps, 0xf25c, AMX_FP16|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 0xf25e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 0x5e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 0x665e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 0xf35e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVVSrc|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
 tileloadd, 0xf24b, AMX_TILE|APX_F, Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 tileloaddt1, 0x664b, AMX_TILE|APX_F, Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
@@ -3244,23 +3333,23 @@ hreset, 0xf30f3af0c0, HRESET, NoSuf, { Imm8 }
 
 // FP16 (HFNI) instructions.
 
-vfcmaddcph, 0xf256, AVX512_FP16, Modrm|VexVVVV|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfcmaddcsh, 0xf257, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfcmaddcph, 0xf256, AVX512_FP16, Modrm|VexVVVVSrc|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfcmaddcsh, 0xf257, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vfmaddcph, 0xf356, AVX512_FP16, Modrm|VexVVVV|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfmaddcsh, 0xf357, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmaddcph, 0xf356, AVX512_FP16, Modrm|VexVVVVSrc|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmaddcsh, 0xf357, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vfcmulcph, 0xf2d6, AVX512_FP16, Modrm|VexVVVV|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfcmulcsh, 0xf2d7, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfcmulcph, 0xf2d6, AVX512_FP16, Modrm|VexVVVVSrc|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfcmulcsh, 0xf2d7, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vfmulcph, 0xf3d6, AVX512_FP16, Modrm|VexVVVV|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vfmulcsh, 0xf3d7, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmulcph, 0xf3d6, AVX512_FP16, Modrm|VexVVVVSrc|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmulcsh, 0xf3d7, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=2|DistinctDest|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcmp<frel>ph, 0xc2/0x<frel:imm>, AVX512_FP16, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vcmpph, 0xc2, AVX512_FP16, Modrm|Masking|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vcmp<frel>ph, 0xc2/0x<frel:imm>, AVX512_FP16, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vcmpph, 0xc2, AVX512_FP16, Modrm|Masking|Space0F3A|VexVVVVSrc|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vcmp<frel>sh, 0xf3c2/0x<frel:imm>, AVX512_FP16, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|NoSuf|ImmExt|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegMask }
-vcmpsh, 0xf3c2, AVX512_FP16, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|NoSuf|SAE, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegMask }
+vcmp<frel>sh, 0xf3c2/0x<frel:imm>, AVX512_FP16, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf|ImmExt|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegMask }
+vcmpsh, 0xf3c2, AVX512_FP16, Modrm|EVexLIG|Masking|Space0F3A|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf|SAE, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegMask }
 
 vcvtdq2ph<Exy>, 0x5b, AVX512_FP16|<Exy:vl>, Modrm|<Exy:attr>|Masking|EVexMap5|VexW0|Broadcast|NoSuf|<Exy:sr>, { <Exy:src>|Dword, <Exy:dst> }
 vcvtudq2ph<Exy>, 0xf27a, AVX512_FP16|<Exy:vl>, Modrm|<Exy:attr>|Masking|EVexMap5|VexW0|Broadcast|NoSuf|<Exy:sr>, { <Exy:src>|Dword, <Exy:dst> }
@@ -3298,17 +3387,17 @@ vcvtph2pd, 0x5a, AVX512_FP16, Modrm|EVex512|Masking|EVexMap5|VexW0|Broadcast|Dis
 vcvtph2w, 0x667d, AVX512_FP16, Modrm|Masking|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvtph2uw, 0x7d, AVX512_FP16, Modrm|Masking|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvtsd2sh, 0xf25a, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVV|VexW1|Disp8MemShift=3|NoSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtss2sh, 0x1d, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVV|VexW0|Disp8MemShift=2|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsd2sh, 0xf25a, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVVSrc|VexW1|Disp8MemShift=3|NoSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtss2sh, 0x1d, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVVSrc|VexW0|Disp8MemShift=2|NoSuf|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsi2sh, 0xf32a, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsi2sh, 0xf32a, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sh, 0xf32a, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sh, 0xf32a, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtusi2sh, 0xf37b, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtusi2sh, 0xf37b, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sh, 0xf37b, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVVSrc|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sh, 0xf37b, AVX512_FP16, Modrm|EVexLIG|EVexMap5|VexVVVVSrc|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsh2sd, 0xf35a, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVV|VexW0|Disp8MemShift=1|NoSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtsh2ss, 0x13, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|NoSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsh2sd, 0xf35a, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap5|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsh2ss, 0x13, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vcvtsh2si, 0xf32d, AVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|NoSuf|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
 
@@ -3344,11 +3433,11 @@ vmovw, 0x667e, AVX512_FP16, D|RegMem|EVex128|VexWIG|EVexMap5|NoSuf, { RegXMM, Re
 
 vrcpph, 0x664c, AVX512_FP16, Modrm|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vrcpsh, 0x664d, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrcpsh, 0x664d, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vrsqrtph, 0x664e, AVX512_FP16, Modrm|Masking|EVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vrsqrtsh, 0x664f, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrsqrtsh, 0x664f, AVX512_FP16, Modrm|EVexLIG|Masking|EVexMap6|VexVVVVSrc|VexW0|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // FP16 (HFNI) instructions end.
 
@@ -3361,8 +3450,8 @@ prefetchit1, 0xf18/6, PREFETCHI|x64, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64|APX_F, Modrm|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64, Modrm|Vex|Space0F38|VexVVVVSrc|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64|APX_F, Modrm|EVex128|Space0F38|VexVVVVSrc|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 6/8] Support APX Push2/Pop2
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (4 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-08 11:44   ` Jan Beulich
  2023-11-09  9:57   ` Jan Beulich
  2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant, Mo, Zewei

From: "Mo, Zewei" <zewei.mo@intel.com>

PPX functionality for PUSH/POP is not implemented in this patch
and will be implemented separately.

gas/ChangeLog:

	* config/tc-i386.c: (enum i386_error):
	New unsupported_rsp_register.
	(md_assemble): Add handler for unsupported_rsp_register.
	(check_VecOperands): Add invalid check for push2/pop2.
	* testsuite/gas/i386/i386.exp: Add apx-push2pop2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2.d: New test.
	* testsuite/gas/i386/x86-64-apx-push2pop2.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Added bad
	testcases for POP.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-mod.h: Add MOD_EVEX_MAP4_8F_R_0
	and MOD_EVEX_MAP4_FF_R_6
	* i386-dis-evex-prefix.h: Add PREFIX_EVEX_MAP4_8F_R_0_M_1
	and PREFIX_EVEX_MAP4_FF_R_6_M_1.
	* i386-dis-evex-reg.h: Add REG_EVEX_MAP4_8F.
	* i386-dis-evex-w.h: Add EVEX_W_MAP4_8F_R_0_M_1_P_0
	and EVEX_W_MAP4_FF_R_6_M_1_P_0
	* i386-dis-evex.h: Add REG_EVEX_MAP4_8F.
	* i386-dis.c (PUSH2_POP2_Fixup): Add special handling for PUSH2/POP2.
	(get_valid_dis386): Add handler for vector length and address_mode for
	APX-Push2/Pop2 insn.
	(OP_VEX): Add handler of 64-bit vvvv register for APX-Push2/Pop2 insn.
	* i386-gen.c: Add Push2Pop2 bitfield.
	* i386-opc.h: Regenerated.
	* i386-opc.tbl: Regenerated.
---
 gas/config/tc-i386.c                          | 22 ++++++++++
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |  5 +++
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |  9 ++++
 gas/testsuite/gas/i386/i386.exp               |  1 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  6 ++-
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  6 +++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     | 42 +++++++++++++++++++
 .../gas/i386/x86-64-apx-push2pop2-inval.l     | 11 +++++
 .../gas/i386/x86-64-apx-push2pop2-inval.s     | 15 +++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d | 42 +++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s | 39 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |  3 ++
 opcodes/i386-dis-evex-mod.h                   | 10 +++++
 opcodes/i386-dis-evex-prefix.h                |  8 ++++
 opcodes/i386-dis-evex-reg.h                   |  9 ++++
 opcodes/i386-dis-evex-w.h                     | 10 +++++
 opcodes/i386-dis-evex.h                       |  2 +-
 opcodes/i386-dis.c                            | 40 +++++++++++++++++-
 opcodes/i386-gen.c                            |  1 +
 opcodes/i386-opc.h                            |  4 ++
 opcodes/i386-opc.tbl                          |  7 ++++
 21 files changed, 287 insertions(+), 5 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 5b925505435..7a86aff1828 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -256,6 +256,7 @@ enum i386_error
     mask_not_on_destination,
     no_default_mask,
     unsupported_rc_sae,
+    unsupported_rsp_register,
     invalid_register_operand,
     internal_error,
   };
@@ -5476,6 +5477,9 @@ md_assemble (char *line)
 	case unsupported_rc_sae:
 	  err_msg = _("unsupported static rounding/sae");
 	  break;
+	case unsupported_rsp_register:
+	  err_msg = _("unsupported rsp register");
+	  break;
 	case invalid_register_operand:
 	  err_msg = _("invalid register operand");
 	  break;
@@ -6854,6 +6858,24 @@ check_VecOperands (const insn_template *t)
 	}
     }
 
+  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
+  if (t->opcode_modifier.push2pop2)
+    {
+      unsigned int reg1 = register_number (i.op[0].regs);
+      unsigned int reg2 = register_number (i.op[1].regs);
+
+      if (reg1 == 0x4 || reg2 == 0x4)
+	{
+	  i.error = unsupported_rsp_register;
+	  return 1;
+	}
+      if (t->base_opcode == 0x8f && reg1 == reg2)
+	{
+	  i.error = invalid_dest_and_src_register_set;
+	  return 1;
+	}
+    }
+
   /* Check if broadcast is supported by the instruction and is applied
      to the memory operand.  */
   if (i.broadcast.type || i.broadcast.bytes)
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.l b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
new file mode 100644
index 00000000000..a55a71520c8
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
@@ -0,0 +1,5 @@
+.* Assembler messages:
+.*:6: Error: `push2' is only supported in 64-bit mode
+.*:7: Error: `push2p' is only supported in 64-bit mode
+.*:8: Error: `pop2' is only supported in 64-bit mode
+.*:9: Error: `pop2p' is only supported in 64-bit mode
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.s b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
new file mode 100644
index 00000000000..77166327ed1
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
@@ -0,0 +1,9 @@
+# Check 32bit APX-PUSH2/POP2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rax, %rbx
+	push2p %rax, %rbx
+	pop2 %rax, %rbx
+	pop2p %rax, %rbx
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index ee74bcd4615..75e1a4ca369 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -509,6 +509,7 @@ if [gas_32_check] then {
     run_dump_test "sm4"
     run_dump_test "sm4-intel"
     run_list_test "pbndkb-inval"
+    run_list_test "apx-push2pop2-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index 9060b697c0d..fe652977a54 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -31,5 +31,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+00 ff[ 	]+add    %bh,%bh
 [ 	]*[a-f0-9]+:[ 	]+62 f4 ec[ 	]+\(bad\)
 [ 	]*[a-f0-9]+:[ 	]+08 ff[ 	]+or     %bh,%bh
-[ 	]*[a-f0-9]+:[ 	]+c0[ 	]+.byte 0xc0
-#pass
+[ 	]*[a-f0-9]+:[ 	]+c0 ff ff[ 	]+sar    \$0xff,%bh
+[ 	]*[a-f0-9]+:[ 	]+62 f4 64[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+08 8f c0 ff ff ff[ 	]+or     %cl,-0x40\(%rdi\)
+[ 	]*[a-f0-9]+:[ 	]+62 f4 7c 18 8f c0[ 	]+pop2   %rax,\(bad\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index d4f4cb72e6e..e6f61a229f0 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -30,3 +30,9 @@ _start:
         .byte 0xff
         #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
         .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
+        .byte 0xff, 0xff
+	# pop2 %rax, %rbx set EVEX.ND=0.
+        .byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
+        .byte 0xff, 0xff, 0xff
+	# pop2 %rax, %rsp set EVEX.VVVV=0xf.
+        .byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
new file mode 100644
index 00000000000..46b21219582
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw -Mintel
+#name: i386 APX-push2pop2 insns (Intel disassembly)
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
new file mode 100644
index 00000000000..5eea811a047
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
@@ -0,0 +1,11 @@
+.* Assembler messages:
+.*:6: Error: operand size mismatch for `push2'
+.*:7: Error: unsupported rsp register for `push2'
+.*:8: Error: unsupported rsp register for `push2'
+.*:9: Error: operand size mismatch for `push2p'
+.*:10: Error: unsupported rsp register for `push2p'
+.*:11: Error: unsupported rsp register for `pop2'
+.*:12: Error: unsupported rsp register for `pop2'
+.*:13: Error: destination and source registers must be distinct for `pop2'
+.*:14: Error: unsupported rsp register for `pop2p'
+.*:15: Error: destination and source registers must be distinct for `pop2p'
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
new file mode 100644
index 00000000000..c0cd9c3ce89
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
@@ -0,0 +1,15 @@
+# Check illegal APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2  %eax, %ebx
+	push2  %rsp, %r17
+	push2  %r17, %rsp
+	push2p %eax, %ebx
+	push2p %rsp, %r17
+	pop2   %rax, %rsp
+	pop2   %rsp, %rax
+	pop2   %r12, %r12
+	pop2p  %rax, %rsp
+	pop2p  %r12, %r12
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
new file mode 100644
index 00000000000..54f22a7f94e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw
+#name: x86_64 APX-push2pop2 insns
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
new file mode 100644
index 00000000000..4cfc0a2185f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
@@ -0,0 +1,39 @@
+# Check 64bit APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rbx, %rax
+	push2 %r17, %r8
+	push2 %r9, %r31
+	push2 %r31, %r24
+	push2p %rbx, %rax
+	push2p %r17, %r8
+	push2p %r9, %r31
+	push2p %r31, %r24
+	pop2 %rax, %rbx
+	pop2 %r8, %r17
+	pop2 %r31, %r9
+	pop2 %r24, %r31
+	pop2p %rax, %rbx
+	pop2p %r8, %r17
+	pop2p %r31, %r9
+	pop2p %r24, %r31
+
+.intel_syntax noprefix
+	push2 rax, rbx
+	push2 r8, r17
+	push2 r31, r9
+	push2 r24, r31
+	push2p rax, rbx
+	push2p r8, r17
+	push2p r31, r9
+	push2p r24, r31
+	pop2 rbx, rax
+	pop2 r17, r8
+	pop2 r9, r31
+	pop2 r31, r24
+	pop2p rbx, rax
+	pop2p r17, r8
+	pop2p r9, r31
+	pop2p r31, r24
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 07cb716d2a5..668b366a212 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -342,6 +342,9 @@ run_dump_test "x86-64-avx512dq-rcigrd-intel"
 run_dump_test "x86-64-avx512dq-rcigrd"
 run_dump_test "x86-64-avx512dq-rcigrne-intel"
 run_dump_test "x86-64-avx512dq-rcigrne"
+run_dump_test "x86-64-apx-push2pop2"
+run_dump_test "x86-64-apx-push2pop2-intel"
+run_list_test "x86-64-apx-push2pop2-inval"
 run_dump_test "x86-64-avx512dq-rcigru-intel"
 run_dump_test "x86-64-avx512dq-rcigru"
 run_dump_test "x86-64-avx512dq-rcigrz-intel"
diff --git a/opcodes/i386-dis-evex-mod.h b/opcodes/i386-dis-evex-mod.h
index a60c19add3c..515039b431c 100644
--- a/opcodes/i386-dis-evex-mod.h
+++ b/opcodes/i386-dis-evex-mod.h
@@ -1,4 +1,9 @@
 /* Nothing at present.  */
+  /* MOD_EVEX_MAP4_8F_R_0 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_8F_R_0_M_1) },
+  },
   /* MOD_EVEX_MAP4_DA_PREFIX_1 */
   {
     { Bad_Opcode },
@@ -41,3 +46,8 @@
   {
     { "movdiri",	{ Edq, Gdq }, 0 },
   },
+  /* MOD_EVEX_MAP4_FF_R_6 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_FF_R_6_M_1) },
+  },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 09d8c10bdfd..f8cc5b492c0 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -338,6 +338,10 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_8F_R_0_M_1 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP4_8F_R_0_M_1_P_0) }
+  },
   /* PREFIX_EVEX_MAP4_D8 */
   {
     { "sha1nexte", { XM, EXxmm }, 0 },
@@ -403,6 +407,10 @@
     { "aand",	{ Mdq, Gdq }, 0 },
     { "aor",	{ Mdq, Gdq }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_FF_R_6_M_1 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP4_FF_R_6_M_1_P_0) },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index b75558c40ca..6bc0c26116f 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -86,6 +86,10 @@
     { "subQ",	{ VexGv, Ev, sIb }, 0 },
     { "xorQ",	{ VexGv, Ev, sIb }, 0 },
   },
+  /* REG_EVEX_MAP4_8F */
+  {
+    { MOD_TABLE (MOD_EVEX_MAP4_8F_R_0) },
+  },
   /* REG_EVEX_MAP4_D8_PREFIX_1 */
   {
     { "aesencwide128kl",	{ M }, 0 },
@@ -116,4 +120,9 @@
   {
     { "incQ",   { VexGv ,Ev }, 0 },
     { "decQ",   { VexGv ,Ev }, 0 },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_FF_R_6) },
   },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index b828277d413..ad3db92888c 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -442,6 +442,16 @@
     { Bad_Opcode },
     { "vpshrdw",   { XM, Vex, EXx, Ib }, 0 },
   },
+  /* EVEX_W_MAP4_8F_R_0_M_1_P_0 */
+  {
+    { "pop2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+    { "pop2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+  },
+  /* EVEX_W_MAP4_FF_R_6_M_1_P_0 */
+  {
+    { "push2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+    { "push2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+  },
   /* EVEX_W_MAP5_5B_P_0 */
   {
     { "vcvtdq2ph%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index ef752f417c5..7ad7b5b2cf8 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -1035,7 +1035,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_8F) },
     /* 90 */
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 0de3959cf80..825b14ad0dd 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -105,6 +105,7 @@ static bool FXSAVE_Fixup (instr_info *, int, int);
 static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
+static bool PUSH2_POP2_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -890,6 +891,7 @@ enum
   REG_EVEX_MAP4_80,
   REG_EVEX_MAP4_81,
   REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_8F,
   REG_EVEX_MAP4_D8_PREFIX_1,
   REG_EVEX_MAP4_F6,
   REG_EVEX_MAP4_F7,
@@ -935,6 +937,7 @@ enum
 
   MOD_VEX_0F3849_X86_64_L_0_W_0,
 
+  MOD_EVEX_MAP4_8F_R_0,
   MOD_EVEX_MAP4_DA_PREFIX_1,
   MOD_EVEX_MAP4_DB_PREFIX_1,
   MOD_EVEX_MAP4_DC_PREFIX_1,
@@ -945,6 +948,7 @@ enum
   MOD_EVEX_MAP4_F8_PREFIX_2,
   MOD_EVEX_MAP4_F8_PREFIX_3,
   MOD_EVEX_MAP4_F9,
+  MOD_EVEX_MAP4_FF_R_6,
 };
 
 enum
@@ -1180,6 +1184,7 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_8F_R_0_M_1,
   PREFIX_EVEX_MAP4_D8,
   PREFIX_EVEX_MAP4_DA,
   PREFIX_EVEX_MAP4_DB,
@@ -1192,6 +1197,7 @@ enum
   PREFIX_EVEX_MAP4_F2,
   PREFIX_EVEX_MAP4_F8,
   PREFIX_EVEX_MAP4_FC,
+  PREFIX_EVEX_MAP4_FF_R_6_M_1,
 
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
@@ -1752,6 +1758,9 @@ enum
   EVEX_W_0F3A70,
   EVEX_W_0F3A72,
 
+  EVEX_W_MAP4_8F_R_0_M_1_P_0,
+  EVEX_W_MAP4_FF_R_6_M_1_P_0,
+
   EVEX_W_MAP5_5B_P_0,
   EVEX_W_MAP5_7A_P_3,
 };
@@ -9011,6 +9020,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x4:
 	  vex_table_index = EVEX_MAP4;
 	  ins->evex_type = evex_from_legacy;
+	  if (ins->address_mode != mode_64bit)
+	    return &bad_opcode;
 	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
@@ -9073,8 +9084,9 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	{
 	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
 	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
-	  if (!ins->vex.b && (ins->vex.register_specifier
-				  || !ins->vex.v))
+	  if (ins->vex.ll || (!ins->vex.b
+			      && (ins->vex.register_specifier
+				  || !ins->vex.v)))
 	    return &bad_opcode;
 	  ins->rex |= REX_OPCODE;
 	}
@@ -13437,6 +13449,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	case b_mode:
 	  names = att_names8rex;
 	  break;
+	case q_mode:
+	  names = att_names64;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
@@ -13821,3 +13836,24 @@ PREFETCHI_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_M (ins, bytemode, sizeflag);
 }
+
+static bool
+PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  unsigned int vvvv_reg = ins->vex.register_specifier
+    | !ins->vex.v << 4;
+  unsigned int rm_reg = ins->modrm.rm + (ins->rex & REX_B ? 8 : 0)
+    + (ins->rex2 & REX_B ? 16 : 0);
+
+  /* Here vex.b is treated as "EVEX.ND.  */
+  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
+  if (!ins->vex.b || vvvv_reg == 0x4 || rm_reg == 0x4
+      || (!ins->modrm.reg
+	  && vvvv_reg == rm_reg))
+    {
+      oappend (ins, "(bad)");
+      return true;
+    }
+
+  return OP_VEX (ins, bytemode, sizeflag);
+}
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 2e6ae807bbe..144ec129a32 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -474,6 +474,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
   BITFIELD (NF),
+  BITFIELD (Push2Pop2),
 };
 
 #define CLASS(n) #n, n
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index bb826bbdb34..4b6d52d29cb 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -755,6 +755,9 @@ enum
   /* No CSPAZO flags update indication.  */
   NF,
 
+  /* APX Push2Pop2 bit  */
+  Push2Pop2,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -804,6 +807,7 @@ typedef struct i386_opcode_modifier
   unsigned int isa64:2;
   unsigned int noegpr:1;
   unsigned int nf:1;
+  unsigned int push2pop2:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index b1f7491e7d7..03ebef028f9 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3494,3 +3494,10 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}
 eretu, 0xf30f01ca, FRED|x64, NoSuf, {}
 
 // FRED instructions end.
+
+// APX Push2/Pop2 instruction.
+
+push2, 0xff/6, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+push2p, 0xff/6, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2, 0x8f/0, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (5 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-09 10:36   ` Jan Beulich
  2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
  2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
  8 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

This patch aims to optimize:

add %r16, %r15, %r15 -> add %r16, %r15

gas/ChangeLog:

	* config/tc-i386.c (optimize_NDD_to_nonNDD): New function.
	(match_template): If we can optimzie APX NDD insns, so rematch
	template.
	* testsuite/gas/i386/x86-64.exp: Add test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto.

opcodes/ChangeLog:

	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-tbl.h: Ditto.
	* i386-opc.tbl: Add C to some instructions for support
	optimization.
---
 gas/config/tc-i386.c                          |  46 +++++++
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 124 ++++++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 117 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 opcodes/i386-opc.tbl                          |  22 ++--
 5 files changed, 302 insertions(+), 8 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 7a86aff1828..787108cedc8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7208,6 +7208,44 @@ check_EgprOperands (const insn_template *t)
   return 0;
 }
 
+/* Optimize APX NDD insns to non-NDD insns.  */
+
+static bool
+optimize_NDD_to_nonNDD (const insn_template *t)
+{
+  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
+      && t->opcode_space == SPACE_EVEXMAP4
+      && !i.has_nf
+      && i.reg_operands >= 2
+      && i.types[i.operands - 1].bitfield.class == Reg)
+    {
+      unsigned int readonly_var = ~0;
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = (i.operands > 2) ? i.operands - 2 : 0;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+	  && i.op[src1].regs == i.op[dest].regs)
+	readonly_var = src2;
+      /* adcx, adox and imul don't have D bit.  */
+      else if (i.types[src2].bitfield.class == Reg
+	       && i.op[src2].regs == i.op[dest].regs
+	       && t->opcode_modifier.commutative)
+	readonly_var = src1;
+      if (readonly_var != (unsigned int) ~0)
+	{
+	  --i.operands;
+	  --i.reg_operands;
+	  --i.tm.operands;
+
+	  if (readonly_var != src2)
+	    swap_2_operands (readonly_var, src2);
+	  return 1;
+	}
+    }
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
 	  i.memshift = memshift;
 	}
 
+      /* If we can optimize a NDD insn to non-NDD insn, like
+	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
+      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
+	{
+	  t = current_templates->start - 1;
+	  continue;
+	}
+
       /* We've found a match; break out of loop.  */
       break;
     }
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
new file mode 100644
index 00000000000..f23b2b127b6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
@@ -0,0 +1,124 @@
+#as: -O1
+#objdump: -drw
+#name: x86-64 APX NDD optimized encoding
+#source: x86-64-apx-ndd-optimize.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 19 ff c7          	inc    %r31
+\s*[a-f0-9]+:\s*d5 11 fe c7          	inc    %r31b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 45 00 f8          	add    %r31b,%r8b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 1d 03 c7          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 4d 03 38          	add    \(%r8\),%r31
+\s*[a-f0-9]+:\s*d5 1d 03 07          	add    \(%r31\),%r8
+\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 	add    \$0x12344433,%r15
+\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%r8
+\s*[a-f0-9]+:\s*d5 18 ff c9          	dec    %r17
+\s*[a-f0-9]+:\s*d5 10 fe c9          	dec    %r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d1          	not    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d1          	not    %r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d9          	neg    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d9          	neg    %r17b
+\s*[a-f0-9]+:\s*d5 1c 29 f9          	sub    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 28 f9          	sub    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 29 38    	sub    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 2b 04 07       	sub    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 	sub    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 19 f9          	sbb    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 18 f9          	sbb    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 19 38    	sbb    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 1b 04 07       	sbb    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 	sbb    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 11 f9          	adc    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 10 f9          	adc    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 13 38             	adc    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 13 04 07       	adc    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 	adc    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 09 f9          	or     %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 08 f9          	or     %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 0b 38             	or     \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 0b 04 07       	or     \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 	or     \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 31 f9          	xor    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 30 f9          	xor    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 33 38             	xor    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 33 04 07       	xor    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 	xor    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 21 f9          	and    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 20 f9          	and    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 23 38             	and    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 23 04 07       	and    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 	and    \$0x1234,%r30d
+\s*[a-f0-9]+:\s*d5 19 d1 cf          	ror    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 cf          	ror    %r31b
+\s*[a-f0-9]+:\s*49 c1 cc 02          	ror    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 cc 02          	ror    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 c7          	rol    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 c7          	rol    %r31b
+\s*[a-f0-9]+:\s*49 c1 c4 02          	rol    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 c4 02          	rol    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 df          	rcr    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 df          	rcr    %r31b
+\s*[a-f0-9]+:\s*49 c1 dc 02          	rcr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 dc 02          	rcr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 d7          	rcl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 d7          	rcl    %r31b
+\s*[a-f0-9]+:\s*49 c1 d4 02          	rcl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 d4 02          	rcl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    %r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ff          	sar    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 ff          	sar    %r31b
+\s*[a-f0-9]+:\s*49 c1 fc 02          	sar    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 fc 02          	sar    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    %r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ef          	shr    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 ef          	shr    %r31b
+\s*[a-f0-9]+:\s*49 c1 ec 02          	shr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 ec 02          	shr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f a4 c4 02       	shld   \$0x2,%r8,%r12
+\s*[a-f0-9]+:\s*62 74 b4 18 a5 08    	shld   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c a5 e0          	shld   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 94 18 a5 2c 83 	shld   %cl,%r13,\(%r19,%rax,4\),%r13
+\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f ac ec 01       	shrd   \$0x1,%r13,%r12
+\s*[a-f0-9]+:\s*62 74 b4 18 ad 08    	shrd   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c ad e0          	shrd   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 94 18 ad 2c 83 	shrd   %cl,%r13,\(%r19,%rax,4\),%r13
+\s*[a-f0-9]+:\s*66 4d 0f 38 f6 c7    	adcx   %r15,%r8
+\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f 	adcx   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*66 4d 0f 38 f6 c1    	adcx   %r9,%r8
+\s*[a-f0-9]+:\s*f3 4d 0f 38 f6 c7    	adox   %r15,%r8
+\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f 	adox   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*f3 4d 0f 38 f6 c1    	adox   %r9,%r8
+\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx
+\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx
+\s*[a-f0-9]+:\s*48 0f af d0          	imul   %rax,%rdx
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
new file mode 100644
index 00000000000..1b5cc94757d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
@@ -0,0 +1,117 @@
+# Check 64bit APX NDD instructions with optimized encoding
+
+	.text
+_start:
+inc    %r31,%r31
+incb   %r31b,%r31b
+add    %r31,%r8,%r8
+addb   %r31b,%r8b,%r8b
+{store} add    %r31,%r8,%r8
+{load}  add    %r31,%r8,%r8
+add    %r31,(%r8),%r31
+add    (%r31),%r8,%r8
+add    $0x12344433,%r15,%r15
+add    $0xfffffffff4332211,%r8,%r8
+dec    %r17,%r17
+decb   %r17b,%r17b
+not    %r17,%r17
+notb   %r17b,%r17b
+neg    %r17,%r17
+negb   %r17b,%r17b
+sub    %r15,%r17,%r17
+subb   %r15b,%r17b,%r17b
+sub    %r15,(%r8),%r15
+sub    (%r15,%rax,1),%r16,%r16
+sub    $0x1234,%r30,%r30
+sbb    %r15,%r17,%r17
+sbbb   %r15b,%r17b,%r17b
+sbb    %r15,(%r8),%r15
+sbb    (%r15,%rax,1),%r16,%r16
+sbb    $0x1234,%r30,%r30
+adc    %r15,%r17,%r17
+adcb   %r15b,%r17b,%r17b
+adc    %r15,(%r8),%r15
+adc    (%r15,%rax,1),%r16,%r16
+adc    $0x1234,%r30,%r30
+or     %r15,%r17,%r17
+orb    %r15b,%r17b,%r17b
+or     %r15,(%r8),%r15
+or     (%r15,%rax,1),%r16,%r16
+or     $0x1234,%r30,%r30
+xor    %r15,%r17,%r17
+xorb   %r15b,%r17b,%r17b
+xor    %r15,(%r8),%r15
+xor    (%r15,%rax,1),%r16,%r16
+xor    $0x1234,%r30,%r30
+and    %r15,%r17,%r17
+andb   %r15b,%r17b,%r17b
+and    %r15,(%r8),%r15
+and    (%r15,%rax,1),%r16,%r16
+and    $0x1234,%r30,%r30
+ror    %r31,%r31
+rorb   %r31b,%r31b
+ror    $0x2,%r12,%r12
+rorb   $0x2,%r12b,%r12b
+rol    %r31,%r31
+rolb   %r31b,%r31b
+rol    $0x2,%r12,%r12
+rolb   $0x2,%r12b,%r12b
+rcr    %r31,%r31
+rcrb   %r31b,%r31b
+rcr    $0x2,%r12,%r12
+rcrb   $0x2,%r12b,%r12b
+rcl    %r31,%r31
+rclb   %r31b,%r31b
+rcl    $0x2,%r12,%r12
+rclb   $0x2,%r12b,%r12b
+shl    %r31,%r31
+shlb   %r31b,%r31b
+shl    $0x2,%r12,%r12
+shlb   $0x2,%r12b,%r12b
+sar    %r31,%r31
+sarb   %r31b,%r31b
+sar    $0x2,%r12,%r12
+sarb   $0x2,%r12b,%r12b
+shl    %r31,%r31
+shlb   %r31b,%r31b
+shl    $0x2,%r12,%r12
+shlb   $0x2,%r12b,%r12b
+shr    %r31,%r31
+shrb   %r31b,%r31b
+shr    $0x2,%r12,%r12
+shrb   $0x2,%r12b,%r12b
+shld   $0x1,%r12,(%rax),%r12
+shld   $0x2,%r8,%r12,%r12
+shld   %cl,%r9,(%rax),%r9
+shld   %cl,%r12,%r16,%r16
+shld   %cl,%r13,(%r19,%rax,4),%r13
+shrd   $0x1,%r12,(%rax),%r12
+shrd   $0x1,%r13,%r12,%r12
+shrd   %cl,%r9,(%rax),%r9
+shrd   %cl,%r12,%r16,%r16
+shrd   %cl,%r13,(%r19,%rax,4),%r13
+adcx   %r15,%r8,%r8
+adcx   (%r15,%r31,1),%r8,%r8
+adcx   %r8,%r9,%r8
+adox   %r15,%r8,%r8
+adox   (%r15,%r31,1),%r8,%r8
+adox   %r8,%r9,%r8
+cmovo  0x90909090(%eax),%edx,%edx
+cmovno 0x90909090(%eax),%edx,%edx
+cmovb  0x90909090(%eax),%edx,%edx
+cmovae 0x90909090(%eax),%edx,%edx
+cmove  0x90909090(%eax),%edx,%edx
+cmovne 0x90909090(%eax),%edx,%edx
+cmovbe 0x90909090(%eax),%edx,%edx
+cmova  0x90909090(%eax),%edx,%edx
+cmovs  0x90909090(%eax),%edx,%edx
+cmovns 0x90909090(%eax),%edx,%edx
+cmovp  0x90909090(%eax),%edx,%edx
+cmovnp 0x90909090(%eax),%edx,%edx
+cmovl  0x90909090(%eax),%edx,%edx
+cmovge 0x90909090(%eax),%edx,%edx
+cmovle 0x90909090(%eax),%edx,%edx
+cmovg  0x90909090(%eax),%edx,%edx
+imul   0x90909(%eax),%edx,%edx
+imul   0x909(%rax,%r31,8),%rdx,%rdx
+imul   %rdx,%rax,%rdx
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 668b366a212..eab99f9e52b 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -552,6 +552,7 @@ run_dump_test "x86-64-optimize-6"
 run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al"
 run_dump_test "x86-64-optimize-7b"
 run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al"
+run_dump_test "x86-64-apx-ndd-optimize"
 run_dump_test "x86-64-align-branch-1a"
 run_dump_test "x86-64-align-branch-1b"
 run_dump_test "x86-64-align-branch-1c"
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 03ebef028f9..5e36f6f67eb 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -145,6 +145,8 @@
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
+// And re-use the bit to mark some NDD insns that swapping the source operands
+// may allow to switch from 3 operands to 2 operands.
 #define C StaticRounding
 
 #define FP 387|287|8087
@@ -166,6 +168,10 @@
 
 ### MARKER ###
 
+// Please don't add a NDD insn which may be optimized to a REX2 insn before the
+// mov. It may result that a good UB checker object the behavior
+// "template->start - 1" at the end of match_template.
+
 // Move instructions.
 mov, 0xa0, No64, D|W|CheckOperandSize|No_sSuf|No_qSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
 mov, 0xa0, x64, D|W|CheckOperandSize|No_sSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
@@ -295,7 +301,7 @@ add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 
@@ -339,7 +345,7 @@ and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-and, 0x20, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
@@ -347,7 +353,7 @@ or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Re
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-or, 0x8, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
@@ -355,7 +361,7 @@ xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-xor, 0x30, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
@@ -369,7 +375,7 @@ adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|R
 adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
@@ -412,7 +418,7 @@ cqto, 0x99, x64, Size64|NoSuf, {}
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
-imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
+imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 // imul with 2 operands mimics imul with 3 by putting the register in
@@ -2126,10 +2132,10 @@ xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 // Multy-precision Add Carry, rdseed instructions.
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+adcx, 0x6666, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+adox, 0xf366, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 8/8] Support APX JMPABS
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (6 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
@ 2023-11-02 11:29 ` Cui, Lili
  2023-11-09 12:59   ` Jan Beulich
  2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
  8 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-02 11:29 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, ccoutant, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

gas/ChangeLog:

	* config/tc-i386.c (is_any_apx_encoding): Add jmpabs.
	(is_any_apx_rex2_encoding): Ditto.
	* testsuite/gas/i386/i386.exp: Add tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/apx-jmpabs-inval.l: New test.
	* testsuite/gas/i386/apx-jmpabs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (JMPABS_Fixup): New Fixup function to disassemble jmpabs.
	(print_insn): Add #UD exception for jmpabs.
	(dis386): Modify a1 unit for support jmpabs.
	* i386-mnem.h: Regenerated.
	* i386-opc.tbl: New insns.
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          |  6 +-
 gas/testsuite/gas/i386/apx-jmpabs-inval.l     |  3 +
 gas/testsuite/gas/i386/apx-jmpabs-inval.s     |  6 ++
 gas/testsuite/gas/i386/i386.exp               |  1 +
 .../gas/i386/x86-64-apx-jmpabs-intel.d        | 14 +++++
 .../gas/i386/x86-64-apx-jmpabs-inval.d        | 55 +++++++++++++++++++
 .../gas/i386/x86-64-apx-jmpabs-inval.s        | 17 ++++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    | 14 +++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    | 10 ++++
 gas/testsuite/gas/i386/x86-64.exp             |  3 +
 opcodes/i386-dis.c                            | 43 ++++++++++++++-
 opcodes/i386-opc.tbl                          |  2 +
 12 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/apx-jmpabs-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 787108cedc8..42019c61a33 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7790,7 +7790,8 @@ match_template (char mnem_suffix)
   if (!quiet_warnings)
     {
       if (!intel_syntax
-	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE)))
+	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE))
+	  && t->mnem_off != MN_jmpabs)
 	as_warn (_("indirect %s without `*'"), insn_name (t));
 
       if (t->opcode_modifier.isprefix
@@ -8939,6 +8940,9 @@ process_operands (void)
 	}
     }
 
+  if (i.tm.mnem_off == MN_jmpabs)
+    i.rex2_encoding = true;
+
   /* If a segment was explicitly specified, and the specified segment
      is neither the default nor the one already recorded from a prefix,
      use an opcode prefix to select it.  If we never figured out what
diff --git a/gas/testsuite/gas/i386/apx-jmpabs-inval.l b/gas/testsuite/gas/i386/apx-jmpabs-inval.l
new file mode 100644
index 00000000000..87e7a800f1a
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-jmpabs-inval.l
@@ -0,0 +1,3 @@
+.* Assembler messages:
+.*:5: Error: `jmpabs' is only supported in 64-bit mode
+.*:6: Error: `jmpabs' is only supported in 64-bit mode
diff --git a/gas/testsuite/gas/i386/apx-jmpabs-inval.s b/gas/testsuite/gas/i386/apx-jmpabs-inval.s
new file mode 100644
index 00000000000..1f9f1f80b72
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-jmpabs-inval.s
@@ -0,0 +1,6 @@
+# Check 32bit illegal APX_F JMPABS instructions
+
+	.text
+ _start:
+	jmpabs	      $0x0202020202020202
+	jmpabs	      $0x2
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 75e1a4ca369..9280785d41d 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -510,6 +510,7 @@ if [gas_32_check] then {
     run_dump_test "sm4-intel"
     run_list_test "pbndkb-inval"
     run_list_test "apx-push2pop2-inval"
+    run_list_test "apx-jmpabs-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
new file mode 100644
index 00000000000..3b51aead651
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
@@ -0,0 +1,14 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F JMPABS insns (Intel disassembly)
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 02 02 02 02 02 02 02[	 ]+jmpabs 0x202020202020202
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs 0x2
+\s*[a-f0-9]+:\s*d5 00 a1 02 02 02 02 02 02 02 02[	 ]+jmpabs 0x202020202020202
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs 0x2
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
new file mode 100644
index 00000000000..ef3c1fa55e2
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
@@ -0,0 +1,55 @@
+#as: --64
+#objdump: -dw
+#name: illegal decoding of APX_F jmpabs insns
+#source: x86-64-apx-jmpabs-inval.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <.text>:
+\s*[a-f0-9]+:	66 64 d5 00 a1[ 	 ]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	66 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	67 64 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	67 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f2 64 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f2 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f3 64 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f3 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	d5 08 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*...
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
new file mode 100644
index 00000000000..e41240972d7
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
@@ -0,0 +1,17 @@
+# Check bytecode of APX_F jmpabs instructions with illegal encode.
+
+	.text
+# With 66 prefix
+	.byte 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With 67 prefix
+	.byte 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F2 prefix
+	.byte 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F3 prefix
+	.byte 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# REX2.M0 = 0 REX2.W = 1
+	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
new file mode 100644
index 00000000000..0c1875230c6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
@@ -0,0 +1,14 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F JMPABS insns
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 02 02 02 02 02 02 02[ 	 ]+jmpabs \$0x202020202020202
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs \$0x2
+\s*[a-f0-9]+:\s*d5 00 a1 02 02 02 02 02 02 02 02[	 ]+jmpabs \$0x202020202020202
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs \$0x2
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
new file mode 100644
index 00000000000..beb722421bd
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
@@ -0,0 +1,10 @@
+# Check 64bit APX_F JMPABS instructions
+
+	.text
+ _start:
+	jmpabs	      $0x0202020202020202
+	jmpabs	      $0x2
+
+.intel_syntax noprefix
+	jmpabs	      0x0202020202020202
+	jmpabs	      0x2
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index eab99f9e52b..ad6f7be9c4f 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -371,6 +371,9 @@ run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-apx-ndd"
+run_dump_test "x86-64-apx-jmpabs"
+run_dump_test "x86-64-apx-jmpabs-intel"
+run_dump_test "x86-64-apx-jmpabs-inval"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 825b14ad0dd..d767090aa65 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
 static bool PUSH2_POP2_Fixup (instr_info *, int, int);
+static bool JMPABS_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -258,6 +259,9 @@ struct instr_info
   char scale_char;
 
   enum x86_64_isa isa64;
+
+  /* Remember if the current op is jmpabs.  */
+  bool is_jmpabs;
 };
 
 struct dis_private {
@@ -2032,7 +2036,7 @@ static const struct dis386 dis386[] = {
   { "lahf",		{ XX }, 0 },
   /* a0 */
   { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
-  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup, v_mode } }, PREFIX_REX2_ILLEGAL },
   { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
   { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
   { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
@@ -9648,7 +9652,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     }
 
   if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
-      && ins.last_rex2_prefix >= 0)
+      && ins.last_rex2_prefix >= 0 && !ins.is_jmpabs)
     {
       i386_dis_printf (info, dis_style_text, "(bad)");
       ret = ins.end_codep - priv.the_buffer;
@@ -13857,3 +13861,38 @@ PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_VEX (ins, bytemode, sizeflag);
 }
+
+static bool
+JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  if (ins->address_mode == mode_64bit
+      && ins->last_rex2_prefix >= 0
+      && (ins->rex2 & 0x80) == 0x0)
+    {
+      uint64_t op;
+
+      if (bytemode == eAX_reg)
+	return true;
+
+      if (!get64 (ins, &op))
+	return false;
+
+      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR)) != 0x0
+	  || (ins->rex & REX_W) != 0x0)
+	{
+	  oappend (ins, "(bad)");
+	  return true;
+	}
+
+      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
+      ins->all_prefixes[ins->last_rex2_prefix] = 0;
+      ins->is_jmpabs = true;
+      oappend_immediate (ins, op);
+
+      return true;
+    }
+
+  if (bytemode == eAX_reg)
+    return OP_IMREG (ins, bytemode, sizeflag);
+  return OP_OFF64 (ins, v_mode, sizeflag);
+}
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 5e36f6f67eb..76f670c0a9d 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -554,6 +554,8 @@ ljmp, 0xea, No64, JumpInterSegment|No_bSuf|No_sSuf|No_qSuf, { Imm16, Imm16|Imm32
 ljmp, 0xff/5, 0, Amd64|Modrm|JumpAbsolute|No_bSuf|No_sSuf|No_qSuf, { Unspecified|BaseIndex }
 ljmp, 0xff/5, x64, Intel64|Modrm|JumpAbsolute|No_bSuf|No_sSuf, { Unspecified|BaseIndex }
 
+jmpabs, 0xa1, APX_F|x64, JumpAbsolute|NoSuf, { Imm64 }
+
 ret, 0xc3, No64, DefaultSize|No_bSuf|No_sSuf|No_qSuf|RepPrefixOk|BNDPrefixOk, {}
 ret, 0xc2, No64, DefaultSize|No_bSuf|No_sSuf|No_qSuf|RepPrefixOk|BNDPrefixOk, { Imm16 }
 ret, 0xc3, x64, Amd64|DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64|RepPrefixOk|BNDPrefixOk, {}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
                   ` (7 preceding siblings ...)
  2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
@ 2023-11-02 13:22 ` Jan Beulich
  2023-11-03 16:42   ` Cui, Lili
  8 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-02 13:22 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> This is V2 of all APX patch.
> 1. Merged patch part II 1/6 into patch 1/8.
> 2. Created a new patch for empty EVEX_MAP4_ sub-table.
> 3. The NF patch needs to be suspended, Where NF should be placed is under discussion. Since the patch part II 2/6 depends on the NF patch, it is also suspended.
> 4. There are no comments yet for APX linker patch.
> 
> 
> Cui, Lili (4):
>   Support APX GPR32 with rex2 prefix
>   Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
>   Support APX GPR32 with extend evex prefix
>   Add tests for APX GPR32 with extend evex prefix
> 
> Hu, Lin1 (2):
>   Support APX NDD optimized encoding.
>   Support APX JMPABS
> 
> Mo, Zewei (1):
>   Support APX Push2/Pop2
> 
> konglin1 (1):
>   Support APX NDD

Mind me asking whether this work is now based on my "x86: split insn
templates' CPU field"? You don't say so here, so my initial assumption
would be that it isn't. That's also supported by me peeking at patch 3.
Yet that patch was specifically created as a prereq for the APX work to
base on top (and it may require further refinement, the need for which
I could only know once you're actually using that patch as a prereq).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-11-02 17:05   ` Jan Beulich
  2023-11-03  6:20     ` Cui, Lili
  2023-11-03 13:05     ` Jan Beulich
  2023-11-03 14:19   ` Jan Beulich
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-02 17:05 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

(for now only comments on i386-gen.c changes)

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)

This function wants to be named after its purpose, e.g. rex2_disallowed()
with its current return value arrangement. "needs special handling" is a
term that might be okay now, but what if you gus come up with REX3 in a
few years time which then again needs (a different kind of) special
handling?

> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> +      || !strcmp (cpu_flags, "3dnow")
> +      || !strcmp (cpu_flags, "3dnowA"))
> +    return true;
> +
> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
> +     are reserved under REX2 and triggers #UD when prefixed with REX2 */
> +  if ((space == 0 && (opcode >> 4 == 0x4
> +		      || opcode >> 4 == 0x7
> +		      || opcode >> 4 == 0xA))

What about row 0xE? Plus in the comment the latter is map1, not (again) map0.

This also may be easier to express using 0x4490 and

> +      || (space == SPACE_0F && (opcode >> 4 == 0x3
> +				|| opcode >> 4 == 0x8)))

... 0x0108 as constants. Else I'd like to ask that switch() be used to
kept this halfway readable.

> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	fprintf (stderr,
>  		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
>  		 filename, lineno);
> +
> +      /* The part about judging EVEX encoding should be synchronized with
> +	 is_evex_encoding.  */
> +      if (modifiers[Vex].value
> +	  || ((space > SPACE_0F || has_special_handle)
> +	      && !modifiers[EVex].value
> +	      && !modifiers[Disp8MemShift].value
> +	      && !modifiers[Broadcast].value
> +	      && !modifiers[Masking].value
> +	      && !modifiers[SAE].value))
> +	modifiers[NoEgpr].value = 1;
> +
>      }

The comment is one half of what's needed here. First, however, you want
to say a word on what this is about.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 17:05   ` Jan Beulich
@ 2023-11-03  6:20     ` Cui, Lili
  2023-11-03 13:05     ` Jan Beulich
  1 sibling, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-03  6:20 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> (for now only comments on i386-gen.c changes)
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > +			       const char *cpu_flags)
> 
> This function wants to be named after its purpose, e.g. rex2_disallowed() with
> its current return value arrangement. "needs special handling" is a term that
> might be okay now, but what if you gus come up with REX3 in a few years time
> which then again needs (a different kind of) special handling?
> 
Done.

> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> > +      || !strcmp (cpu_flags, "3dnow")
> > +      || !strcmp (cpu_flags, "3dnowA"))
> > +    return true;
> > +
> > +  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
> > +     are reserved under REX2 and triggers #UD when prefixed with REX2
> > +*/
> > +  if ((space == 0 && (opcode >> 4 == 0x4
> > +		      || opcode >> 4 == 0x7
> > +		      || opcode >> 4 == 0xA))
> 
> What about row 0xE? Plus in the comment the latter is map1, not (again)
> map0.
> 

Done, thanks.

> This also may be easier to express using 0x4490 and
> 
> > +      || (space == SPACE_0F && (opcode >> 4 == 0x3
> > +				|| opcode >> 4 == 0x8)))
> 
> ... 0x0108 as constants. Else I'd like to ask that switch() be used to kept this
> halfway readable.
> 

Changed it to

+  if (space == 0)
+    switch (opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+       return true;
+      default:
+       return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+       return true;
+      default:
+       return false;
+      }
+
> > @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
> *mod, unsigned int space,
> >  	fprintf (stderr,
> >  		 "%s: %d: W modifier without Word/Dword/Qword
> operand(s)\n",
> >  		 filename, lineno);
> > +
> > +      /* The part about judging EVEX encoding should be synchronized with
> > +	 is_evex_encoding.  */
> > +      if (modifiers[Vex].value
> > +	  || ((space > SPACE_0F || has_special_handle)
> > +	      && !modifiers[EVex].value
> > +	      && !modifiers[Disp8MemShift].value
> > +	      && !modifiers[Broadcast].value
> > +	      && !modifiers[Masking].value
> > +	      && !modifiers[SAE].value))
> > +	modifiers[NoEgpr].value = 1;
> > +
> >      }
> 
> The comment is one half of what's needed here. First, however, you want to
> say a word on what this is about.
> 
Added. 

Thanks,
Lili

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 17:05   ` Jan Beulich
  2023-11-03  6:20     ` Cui, Lili
@ 2023-11-03 13:05     ` Jan Beulich
  1 sibling, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-03 13:05 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 18:05, Jan Beulich wrote:
> (for now only comments on i386-gen.c changes)
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>    return elem_size;
>>  }
>>  
>> +static bool
>> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
>> +			       const char *cpu_flags)
> 
> This function wants to be named after its purpose, e.g. rex2_disallowed()
> with its current return value arrangement. "needs special handling" is a
> term that might be okay now, but what if you gus come up with REX3 in a
> few years time which then again needs (a different kind of) special
> handling?

Actually, depending on its significance for later changes, egpr_disallowed()
might be a (longterm) better name.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
  2023-11-02 17:05   ` Jan Beulich
@ 2023-11-03 14:19   ` Jan Beulich
  2023-11-06 15:20     ` Cui, Lili
  2023-11-06 15:02   ` Jan Beulich
  2023-11-06 15:39   ` Jan Beulich
  3 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-03 14:19 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -406,6 +409,11 @@ struct _i386_insn
>      /* Compressed disp8*N attribute.  */
>      unsigned int memshift;
>  
> +    /* No CSPAZO flags update.*/
> +    bool has_nf;
> +
> +    bool has_zero_upper;
> +

Can both please be introduced when they're needed, not randomly ahead
of time?

> @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
>    if (r->reg_flags & RegRex)
>      nr += 8;
>  
> +  if (r->reg_flags & RegRex2)
> +    nr += 16;
> +
>    if (r->reg_flags & RegVRex)
>      nr += 16;

Perhaps fold to

    if (r->reg_flags & (RegVRex | RegRex2))
      nr += 16;

? Irrespective an assertion may be worthwhile that both flags aren't set
at the same time?

> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>      i.vex.bytes[3] |= i.mask.reg->reg_num;
>  }
>  
> +/* Build (2 bytes) rex2 prefix.
> +   | D5h |
> +   | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> +  i.vex.length = 2;
> +  i.vex.bytes[0] = 0xd5;
> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> +		    | (i.rex2 << 4) | i.rex);
> +}

I may have asked on v1 already: For emitting REX we don't resort to
(ab)using i.vex. Is that really necessary? (If so, a comment next to
the field declaration may be warranted.)

Speaking of v1: Can you please make sure you have correct version tags
on submissions of updated patch versions?

> @@ -4423,12 +4460,16 @@ optimize_encoding (void)
>  	  i.suffix = 0;
>  	  /* Convert to byte registers.  */
>  	  if (i.types[1].bitfield.word)
> -	    j = 16;
> -	  else if (i.types[1].bitfield.dword)
> +	    /* There are 32 8-bit registers.  */

Please make sure comments are actually correct. With your additions
there are 40 8-bit registers; prior to that there were 24. The
j += 8 further down deal with that difference, and the comment here
(if one is to be added) wants to tell the full truth.

> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
>  	case register_type_mismatch:
>  	  err_msg = _("register type mismatch");
>  	  break;
> +	case register_type_of_address_mismatch:
> +	  err_msg = _("register type of address mismatch");
> +	  break;

I have a concern with wording / naming here: If I saw this in an error
message, I wouldn't know what is meant. Maybe something along the lines
of "cannot use an extended GPR for addressing"? And then the enumerator
suitabley renamed as well?

> @@ -5578,7 +5625,7 @@ md_assemble (char *line)
>        as_warn (_("translating to `%sp'"), insn_name (&i.tm));
>      }
>  
> -  if (is_any_vex_encoding (&i.tm))
> + if (is_any_vex_encoding (&i.tm))
>      {

Stray change, breaking indentation?

> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>  	  return;
>  	}
>  
> +      /* Check for explicit REX2 prefix.  */
> +      if (i.rex2 || i.rex2_encoding)

This open-codes is_any_apx_rex2_encoding(). But read on.

> +	{
> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));

There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
what case the i.rex2 check above is intended to cover. Error message
comment, and condition want to reflect that.

> @@ -5633,11 +5687,11 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
> -
> -      i.rex |= REX_OPCODE;

Please don't remove blank lines like this.

> @@ -5647,9 +5701,11 @@ md_assemble (char *line)
>  	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
> -		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> -			register_prefix, i.op[x].regs->reg_name);
> +		{
> +		  as_bad (_("can't encode register '%s%s' in an "
> +			    "instruction requiring REX/REX2 prefix."),
> +			  register_prefix, i.op[x].regs->reg_name);
> +		}

There's no need to introduce braces here. Without doing so this will 
also be less of a change.

> @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> +  if (t->opcode_modifier.noegpr)
> +    {

This scope effectively covers the entire function. Did you consider

  if (!t->opcode_modifier.noegpr)
    return 0;

to aid readability?

> +      for (unsigned int op = 0; op < i.operands; op++)
> +	{
> +	  if (i.types[op].bitfield.class != Reg
> +	      /* Special case for (%dx) while doing input/output op */
> +	      || i.input_output_operand)

Why is this needed? The register table entry for %dx ...

> +	    continue;
> +
> +	  if (i.op[op].regs->reg_flags & RegRex2)

... doesn't have this bit set anyway.

> +	    {
> +	      i.error = register_type_mismatch;
> +	      return 1;
> +	    }
> +	}
> +
> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> +	{
> +	  i.error = register_type_of_address_mismatch;
> +	  return 1;
> +	}
> +
> +      /* Check pseudo prefix {rex2} are valid.  */
> +      if (i.rex2_encoding)
> +	{
> +	  i.error = invalid_pseudo_prefix;
> +	  return 1;
> +	}

Further up in md_assemble() {rex} or {rex2} is simply ignored when
wrong to apply. Why would an inapplicable {rex2} be treated as an
error here? This would then also ...

> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>        /* Do not verify operands when there are none.  */
>        if (!t->operands)
>  	{
> -	  if (VEX_check_encoding (t))
> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>  	    {
>  	      specific_error = progress (i.error);
>  	      continue;

... eliminate the need for this change, which is kind of bogus anyway:
There are no operands here, so calling a function of the given name is
at least suspicious.

> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
>  	i.vec_encoding = vex_encoding_error;
>      }
>  
> +  if (r->reg_flags & RegRex2)
> +    {
> +      if (!cpu_arch_flags.bitfield.cpuapx_f
> +	  || flag_code != CODE_64BIT)
> +	return false;
> +    }

Please fold the two if()s into one (unless of course you know that the
outer one is going to be extended in a subsequent patch).

> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
>  @code{avx10.1/512},
>  @code{avx10.1/256},
>  @code{avx10.1/128},
> +@code{apx},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
>  @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
>  @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
>  @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
> -@item @samp{.tlbsync}
> +@item @samp{.tlbsync} @tab @samp{.apx}
>  @end multitable

DYM apx_f in both cases?

Also don't you need to also mention {rex2} somewhere in this file?

> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> @@ -11,11 +11,11 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>  
>  0+1 <aad0>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>  
>  0+3 <aad1>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>  
>  0+5 <aam0>:
> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> @@ -11,11 +11,11 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>  
>  0+1 <aad0>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>  
>  0+3 <aad1>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>  
>  0+5 <aam0>:

These expectations match the ones of the same test in the parent directory.
Hence instead of adjusting each in both places, please have the ones here
reference the parent directory files.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c

As before I'll look at the disassembler changes separately. This patch is
simply too big.

> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)
> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0

Upon further thought for these (and maybe even ...

> +      || !strcmp (cpu_flags, "3dnow")
> +      || !strcmp (cpu_flags, "3dnowA"))

... for these, but see also below) it might be better to add the attribute
right in the opcode table.

As to the 3dnow insns - I think I'd like to revise my earlier suggestion to
also tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so
allowing them to be used like that would appear only consistent. Otherwise,
if we were concerned of AMD extensions in general, SSE4a insns (and maybe
further ones) would also need excluding. (Additionally recall that there's
an overlap between 3dnowa and SSE, which would result in another [apparent]
inconsistency when excluding 3dnow insns here.)

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
@ 2023-11-03 16:42   ` Cui, Lili
  2023-11-06  7:30     ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-03 16:42 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > This is V2 of all APX patch.
> > 1. Merged patch part II 1/6 into patch 1/8.
> > 2. Created a new patch for empty EVEX_MAP4_ sub-table.
> > 3. The NF patch needs to be suspended, Where NF should be placed is
> under discussion. Since the patch part II 2/6 depends on the NF patch, it is
> also suspended.
> > 4. There are no comments yet for APX linker patch.
> >
> >
> > Cui, Lili (4):
> >   Support APX GPR32 with rex2 prefix
> >   Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
> >   Support APX GPR32 with extend evex prefix
> >   Add tests for APX GPR32 with extend evex prefix
> >
> > Hu, Lin1 (2):
> >   Support APX NDD optimized encoding.
> >   Support APX JMPABS
> >
> > Mo, Zewei (1):
> >   Support APX Push2/Pop2
> >
> > konglin1 (1):
> >   Support APX NDD
> 
> Mind me asking whether this work is now based on my "x86: split insn
> templates' CPU field"? You don't say so here, so my initial assumption would
> be that it isn't. That's also supported by me peeking at patch 3.
> Yet that patch was specifically created as a prereq for the APX work to base on
> top (and it may require further refinement, the need for which I could only
> know once you're actually using that patch as a prereq).
> 

Sorry for missing this patch, I rebased patch3 on it.  this patch works without my old code. I will sent out new patch3.

+//       else if (x.bitfield.cpuapx_f)
+//         {
+//           /* All cpu in x need to be enabled in cpu_arch_flags.  */
+//           if (cpu_flags_not_or_check (&x, &cpu_arch_flags))
+//             match |= CPU_FLAGS_ARCH_MATCH;
+//         }


AMX can works with the following changing. 
--------------------------------------------------------
opcodes/i386-opc.tbl:

#define APX_F_64 APX_F&x64
ldtilecfg, 0x49/0, AMX_TILE&x64&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }

gas/config/tc-i386.c:

   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
   {
-      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
-          || maybe_cpu (t, CpuFMA))
-         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
+    if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
+        || maybe_cpu (t, CpuFMA) ||  maybe_cpu (t, CpuAMX_TILE))
+       && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
+           || maybe_cpu (t, CpuAPX_F)))
        {
          if (need_evex_encoding ())
            {
@@ -3725,7 +3726,7 @@ install_template (const insn_template *t)
                i.tm.cpu.bitfield.cpuavx = 1;
              else
                {
-                 gas_assert (!i.tm.cpu.bitfield.isa);
+//               gas_assert (!i.tm.cpu.bitfield.isa);
                  i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
                }
            }
-----------------------------------------------------------------

But if we want to merge bextr's vex and evex formats, we need to support BMI&(BMI |( APX_F&x64))
....
bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
bextr, 0xf7, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_b
...

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-03 16:42   ` Cui, Lili
@ 2023-11-06  7:30     ` Jan Beulich
  2023-11-06 14:20       ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06  7:30 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 03.11.2023 17:42, Cui, Lili wrote:
> But if we want to merge bextr's vex and evex formats, we need to support BMI&(BMI |( APX_F&x64))

Maybe more like BMI&(<tbd>|APX_F), with further work (which I was considering
anyway) towards x64 becoming a prereq to the increasing number of 64-bit-
only features? (The <tbd> may well be BMI as you suggest, even if that reads
a little odd.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-06  7:30     ` Jan Beulich
@ 2023-11-06 14:20       ` Cui, Lili
  2023-11-06 14:44         ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-06 14:20 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 6, 2023 3:30 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
> 
> On 03.11.2023 17:42, Cui, Lili wrote:
> > But if we want to merge bextr's vex and evex formats, we need to
> > support BMI&(BMI |( APX_F&x64))
> 
> Maybe more like BMI&(<tbd>|APX_F), with further work (which I was
> considering
> anyway) towards x64 becoming a prereq to the increasing number of 64-bit-
> only features? (The <tbd> may well be BMI as you suggest, even if that reads a
> little odd.
> 

Yes, most VEX instructions don't require x64, but apx_f is x64 based. If the format "BMI&(BMI |( APX_F&x64))"  is complicated to implement or looks ugly, maybe we can handle x64 uniformly for apx_f in tc-i386.c.

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-06 14:20       ` Cui, Lili
@ 2023-11-06 14:44         ` Jan Beulich
  2023-11-06 16:03           ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06 14:44 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 06.11.2023 15:20, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 6, 2023 3:30 PM
>>
>> On 03.11.2023 17:42, Cui, Lili wrote:
>>> But if we want to merge bextr's vex and evex formats, we need to
>>> support BMI&(BMI |( APX_F&x64))
>>
>> Maybe more like BMI&(<tbd>|APX_F), with further work (which I was
>> considering
>> anyway) towards x64 becoming a prereq to the increasing number of 64-bit-
>> only features? (The <tbd> may well be BMI as you suggest, even if that reads a
>> little odd.
>>
> 
> Yes, most VEX instructions don't require x64, but apx_f is x64 based. If the format "BMI&(BMI |( APX_F&x64))"  is complicated to implement or looks ugly, maybe we can handle x64 uniformly for apx_f in tc-i386.c.

Well, some adjustment is needed there anyway, at the very least for the
equivalent of e.g. the present handling of AVX|AVX512F or FMA|AVX512F.
The goal wants to be to balance the amount of special casing code against
complications in representing data in the opcode table. One question I
have is: In how far is it necessary to actually represent APX_F in the
BMI templates? There are two things triggering use of the EVEX encoding,
iirc: Use of an extended register or NF. Use of an extended register is
itself already dependent upon APX_F, and whatever the representation of
NF is going to be, its parsing could be made dependent upon APX_F, too.
No (strong) need then for the template to enforce APX_F yet another time,
hopefully.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
  2023-11-02 17:05   ` Jan Beulich
  2023-11-03 14:19   ` Jan Beulich
@ 2023-11-06 15:02   ` Jan Beulich
  2023-11-07  8:06     ` Cui, Lili
  2023-11-06 15:39   ` Jan Beulich
  3 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06 15:02 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	fprintf (stderr,
>  		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
>  		 filename, lineno);
> +
> +      /* The part about judging EVEX encoding should be synchronized with
> +	 is_evex_encoding.  */
> +      if (modifiers[Vex].value
> +	  || ((space > SPACE_0F || has_special_handle)
> +	      && !modifiers[EVex].value
> +	      && !modifiers[Disp8MemShift].value
> +	      && !modifiers[Broadcast].value
> +	      && !modifiers[Masking].value
> +	      && !modifiers[SAE].value))
> +	modifiers[NoEgpr].value = 1;

While this is just i386-gen (and hence being somewhat inefficient isn't the
end of the world) I still wonder whether we need all the parts of this condition:
Do we really need all the constituents of this EVEX related checks? Wouldn't it
also help is_evex_encoding() if we switched to uniformly having EVex attributes
on all EVEX templates? A presently missing EVex attribute, after all, merely is
another way of saying EVexDYN, if I'm not mistaken. (Such an adjustment, if
deemed to help, would of course want to come as a separate, prereq patch.)

Furthermore, is this correct at all for mixed VEX/EVEX templates?

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>  <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>

Seeing this I realized that there's something missing here (an APX_F dependency),
which then again would not have had an effect without the patch [1] sent earlier
today.

Jan

[1] https://sourceware.org/pipermail/binutils/2023-November/130345.html

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-03 14:19   ` Jan Beulich
@ 2023-11-06 15:20     ` Cui, Lili
  2023-11-06 16:08       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-06 15:20 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -406,6 +409,11 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> > +
> > +    bool has_zero_upper;
> > +
> 
> Can both please be introduced when they're needed, not randomly ahead of
> time?
 
Moved has_nf to patch 2/8 and deleted has_zero_upper.

> > @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
> >    if (r->reg_flags & RegRex)
> >      nr += 8;
> >
> > +  if (r->reg_flags & RegRex2)
> > +    nr += 16;
> > +
> >    if (r->reg_flags & RegVRex)
> >      nr += 16;
> 
> Perhaps fold to
> 
>     if (r->reg_flags & (RegVRex | RegRex2))
>       nr += 16;
> 
> ? Irrespective an assertion may be worthwhile that both flags aren't set at the
> same time?

Done.

> 
> > @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > +   | D5h |
> > +   | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > +  i.vex.length = 2;
> > +  i.vex.bytes[0] = 0xd5;
> > +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > +		    | (i.rex2 << 4) | i.rex);
> > +}
> 
> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
> i.vex. Is that really necessary? (If so, a comment next to the field declaration
> may be warranted.)
> 
Added comment for it.

  /* For the W R X B bits, the variables of rex prefix will be reused.  */
  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
                    | (i.rex2 << 4) | i.rex);

> Speaking of v1: Can you please make sure you have correct version tags on
> submissions of updated patch versions?
> 
I used git to send all the patches at once( git send-email  --cover-letter --annotate  --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.

> > @@ -4423,12 +4460,16 @@ optimize_encoding (void)
> >  	  i.suffix = 0;
> >  	  /* Convert to byte registers.  */
> >  	  if (i.types[1].bitfield.word)
> > -	    j = 16;
> > -	  else if (i.types[1].bitfield.dword)
> > +	    /* There are 32 8-bit registers.  */
> 
> Please make sure comments are actually correct. With your additions there
> are 40 8-bit registers; prior to that there were 24. The j += 8 further down deal
> with that difference, and the comment here (if one is to be added) wants to
> tell the full truth.
> 

Done.

> > @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> >  	case register_type_mismatch:
> >  	  err_msg = _("register type mismatch");
> >  	  break;
> > +	case register_type_of_address_mismatch:
> > +	  err_msg = _("register type of address mismatch");
> > +	  break;
> 
> I have a concern with wording / naming here: If I saw this in an error message,
> I wouldn't know what is meant. Maybe something along the lines of "cannot
> use an extended GPR for addressing"? And then the enumerator suitabley
> renamed as well?
> 
 Changed to  

+       case unsupported_EGPR_for_addressing:
+         err_msg = _("unsupported EGPR for addressing");
+         break;

> > @@ -5578,7 +5625,7 @@ md_assemble (char *line)
> >        as_warn (_("translating to `%sp'"), insn_name (&i.tm));
> >      }
> >
> > -  if (is_any_vex_encoding (&i.tm))
> > + if (is_any_vex_encoding (&i.tm))
> >      {
> 
> Stray change, breaking indentation?
> 

Done.

> > @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> >  	  return;
> >  	}
> >
> > +      /* Check for explicit REX2 prefix.  */
> > +      if (i.rex2 || i.rex2_encoding)
> 
> This open-codes is_any_apx_rex2_encoding(). But read on.
> 
> > +	{
> > +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> 
> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
> the i.rex2 check above is intended to cover. Error message comment, and
> condition want to reflect that.
> 

Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.

        {rex} vmovaps %xmm7,%xmm2
        {rex} vmovaps %xmm17,%xmm2
        {rex} rorx $7,%eax,%ebx
+       {rex2} vmovaps %xmm7,%xmm2

> > @@ -5633,11 +5687,11 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> > -
> > -      i.rex |= REX_OPCODE;
> 
> Please don't remove blank lines like this.
> 

Done.

> > @@ -5647,9 +5701,11 @@ md_assemble (char *line)
> >  	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> > -		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > -			register_prefix, i.op[x].regs->reg_name);
> > +		{
> > +		  as_bad (_("can't encode register '%s%s' in an "
> > +			    "instruction requiring REX/REX2 prefix."),
> > +			  register_prefix, i.op[x].regs->reg_name);
> > +		}
> 
> There's no need to introduce braces here. Without doing so this will also be
> less of a change.
> 

Done.

> > @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > +  if (t->opcode_modifier.noegpr)
> > +    {
> 
> This scope effectively covers the entire function. Did you consider
> 
>   if (!t->opcode_modifier.noegpr)
>     return 0;
> 
> to aid readability?
> 

Done.

> > +      for (unsigned int op = 0; op < i.operands; op++)
> > +	{
> > +	  if (i.types[op].bitfield.class != Reg
> > +	      /* Special case for (%dx) while doing input/output op */
> > +	      || i.input_output_operand)
> 
> Why is this needed? The register table entry for %dx ...
> 
> > +	    continue;
> > +
> > +	  if (i.op[op].regs->reg_flags & RegRex2)
> 
> ... doesn't have this bit set anyway.
> 

For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags  will cause segment fault.

> > +	    {
> > +	      i.error = register_type_mismatch;
> > +	      return 1;
> > +	    }
> > +	}
> > +
> > +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> > +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> > +	{
> > +	  i.error = register_type_of_address_mismatch;
> > +	  return 1;
> > +	}
> > +
> > +      /* Check pseudo prefix {rex2} are valid.  */
> > +      if (i.rex2_encoding)
> > +	{
> > +	  i.error = invalid_pseudo_prefix;
> > +	  return 1;
> > +	}
> 
> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
> apply. Why would an inapplicable {rex2} be treated as an error here? This
> would then also ...
> 
> > @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >        /* Do not verify operands when there are none.  */
> >        if (!t->operands)
> >  	{
> > -	  if (VEX_check_encoding (t))
> > +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >  	    {
> >  	      specific_error = progress (i.error);
> >  	      continue;
> 
> ... eliminate the need for this change, which is kind of bogus anyway:
> There are no operands here, so calling a function of the given name is at least
> suspicious.
> 

We have these tests and I'm confused whether to remove them or not.

+       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
+       {rex2} wrmsr
+       {rex2} rdtsc
+       {rex2} rdmsr
+       {rex2} sysenter
+       {rex2} sysexitl
+       {rex2} rdpmc

> > @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
> >  	i.vec_encoding = vex_encoding_error;
> >      }
> >
> > +  if (r->reg_flags & RegRex2)
> > +    {
> > +      if (!cpu_arch_flags.bitfield.cpuapx_f
> > +	  || flag_code != CODE_64BIT)
> > +	return false;
> > +    }
> 
> Please fold the two if()s into one (unless of course you know that the outer
> one is going to be extended in a subsequent patch).
> 

Yes, other code will be added in the outer if with patch2/8.

> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
> > @code{avx10.1/512},  @code{avx10.1/256},  @code{avx10.1/128},
> > +@code{apx},
> >  @code{amx_int8},
> >  @code{amx_bf16},
> >  @code{amx_fp16},
> > @@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for
> @var{cpu_type} are:
> >  @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab
> > @samp{.cx16}  @item @samp{.padlock} @tab @samp{.clzero} @tab
> > @samp{.mwaitx} @tab @samp{.rdpru}  @item @samp{.mcommit} @tab
> > @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb} -@item
> > @samp{.tlbsync}
> > +@item @samp{.tlbsync} @tab @samp{.apx}
> >  @end multitable
> 
> DYM apx_f in both cases?
> 
> Also don't you need to also mention {rex2} somewhere in this file?
> 

Done.

> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> 
> These expectations match the ones of the same test in the parent directory.
> Hence instead of adjusting each in both places, please have the ones here
> reference the parent directory files.
> 

They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.

# All the followings are illegal opcodes for x86-64.
aad0:
        aad
aad1:
        aad $2

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> 
> As before I'll look at the disassembler changes separately. This patch is simply
> too big.
> 

Ok

> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > +			       const char *cpu_flags)
> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> 
> Upon further thought for these (and maybe even ...
> 
> > +      || !strcmp (cpu_flags, "3dnow")
> > +      || !strcmp (cpu_flags, "3dnowA"))
> 
> ... for these, but see also below) it might be better to add the attribute right in
> the opcode table.
> 
> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
> to be used like that would appear only consistent. Otherwise, if we were
> concerned of AMD extensions in general, SSE4a insns (and maybe further
> ones) would also need excluding. (Additionally recall that there's an overlap
> between 3dnowa and SSE, which would result in another [apparent]
> inconsistency when excluding 3dnow insns here.)
> 

I see, for example  I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
                     ` (2 preceding siblings ...)
  2023-11-06 15:02   ` Jan Beulich
@ 2023-11-06 15:39   ` Jan Beulich
  2023-11-09  8:02     ` Cui, Lili
  3 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06 15:39 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -269,9 +275,17 @@ struct dis_private {
>        ins->rex_used |= REX_OPCODE;			\
>    }
>  
> +#define USED_REX2(value)				\
> +  {							\
> +    if ((ins->rex2 & value))				\
> +      ins->rex2_used |= value;				\
> +  }
> +
>  
>  #define EVEX_b_used 1

Nit: Please avoid (re)introducing double blank lines. Instead ...

>  #define EVEX_len_used 2
> +/* M0 in rex2 prefix represents map0 or map1.  */
> +#define REX2_M 0x8

... a blank line ahead of this insertion would be helpful.

> @@ -1872,23 +1888,23 @@ static const struct dis386 dis386[] = {
>    { "outs{b|}",		{ indirDXr, Xb }, 0 },
>    { X86_64_TABLE (X86_64_6F) },
>    /* 70 */
> -  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
> +  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
>    /* 78 */
> -  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
> +  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
>    /* 80 */
>    { REG_TABLE (REG_80) },
>    { REG_TABLE (REG_81) },
> @@ -1926,23 +1942,23 @@ static const struct dis386 dis386[] = {
>    { "sahf",		{ XX }, 0 },
>    { "lahf",		{ XX }, 0 },
>    /* a0 */
> -  { "mov%LB",		{ AL, Ob }, 0 },
> -  { "mov%LS",		{ eAX, Ov }, 0 },
> -  { "mov%LB",		{ Ob, AL }, 0 },
> -  { "mov%LS",		{ Ov, eAX }, 0 },
> -  { "movs{b|}",		{ Ybr, Xb }, 0 },
> -  { "movs{R|}",		{ Yvr, Xv }, 0 },
> -  { "cmps{b|}",		{ Xb, Yb }, 0 },
> -  { "cmps{R|}",		{ Xv, Yv }, 0 },
> +  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
> +  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
> +  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
> +  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
> +  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
>    /* a8 */
> -  { "testB",		{ AL, Ib }, 0 },
> -  { "testS",		{ eAX, Iv }, 0 },
> -  { "stosB",		{ Ybr, AL }, 0 },
> -  { "stosS",		{ Yvr, eAX }, 0 },
> -  { "lodsB",		{ ALr, Xb }, 0 },
> -  { "lodsS",		{ eAXr, Xv }, 0 },
> -  { "scasB",		{ AL, Yb }, 0 },
> -  { "scasS",		{ eAX, Yv }, 0 },
> +  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
> +  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
> +  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
> +  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
> +  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
> +  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
> +  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
> +  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
>    /* b0 */
>    { "movB",		{ RMAL, Ib }, 0 },
>    { "movB",		{ RMCL, Ib }, 0 },

Like in the i386-gen.c adjustments for row E look to be missing here, too.

> @@ -2091,12 +2107,12 @@ static const struct dis386 dis386_twobyte[] = {
>    { PREFIX_TABLE (PREFIX_0F2E) },
>    { PREFIX_TABLE (PREFIX_0F2F) },
>    /* 30 */
> -  { "wrmsr",		{ XX }, 0 },
> -  { "rdtsc",		{ XX }, 0 },
> -  { "rdmsr",		{ XX }, 0 },
> -  { "rdpmc",		{ XX }, 0 },
> -  { "sysenter",		{ SEP }, 0 },
> -  { "sysexit%LQ",	{ SEP }, 0 },
> +  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
> +  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
>    { Bad_Opcode },
>    { "getsec",		{ XX }, 0 },
>    /* 38 */

Down from here row 8 also wants adjustment afaict.

> @@ -8289,6 +8313,7 @@ ckprefix (instr_info *ins)
>  {
>    int i, length;
>    uint8_t newrex;
> +  unsigned char rex2_payload;

Please can this be restricted to the inner scope where it's used?

> @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        goto out;
>      }
>  
> -  if (*ins.codep == 0x0f)
> +  /* M0 in rex2 prefix represents map0 or map1.  */
> +  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))

I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?

Also is this, ...

>      {
>        unsigned char threebyte;
>  
> -      ins.codep++;
> -      if (!fetch_code (info, ins.codep + 1))
> -	goto fetch_error_out;
> +      if (!ins.rex2)
> +	{
> +	  ins.codep++;
> +	  if (!fetch_code (info, ins.codep + 1))
> +	    goto fetch_error_out;
> +	}
>        threebyte = *ins.codep;
>        dp = &dis386_twobyte[threebyte];
>        ins.need_modrm = twobyte_has_modrm[threebyte];

... all the way to here, really correct for d5 00 0f?

> @@ -9454,6 +9504,14 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        goto out;
>      }
>  
> +  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> +      && ins.last_rex2_prefix >= 0)
> +    {
> +      i386_dis_printf (info, dis_style_text, "(bad)");
> +      ret = ins.end_codep - priv.the_buffer;
> +      goto out;
> +    }
> +
>    switch (dp->prefix_requirement)
>      {
>      case PREFIX_DATA:
> @@ -9468,6 +9526,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        ins.used_prefixes |= PREFIX_DATA;
>        /* Fall through.  */
>      case PREFIX_OPCODE:
> +    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:

May more robust to mask off PREFIX_REX2_ILLEGAL in the control expression
of the switch()? Or else why don't you move the if() immediately ahead of
the switch() into here, as a new case block?

> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>  
> +  /* Check if the REX2 prefix is used.  */
> +  if (ins.last_rex2_prefix >= 0
> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> +	   && (ins.rex2 & 0x7))

DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0

> +	  || dp == &bad_opcode))

What is this last part of the condition about? Other prefix zapping
code doesn't have such.

> +    ins.all_prefixes[ins.last_rex2_prefix] = 0;
> +
>    /* Check if the SEG prefix is used.  */
>    if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
>  		       | PREFIX_FS | PREFIX_GS)) != 0
> @@ -9541,7 +9607,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>  	if (name == NULL)
>  	  abort ();
>  	prefix_length += strlen (name) + 1;
> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> +	if (ins.all_prefixes[i] == REX2_OPCODE)
> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);

Do braces really count as part of the mnemonic?

> +	else
> +	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
>        }

Aren't you at risk of wrongly printing a REX prefix here if the high 4 bits
of the REX2 payload were all zero, but some of the low 4 bits turned out
unused?

> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      ins->illegal_masking = true;
>  
>    USED_REX (rexmask);
> +  USED_REX2 (rexmask);

Do both really need tracking separately? Whatever consumes REX.B will also
consume REX2.B4, an so on.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-06 14:44         ` Jan Beulich
@ 2023-11-06 16:03           ` Cui, Lili
  2023-11-06 16:10             ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-06 16:03 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 6, 2023 10:45 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
> 
> On 06.11.2023 15:20, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, November 6, 2023 3:30 PM
> >>
> >> On 03.11.2023 17:42, Cui, Lili wrote:
> >>> But if we want to merge bextr's vex and evex formats, we need to
> >>> support BMI&(BMI |( APX_F&x64))
> >>
> >> Maybe more like BMI&(<tbd>|APX_F), with further work (which I was
> >> considering
> >> anyway) towards x64 becoming a prereq to the increasing number of
> >> 64-bit- only features? (The <tbd> may well be BMI as you suggest,
> >> even if that reads a little odd.
> >>
> >
> > Yes, most VEX instructions don't require x64, but apx_f is x64 based. If the
> format "BMI&(BMI |( APX_F&x64))"  is complicated to implement or looks ugly,
> maybe we can handle x64 uniformly for apx_f in tc-i386.c.
> 
> Well, some adjustment is needed there anyway, at the very least for the
> equivalent of e.g. the present handling of AVX|AVX512F or FMA|AVX512F.
> The goal wants to be to balance the amount of special casing code against
> complications in representing data in the opcode table. One question I have is:
> In how far is it necessary to actually represent APX_F in the BMI templates?
> There are two things triggering use of the EVEX encoding,
> iirc: Use of an extended register or NF. Use of an extended register is itself
> already dependent upon APX_F, and whatever the representation of NF is
> going to be, its parsing could be made dependent upon APX_F, too.
> No (strong) need then for the template to enforce APX_F yet another time,
> hopefully.
> 

In [patch 2/8] Support APX GPR32 with extend evex prefix, I only merged AMX's vex and evex formats (both vex and evex require x64), due to x64 reasons, BMI and other VEX instructions listed in 3.1.5 are not merged yet.
NDD also triggers EVEX encoding. There are some VEX instructions that support NF, their vex and evex cannot be merged.

Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-06 15:20     ` Cui, Lili
@ 2023-11-06 16:08       ` Jan Beulich
  2023-11-07  8:16         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06 16:08 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 06.11.2023 16:20, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -406,6 +409,11 @@ struct _i386_insn
>>>      /* Compressed disp8*N attribute.  */
>>>      unsigned int memshift;
>>>
>>> +    /* No CSPAZO flags update.*/
>>> +    bool has_nf;
>>> +
>>> +    bool has_zero_upper;
>>> +
>>
>> Can both please be introduced when they're needed, not randomly ahead of
>> time?
>  
> Moved has_nf to patch 2/8 and deleted has_zero_upper.

Patch 2/8? Not in this series then, I suppose?

>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>
>>> +/* Build (2 bytes) rex2 prefix.
>>> +   | D5h |
>>> +   | m | R4 X4 B4 | W R X B |
>>> +*/
>>> +static void
>>> +build_rex2_prefix (void)
>>> +{
>>> +  i.vex.length = 2;
>>> +  i.vex.bytes[0] = 0xd5;
>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>> +		    | (i.rex2 << 4) | i.rex);
>>> +}
>>
>> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
>> i.vex. Is that really necessary? (If so, a comment next to the field declaration
>> may be warranted.)
>>
> Added comment for it.
> 
>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>                     | (i.rex2 << 4) | i.rex);

How does the comment relate to the (ab)use of i.vex?

>> Speaking of v1: Can you please make sure you have correct version tags on
>> submissions of updated patch versions?
>>
> I used git to send all the patches at once( git send-email  --cover-letter --annotate  --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.

Well, personally I don't use git to send patches. But I know people send
series with proper version tags throughout, all the time.

>>> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
>>>  	case register_type_mismatch:
>>>  	  err_msg = _("register type mismatch");
>>>  	  break;
>>> +	case register_type_of_address_mismatch:
>>> +	  err_msg = _("register type of address mismatch");
>>> +	  break;
>>
>> I have a concern with wording / naming here: If I saw this in an error message,
>> I wouldn't know what is meant. Maybe something along the lines of "cannot
>> use an extended GPR for addressing"? And then the enumerator suitabley
>> renamed as well?
>>
>  Changed to  
> 
> +       case unsupported_EGPR_for_addressing:
> +         err_msg = _("unsupported EGPR for addressing");
> +         break;

May I suggest "extended GPR" in the message text (the enumerator is fine
to have EGPR)?

>>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>>>  	  return;
>>>  	}
>>>
>>> +      /* Check for explicit REX2 prefix.  */
>>> +      if (i.rex2 || i.rex2_encoding)
>>
>> This open-codes is_any_apx_rex2_encoding(). But read on.
>>
>>> +	{
>>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>>
>> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
>> the i.rex2 check above is intended to cover. Error message comment, and
>> condition want to reflect that.
>>
> 
> Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.
> 
>         {rex} vmovaps %xmm7,%xmm2
>         {rex} vmovaps %xmm17,%xmm2
>         {rex} rorx $7,%eax,%ebx
> +       {rex2} vmovaps %xmm7,%xmm2

Right, but please see my "optional vs required" comment in the pseudo-
prefix related patch I did send earlier today. I question the correctness
of the {rex} related check here, which would then extend to the {rex2}
one as well.

>>> +      for (unsigned int op = 0; op < i.operands; op++)
>>> +	{
>>> +	  if (i.types[op].bitfield.class != Reg
>>> +	      /* Special case for (%dx) while doing input/output op */
>>> +	      || i.input_output_operand)
>>
>> Why is this needed? The register table entry for %dx ...
>>
>>> +	    continue;
>>> +
>>> +	  if (i.op[op].regs->reg_flags & RegRex2)
>>
>> ... doesn't have this bit set anyway.
>>
> 
> For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags  will cause segment fault.

I vaguely recall commenting on this anomaly to H.J. - perhaps time to fix
that properly (in a separate patch), to not leave this trap open any longer?
(Otherwise at least a comment is needed here.)

>>> +	    {
>>> +	      i.error = register_type_mismatch;
>>> +	      return 1;
>>> +	    }
>>> +	}
>>> +
>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>> +	{
>>> +	  i.error = register_type_of_address_mismatch;
>>> +	  return 1;
>>> +	}
>>> +
>>> +      /* Check pseudo prefix {rex2} are valid.  */
>>> +      if (i.rex2_encoding)
>>> +	{
>>> +	  i.error = invalid_pseudo_prefix;
>>> +	  return 1;
>>> +	}
>>
>> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
>> apply. Why would an inapplicable {rex2} be treated as an error here? This
>> would then also ...
>>
>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>>>        /* Do not verify operands when there are none.  */
>>>        if (!t->operands)
>>>  	{
>>> -	  if (VEX_check_encoding (t))
>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>>>  	    {
>>>  	      specific_error = progress (i.error);
>>>  	      continue;
>>
>> ... eliminate the need for this change, which is kind of bogus anyway:
>> There are no operands here, so calling a function of the given name is at least
>> suspicious.
>>
> 
> We have these tests and I'm confused whether to remove them or not.
> 
> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> +       {rex2} wrmsr
> +       {rex2} rdtsc
> +       {rex2} rdmsr
> +       {rex2} sysenter
> +       {rex2} sysexitl
> +       {rex2} rdpmc

They should all stay. But as to my comment: There's no use of any eGPR
here. If you want to abuse that function and if there's no better
descriptive name for it, then once again at least a comment is needed.
(Considering this, the attribute's name NoEgpr is probably also
misleading in the cases here, i.e. when there are no operands. Hence,
if not to be renamed, requires yet another comment in i386-opc.h.)

>>> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
>>>  	i.vec_encoding = vex_encoding_error;
>>>      }
>>>
>>> +  if (r->reg_flags & RegRex2)
>>> +    {
>>> +      if (!cpu_arch_flags.bitfield.cpuapx_f
>>> +	  || flag_code != CODE_64BIT)
>>> +	return false;
>>> +    }
>>
>> Please fold the two if()s into one (unless of course you know that the outer
>> one is going to be extended in a subsequent patch).
>>
> 
> Yes, other code will be added in the outer if with patch2/8.

Hmm, you again say patch 2/8, yet that patch in this series clearly
doesn't do anything like that.

>>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
>>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
>>> @@ -11,11 +11,11 @@ Disassembly of section .text:
>>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>>>
>>>  0+1 <aad0>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>>>
>>>  0+3 <aad1>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>>>
>>>  0+5 <aam0>:
>>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
>>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
>>> @@ -11,11 +11,11 @@ Disassembly of section .text:
>>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>>>
>>>  0+1 <aad0>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>>>
>>>  0+3 <aad1>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>>>
>>>  0+5 <aam0>:
>>
>> These expectations match the ones of the same test in the parent directory.
>> Hence instead of adjusting each in both places, please have the ones here
>> reference the parent directory files.
>>
> 
> They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.
> 
> # All the followings are illegal opcodes for x86-64.
> aad0:
>         aad
> aad1:
>         aad $2

Right, but how does this relate to my request to simply fold the
expectations here with that of the same test in the parent directory?
(You'll find various examples under ilp32/ where I've done such
folding already, whenever I had to touch both instances anyway.)

>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>>    return elem_size;
>>>  }
>>>
>>> +static bool
>>> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
>> int space,
>>> +			       const char *cpu_flags)
>>> +{
>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
>>> +#UD.  */
>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
>>> +      || strcmp (cpu_flags, "Xsave") >= 0
>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
>>
>> Upon further thought for these (and maybe even ...
>>
>>> +      || !strcmp (cpu_flags, "3dnow")
>>> +      || !strcmp (cpu_flags, "3dnowA"))
>>
>> ... for these, but see also below) it might be better to add the attribute right in
>> the opcode table.
>>
>> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
>> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
>> to be used like that would appear only consistent. Otherwise, if we were
>> concerned of AMD extensions in general, SSE4a insns (and maybe further
>> ones) would also need excluding. (Additionally recall that there's an overlap
>> between 3dnowa and SSE, which would result in another [apparent]
>> inconsistency when excluding 3dnow insns here.)
>>
> 
> I see, for example  I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

I'm afraid I don't understand the question. All I've asked for is that
the special treatment of 3dnow insns be removed again. Unless you want
to special-case further insns; it's not really clear to me what's best,
as both approaches have noticable downsides (either we allow to encode
something which may never become valid, or we disallow something which
may become valid).

In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui
they can all be encoded with REX2.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-06 16:03           ` Cui, Lili
@ 2023-11-06 16:10             ` Jan Beulich
  2023-11-07  1:53               ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-06 16:10 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 06.11.2023 17:03, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 6, 2023 10:45 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
>> binutils@sourceware.org
>> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
>>
>> On 06.11.2023 15:20, Cui, Lili wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Monday, November 6, 2023 3:30 PM
>>>>
>>>> On 03.11.2023 17:42, Cui, Lili wrote:
>>>>> But if we want to merge bextr's vex and evex formats, we need to
>>>>> support BMI&(BMI |( APX_F&x64))
>>>>
>>>> Maybe more like BMI&(<tbd>|APX_F), with further work (which I was
>>>> considering
>>>> anyway) towards x64 becoming a prereq to the increasing number of
>>>> 64-bit- only features? (The <tbd> may well be BMI as you suggest,
>>>> even if that reads a little odd.
>>>>
>>>
>>> Yes, most VEX instructions don't require x64, but apx_f is x64 based. If the
>> format "BMI&(BMI |( APX_F&x64))"  is complicated to implement or looks ugly,
>> maybe we can handle x64 uniformly for apx_f in tc-i386.c.
>>
>> Well, some adjustment is needed there anyway, at the very least for the
>> equivalent of e.g. the present handling of AVX|AVX512F or FMA|AVX512F.
>> The goal wants to be to balance the amount of special casing code against
>> complications in representing data in the opcode table. One question I have is:
>> In how far is it necessary to actually represent APX_F in the BMI templates?
>> There are two things triggering use of the EVEX encoding,
>> iirc: Use of an extended register or NF. Use of an extended register is itself
>> already dependent upon APX_F, and whatever the representation of NF is
>> going to be, its parsing could be made dependent upon APX_F, too.
>> No (strong) need then for the template to enforce APX_F yet another time,
>> hopefully.
>>
> 
> In [patch 2/8] Support APX GPR32 with extend evex prefix,

Yet again patch 2/8, but this time the title reference at least clarifies
you mean 3/8.

> I only merged AMX's vex and evex formats (both vex and evex require x64), due to x64 reasons, BMI and other VEX instructions listed in 3.1.5 are not merged yet.

I see. Looking at patch 3 is next.

> NDD also triggers EVEX encoding.

But not for insns which were previously VEX-encoded?

> There are some VEX instructions that support NF, their vex and evex cannot be merged.

Why not?

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-06 16:10             ` Jan Beulich
@ 2023-11-07  1:53               ` Cui, Lili
  2023-11-07 10:11                 ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-07  1:53 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> >> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
> >>
> >> On 06.11.2023 15:20, Cui, Lili wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: Monday, November 6, 2023 3:30 PM
> >>>>
> >>>> On 03.11.2023 17:42, Cui, Lili wrote:
> >>>>> But if we want to merge bextr's vex and evex formats, we need to
> >>>>> support BMI&(BMI |( APX_F&x64))
> >>>>
> >>>> Maybe more like BMI&(<tbd>|APX_F), with further work (which I was
> >>>> considering
> >>>> anyway) towards x64 becoming a prereq to the increasing number of
> >>>> 64-bit- only features? (The <tbd> may well be BMI as you suggest,
> >>>> even if that reads a little odd.
> >>>>
> >>>
> >>> Yes, most VEX instructions don't require x64, but apx_f is x64
> >>> based. If the
> >> format "BMI&(BMI |( APX_F&x64))"  is complicated to implement or
> >> looks ugly, maybe we can handle x64 uniformly for apx_f in tc-i386.c.
> >>
> >> Well, some adjustment is needed there anyway, at the very least for
> >> the equivalent of e.g. the present handling of AVX|AVX512F or
> FMA|AVX512F.
> >> The goal wants to be to balance the amount of special casing code
> >> against complications in representing data in the opcode table. One
> question I have is:
> >> In how far is it necessary to actually represent APX_F in the BMI templates?
> >> There are two things triggering use of the EVEX encoding,
> >> iirc: Use of an extended register or NF. Use of an extended register
> >> is itself already dependent upon APX_F, and whatever the
> >> representation of NF is going to be, its parsing could be made dependent
> upon APX_F, too.
> >> No (strong) need then for the template to enforce APX_F yet another
> >> time, hopefully.
> >>
> >
> > In [patch 2/8] Support APX GPR32 with extend evex prefix,
> 
> Yet again patch 2/8, but this time the title reference at least clarifies you mean
> 3/8.
> 

I was all mixed up last night.

> > I only merged AMX's vex and evex formats (both vex and evex require x64),
> due to x64 reasons, BMI and other VEX instructions listed in 3.1.5 are not
> merged yet.
> 
> I see. Looking at patch 3 is next.
> 
> > NDD also triggers EVEX encoding.
> 
> But not for insns which were previously VEX-encoded?
> 
EVEX promote from VEX has no ND bit. You are right, there are only two factors extend register and NF.

> > There are some VEX instructions that support NF, their vex and evex cannot
> be merged.
> 
> Why not?
> 
Like bextr, new EVEX table has NF tag, I think we cannot merge them.

bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
bextr, 0xf7, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-06 15:02   ` Jan Beulich
@ 2023-11-07  8:06     ` Cui, Lili
  2023-11-07 10:20       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-07  8:06 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 6, 2023 11:03 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
> *mod, unsigned int space,
> >  	fprintf (stderr,
> >  		 "%s: %d: W modifier without Word/Dword/Qword
> operand(s)\n",
> >  		 filename, lineno);
> > +
> > +      /* The part about judging EVEX encoding should be synchronized with
> > +	 is_evex_encoding.  */
> > +      if (modifiers[Vex].value
> > +	  || ((space > SPACE_0F || has_special_handle)
> > +	      && !modifiers[EVex].value
> > +	      && !modifiers[Disp8MemShift].value
> > +	      && !modifiers[Broadcast].value
> > +	      && !modifiers[Masking].value
> > +	      && !modifiers[SAE].value))
> > +	modifiers[NoEgpr].value = 1;
> 
> While this is just i386-gen (and hence being somewhat inefficient isn't the end
> of the world) I still wonder whether we need all the parts of this condition:
> Do we really need all the constituents of this EVEX related checks? Wouldn't it
> also help is_evex_encoding() if we switched to uniformly having EVex
> attributes on all EVEX templates? A presently missing EVex attribute, after all,
> merely is another way of saying EVexDYN, if I'm not mistaken. (Such an
> adjustment, if deemed to help, would of course want to come as a separate,
> prereq patch.)
> 

Yes, EVex is another way of saying EVexDYN, it should be appear in every EVEX template, when we merge EVex128, EVex256 and EVex512 into one template we omitted the expression of EVexDYN. So some EVEX templates don’t have this tag. If we want to re-add it, we need new values.

Such as:
vcvttps2dq, 0xF35B, AVX512F, Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }

> Furthermore, is this correct at all for mixed VEX/EVEX templates?
> 
After merging the templates we only have one entry and I prefer to set [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in check_EgprOperands function. 

check_EgprOperands (const insn_template *t)
 {
-  if (t->opcode_modifier.noegpr)
+  if (t->opcode_modifier.noegpr && !need_evex_encoding())


> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:x64,
> > + nooptimize:NoOptimize:0>
> 
> Seeing this I realized that there's something missing here (an APX_F
> dependency), which then again would not have had an effect without the
> patch [1] sent earlier today.
> 
> Jan
> 
> [1] https://sourceware.org/pipermail/binutils/2023-November/130345.html

Changed to 

+#define APX_F_64 APX_F|x64
+
<pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F_64, nooptimize:NoOptimize:0>

When we have" x86: split insn templates' CPU field" in trunk, I will change it to #define APX_F_64 APX_F&x64.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-06 16:08       ` Jan Beulich
@ 2023-11-07  8:16         ` Cui, Lili
  2023-11-07 10:43           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-07  8:16 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> >>> @@ -406,6 +409,11 @@ struct _i386_insn
> >>>      /* Compressed disp8*N attribute.  */
> >>>      unsigned int memshift;
> >>>
> >>> +    /* No CSPAZO flags update.*/
> >>> +    bool has_nf;
> >>> +
> >>> +    bool has_zero_upper;
> >>> +
> >>
> >> Can both please be introduced when they're needed, not randomly ahead
> >> of time?
> >
> > Moved has_nf to patch 2/8 and deleted has_zero_upper.
> 
> Patch 2/8? Not in this series then, I suppose?
> 
It should be patch 3/8. 😊

> >>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>
> >>> +/* Build (2 bytes) rex2 prefix.
> >>> +   | D5h |
> >>> +   | m | R4 X4 B4 | W R X B |
> >>> +*/
> >>> +static void
> >>> +build_rex2_prefix (void)
> >>> +{
> >>> +  i.vex.length = 2;
> >>> +  i.vex.bytes[0] = 0xd5;
> >>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>> +		    | (i.rex2 << 4) | i.rex);
> >>> +}
> >>
> >> I may have asked on v1 already: For emitting REX we don't resort to
> >> (ab)using i.vex. Is that really necessary? (If so, a comment next to
> >> the field declaration may be warranted.)
> >>
> > Added comment for it.
> >
> >   /* For the W R X B bits, the variables of rex prefix will be reused.  */
> >   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >                     | (i.rex2 << 4) | i.rex);
> 
> How does the comment relate to the (ab)use of i.vex?
>
Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable, but in the output_insn function they have the same special handling of i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code. 

> >> Speaking of v1: Can you please make sure you have correct version
> >> tags on submissions of updated patch versions?
> >>
> > I used git to send all the patches at once( git send-email  --cover-letter --
> annotate  --to="..." -8), which only has the opportunity to change the version of
> the cover letter patch. To change the version of each patch, I can send them
> one by one next time. By the way, do you have a better way? Or how did you
> modify them? Thanks.
> 
> Well, personally I don't use git to send patches. But I know people send series
> with proper version tags throughout, all the time.
> 
Ok, I will pay attention for it.

> >>> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> >>>  	case register_type_mismatch:
> >>>  	  err_msg = _("register type mismatch");
> >>>  	  break;
> >>> +	case register_type_of_address_mismatch:
> >>> +	  err_msg = _("register type of address mismatch");
> >>> +	  break;
> >>
> >> I have a concern with wording / naming here: If I saw this in an
> >> error message, I wouldn't know what is meant. Maybe something along
> >> the lines of "cannot use an extended GPR for addressing"? And then
> >> the enumerator suitabley renamed as well?
> >>
> >  Changed to
> >
> > +       case unsupported_EGPR_for_addressing:
> > +         err_msg = _("unsupported EGPR for addressing");
> > +         break;
> 
> May I suggest "extended GPR" in the message text (the enumerator is fine to
> have EGPR)?
> 

Sure.

> >>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> >>>  	  return;
> >>>  	}
> >>>
> >>> +      /* Check for explicit REX2 prefix.  */
> >>> +      if (i.rex2 || i.rex2_encoding)
> >>
> >> This open-codes is_any_apx_rex2_encoding(). But read on.
> >>
> >>> +	{
> >>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> >>
> >> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
> >> what case the i.rex2 check above is intended to cover. Error message
> >> comment, and condition want to reflect that.
> >>
> >
> > Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase
> for it.
> >
> >         {rex} vmovaps %xmm7,%xmm2
> >         {rex} vmovaps %xmm17,%xmm2
> >         {rex} rorx $7,%eax,%ebx
> > +       {rex2} vmovaps %xmm7,%xmm2
> 
> Right, but please see my "optional vs required" comment in the pseudo- prefix
> related patch I did send earlier today. I question the correctness of the {rex}
> related check here, which would then extend to the {rex2} one as well.
> 

A REX byte that is immediately followed by a legacy prefix byte (LOCK, REPE, REPNE, OSIZE override, ASIZE override, or segment overrides) or another REX byte is ignored and behaves as if it does not exist (except for contributing to the instruction length)
but in this case I think it's correct.

> >>> +      for (unsigned int op = 0; op < i.operands; op++)
> >>> +	{
> >>> +	  if (i.types[op].bitfield.class != Reg
> >>> +	      /* Special case for (%dx) while doing input/output op */
> >>> +	      || i.input_output_operand)
> >>
> >> Why is this needed? The register table entry for %dx ...
> >>
> >>> +	    continue;
> >>> +
> >>> +	  if (i.op[op].regs->reg_flags & RegRex2)
> >>
> >> ... doesn't have this bit set anyway.
> >>
> >
> > For this special case i.op is empty, we need continue, otherwise r
> i.op[op].regs->reg_flags  will cause segment fault.
> 
> I vaguely recall commenting on this anomaly to H.J. - perhaps time to fix that
> properly (in a separate patch), to not leave this trap open any longer?
> (Otherwise at least a comment is needed here.)
>

H.J is on vacation and I will try to fix this.

> >>> +	    {
> >>> +	      i.error = register_type_mismatch;
> >>> +	      return 1;
> >>> +	    }
> >>> +	}
> >>> +
> >>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >>> +	{
> >>> +	  i.error = register_type_of_address_mismatch;
> >>> +	  return 1;
> >>> +	}
> >>> +
> >>> +      /* Check pseudo prefix {rex2} are valid.  */
> >>> +      if (i.rex2_encoding)
> >>> +	{
> >>> +	  i.error = invalid_pseudo_prefix;
> >>> +	  return 1;
> >>> +	}
> >>
> >> Further up in md_assemble() {rex} or {rex2} is simply ignored when
> >> wrong to apply. Why would an inapplicable {rex2} be treated as an
> >> error here? This would then also ...
> >>
> >>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >>>        /* Do not verify operands when there are none.  */
> >>>        if (!t->operands)
> >>>  	{
> >>> -	  if (VEX_check_encoding (t))
> >>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >>>  	    {
> >>>  	      specific_error = progress (i.error);
> >>>  	      continue;
> >>
> >> ... eliminate the need for this change, which is kind of bogus anyway:
> >> There are no operands here, so calling a function of the given name
> >> is at least suspicious.
> >>
> >
> > We have these tests and I'm confused whether to remove them or not.
> >
> > +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> > +       {rex2} wrmsr
> > +       {rex2} rdtsc
> > +       {rex2} rdmsr
> > +       {rex2} sysenter
> > +       {rex2} sysexitl
> > +       {rex2} rdpmc
> 
> They should all stay. But as to my comment: There's no use of any eGPR here. If
> you want to abuse that function and if there's no better descriptive name for it,
> then once again at least a comment is needed.
> (Considering this, the attribute's name NoEgpr is probably also misleading in
> the cases here, i.e. when there are no operands. Hence, if not to be renamed,
> requires yet another comment in i386-opc.h.)
> 
This question also confused me , some instructions only support Acc register, but we need to add NoEgpr for them, this seems a bit strange. if we use NoRex2 , it doesn't fit the vex and evex instructions either. So I will add comments to it for now.

+         /* When there are no operands, we still need to use the
+            check_EgprOperands function to check whether {rex2} is valid.  */
          if (VEX_check_encoding (t) || check_EgprOperands (t))

-  /* egprs (r16-r31) on instruction illegal.  */
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,

> >>> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry
> *r)
> >>>  	i.vec_encoding = vex_encoding_error;
> >>>      }
> >>>
> >>> +  if (r->reg_flags & RegRex2)
> >>> +    {
> >>> +      if (!cpu_arch_flags.bitfield.cpuapx_f
> >>> +	  || flag_code != CODE_64BIT)
> >>> +	return false;
> >>> +    }
> >>
> >> Please fold the two if()s into one (unless of course you know that
> >> the outer one is going to be extended in a subsequent patch).
> >>
> >
> > Yes, other code will be added in the outer if with patch2/8.
> 
> Hmm, you again say patch 2/8, yet that patch in this series clearly doesn't do
> anything like that.
> 

It should be patch 3/8.

> >>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> >>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> >>> @@ -11,11 +11,11 @@ Disassembly of section .text:
> >>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >>>
> >>>  0+1 <aad0>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >>>
> >>>  0+3 <aad1>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >>>
> >>>  0+5 <aam0>:
> >>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> >>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> >>> @@ -11,11 +11,11 @@ Disassembly of section .text:
> >>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >>>
> >>>  0+1 <aad0>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >>>
> >>>  0+3 <aad1>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >>>
> >>>  0+5 <aam0>:
> >>
> >> These expectations match the ones of the same test in the parent directory.
> >> Hence instead of adjusting each in both places, please have the ones
> >> here reference the parent directory files.
> >>
> >
> > They are used to test illegal opcodes for x86-64. Since D5 now makes sense,
> these two test cases were removed.
> >
> > # All the followings are illegal opcodes for x86-64.
> > aad0:
> >         aad
> > aad1:
> >         aad $2
> 
> Right, but how does this relate to my request to simply fold the expectations
> here with that of the same test in the parent directory?
> (You'll find various examples under ilp32/ where I've done such folding already,
> whenever I had to touch both instances anyway.)
> 

Oh, got it.

> >>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >>>    return elem_size;
> >>>  }
> >>>
> >>> +static bool
> >>> +if_entry_needs_special_handle (const unsigned long long opcode,
> >>> +unsigned
> >> int space,
> >>> +			       const char *cpu_flags)
> >>> +{
> >>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>> +#UD.  */
> >>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> >>
> >> Upon further thought for these (and maybe even ...
> >>
> >>> +      || !strcmp (cpu_flags, "3dnow")
> >>> +      || !strcmp (cpu_flags, "3dnowA"))
> >>
> >> ... for these, but see also below) it might be better to add the
> >> attribute right in the opcode table.
> >>
> >> As to the 3dnow insns - I think I'd like to revise my earlier
> >> suggestion to also tag those. Like e.g. FPU insns they're pretty
> >> normal GPR-wise, so allowing them to be used like that would appear
> >> only consistent. Otherwise, if we were concerned of AMD extensions in
> >> general, SSE4a insns (and maybe further
> >> ones) would also need excluding. (Additionally recall that there's an
> >> overlap between 3dnowa and SSE, which would result in another
> >> [apparent] inconsistency when excluding 3dnow insns here.)
> >>
> >
> > I see, for example  I think I need to split this table into two parts, one is for
> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> > pextrw, 0xfc5, SSE|3dnowA,
> > Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> RegMMX,
> > Reg32|Reg64 }
> 
> I'm afraid I don't understand the question. All I've asked for is that the special
> treatment of 3dnow insns be removed again. Unless you want to special-case
> further insns; it's not really clear to me what's best, as both approaches have
> noticable downsides (either we allow to encode something which may never
> become valid, or we disallow something which may become valid).
> 
> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui they can
> all be encoded with REX2.
> 
I need to correct it:  There are some instructions table present both SSE and AMD instructions. I need to split them first and then add NoEgpr to AMD instructions.
Another point is that we have not split the common instructions of AMD and Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make much sense.

Do you want me also to remove this part  and add  NoEgpr in insn table?
> >>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>> +#UD.  */
> >>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH v2 0/8] Support Intel APX EGPR
  2023-11-07  1:53               ` Cui, Lili
@ 2023-11-07 10:11                 ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-07 10:11 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 07.11.2023 02:53, Cui, Lili wrote:
>>>> Subject: Re: [PATCH v2 0/8] Support Intel APX EGPR
>>>>
>>>> On 06.11.2023 15:20, Cui, Lili wrote:
>>> There are some VEX instructions that support NF, their vex and evex cannot
>> be merged.
>>
>> Why not?
>>
> Like bextr, new EVEX table has NF tag, I think we cannot merge them.
> 
> bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> bextr, 0xf7, BMI|APX_F, Modrm|CheckOperandSize|EVex128|NF|Space0F38|VexVVVVSrc|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

NF is no different than, say, SAE (which didn't get in the way of folding VEX
and EVEX templates). It's a(nother) reliable indication that EVEX encoding is
going to be needed.

Jan


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07  8:06     ` Cui, Lili
@ 2023-11-07 10:20       ` Jan Beulich
  2023-11-07 14:32         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-07 10:20 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 07.11.2023 09:06, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 6, 2023 11:03 PM
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
>> *mod, unsigned int space,
>>>  	fprintf (stderr,
>>>  		 "%s: %d: W modifier without Word/Dword/Qword
>> operand(s)\n",
>>>  		 filename, lineno);
>>> +
>>> +      /* The part about judging EVEX encoding should be synchronized with
>>> +	 is_evex_encoding.  */
>>> +      if (modifiers[Vex].value
>>> +	  || ((space > SPACE_0F || has_special_handle)
>>> +	      && !modifiers[EVex].value
>>> +	      && !modifiers[Disp8MemShift].value
>>> +	      && !modifiers[Broadcast].value
>>> +	      && !modifiers[Masking].value
>>> +	      && !modifiers[SAE].value))
>>> +	modifiers[NoEgpr].value = 1;
>>
>> While this is just i386-gen (and hence being somewhat inefficient isn't the end
>> of the world) I still wonder whether we need all the parts of this condition:
>> Do we really need all the constituents of this EVEX related checks? Wouldn't it
>> also help is_evex_encoding() if we switched to uniformly having EVex
>> attributes on all EVEX templates? A presently missing EVex attribute, after all,
>> merely is another way of saying EVexDYN, if I'm not mistaken. (Such an
>> adjustment, if deemed to help, would of course want to come as a separate,
>> prereq patch.)
>>
> 
> Yes, EVex is another way of saying EVexDYN, it should be appear in every EVEX template, when we merge EVex128, EVex256 and EVex512 into one template we omitted the expression of EVexDYN. So some EVEX templates don’t have this tag. If we want to re-add it, we need new values.

I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
the same time. Otoh ...

> Such as:
> vcvttps2dq, 0xF35B, AVX512F, Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }

... aiui this one could have EVexDYN added without change in behavior, but
would then allow being recognized as needed EVEX-encoding by just checking
the .evex field, not any of the other fields is_evex_encoding() presently
needs to check.

>> Furthermore, is this correct at all for mixed VEX/EVEX templates?
>>
> After merging the templates we only have one entry and I prefer to set [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in check_EgprOperands function. 
> 
> check_EgprOperands (const insn_template *t)
>  {
> -  if (t->opcode_modifier.noegpr)
> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())

So why would you add an attribute just to then ignore it by adding extra
code?

>>> --- a/opcodes/i386-opc.tbl
>>> +++ b/opcodes/i386-opc.tbl
>>> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>>> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>>>                        load:Load:0, store:Store:0, +
>>>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
>>> -                      rex:REX:x64, nooptimize:NoOptimize:0>
>>> +                      rex:REX:x64, rex2:REX2:x64,
>>> + nooptimize:NoOptimize:0>
>>
>> Seeing this I realized that there's something missing here (an APX_F
>> dependency), which then again would not have had an effect without the
>> patch [1] sent earlier today.
>>
>> Jan
>>
>> [1] https://sourceware.org/pipermail/binutils/2023-November/130345.html
> 
> Changed to 
> 
> +#define APX_F_64 APX_F|x64
> +
> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:APX_F_64, nooptimize:NoOptimize:0>
> 
> When we have" x86: split insn templates' CPU field" in trunk, I will change it to #define APX_F_64 APX_F&x64.

I've meanwhile put together the Cpu64 patch I was thinking of. No "&x64"
should then be needed anymore for any of the APX templates. Before sending
that one out, I will want to first see whether I can re-order it with the
patch sent earlier, as this would allow that other patch to shrink in
size (fewer "|x64" to convert to "&x64").

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07  8:16         ` Cui, Lili
@ 2023-11-07 10:43           ` Jan Beulich
  2023-11-07 15:31             ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-07 10:43 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 07.11.2023 09:16, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>>>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>>>
>>>>> +/* Build (2 bytes) rex2 prefix.
>>>>> +   | D5h |
>>>>> +   | m | R4 X4 B4 | W R X B |
>>>>> +*/
>>>>> +static void
>>>>> +build_rex2_prefix (void)
>>>>> +{
>>>>> +  i.vex.length = 2;
>>>>> +  i.vex.bytes[0] = 0xd5;
>>>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>>> +		    | (i.rex2 << 4) | i.rex);
>>>>> +}
>>>>
>>>> I may have asked on v1 already: For emitting REX we don't resort to
>>>> (ab)using i.vex. Is that really necessary? (If so, a comment next to
>>>> the field declaration may be warranted.)
>>>>
>>> Added comment for it.
>>>
>>>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
>>>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>                     | (i.rex2 << 4) | i.rex);
>>
>> How does the comment relate to the (ab)use of i.vex?
>>
> Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable, but in the output_insn function they have the same special handling of i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code. 

Things like this are very helpful to explain in the patch description.

>>>>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>>>>>  	  return;
>>>>>  	}
>>>>>
>>>>> +      /* Check for explicit REX2 prefix.  */
>>>>> +      if (i.rex2 || i.rex2_encoding)
>>>>
>>>> This open-codes is_any_apx_rex2_encoding(). But read on.
>>>>
>>>>> +	{
>>>>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>>>>
>>>> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
>>>> what case the i.rex2 check above is intended to cover. Error message
>>>> comment, and condition want to reflect that.
>>>>
>>>
>>> Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase
>> for it.
>>>
>>>         {rex} vmovaps %xmm7,%xmm2
>>>         {rex} vmovaps %xmm17,%xmm2
>>>         {rex} rorx $7,%eax,%ebx
>>> +       {rex2} vmovaps %xmm7,%xmm2
>>
>> Right, but please see my "optional vs required" comment in the pseudo- prefix
>> related patch I did send earlier today. I question the correctness of the {rex}
>> related check here, which would then extend to the {rex2} one as well.
>>
> 
> A REX byte that is immediately followed by a legacy prefix byte (LOCK, REPE, REPNE, OSIZE override, ASIZE override, or segment overrides) or another REX byte is ignored and behaves as if it does not exist (except for contributing to the instruction length)
> but in this case I think it's correct.

I'm afraid I can't relate this to the aspect I raised above. Perhaps better to
discuss in the context of the patch that I sent (and that I mentioned above;
"x86: CPU-qualify {disp16} / {disp32}"). You did reply to the patch, but you
didn't reply to the more detailed description of the issue (which I did refer
to above).

>>>>> +	    {
>>>>> +	      i.error = register_type_mismatch;
>>>>> +	      return 1;
>>>>> +	    }
>>>>> +	}
>>>>> +
>>>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>>>> +	{
>>>>> +	  i.error = register_type_of_address_mismatch;
>>>>> +	  return 1;
>>>>> +	}
>>>>> +
>>>>> +      /* Check pseudo prefix {rex2} are valid.  */
>>>>> +      if (i.rex2_encoding)
>>>>> +	{
>>>>> +	  i.error = invalid_pseudo_prefix;
>>>>> +	  return 1;
>>>>> +	}
>>>>
>>>> Further up in md_assemble() {rex} or {rex2} is simply ignored when
>>>> wrong to apply. Why would an inapplicable {rex2} be treated as an
>>>> error here? This would then also ...
>>>>
>>>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>>>>>        /* Do not verify operands when there are none.  */
>>>>>        if (!t->operands)
>>>>>  	{
>>>>> -	  if (VEX_check_encoding (t))
>>>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>>>>>  	    {
>>>>>  	      specific_error = progress (i.error);
>>>>>  	      continue;
>>>>
>>>> ... eliminate the need for this change, which is kind of bogus anyway:
>>>> There are no operands here, so calling a function of the given name
>>>> is at least suspicious.
>>>>
>>>
>>> We have these tests and I'm confused whether to remove them or not.
>>>
>>> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
>>> +       {rex2} wrmsr
>>> +       {rex2} rdtsc
>>> +       {rex2} rdmsr
>>> +       {rex2} sysenter
>>> +       {rex2} sysexitl
>>> +       {rex2} rdpmc
>>
>> They should all stay. But as to my comment: There's no use of any eGPR here. If
>> you want to abuse that function and if there's no better descriptive name for it,
>> then once again at least a comment is needed.
>> (Considering this, the attribute's name NoEgpr is probably also misleading in
>> the cases here, i.e. when there are no operands. Hence, if not to be renamed,
>> requires yet another comment in i386-opc.h.)
>>
> This question also confused me , some instructions only support Acc register, but we need to add NoEgpr for them, this seems a bit strange. if we use NoRex2 , it doesn't fit the vex and evex instructions either. So I will add comments to it for now.
> 
> +         /* When there are no operands, we still need to use the
> +            check_EgprOperands function to check whether {rex2} is valid.  */
>           if (VEX_check_encoding (t) || check_EgprOperands (t))
> 
> -  /* egprs (r16-r31) on instruction illegal.  */
> +  /* egprs (r16-r31) on instruction illegal. We also use it to judge
> +     whether the instruction supports pseudo-prefix {rex2}.  */
>    NoEgpr,

This looks okay commentary-wise, but as per above we first need to settle on
whether an inapplicable {rex2} shouldn't simply be ignored.

>>>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>>>>    return elem_size;
>>>>>  }
>>>>>
>>>>> +static bool
>>>>> +if_entry_needs_special_handle (const unsigned long long opcode,
>>>>> +unsigned
>>>> int space,
>>>>> +			       const char *cpu_flags)
>>>>> +{
>>>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
>>>>> +#UD.  */
>>>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
>>>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsave") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
>>>>
>>>> Upon further thought for these (and maybe even ...
>>>>
>>>>> +      || !strcmp (cpu_flags, "3dnow")
>>>>> +      || !strcmp (cpu_flags, "3dnowA"))
>>>>
>>>> ... for these, but see also below) it might be better to add the
>>>> attribute right in the opcode table.
>>>>
>>>> As to the 3dnow insns - I think I'd like to revise my earlier
>>>> suggestion to also tag those. Like e.g. FPU insns they're pretty
>>>> normal GPR-wise, so allowing them to be used like that would appear
>>>> only consistent. Otherwise, if we were concerned of AMD extensions in
>>>> general, SSE4a insns (and maybe further
>>>> ones) would also need excluding. (Additionally recall that there's an
>>>> overlap between 3dnowa and SSE, which would result in another
>>>> [apparent] inconsistency when excluding 3dnow insns here.)
>>>>
>>>
>>> I see, for example  I think I need to split this table into two parts, one is for
>> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
>>> pextrw, 0xfc5, SSE|3dnowA,
>>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
>> RegMMX,
>>> Reg32|Reg64 }
>>
>> I'm afraid I don't understand the question. All I've asked for is that the special
>> treatment of 3dnow insns be removed again. Unless you want to special-case
>> further insns; it's not really clear to me what's best, as both approaches have
>> noticable downsides (either we allow to encode something which may never
>> become valid, or we disallow something which may become valid).
>>
>> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui they can
>> all be encoded with REX2.
>>
> I need to correct it:  There are some instructions table present both SSE and AMD instructions. I need to split them first and then add NoEgpr to AMD instructions.
> Another point is that we have not split the common instructions of AMD and Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make much sense.
> 
> Do you want me also to remove this part  and add  NoEgpr in insn table?

First we need to settle on what to do with 3DNow!, SSE4a, and maybe further
AMD-only insns (beyond e.g. XOP and TBM ones, which aiui are covered by
virtue of being VEX[-like], and hence never eligible for eGPR use). Then we
can sort out how to best express what we have decided to enforce.

I'm not convinced at all that templates like that for MASKMOVQ would need
splitting: The difference would be noticeable only if someone disabled SSE,
but kept 3DNow! and APX_F enabled. We could easily document the resulting
pitfall instead.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07 10:20       ` Jan Beulich
@ 2023-11-07 14:32         ` Cui, Lili
  2023-11-07 15:08           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-07 14:32 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 09:06, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, November 6, 2023 11:03 PM
> >>
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> >>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table,
> >>> char
> >> *mod, unsigned int space,
> >>>  	fprintf (stderr,
> >>>  		 "%s: %d: W modifier without Word/Dword/Qword
> >> operand(s)\n",
> >>>  		 filename, lineno);
> >>> +
> >>> +      /* The part about judging EVEX encoding should be synchronized with
> >>> +	 is_evex_encoding.  */
> >>> +      if (modifiers[Vex].value
> >>> +	  || ((space > SPACE_0F || has_special_handle)
> >>> +	      && !modifiers[EVex].value
> >>> +	      && !modifiers[Disp8MemShift].value
> >>> +	      && !modifiers[Broadcast].value
> >>> +	      && !modifiers[Masking].value
> >>> +	      && !modifiers[SAE].value))
> >>> +	modifiers[NoEgpr].value = 1;
> >>
> >> While this is just i386-gen (and hence being somewhat inefficient
> >> isn't the end of the world) I still wonder whether we need all the parts of
> this condition:
> >> Do we really need all the constituents of this EVEX related checks?
> >> Wouldn't it also help is_evex_encoding() if we switched to uniformly
> >> having EVex attributes on all EVEX templates? A presently missing
> >> EVex attribute, after all, merely is another way of saying EVexDYN,
> >> if I'm not mistaken. (Such an adjustment, if deemed to help, would of
> >> course want to come as a separate, prereq patch.)
> >>
> >
> > Yes, EVex is another way of saying EVexDYN, it should be appear in every
> EVEX template, when we merge EVex128, EVex256 and EVex512 into one
> template we omitted the expression of EVexDYN. So some EVEX templates
> don’t have this tag. If we want to re-add it, we need new values.
> 
> I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
> the same time. Otoh ...
> 
> > Such as:
> > vcvttps2dq, 0xF35B, AVX512F,
> >
> Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize
> |No
> > Suf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex,
> > RegXMM|RegYMM|RegZMM }
> 
> ... aiui this one could have EVexDYN added without change in behavior, but
> would then allow being recognized as needed EVEX-encoding by just checking
> the .evex field, not any of the other fields is_evex_encoding() presently needs
> to check.
>

We have the same idea, that's what I mean too. I'd be happy to rewrite a patch later to implement it. Maybe it needs to be done after these patches are committed to the trunk.

> >> Furthermore, is this correct at all for mixed VEX/EVEX templates?
> >>
> > After merging the templates we only have one entry and I prefer to set
> [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in
> check_EgprOperands function.
> >
> > check_EgprOperands (const insn_template *t)  {
> > -  if (t->opcode_modifier.noegpr)
> > +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
> 
> So why would you add an attribute just to then ignore it by adding extra code?
> 
Since all EVEX encodings support EGPR, the attribute noegpr has little meaning for evex, so when vex and evex share an entry, we use noegpr to indicate the vex format.

> >>> --- a/opcodes/i386-opc.tbl
> >>> +++ b/opcodes/i386-opc.tbl
> >>> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> >>> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >>>                        load:Load:0, store:Store:0, +
> >>>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> >>> -                      rex:REX:x64, nooptimize:NoOptimize:0>
> >>> +                      rex:REX:x64, rex2:REX2:x64,
> >>> + nooptimize:NoOptimize:0>
> >>
> >> Seeing this I realized that there's something missing here (an APX_F
> >> dependency), which then again would not have had an effect without
> >> the patch [1] sent earlier today.
> >>
> >> Jan
> >>
> >> [1]
> >> https://sourceware.org/pipermail/binutils/2023-November/130345.html
> >
> > Changed to
> >
> > +#define APX_F_64 APX_F|x64
> > +
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:APX_F_64,
> > + nooptimize:NoOptimize:0>
> >
> > When we have" x86: split insn templates' CPU field" in trunk, I will change it
> to #define APX_F_64 APX_F&x64.
> 
> I've meanwhile put together the Cpu64 patch I was thinking of. No "&x64"
> should then be needed anymore for any of the APX templates. Before sending
> that one out, I will want to first see whether I can re-order it with the patch
> sent earlier, as this would allow that other patch to shrink in size (fewer "|x64"
> to convert to "&x64").
> 
Ok, looking forward to your patch.

Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07 14:32         ` Cui, Lili
@ 2023-11-07 15:08           ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-07 15:08 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 07.11.2023 15:32, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 07.11.2023 09:06, Cui, Lili wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Monday, November 6, 2023 11:03 PM
>>>>
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table,
>>>>> char
>>>> *mod, unsigned int space,
>>>>>  	fprintf (stderr,
>>>>>  		 "%s: %d: W modifier without Word/Dword/Qword
>>>> operand(s)\n",
>>>>>  		 filename, lineno);
>>>>> +
>>>>> +      /* The part about judging EVEX encoding should be synchronized with
>>>>> +	 is_evex_encoding.  */
>>>>> +      if (modifiers[Vex].value
>>>>> +	  || ((space > SPACE_0F || has_special_handle)
>>>>> +	      && !modifiers[EVex].value
>>>>> +	      && !modifiers[Disp8MemShift].value
>>>>> +	      && !modifiers[Broadcast].value
>>>>> +	      && !modifiers[Masking].value
>>>>> +	      && !modifiers[SAE].value))
>>>>> +	modifiers[NoEgpr].value = 1;
>>>>
>>>> While this is just i386-gen (and hence being somewhat inefficient
>>>> isn't the end of the world) I still wonder whether we need all the parts of
>> this condition:
>>>> Do we really need all the constituents of this EVEX related checks?
>>>> Wouldn't it also help is_evex_encoding() if we switched to uniformly
>>>> having EVex attributes on all EVEX templates? A presently missing
>>>> EVex attribute, after all, merely is another way of saying EVexDYN,
>>>> if I'm not mistaken. (Such an adjustment, if deemed to help, would of
>>>> course want to come as a separate, prereq patch.)
>>>>
>>>
>>> Yes, EVex is another way of saying EVexDYN, it should be appear in every
>> EVEX template, when we merge EVex128, EVex256 and EVex512 into one
>> template we omitted the expression of EVexDYN. So some EVEX templates
>> don’t have this tag. If we want to re-add it, we need new values.
>>
>> I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
>> the same time. Otoh ...
>>
>>> Such as:
>>> vcvttps2dq, 0xF35B, AVX512F,
>>>
>> Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize
>> |No
>>> Suf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex,
>>> RegXMM|RegYMM|RegZMM }
>>
>> ... aiui this one could have EVexDYN added without change in behavior, but
>> would then allow being recognized as needed EVEX-encoding by just checking
>> the .evex field, not any of the other fields is_evex_encoding() presently needs
>> to check.
>>
> 
> We have the same idea, that's what I mean too. I'd be happy to rewrite a patch later to implement it. Maybe it needs to be done after these patches are committed to the trunk.

Well, you mirror is_evex_encoding() logic into i386-gen, and if that logic
is to be simplified anyway, it would likely be better to do so up front.
That'll reduce two patches then - the one here fiddling with i386-gen and
the one doing the conversion (which then won't need to also touch
i386-gen.c, at least not in this regard). Since it looks like I'm collecting
a set of prereq patches anyway, maybe I should take the time right away ...

>>>> Furthermore, is this correct at all for mixed VEX/EVEX templates?
>>>>
>>> After merging the templates we only have one entry and I prefer to set
>> [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in
>> check_EgprOperands function.
>>>
>>> check_EgprOperands (const insn_template *t)  {
>>> -  if (t->opcode_modifier.noegpr)
>>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
>>
>> So why would you add an attribute just to then ignore it by adding extra code?
>>
> Since all EVEX encodings support EGPR, the attribute noegpr has little meaning for evex, so when vex and evex share an entry, we use noegpr to indicate the vex format.

I still don't follow. From VEX alone you know that eGPR isn't possible. A
mixed template very definitely allows for eGPR, otoh (requiring its EVEX
sub-variant to be chosen).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07 10:43           ` Jan Beulich
@ 2023-11-07 15:31             ` Cui, Lili
  2023-11-07 15:43               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-07 15:31 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 09:16, Cui, Lili wrote:
> >>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>>>
> >>>> On 02.11.2023 12:29, Cui, Lili wrote:
> >>>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >>>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>>>
> >>>>> +/* Build (2 bytes) rex2 prefix.
> >>>>> +   | D5h |
> >>>>> +   | m | R4 X4 B4 | W R X B |
> >>>>> +*/
> >>>>> +static void
> >>>>> +build_rex2_prefix (void)
> >>>>> +{
> >>>>> +  i.vex.length = 2;
> >>>>> +  i.vex.bytes[0] = 0xd5;
> >>>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>>>> +		    | (i.rex2 << 4) | i.rex);
> >>>>> +}
> >>>>
> >>>> I may have asked on v1 already: For emitting REX we don't resort to
> >>>> (ab)using i.vex. Is that really necessary? (If so, a comment next
> >>>> to the field declaration may be warranted.)
> >>>>
> >>> Added comment for it.
> >>>
> >>>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
> >>>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>>                     | (i.rex2 << 4) | i.rex);
> >>
> >> How does the comment relate to the (ab)use of i.vex?
> >>
> > Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable,
> but in the output_insn function they have the same special handling of
> i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code.
> 
> Things like this are very helpful to explain in the patch description.
> 

Done.

> >>>>> +	    {
> >>>>> +	      i.error = register_type_mismatch;
> >>>>> +	      return 1;
> >>>>> +	    }
> >>>>> +	}
> >>>>> +
> >>>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >>>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >>>>> +	{
> >>>>> +	  i.error = register_type_of_address_mismatch;
> >>>>> +	  return 1;
> >>>>> +	}
> >>>>> +
> >>>>> +      /* Check pseudo prefix {rex2} are valid.  */
> >>>>> +      if (i.rex2_encoding)
> >>>>> +	{
> >>>>> +	  i.error = invalid_pseudo_prefix;
> >>>>> +	  return 1;
> >>>>> +	}
> >>>>
> >>>> Further up in md_assemble() {rex} or {rex2} is simply ignored when
> >>>> wrong to apply. Why would an inapplicable {rex2} be treated as an
> >>>> error here? This would then also ...
> >>>>
> >>>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >>>>>        /* Do not verify operands when there are none.  */
> >>>>>        if (!t->operands)
> >>>>>  	{
> >>>>> -	  if (VEX_check_encoding (t))
> >>>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >>>>>  	    {
> >>>>>  	      specific_error = progress (i.error);
> >>>>>  	      continue;
> >>>>
> >>>> ... eliminate the need for this change, which is kind of bogus anyway:
> >>>> There are no operands here, so calling a function of the given name
> >>>> is at least suspicious.
> >>>>
> >>>
> >>> We have these tests and I'm confused whether to remove them or not.
> >>>
> >>> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> >>> +       {rex2} wrmsr
> >>> +       {rex2} rdtsc
> >>> +       {rex2} rdmsr
> >>> +       {rex2} sysenter
> >>> +       {rex2} sysexitl
> >>> +       {rex2} rdpmc
> >>
> >> They should all stay. But as to my comment: There's no use of any
> >> eGPR here. If you want to abuse that function and if there's no
> >> better descriptive name for it, then once again at least a comment is
> needed.
> >> (Considering this, the attribute's name NoEgpr is probably also
> >> misleading in the cases here, i.e. when there are no operands. Hence,
> >> if not to be renamed, requires yet another comment in i386-opc.h.)
> >>
> > This question also confused me , some instructions only support Acc register,
> but we need to add NoEgpr for them, this seems a bit strange. if we use
> NoRex2 , it doesn't fit the vex and evex instructions either. So I will add
> comments to it for now.
> >
> > +         /* When there are no operands, we still need to use the
> > +            check_EgprOperands function to check whether {rex2} is
> > + valid.  */
> >           if (VEX_check_encoding (t) || check_EgprOperands (t))
> >
> > -  /* egprs (r16-r31) on instruction illegal.  */
> > +  /* egprs (r16-r31) on instruction illegal. We also use it to judge
> > +     whether the instruction supports pseudo-prefix {rex2}.  */
> >    NoEgpr,
> 
> This looks okay commentary-wise, but as per above we first need to settle on
> whether an inapplicable {rex2} shouldn't simply be ignored.
> 
> >>>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >>>>>    return elem_size;
> >>>>>  }
> >>>>>
> >>>>> +static bool
> >>>>> +if_entry_needs_special_handle (const unsigned long long opcode,
> >>>>> +unsigned
> >>>> int space,
> >>>>> +			       const char *cpu_flags)
> >>>>> +{
> >>>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>>>> +#UD.  */
> >>>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>>>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> >>>>
> >>>> Upon further thought for these (and maybe even ...
> >>>>
> >>>>> +      || !strcmp (cpu_flags, "3dnow")
> >>>>> +      || !strcmp (cpu_flags, "3dnowA"))
> >>>>
> >>>> ... for these, but see also below) it might be better to add the
> >>>> attribute right in the opcode table.
> >>>>
> >>>> As to the 3dnow insns - I think I'd like to revise my earlier
> >>>> suggestion to also tag those. Like e.g. FPU insns they're pretty
> >>>> normal GPR-wise, so allowing them to be used like that would appear
> >>>> only consistent. Otherwise, if we were concerned of AMD extensions
> >>>> in general, SSE4a insns (and maybe further
> >>>> ones) would also need excluding. (Additionally recall that there's
> >>>> an overlap between 3dnowa and SSE, which would result in another
> >>>> [apparent] inconsistency when excluding 3dnow insns here.)
> >>>>
> >>>
> >>> I see, for example  I think I need to split this table into two
> >>> parts, one is for
> >> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> >>> pextrw, 0xfc5, SSE|3dnowA,
> >>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> >> RegMMX,
> >>> Reg32|Reg64 }
> >>
> >> I'm afraid I don't understand the question. All I've asked for is
> >> that the special treatment of 3dnow insns be removed again. Unless
> >> you want to special-case further insns; it's not really clear to me
> >> what's best, as both approaches have noticable downsides (either we
> >> allow to encode something which may never become valid, or we disallow
> something which may become valid).
> >>
> >> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui
> >> they can all be encoded with REX2.
> >>
> > I need to correct it:  There are some instructions table present both SSE and
> AMD instructions. I need to split them first and then add NoEgpr to AMD
> instructions.
> > Another point is that we have not split the common instructions of AMD and
> Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make
> much sense.
> >
> > Do you want me also to remove this part  and add  NoEgpr in insn table?
> 
> First we need to settle on what to do with 3DNow!, SSE4a, and maybe further
> AMD-only insns (beyond e.g. XOP and TBM ones, which aiui are covered by
> virtue of being VEX[-like], and hence never eligible for eGPR use). Then we can
> sort out how to best express what we have decided to enforce.
> 
> I'm not convinced at all that templates like that for MASKMOVQ would need
> splitting: The difference would be noticeable only if someone disabled SSE, but
> kept 3DNow! and APX_F enabled. We could easily document the resulting
> pitfall instead.
> 

 Do you mean we won't add NoEgpr to the entries like this ? I will try to find a list for AMD-only insns.

pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }


Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07 15:31             ` Cui, Lili
@ 2023-11-07 15:43               ` Jan Beulich
  2023-11-07 15:53                 ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-07 15:43 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 07.11.2023 16:31, Cui, Lili wrote:
>  Do you mean we won't add NoEgpr to the entries like this ? I will try to find a list for AMD-only insns.
> 
> pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

I mean we need to carefully consider whether to add NoEgpr to AMD-only
insns. As mentioned before there are downsides to either approach.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-07 15:43               ` Jan Beulich
@ 2023-11-07 15:53                 ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-07 15:53 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 16:31, Cui, Lili wrote:
> >  Do you mean we won't add NoEgpr to the entries like this ? I will try to find
> a list for AMD-only insns.
> >
> > pextrw, 0xfc5, SSE|3dnowA,
> > Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> RegMMX,
> > Reg32|Reg64 }
> 
> I mean we need to carefully consider whether to add NoEgpr to AMD-only
> insns. As mentioned before there are downsides to either approach.
> 
Yes, I would prefer not to add it.

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
@ 2023-11-08  9:11   ` Jan Beulich
  2023-11-15 14:56     ` Cui, Lili
  2023-11-16 15:34     ` Cui, Lili
  0 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-08  9:11 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> @@ -1,4 +1,4 @@
> -# Check Illegal 64bit APX_F instructions
> +# Check illegal 64bit APX_F instructions
>  	.text
>  	.arch .noapx_f
>  	test    $0x7, %r17d
> @@ -16,3 +16,195 @@
>  	xsaveopt64 (%r16, %r31)
>  	xsavec (%r16, %rbx)
>  	xsavec64 (%r16, %r31)
> +#SSE
> +	phaddw          (%r17),%xmm0
> +	phaddd          (%r17),%xmm0
> +	phaddsw         (%r17),%xmm0
> +	phsubw          (%r17),%xmm0
> +	pmaddubsw       (%r17),%xmm0
> +	pmulhrsw        (%r17),%xmm0
> +	pshufb          (%r17),%xmm0
> +	psignb          (%r17),%xmm0
> +	psignw          (%r17),%xmm0
> +	psignd          (%r17),%xmm0
> +	palignr $100,(%r17),%xmm6
> +	pabsb          (%r17),%xmm0
> +	pabsw          (%r17),%xmm0
> +	pabsd          (%r17),%xmm0
> +	blendpd $100,(%r18),%xmm6
> +	blendps $100,(%r18),%xmm6
> +	blendvpd %xmm0,(%r19),%xmm6
> +	blendvps %xmm0,(%r19),%xmm6
> +	blendvpd (%r19),%xmm6
> +	blendvps (%r19),%xmm6
> +	dppd $100,(%r20),%xmm6
> +	dpps $100,(%r20),%xmm6
> +	extractps $100,%xmm4,(%r21)
> +	extractps $100,%xmm4,%r21
> +	insertps $100,(%r21),%xmm6
> +	movntdqa (%r21),%xmm4
> +	mpsadbw $100,(%r21),%xmm6
> +	packusdw (%r21),%xmm6
> +	pblendvb %xmm0,(%r22),%xmm6
> +	pblendvb (%r22),%xmm6
> +	pblendw $100,(%r22),%xmm6
> +	pcmpeqq (%r22),%xmm6
> +	pextrb $100,%xmm4,(%r22)
> +	pextrb $100,%xmm4,%r22
> +	pextrw $100,%xmm4,(%r22)
> +	pextrd $100,%xmm4,(%r22)
> +        pextrq $100,%xmm4,(%r22)

Nit: Indentation inconsistency.

> +	phminposuw (%r23),%xmm4
> +	pinsrb $100,%r23,%xmm4
> +	pinsrb $100,(%r23),%xmm4
> +	pinsrd $100, %r23d, %xmm4
> +	pinsrd $100,(%r23),%xmm4
> +	pinsrq $100, %r24, %xmm4
> +	pinsrq $100,(%r24),%xmm4
> +	pmaxsb (%r24),%xmm6
> +	pmaxsd (%r24),%xmm6
> +	pmaxud (%r24),%xmm6
> +	pmaxuw (%r24),%xmm6
> +	pminsb (%r24),%xmm6
> +	pminsd (%r24),%xmm6
> +	pminud (%r24),%xmm6
> +	pminuw (%r24),%xmm6
> +	pmovsxbw (%r24),%xmm4
> +	pmovsxbd (%r24),%xmm4
> +	pmovsxbq (%r24),%xmm4
> +	pmovsxwd (%r24),%xmm4
> +	pmovsxwq (%r24),%xmm4
> +	pmovsxdq (%r24),%xmm4
> +	pmovsxbw (%r24),%xmm4
> +	pmovzxbd (%r24),%xmm4
> +	pmovzxbq (%r24),%xmm4
> +	pmovzxwd (%r24),%xmm4
> +	pmovzxwq (%r24),%xmm4
> +	pmovzxdq (%r24),%xmm4
> +	pmuldq (%r24),%xmm4
> +	pmulld (%r24),%xmm4
> +	roundpd $100,(%r24),%xmm6
> +	roundps $100,(%r24),%xmm6
> +	roundsd $100,(%r24),%xmm6
> +	roundss $100,(%r24),%xmm6
> +	pcmpestri $100,(%r25),%xmm6
> +	pcmpestrm $100,(%r25),%xmm6
> +	pcmpgtq (%r25),%xmm4
> +	pcmpistri $100,(%r25),%xmm6
> +	pcmpistrm $100,(%r25),%xmm6
> +#AES
> +	aesdec (%r26),%xmm6
> +	aesdeclast (%r26),%xmm6
> +	aesenc (%r26),%xmm6
> +	aesenclast (%r26),%xmm6
> +	aesimc (%r26),%xmm6
> +	aeskeygenassist $100,(%r26),%xmm6
> +	pclmulqdq $100,(%r26),%xmm6
> +	pclmullqlqdq (%r26),%xmm6
> +	pclmulhqlqdq (%r26),%xmm6
> +	pclmullqhqdq (%r26),%xmm6
> +	pclmulhqhqdq (%r26),%xmm6
> +#GFNI
> +	gf2p8affineqb $100,(%r26),%xmm6
> +	gf2p8affineinvqb $100,(%r26),%xmm6
> +	gf2p8mulb (%r26),%xmm6
> +#VEX without evex
> +	vblendpd $7,(%r27),%xmm6,%xmm2
> +	vblendpd $7,(%r27),%ymm6,%ymm2
> +	vblendps $7,(%r27),%xmm6,%xmm2
> +	vblendps $7,(%r27),%ymm6,%ymm2
> +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
> +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
> +	vdppd $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%ymm6,%ymm2
> +	vhaddpd (%r27),%xmm6,%xmm5
> +	vhaddpd (%r27),%ymm6,%ymm5
> +	vhsubps (%r27),%xmm6,%xmm5
> +	vhsubps (%r27),%ymm6,%ymm5
> +	vlddqu (%r27),%xmm4
> +	vlddqu (%r27),%ymm4
> +	vldmxcsr (%r27)

As mentioned before, for this, ...

> +	vmaskmovpd (%r27),%xmm4,%xmm6
> +	vmaskmovpd %xmm4,%xmm6,(%r27)
> +	vmaskmovps (%r27),%xmm4,%xmm6
> +	vmaskmovps %xmm4,%xmm6,(%r27)
> +	vmaskmovpd (%r27),%ymm4,%ymm6
> +	vmaskmovpd %ymm4,%ymm6,(%r27)
> +	vmaskmovps (%r27),%ymm4,%ymm6
> +	vmaskmovps %ymm4,%ymm6,(%r27)	
> +	vmovmskpd %xmm4,%r27d
> +	vmovmskpd %xmm8,%r27d
> +	vmovmskps %xmm4,%r27d
> +	vmovmskps %ymm8,%r27d
> +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
> +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
> +	vpblendw $7,(%r27),%xmm6,%xmm2
> +	vpblendw $7,(%r27),%ymm6,%ymm2
> +	vpcmpestri $7,(%r27),%xmm6
> +	vpcmpestrm $7,(%r27),%xmm6
> +	vperm2f128 $7,(%r27),%ymm6,%ymm2
> +	vphaddd (%r27),%xmm6,%xmm7
> +	vphaddsw (%r27),%xmm6,%xmm7
> +	vphaddw (%r27),%xmm6,%xmm7
> +	vphsubd (%r27),%xmm6,%xmm7
> +	vphsubsw (%r27),%xmm6,%xmm7
> +	vphsubw (%r27),%xmm6,%xmm7
> +	vphaddd (%r27),%ymm6,%ymm7
> +	vphaddsw (%r27),%ymm6,%ymm7
> +	vphaddw (%r27),%ymm6,%ymm7
> +	vphsubd (%r27),%ymm6,%ymm7
> +	vphsubsw (%r27),%ymm6,%ymm7
> +	vphsubw (%r27),%ymm6,%ymm7
> +	vphminposuw (%r27),%xmm6
> +	vpmovmskb %xmm4,%r27
> +	vpmovmskb %ymm4,%r27d
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vptest (%r27),%xmm6
> +	vptest (%r27),%ymm6
> +	vrcpps (%r27),%xmm6
> +	vrcpps (%r27),%ymm6
> +	vrcpss (%r27),%xmm6,%xmm6
> +	vrsqrtps (%r27),%xmm6
> +	vrsqrtps (%r27),%ymm6
> +	vrsqrtss (%r27),%xmm6,%xmm6
> +	vstmxcsr (%r27)

... this, and ...

> +	vtestps (%r27),%xmm6
> +	vtestps (%r27),%ymm6
> +	vtestpd (%r27),%xmm6
> +	vtestps (%r27),%ymm6
> +	vtestpd (%r27),%ymm6
> +	vpblendd $7,(%r27),%xmm6,%xmm2
> +	vpblendd $7,(%r27),%ymm6,%ymm2
> +	vperm2i128 $7,(%r27),%ymm6,%ymm2
> +	vpmaskmovd (%r27),%xmm4,%xmm6
> +	vpmaskmovd %xmm4,%xmm6,(%r27)
> +	vpmaskmovq (%r27),%xmm4,%xmm6
> +	vpmaskmovq %xmm4,%xmm6,(%r27)
> +	vpmaskmovd (%r27),%ymm4,%ymm6
> +	vpmaskmovd %ymm4,%ymm6,(%r27)
> +	vpmaskmovq (%r27),%ymm4,%ymm6
> +	vpmaskmovq %ymm4,%ymm6,(%r27)
> +	vaesimc (%r27), %xmm3
> +	vaeskeygenassist $7,(%r27),%xmm3
> +	vroundpd $1,(%r24),%xmm6
> +	vroundps $2,(%r24),%xmm6
> +	vroundsd $3,(%r24),%xmm6,%xmm3
> +	vroundss $4,(%r24),%xmm6,%xmm3

... and these four I wonder whether the documentation shouldn't at least
allow room for translating them, for there being functionally equivalent
encodings.

> +	vpcmpistri $100,(%r25),%xmm6
> +	vpcmpistrm $100,(%r25),%xmm6
> +	vpcmpeqb (%r26),%ymm6,%ymm2
> +	vpcmpeqw (%r16),%ymm6,%ymm2
> +	vpcmpeqd (%r26),%ymm6,%ymm2
> +	vpcmpeqq (%r16),%ymm6,%ymm2
> +	vpcmpgtb (%r26),%ymm6,%ymm2
> +	vpcmpgtw (%r16),%ymm6,%ymm2
> +	vpcmpgtd (%r26),%ymm6,%ymm2
> +	vpcmpgtq (%r16),%ymm6,%ymm2

As an overall remark to this (and perhaps similar) test(s): It would be
nice if there was some consistent sorting criteria applied throughout
the test as whole or (here) the sub-sections (validly grouped by
category). Without that it's needlessly hard to spot any omissions.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
> @@ -0,0 +1,16 @@
> +.*: Assembler messages:
> +.*:4: Error: `movbe' is not supported on `x86_64.nomovbe'
> +.*:5: Error: `movbe' is not supported on `x86_64.nomovbe'
> +.*:7: Error: `invept' is not supported on `x86_64.nomovbe.noept'
> +.*:8: Error: `invept' is not supported on `x86_64.nomovbe.noept'
> +.*:10: Error: `kmovq' is not supported on `x86_64.nomovbe.noept.noavx512bw'
> +.*:11: Error: `kmovq' is not supported on `x86_64.nomovbe.noept.noavx512bw'
> +.*:13: Error: `kmovb' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
> +.*:14: Error: `kmovb' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
> +.*:16: Error: `kmovw' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'
> +.*:17: Error: `kmovw' is not supported on `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'

Can the irrelevant middle parts of these .no* expecations please be omitted?
The construction of these strings is in need of improvement, and it would be
nice if testcases where the precise string doesn't matter would then not
need touching. (This is a more general principle: Testcase expectations
would better be only as specific as needed for what is under test. Certainly
multiple aspects may be tested in one go, but quite commonly expecations are
needlessly strict, and hence needlessly prone to breaking when unrelated
changes are made somewhere in the code.)

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> @@ -0,0 +1,17 @@
> +# Check illegal 64bit APX EVEX promoted instructions
> +	.text
> +	.arch .nomovbe
> +	movbe (%r16), %r17
> +	movbe (%rax), %rcx
> +	.arch .noept
> +	invept (%r16), %r17
> +	invept (%rax), %rcx
> +	.arch .noavx512bw
> +	kmovq %k1, (%r16)
> +	kmovq %k1, (%r8)
> +	.arch .noavx512dq
> +	kmovb %k1, %r16d
> +	kmovb %k1, %r8d
> +	.arch .noavx512f
> +	kmovw %k1, %r16d
> +	kmovw %k1, %r8d

What about BMI/BMI2 insns? Or AMX ones? (I surely missed further groups.)

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -0,0 +1,29 @@
> +# Check Illegal prefix for 64bit EVEX-promoted instructions
> +
> +        .allow_index_reg
> +        .text
> +_start:
> +        #movbe %r18w,%ax set EVEX.pp = f3 (illegal value).
> +        .byte 0x62, 0xfc, 0x7e, 0x08, 0x60, 0xc2
> +        .byte 0xff, 0xff
> +        #movbe %r18w,%ax set EVEX.pp = f2 (illegal value).
> +        .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
> +        .byte 0xff, 0xff
> +        #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
> +	#(illegal value).
> +        .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
> +        .byte 0xff
> +        #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
> +	.byte 0x62, 0xfd, 0x7d, 0x08, 0x60, 0xc2
> +        .byte 0xff, 0xff
> +        #EVEX_MAP4 movbe %r18w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).
> +	.byte 0x62, 0xfd, 0x7d, 0x09, 0x60, 0xc2
> +        .byte 0xff, 0xff
> +        #EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == b001 (illegal value).
> +	.byte 0x62, 0xfd, 0x7d, 0x28, 0x60, 0xc2
> +        .byte 0xff, 0xff
> +        #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[17:16] == 1 (illegal value).
> +        .byte 0x62, 0x4c, 0x7f, 0x09, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00
> +        .byte 0xff
> +        #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[23:22] == 1 (illegal value).
> +        .byte 0x62, 0x4c, 0x7f, 0x28, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00

I suspect at least some of these can be expressed via .insn, which would
greatly help readability (i.e. recognizing what is actually being done,
and what's expected-wrong about it).

Also - nit - there are again indentation inconsistencies here.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> @@ -0,0 +1,322 @@
> +# Check 64bit APX_F EVEX-Promoted instructions.
> +
> +	.text
> +_start:
>[...]
> +.intel_syntax noprefix

Didn't you say you corrected directive indentation throughout the series?

> +	aadd	DWORD PTR [r31+rax*4+0x123],r25d
> +	aadd	QWORD PTR [r31+rax*4+0x123],r31
> +	aand	DWORD PTR [r31+rax*4+0x123],r25d
> +	aand	QWORD PTR [r31+rax*4+0x123],r31
> +	aesdec128kl	xmm22,[r31+rax*4+0x123]
> +	aesdec256kl	xmm22,[r31+rax*4+0x123]
> +	aesdecwide128kl	[r31+rax*4+0x123]
> +	aesdecwide256kl	[r31+rax*4+0x123]
> +	aesenc128kl	xmm22,[r31+rax*4+0x123]
> +	aesenc256kl	xmm22,[r31+rax*4+0x123]
> +	aesencwide128kl	[r31+rax*4+0x123]
> +	aesencwide256kl	[r31+rax*4+0x123]
> +	aor	DWORD PTR [r31+rax*4+0x123],r25d
> +	aor	QWORD PTR [r31+rax*4+0x123],r31
> +	axor	DWORD PTR [r31+rax*4+0x123],r25d
> +	axor	QWORD PTR [r31+rax*4+0x123],r31
> +	bextr	r10d,edx,r25d
> +	bextr	edx,DWORD PTR [r31+rax*4+0x123],r25d
> +	bextr	r11,r15,r31
> +	bextr	r15,QWORD PTR [r31+rax*4+0x123],r31

Going just down to here (it extends throughout the Intel syntax part):
Can there please also be cases where the xxx PTR is omitted from the
memory operands? That doesn't mean there always need to be both forms,
but there should be a fair mix. (I notice you have one such example
with INVPCID below.)

>[...]
> +	crc32	r22,r31
> +	crc32	r22,QWORD PTR [r31]
> +	crc32	r17,r19b
> +	crc32	r21d,r19b
> +	crc32	ebx,BYTE PTR [r19]
> +	crc32	r23d,r31d
> +	crc32	r23d,DWORD PTR [r31]
> +	crc32	r21d,r31w
> +	crc32	r21d,WORD PTR [r31]
> +	crc32	r18,rax

These could do with moving up, since otherwise things look to be sorted
alphabetically here. But seeing these also reminds me that the noreg64
test also needs extending, to cover these new forms (handled by separate
templates).

> +	kmovb	k5,k3

This (and its siblings) doesn't belong, here, does it? It continues to
be VEX-encoded.

> --- a/gas/testsuite/gas/i386/x86-64.exp
> +++ b/gas/testsuite/gas/i386/x86-64.exp
> @@ -360,8 +360,13 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
>  run_dump_test "x86-64-avx512f-rcigrne"
>  run_dump_test "x86-64-avx512f-rcigru-intel"
>  run_dump_test "x86-64-avx512f-rcigru"
> -run_list_test "x86-64-apx-egpr-inval" "-al"
> +run_list_test "x86-64-apx-egpr-inval"

This should be put in its final shape right in patch 1; no need to touch
it here again. (Else you'd need to mention the change in the ChangeLog
entry.)

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
@ 2023-11-08 10:39   ` Jan Beulich
  2023-11-20  1:19     ` Cui, Lili
  2023-11-08 11:13   ` Jan Beulich
  2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
  2 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-08 10:39 UTC (permalink / raw)
  To: Cui, Lili, konglin1; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> From: konglin1 <lingling.kong@intel.com>
> 
> opcodes/ChangeLog:
> 
> 	* opcodes/i386-dis-evex-prefix.h: Add NDD decode for adox/adcx.
> 	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
> 	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83,  REG_EVEX_MAP4_F6,
> 	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
> 	* opcodes/i386-dis-evex.h: Add NDD insn.
> 	* opcodes/i386-dis.c (VexGb): Add new define.
> 	(VexGv): Ditto.
> 	(get_valid_dis386): Change for NDD decode.
> 	(print_insn): Ditto.
> 	(print_register): Ditto.
> 	(intel_operand_size): Ditto.
> 	(OP_E_memory): Ditto.
> 	(OP_VEX): Ditto.
> 	* opcodes/i386-opc.h (VexVVVV_SRC): New.
> 	VexVVVV_DST):  Ditto.
> 	* opcodes/i386-opc.tbl: Add APX NDD instructions and adjust VexVVVV.
> 	* opcodes/i386-tbl.h: Regenerated.
> 
> gas/ChangeLog:
> 
> 	* gas/config/tc-i386.c (is_any_apx_evex_encoding): Add legacy insn
> 	promote to SPACE_EVEXMAP4.
> 	(md_assemble): Change for ndd encode.
> 	(process_operands): Ditto.
> 	(build_modrm_byte): Ditto.
> 	(operand_size_match):
> 	Support APX NDD that the number of operands is 3.
> 	(match_template): Support swap the first two operands for
> 	APX NDD.
> 	reg_table
> 	* testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
> 	* testsuite/gas/i386/x86-64-apx-ndd.d: New test.
> 	* testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
> 	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
> 	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
> 	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d : Ditto.
> 	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s : Ditto.
> ---
>  gas/config/tc-i386.c                          |  111 +-
>  .../gas/i386/x86-64-apx-evex-promoted-bad.d   |    4 +
>  .../gas/i386/x86-64-apx-evex-promoted-bad.s   |    5 +-
>  gas/testsuite/gas/i386/x86-64-apx-ndd.d       |  161 +++
>  gas/testsuite/gas/i386/x86-64-apx-ndd.s       |  154 +++
>  gas/testsuite/gas/i386/x86-64-pseudos.d       |   42 +
>  gas/testsuite/gas/i386/x86-64-pseudos.s       |   43 +
>  gas/testsuite/gas/i386/x86-64.exp             |    1 +
>  opcodes/i386-dis-evex-prefix.h                |    4 -
>  opcodes/i386-dis-evex-reg.h                   |   54 +
>  opcodes/i386-dis-evex.h                       |  126 +-
>  opcodes/i386-dis.c                            |  145 +-
>  opcodes/i386-gen.c                            |    1 +
>  opcodes/i386-opc.h                            |    9 +-
>  opcodes/i386-opc.tbl                          | 1231 +++++++++--------
>  15 files changed, 1354 insertions(+), 737 deletions(-)
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> 
> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
> index 398909a6a30..5b925505435 100644
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -2317,8 +2317,10 @@ operand_size_match (const insn_template *t)
>        unsigned int given = i.operands - j - 1;
>  
>        /* For FMA4 and XOP insns VEX.W controls just the first two
> -	 register operands.  */
> -      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> +	 register operands. And APX_F insns just swap the two source operands,
> +	 with the 3rd one being the destination.  */
> +      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> +	  || is_cpu (t,CpuAPX_F))
>  	given = j < 2 ? 1 - j : j;

Nit: Please retain consistency wrt style (here: missing blank after comma
in the addition). Feels like I said so before.

> @@ -3959,6 +3961,7 @@ static INLINE bool
>  is_any_apx_evex_encoding (void)
>  {
>    return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 
> +    || i.rex2_encoding
>      || (i.vex.register_specifier
>  	&& i.vex.register_specifier->reg_flags & RegRex2);
>  }

See my comment on an earlier patch regarding the use of i.rex2 here. But
I doubt this is correct in the first place: Why would {rex2} cause EVEX
encoding to be picked?

> @@ -7481,26 +7484,33 @@ match_template (char mnem_suffix)
>  	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
>  	  if (t->opcode_modifier.d && i.reg_operands == i.operands
>  	      && !operand_type_all_zero (&overlap1))
> -	    switch (i.dir_encoding)
> -	      {
> -	      case dir_encoding_load:
> -		if (operand_type_check (operand_types[i.operands - 1], anymem)
> -		    || t->opcode_modifier.regmem)
> -		  goto check_reverse;
> -		break;
> +	    {
>  
> -	      case dir_encoding_store:
> -		if (!operand_type_check (operand_types[i.operands - 1], anymem)
> -		    && !t->opcode_modifier.regmem)
> -		  goto check_reverse;
> -		break;
> +	      int MemOperand = i.operands - 1 -
> +		(t->opcode_space == SPACE_EVEXMAP4
> +		 && t->opcode_modifier.vexvvvv);

Nit: I don't think local variables should start with a capital letter. I
wonder anyway - can't you just re-use e.g. j here? That would then also
avoid the misleading name: Right here you don't know yet whether that's
the "memory" operand. Finally, if you set the variable ahead if the
enclosing if(), you could (I think) avoid all this re-indentation,
making the change quite a bit easier to review (i.e. to see what really
changes).

> @@ -7530,11 +7540,13 @@ match_template (char mnem_suffix)
>  		continue;
>  	      /* Try reversing direction of operands.  */
>  	      j = is_cpu (t, CpuFMA4)
> -		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> +		  || is_cpu (t, CpuXOP)
> +		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
>  	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
>  	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
>  	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
> -	      gas_assert (t->operands != 3 || !check_register);
> +	      gas_assert (t->operands != 3 || !check_register
> +			  || is_cpu (t,CpuAPX_F));

Nit: Missing blank again. And again in the next hunk. I won't comment
on such any further, expecting you to go through globally.

> @@ -8588,11 +8609,10 @@ process_operands (void)
>    const reg_entry *default_seg = NULL;
>  
>    /* We only need to check those implicit registers for instructions
> -     with 3 operands or less.  */
> -  if (i.operands <= 3)
> -    for (unsigned int j = 0; j < i.operands; j++)
> -      if (i.types[j].bitfield.instance != InstanceNone)
> -	i.reg_operands--;
> +     with 4 operands or less.  */
> +  for (unsigned int j = 0; j < i.operands; j++)
> +    if (i.types[j].bitfield.instance != InstanceNone)
> +      i.reg_operands--;

While you made the requested code adjustment, adjusting the comment
renders it stale now. It needs dropping instead, as it did only
explain the if(), not the for().

> @@ -8946,26 +8966,35 @@ build_modrm_byte (void)
>  				     || i.vec_encoding == vex_encoding_evex));
>      }
>  
> -  for (v = source + 1; v < dest; ++v)
> -    if (v != reg_slot)
> -      break;
> -  if (v >= dest)
> -    v = ~0;
> -  if (i.tm.extension_opcode != None)
> +  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
>      {
> -      if (dest != source)
> -	v = dest;
> -      dest = ~0;
> +      v = dest;
> +      dest-- ;
>      }
> -  gas_assert (source < dest);
> -  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> -      && source != op)
> +  else if (i.tm.opcode_modifier.vexvvvv == VexVVVV_SRC)
>      {
> -      unsigned int tmp = source;
> +      v = source + 1;
> +      for (v = source + 1; v < dest; ++v)
> +	if (v != reg_slot)
> +	  break;
> +      if (i.tm.extension_opcode != None)
> +	{
> +	  if (dest != source)
> +	    v = dest;
> +	  dest = ~0;
> +	}
> +      gas_assert (source < dest);
> +      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> +	  && source != op)
> +	{
> +	  unsigned int tmp = source;
>  
> -      source = v;
> -      v = tmp;
> +	  source = v;
> +	  v = tmp;
> +	}
>      }
> +  else
> +    v = ~0;
>  
>    if (v < MAX_OPERANDS)
>      {

I'm having trouble following this change. The VexVVVV-is-source case
shouldn't change at all, I'd expect. This would ideally be easily
visible from the change done. Yet if looking a little more closely I
can e.g. spot a stray "v = source + 1;" which wasn't there before. And
there also look to be things being dropped. Such a change (again) wants
doing such that it is easy to see what changes. If need be by making a
mechanical prereq change doing just re-indentation, but nothing else.
It feels though as if there is too much changing here anyway: The
difference for VexVVVV-is-dest is that you need to consume the
destination early. If you do so, then the rest should be possible to
keep as is: You'll subsequently deal with just the normal ModR/M, as if
without a VexVVVV-encoded operand.

> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -10,7 +10,7 @@ _start:
>          .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
>          .byte 0xff, 0xff
>          #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
> -	#(illegal value).
> +        #(illegal value).
>          .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
>          .byte 0xff
>          #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).

???

> @@ -27,3 +27,6 @@ _start:
>          .byte 0xff
>          #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[23:22] == 1 (illegal value).
>          .byte 0x62, 0x4c, 0x7f, 0x28, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00
> +        .byte 0xff

This then similarly looks to belong into the earlier patch, suggesting
that it had #pass too early.

> +        #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
> +        .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0

Again I suspect this can be expressed via .insn, thus ending up more clear.
I can only stress again that one of the reasons of introducing .insn was to
reduce the number of such entirely unreadable byte sequences.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> @@ -0,0 +1,154 @@
> +# Check 64bit APX NDD instructions with evex prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +inc    %rax,%rbx

Please can instructions be indented by a tab?

> +inc    %r31,%r8
> +inc    %r31,%r16
> +add    %r31b,%r8b,%r16b
> +addb    %r31b,%r8b,%r16b
> +add    %r31,%r8,%r16
> +addq    %r31,%r8,%r16
> +add    %r31d,%r8d,%r16d
> +addl    %r31d,%r8d,%r16d
> +add    %r31w,%r8w,%r16w
> +addw    %r31w,%r8w,%r16w
> +{store} add    %r31,%r8,%r16
> +{load}  add    %r31,%r8,%r16
> +add    %r31,(%r8),%r16
> +add    (%r31),%r8,%r16
> +add    0x9090(%r31,%r16,1),%r8,%r16
> +add    %r31,(%r8,%r16,8),%r16
> +add    $0x34,%r13b,%r17b
> +addl   $0x11,(%r19,%rax,4),%r20d
> +add    $0x1234,%ax,%r30w
> +add    $0x12344433,%r15,%r16
> +addq   $0x12344433,(%r15,%rcx,4),%r16
> +add    $0xfffffffff4332211,%rax,%r8
> +dec    %rax,%r17
> +decb   (%r31,%r12,1),%r8b
> +not    %rax,%r17
> +notb   (%r31,%r12,1),%r8b
> +neg    %rax,%r17
> +negb   (%r31,%r12,1),%r8b
> +sub    %r15b,%r17b,%r18b
> +sub    %r15d,(%r8),%r18d
> +sub    (%r15,%rax,1),%r16b,%r8b
> +sub    (%r15,%rax,1),%r16w,%r8w
> +subl   $0x11,(%r19,%rax,4),%r20d
> +sub    $0x1234,%ax,%r30w
> +sbb    %r15b,%r17b,%r18b
> +sbb    %r15d,(%r8),%r18d
> +sbb    (%r15,%rax,1),%r16b,%r8b
> +sbb    (%r15,%rax,1),%r16w,%r8w
> +sbbl   $0x11,(%r19,%rax,4),%r20d
> +sbb    $0x1234,%ax,%r30w
> +adc    %r15b,%r17b,%r18b
> +adc    %r15d,(%r8),%r18d
> +adc    (%r15,%rax,1),%r16b,%r8b
> +adc    (%r15,%rax,1),%r16w,%r8w
> +adcl   $0x11,(%r19,%rax,4),%r20d
> +adc    $0x1234,%ax,%r30w
> +or     %r15b,%r17b,%r18b
> +or     %r15d,(%r8),%r18d
> +or     (%r15,%rax,1),%r16b,%r8b
> +or     (%r15,%rax,1),%r16w,%r8w
> +orl    $0x11,(%r19,%rax,4),%r20d
> +or     $0x1234,%ax,%r30w
> +xor    %r15b,%r17b,%r18b
> +xor    %r15d,(%r8),%r18d
> +xor    (%r15,%rax,1),%r16b,%r8b
> +xor    (%r15,%rax,1),%r16w,%r8w
> +xorl   $0x11,(%r19,%rax,4),%r20d
> +xor    $0x1234,%ax,%r30w
> +and    %r15b,%r17b,%r18b
> +and    %r15d,(%r8),%r18d
> +and    (%r15,%rax,1),%r16b,%r8b
> +and    (%r15,%rax,1),%r16w,%r8w
> +andl   $0x11,(%r19,%rax,4),%r20d
> +and    $0x1234,%ax,%r30w
> +rorb   (%rax),%r31b

While there's a doc problem here as well, the question here and there is
the same: Which form of ROR does this represent, when the shift count
isn't specified explicitly? It could be %cl or $1. I would strongly
advise against introducing further ambiguous instruction patterns like
this, and instead demand that both %cl and $1 be always named explicitly
as operands.

> +ror    $0x2,%r12b,%r31b
> +rorl   $0x2,(%rax),%r31d
> +rorw   (%rax),%r31w
> +ror    %cl,%r16b,%r8b
> +rorw   %cl,(%r19,%rax,4),%r31w
> +rolb   (%rax),%r31b
> +rol    $0x2,%r12b,%r31b
> +roll   $0x2,(%rax),%r31d
> +rolw   (%rax),%r31w
> +rol    %cl,%r16b,%r8b
> +rolw   %cl,(%r19,%rax,4),%r31w
> +rcrb   (%rax),%r31b
> +rcr    $0x2,%r12b,%r31b
> +rcrl   $0x2,(%rax),%r31d
> +rcrw   (%rax),%r31w
> +rcr    %cl,%r16b,%r8b
> +rcrw   %cl,(%r19,%rax,4),%r31w
> +rclb   (%rax),%r31b
> +rcl    $0x2,%r12b,%r31b
> +rcll   $0x2,(%rax),%r31d
> +rclw   (%rax),%r31w
> +rcl    %cl,%r16b,%r8b
> +rclw   %cl,(%r19,%rax,4),%r31w
> +shlb   (%rax),%r31b
> +shl    $0x2,%r12b,%r31b
> +shll   $0x2,(%rax),%r31d
> +shlw   (%rax),%r31w
> +shl    %cl,%r16b,%r8b
> +shlw   %cl,(%r19,%rax,4),%r31w
> +sarb   (%rax),%r31b
> +sar    $0x2,%r12b,%r31b
> +sarl   $0x2,(%rax),%r31d
> +sarw   (%rax),%r31w
> +sar    %cl,%r16b,%r8b
> +sarw   %cl,(%r19,%rax,4),%r31w
> +shlb   (%rax),%r31b
> +shl    $0x2,%r12b,%r31b
> +shll   $0x2,(%rax),%r31d
> +shlw   (%rax),%r31w
> +shl    %cl,%r16b,%r8b
> +shlw   %cl,(%r19,%rax,4),%r31w
> +shrb   (%rax),%r31b
> +shr    $0x2,%r12b,%r31b
> +shrl   $0x2,(%rax),%r31d
> +shrw   (%rax),%r31w
> +shr    %cl,%r16b,%r8b
> +shrw   %cl,(%r19,%rax,4),%r31w
> +shld   $0x1,%r12,(%rax),%r31
> +shld   $0x2,%r8w,%r12w,%r31w
> +shld   $0x2,%r15d,(%rax),%r31d
> +shld   %cl,%r9w,(%rax),%r31w
> +shld   %cl,%r12,%r16,%r8
> +shld   %cl,%r13w,(%r19,%rax,4),%r31w
> +shrd   $0x1,%r12,(%rax),%r31
> +shrd   $0x2,%r8w,%r12w,%r31w
> +shrd   $0x2,%r15d,(%rax),%r31d
> +shrd   %cl,%r9w,(%rax),%r31w
> +shrd   %cl,%r12,%r16,%r8
> +shrd   %cl,%r13w,(%r19,%rax,4),%r31w
> +adcx   %r15d,%r8d,%r18d
> +adcx   (%r15,%r31,1),%r8d,%r18d
> +adcx   (%r15,%r31,1),%r8
> +adox   %r15d,%r8d,%r18d
> +adox   (%r15,%r31,1),%r8d,%r18d
> +adox   (%r15,%r31,1),%r8
> +cmovo  0x90909090(%eax),%edx,%r8d
> +cmovno 0x90909090(%eax),%edx,%r8d
> +cmovb  0x90909090(%eax),%edx,%r8d
> +cmovae 0x90909090(%eax),%edx,%r8d
> +cmove  0x90909090(%eax),%edx,%r8d
> +cmovne 0x90909090(%eax),%edx,%r8d
> +cmovbe 0x90909090(%eax),%edx,%r8d
> +cmova  0x90909090(%eax),%edx,%r8d
> +cmovs  0x90909090(%eax),%edx,%r8d
> +cmovns 0x90909090(%eax),%edx,%r8d
> +cmovp  0x90909090(%eax),%edx,%r8d
> +cmovnp 0x90909090(%eax),%edx,%r8d
> +cmovl  0x90909090(%eax),%edx,%r8d
> +cmovge 0x90909090(%eax),%edx,%r8d
> +cmovle 0x90909090(%eax),%edx,%r8d
> +cmovg  0x90909090(%eax),%edx,%r8d
> +imul   0x90909(%eax),%edx,%r8d
> +imul   0x909(%rax,%r31,8),%rdx,%r25

Overall there's also again the sorting criteria question: Without any
sorting, how is one to (easily) check full coverage?

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -473,6 +473,7 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (IntelSyntax),
>    BITFIELD (ISA64),
>    BITFIELD (NoEgpr),
> +  BITFIELD (NF),
>  };

This wants introducing earlier, when the BMI / BMI2 templates are touched
anyway. As said before, it would be very nice if within such a series the
same places wouldn't needlessly be touched more than once. You don't
implement parsing of NF here anyway, so how much in advance the attribute
is added to the opcode table is really irrelevant, and should hence be
done as conventiently as possible.

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -636,7 +636,10 @@ enum
>    /* How to encode VEX.vvvv:
>       0: VEX.vvvv must be 1111b.
>       1: VEX.vvvv encodes one of the register operands.
> +     2: VEX.vvvv encodes as the dest register operands.
>     */
> +#define VexVVVV_SRC   1
> +#define VexVVVV_DST   2
>    VexVVVV,

Nit: Singular in the new comment line, please (plus perhaps drop "as").
And "source" wants adding to the earlier comment line.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -138,6 +138,9 @@
>  #define Vsz512 Vsz=VSZ512
>  
>  #define APX_F APX_F|x64
> +#define VexVVVVSrc  VexVVVV=VexVVVV_SRC
> +#define VexVVVVDest VexVVVV=VexVVVV_DST

I don't this we need the former. Continuing to use just VexVVVV there
is going to be quite fine, and easier to read. Plus, I'm sorry to say
it this bluntly, it is entirely inappropriate to bloat an already
large patch by mechanically replacing VexVVVV by this new alias. If
such a change was really wanted, it would need separating out as an
entirely mechanical one.

For the latter, to help readability, how about DstVVVV?

> +
>  
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source

Please don't add stray (especially double) blank lines. Instead a
blank line would be wanted _ahead_ of the addition.

> @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De
>  mov, 0xf21, x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug, Reg64 }
>  mov, 0xf24, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test, Reg32 }
>  
> +// Move after swapping the bytes
> +movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  // Move after swapping the bytes
>  movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movbe, 0x60, Movbe|APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }

What is this addition about? Please can you look at your own patches before
sending them out?

> @@ -290,22 +295,36 @@ add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3
>  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
>  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
> +add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> +add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}

Earlier review comments were not addressed. I'll stop here, as far as this
file is concerned, expecting a new version to be sent addressing earlier
comments and having been sanity checked and having the bogus VexVVVV
renaming dropped.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
  2023-11-08 10:39   ` Jan Beulich
@ 2023-11-08 11:13   ` Jan Beulich
  2023-11-20 12:36     ` Cui, Lili
  2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
  2 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-08 11:13 UTC (permalink / raw)
  To: Cui, Lili, konglin1; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h
> @@ -338,10 +338,6 @@
>      { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
>      { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
>    },
> -  /* PREFIX_EVEX_MAP4_66 */
> -  {
> -    { "wrssK",	{ M, Gdq }, 0 },
> -  },
>    /* PREFIX_EVEX_MAP4_D8 */
>    {
>      { "sha1nexte", { XM, EXxmm }, 0 },

What's going on here?

> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -56,6 +56,36 @@
>      { "blsmskS",	{ VexGdq, Edq }, 0 },
>      { "blsiS",		{ VexGdq, Edq }, 0 },
>    },
> +  /* REG_EVEX_MAP4_80 */
> +  {
> +    { "addA",	{ VexGb, Eb, Ib }, 0 },
> +    { "orA",	{ VexGb, Eb, Ib }, 0 },
> +    { "adcA",	{ VexGb, Eb, Ib }, 0 },
> +    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
> +    { "andA",	{ VexGb, Eb, Ib }, 0 },
> +    { "subA",	{ VexGb, Eb, Ib }, 0 },
> +    { "xorA",	{ VexGb, Eb, Ib }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_81 */
> +  {
> +    { "addQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "orQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "andQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "subQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_83 */
> +  {
> +    { "addQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "orQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "andQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "subQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
> +  },

No sign of prefix decoding, and also no PREFIX_OPCODE present?

> @@ -63,3 +93,27 @@
>      { "aesencwide256kl",	{ M }, 0 },
>      { "aesdecwide256kl",	{ M }, 0 },
>    },
> +  /* REG_EVEX_MAP4_F6 */
> +  {
> +    { Bad_Opcode },
> +    { Bad_Opcode },
> +    { "notA",	{ VexGb, Eb }, 0 },
> +    { "negA",	{ VexGb, Eb }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_F7 */
> +  {
> +    { Bad_Opcode },
> +    { Bad_Opcode },
> +    { "notQ",	{ VexGv, Ev }, 0 },
> +    { "negQ",	{ VexGv, Ev }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_FE */
> +  {
> +    { "incA",   { VexGb ,Eb }, 0 },
> +    { "decA",   { VexGb ,Eb }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_FF */
> +  {
> +    { "incQ",   { VexGv ,Ev }, 0 },
> +    { "decQ",   { VexGv ,Ev }, 0 },
> +  },

Same here, plus for the inc/dec some commas are misplaced. Padding also
looks to be incosnsitent (tab vs blanks).

> --- a/opcodes/i386-dis-evex.h
> +++ b/opcodes/i386-dis-evex.h
>[...]
> @@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      /* 40 */
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
>      /* 48 */
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
> +    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
> +    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },

Considering CFCMOVcc which sits at the same opcode, doing things like
this sets us up for needing to touch all of these again. Maybe that's
the best that can be done, but I still wonder whether this couldn't
be taken care of right away when introducing these entries.

> @@ -989,7 +989,7 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      { "wrussK",	{ M, Gdq }, PREFIX_DATA },
> -    { PREFIX_TABLE (PREFIX_EVEX_MAP4_66) },
> +    { PREFIX_TABLE (PREFIX_0F38F6) },

Perhaps related to the earlier question: What's going on here?

> @@ -1060,7 +1060,7 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      { Bad_Opcode },
> -    { Bad_Opcode },
> +    { "shldS",		{ VexGv, Ev, Gv, CL }, 0 },
>      { Bad_Opcode },
>      { Bad_Opcode },
>      /* A8 */
> @@ -1069,9 +1069,9 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      { Bad_Opcode },
> +    { "shrdS",		{ VexGv, Ev, Gv, CL }, 0 },
>      { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { "imulS",		{ VexGv, Gv, Ev }, 0 },

PREFIX_OPCODE (or prefix decoding) again missing in all of these?

> @@ -1091,8 +1091,8 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      /* C0 */
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { REG_TABLE (REG_C0) },
> +    { REG_TABLE (REG_C1) },
>      { Bad_Opcode },
>      { Bad_Opcode },
>      { Bad_Opcode },
> @@ -1109,10 +1109,10 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      /* D0 */
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { REG_TABLE (REG_D0) },
> +    { REG_TABLE (REG_D1) },
> +    { REG_TABLE (REG_D2) },
> +    { REG_TABLE (REG_D3) },

Some form of prefix decoding is going to be needed for these too, despite
the goal of wanting to re-use the legacy table entries. Perhaps adding
PREFIX_OPCODE there would be benign to the legacy insns?

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
>[...]
> @@ -2660,47 +2668,47 @@ static const struct dis386 reg_table[][8] = {
>    },
>    /* REG_D0 */
>    {
> -    { "rolA",	{ Eb, I1 }, 0 },
> -    { "rorA",	{ Eb, I1 }, 0 },
> -    { "rclA",	{ Eb, I1 }, 0 },
> -    { "rcrA",	{ Eb, I1 }, 0 },
> -    { "shlA",	{ Eb, I1 }, 0 },
> -    { "shrA",	{ Eb, I1 }, 0 },
> -    { "shlA",	{ Eb, I1 }, 0 },
> -    { "sarA",	{ Eb, I1 }, 0 },
> +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
>    },
>    /* REG_D1 */
>    {
> -    { "rolQ",	{ Ev, I1 }, 0 },
> -    { "rorQ",	{ Ev, I1 }, 0 },
> -    { "rclQ",	{ Ev, I1 }, 0 },
> -    { "rcrQ",	{ Ev, I1 }, 0 },
> -    { "shlQ",	{ Ev, I1 }, 0 },
> -    { "shrQ",	{ Ev, I1 }, 0 },
> -    { "shlQ",	{ Ev, I1 }, 0 },
> -    { "sarQ",	{ Ev, I1 }, 0 },
> +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
>    },

As mentioned on the assembler side already, I think we would be better
off making const_1_mode print $1 in AT&T syntax at least for these new
insn forms, to eliminate the ambiguity.

> @@ -9061,6 +9069,15 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	  ins->rex &= ~REX_B;
>  	  ins->rex2 &= ~REX_R;
>  	}
> +      if (ins->evex_type == evex_from_legacy)
> +	{
> +	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
> +	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> +	  if (!ins->vex.b && (ins->vex.register_specifier
> +				  || !ins->vex.v))
> +	    return &bad_opcode;
> +	  ins->rex |= REX_OPCODE;

What is this line about?

> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	return &err_opcode;
>  
>        /* Set vector length.  */
> -      if (ins->modrm.mod == 3 && ins->vex.b)
> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type == evex_default)
>  	ins->vex.length = 512;
>        else
>  	{

Is this change really needed for anything?

> @@ -9530,6 +9547,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>  		    {
>  		      oappend (&ins, "{bad}");
>  		      continue;
> +
>  		    }
>  
>  		  /* Instructions with a mask register destination allow for

Stray and bogus change.

> @@ -9553,7 +9571,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>  
>  	  /* Check whether rounding control was enabled for an insn not
>  	     supporting it.  */
> -	  if (ins.modrm.mod == 3 && ins.vex.b
> +	  if (ins.modrm.mod == 3 && ins.vex.b && ins.evex_type == evex_default
>  	      && !(ins.evex_used & EVEX_b_used))
>  	    {

This could do with extending the comment, mentioning the aliasing of
EVEX.brs and EVEX.nd.

> @@ -11013,7 +11031,7 @@ print_displacement (instr_info *ins, bfd_signed_vma val)
>  static void
>  intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>  {
> -  if (ins->vex.b)
> +  if (ins->vex.b && ins->evex_type == evex_default)
>      {
>        if (!ins->vex.no_broadcast)
>  	switch (bytemode)

This aliasing would also be worthwhile mentioning here and ...

> @@ -11946,7 +11964,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>  	  print_operand_value (ins, disp & 0xffff, dis_style_text);
>  	}
>      }
> -  if (ins->vex.b)
> +  if (ins->vex.b && ins->evex_type == evex_default)
>      {
>        ins->evex_used |= EVEX_b_used;

... here.

> @@ -13307,6 +13325,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    if (!ins->need_vex)
>      return true;
>  
> +  if (ins->evex_type == evex_from_legacy)
> +    {
> +      ins->evex_used |= EVEX_b_used;
> +      /* Here vex.b is treated as "EVEX.ND.  */

Okay, here you have such a helpful comment, just that - nit - there's
an unbalanced double-quote. (The comment would also more logically
come first in this block.)

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
@ 2023-11-08 11:44   ` Jan Beulich
  2023-11-08 12:52     ` Jan Beulich
  2023-11-22  5:48     ` Cui, Lili
  2023-11-09  9:57   ` Jan Beulich
  1 sibling, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-08 11:44 UTC (permalink / raw)
  To: Cui, Lili, Mo, Zewei; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -256,6 +256,7 @@ enum i386_error
>      mask_not_on_destination,
>      no_default_mask,
>      unsupported_rc_sae,
> +    unsupported_rsp_register,
>      invalid_register_operand,
>      internal_error,
>    };
> @@ -5476,6 +5477,9 @@ md_assemble (char *line)
>  	case unsupported_rc_sae:
>  	  err_msg = _("unsupported static rounding/sae");
>  	  break;
> +	case unsupported_rsp_register:
> +	  err_msg = _("unsupported rsp register");
> +	  break;

Perhaps you mean "cannot be used with" or some such? Also the register
name needs conditionally prefixing with % in diagnostics.

> @@ -6854,6 +6858,24 @@ check_VecOperands (const insn_template *t)
>  	}
>      }
>  
> +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
> +  if (t->opcode_modifier.push2pop2)

I question this way of recognizing these two insns: You introduce a
whole new table column here just to have two entries set this bit.
This is cheaper by comparing the mnemonic offsets, as we do elsewhere
in various cases.

I also disagree with putting the check in check_VecOperands():
There's nothing vector-ish here. Either you put it straight in the
caller, or you introduce a new check_APX_operands().

> +    {
> +      unsigned int reg1 = register_number (i.op[0].regs);
> +      unsigned int reg2 = register_number (i.op[1].regs);
> +
> +      if (reg1 == 0x4 || reg2 == 0x4)
> +	{
> +	  i.error = unsupported_rsp_register;
> +	  return 1;
> +	}
> +      if (t->base_opcode == 0x8f && reg1 == reg2)
> +	{
> +	  i.error = invalid_dest_and_src_register_set;

This enumerator's disagnostic talks about source and destination
register, which isn't applicable here.

> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -30,3 +30,9 @@ _start:
>          .byte 0xff
>          #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
>          .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
> +        .byte 0xff, 0xff
> +	# pop2 %rax, %rbx set EVEX.ND=0.
> +        .byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
> +        .byte 0xff, 0xff, 0xff
> +	# pop2 %rax, %rsp set EVEX.VVVV=0xf.
> +        .byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0

This 2nd comment looks bogus. What is it that's being tested here?

Also again note indentation inconsistencies.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
> @@ -0,0 +1,15 @@
> +# Check illegal APX-Push2Pop2 instructions
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +	push2  %eax, %ebx

It's okay to test 32-bit operands, but more important is to test
16-bit ones, as only those could (also) be used with PUSH/POP.

> --- a/opcodes/i386-dis-evex-mod.h
> +++ b/opcodes/i386-dis-evex-mod.h
> @@ -1,4 +1,9 @@
>  /* Nothing at present.  */
> +  /* MOD_EVEX_MAP4_8F_R_0 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_8F_R_0_M_1) },
> +  },
>    /* MOD_EVEX_MAP4_DA_PREFIX_1 */
>    {
>      { Bad_Opcode },
> @@ -41,3 +46,8 @@
>    {
>      { "movdiri",	{ Edq, Gdq }, 0 },
>    },
> +  /* MOD_EVEX_MAP4_FF_R_6 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_FF_R_6_M_1) },
> +  },

Same comment as before regarding additions to this file.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
>[...]
> @@ -9011,6 +9020,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	case 0x4:
>  	  vex_table_index = EVEX_MAP4;
>  	  ins->evex_type = evex_from_legacy;
> +	  if (ins->address_mode != mode_64bit)
> +	    return &bad_opcode;
>  	  break;

This looks to belong into an earlier patch.

> @@ -9073,8 +9084,9 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	{
>  	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
>  	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> -	  if (!ins->vex.b && (ins->vex.register_specifier
> -				  || !ins->vex.v))
> +	  if (ins->vex.ll || (!ins->vex.b
> +			      && (ins->vex.register_specifier
> +				  || !ins->vex.v)))
>  	    return &bad_opcode;

This as well.

> @@ -13821,3 +13836,24 @@ PREFETCHI_Fixup (instr_info *ins, int bytemode, int sizeflag)
>  
>    return OP_M (ins, bytemode, sizeflag);
>  }
> +
> +static bool
> +PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
> +{
> +  unsigned int vvvv_reg = ins->vex.register_specifier
> +    | !ins->vex.v << 4;

Nit: Please parenthesize the shift.

> +  unsigned int rm_reg = ins->modrm.rm + (ins->rex & REX_B ? 8 : 0)
> +    + (ins->rex2 & REX_B ? 16 : 0);
> +
> +  /* Here vex.b is treated as "EVEX.ND.  */
> +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */

The two comments want folding. As to the former, though: How about having

#define nd b

in the EVEX struct declaration (provided we don't have any variables named
"nd" right now), ...

> +  if (!ins->vex.b || vvvv_reg == 0x4 || rm_reg == 0x4

... allowing to use ins->vex.nd here (at which point that comment is
unnecessary)?

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3494,3 +3494,10 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}
>  eretu, 0xf30f01ca, FRED|x64, NoSuf, {}
>  
>  // FRED instructions end.
> +
> +// APX Push2/Pop2 instruction.
> +
> +push2, 0xff/6, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +push2p, 0xff/6, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +pop2, 0x8f/0, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }

Like other extensions have it, there also wants to be an "end" comment.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-08 11:44   ` Jan Beulich
@ 2023-11-08 12:52     ` Jan Beulich
  2023-11-22  5:48     ` Cui, Lili
  1 sibling, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-08 12:52 UTC (permalink / raw)
  To: Cui, Lili, Mo, Zewei; +Cc: hongjiu.lu, ccoutant, binutils

On 08.11.2023 12:44, Jan Beulich wrote:
> On 02.11.2023 12:29, Cui, Lili wrote:
>> @@ -6854,6 +6858,24 @@ check_VecOperands (const insn_template *t)
>>  	}
>>      }
>>  
>> +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
>> +  if (t->opcode_modifier.push2pop2)
> 
> I question this way of recognizing these two insns: You introduce a
> whole new table column here just to have two entries set this bit.
> This is cheaper by comparing the mnemonic offsets, as we do elsewhere
> in various cases.

Well, it's 4 rows, not 2, so the mnemonic offset suggestion isn't quite
nice. But there still is no need for a whole new attribute. Just another
OperandConstraint value ought to do.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-06 15:39   ` Jan Beulich
@ 2023-11-09  8:02     ` Cui, Lili
  2023-11-09 10:52       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-09  8:02 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -269,9 +275,17 @@ struct dis_private {
> >        ins->rex_used |= REX_OPCODE;			\
> >    }
> >
> > +#define USED_REX2(value)				\
> > +  {							\
> > +    if ((ins->rex2 & value))				\
> > +      ins->rex2_used |= value;				\
> > +  }
> > +
> >
> >  #define EVEX_b_used 1
> 
> Nit: Please avoid (re)introducing double blank lines. Instead ...
> 

Done.

> >  #define EVEX_len_used 2
> > +/* M0 in rex2 prefix represents map0 or map1.  */ #define REX2_M 0x8
> 
> ... a blank line ahead of this insertion would be helpful.
> 

Done.

> > @@ -1872,23 +1888,23 @@ static const struct dis386 dis386[] = {
> > +  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
> > +  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
> >    /* b0 */
> >    { "movB",		{ RMAL, Ib }, 0 },
> >    { "movB",		{ RMCL, Ib }, 0 },
> 
> Like in the i386-gen.c adjustments for row E look to be missing here, too.
> 

Added them, and also added bad testcase for row E.

> > @@ -2091,12 +2107,12 @@ static const struct dis386 dis386_twobyte[] = {
> >    { PREFIX_TABLE (PREFIX_0F2E) },
> >    { PREFIX_TABLE (PREFIX_0F2F) },
> >    /* 30 */
> > -  { "wrmsr",		{ XX }, 0 },
> > -  { "rdtsc",		{ XX }, 0 },
> > -  { "rdmsr",		{ XX }, 0 },
> > -  { "rdpmc",		{ XX }, 0 },
> > -  { "sysenter",		{ SEP }, 0 },
> > -  { "sysexit%LQ",	{ SEP }, 0 },
> > +  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
> > +  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
> >    { Bad_Opcode },
> >    { "getsec",		{ XX }, 0 },
> >    /* 38 */
> 
> Down from here row 8 also wants adjustment afaict.
> 

Done.

> > @@ -8289,6 +8313,7 @@ ckprefix (instr_info *ins)  {
> >    int i, length;
> >    uint8_t newrex;
> > +  unsigned char rex2_payload;
> 
> Please can this be restricted to the inner scope where it's used?
> 

Done.

> > @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info
> *info, int intel_syntax)
> >        goto out;
> >      }
> >
> > -  if (*ins.codep == 0x0f)
> > +  /* M0 in rex2 prefix represents map0 or map1.  */  if (*ins.codep
> > + == 0x0f || (ins.rex2 & REX2_M))
> 
> I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?
> 

Changed to REX2.M.

> Also is this, ...
> 
> >      {
> >        unsigned char threebyte;
> >
> > -      ins.codep++;
> > -      if (!fetch_code (info, ins.codep + 1))
> > -	goto fetch_error_out;
> > +      if (!ins.rex2)
> > +	{
> > +	  ins.codep++;
> > +	  if (!fetch_code (info, ins.codep + 1))
> > +	    goto fetch_error_out;
> > +	}
> >        threebyte = *ins.codep;
> >        dp = &dis386_twobyte[threebyte];
> >        ins.need_modrm = twobyte_has_modrm[threebyte];
> 
> ... all the way to here, really correct for d5 00 0f?
> 

I think the 0f here must indicate that it is the first byte of the legacy map1 instruction, meaning legacy map0 does not have 0f opcode. If this instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does appear, our original code also has the same issue.

static const struct dis386 dis386[] = {
...
/ * 0f  */
{ Bad_Opcode },       /* 0x0f extended opcode escape */

> > @@ -9454,6 +9504,14 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        goto out;
> >      }
> >
> > +  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> > +      && ins.last_rex2_prefix >= 0)
> > +    {
> > +      i386_dis_printf (info, dis_style_text, "(bad)");
> > +      ret = ins.end_codep - priv.the_buffer;
> > +      goto out;
> > +    }
> > +
> >    switch (dp->prefix_requirement)
> >      {
> >      case PREFIX_DATA:
> > @@ -9468,6 +9526,7 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        ins.used_prefixes |= PREFIX_DATA;
> >        /* Fall through.  */
> >      case PREFIX_OPCODE:
> > +    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:
> 
> May more robust to mask off PREFIX_REX2_ILLEGAL in the control expression
> of the switch()? Or else why don't you move the if() immediately ahead of the
> switch() into here, as a new case block?
> 

Changed it to

+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9600,7 +9599,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       ins.used_prefixes |= PREFIX_DATA;
       /* Fall through.  */
     case PREFIX_OPCODE:
-    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:


> > @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >
> > +  /* Check if the REX2 prefix is used.  */
> > +  if (ins.last_rex2_prefix >= 0
> > +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> > +	   && (ins.rex2 & 0x7))
> 
> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>

Here's an example of a negative scenario, when ins.rex2 == 1 and ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and we don't want to add {rex2} to it. 

> > +	  || dp == &bad_opcode))
> 
> What is this last part of the condition about? Other prefix zapping code
> doesn't have such.

Deleted it , there is no impact on current testcase. Don’t know what the author’s intention was at that time. Deleted it.

> 
> > +    ins.all_prefixes[ins.last_rex2_prefix] = 0;
> > +
> >    /* Check if the SEG prefix is used.  */
> >    if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
> >  		       | PREFIX_FS | PREFIX_GS)) != 0 @@ -9541,7 +9607,10
> @@
> > print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
> >  	if (name == NULL)
> >  	  abort ();
> >  	prefix_length += strlen (name) + 1;
> > -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> > +	if (ins.all_prefixes[i] == REX2_OPCODE)
> > +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
> 
> Do braces really count as part of the mnemonic?
> 

Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex, rex.B....

> > +	else
> > +	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> >        }
> 
> Aren't you at risk of wrongly printing a REX prefix here if the high 4 bits of the
> REX2 payload were all zero, but some of the low 4 bits turned out unused?
> 
  For d500 I think we should print rex2 for it. 

> > @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int
> reg, unsigned int rexmask,
> >      ins->illegal_masking = true;
> >
> >    USED_REX (rexmask);
> > +  USED_REX2 (rexmask);
> 
> Do both really need tracking separately? Whatever consumes REX.B will also
> consume REX2.B4, an so on.
> 
I was confused here, I think we only need to print {rex2} for the upper 4 bits == *000, which means egpr is not used and we need to use {rex2} to distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete them.

+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+	   && (ins.rex2 & 0x7))

Thanks,
Lili

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
  2023-11-08 10:39   ` Jan Beulich
  2023-11-08 11:13   ` Jan Beulich
@ 2023-11-09  9:37   ` Jan Beulich
  2023-11-20  1:33     ` Cui, Lili
  2 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-09  9:37 UTC (permalink / raw)
  To: Cui, Lili, konglin1; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De
>  mov, 0xf21, x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug, Reg64 }
>  mov, 0xf24, i386|No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test, Reg32 }
>  
> +// Move after swapping the bytes
> +movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  // Move after swapping the bytes
>  movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movbe, 0x60, Movbe|APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> @@ -290,22 +295,36 @@ add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3
>  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
>  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
> +add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> +add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}

Why are there (just as an example) only 3 new forms of ADD, but ...

> @@ -338,10 +366,19 @@ adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg
>  adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
>  adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
> +adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> +adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

.... 6 new forms of ADC? My guess is that's NF-related, but doesn't that
mean that until NF support is added e.g. "{evex} add %eax, %eax" won't
assemble as intended? IOW in line with NF as an attribute being added right
here, those other templates also will want introducing right here.

While looking at patch 7 I'm also wondering whether same-base-opcode forms
wouldn't better be kept together. I haven't finished yet looking at what's
needed there, but even if it doesn't turn out strictly necessary there it
may still be a good idea anyway (unless of course that would get in the
way of anything).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
  2023-11-08 11:44   ` Jan Beulich
@ 2023-11-09  9:57   ` Jan Beulich
  1 sibling, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-09  9:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3494,3 +3494,10 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}
>  eretu, 0xf30f01ca, FRED|x64, NoSuf, {}
>  
>  // FRED instructions end.
> +
> +// APX Push2/Pop2 instruction.
> +
> +push2, 0xff/6, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +push2p, 0xff/6, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +pop2, 0x8f/0, APX_F, Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
> +pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }

I think the use of the VexVVVV attribute here wants to be consistent with
that in the earlier patch used for e.g. NEG: Nominally (i.e. notation-wise)
the last operand is the destination operand in both cases (even if in
practice it's a source one here). The resulting encoding being the same is
merely an effect resulting from ModR/M.reg being used for an opcode
extension in both cases (which takes priority over register assignment in
build_modrm_byte()). But: I came to spot the inconsistency when looking at
patch 7, which keys off of VexVVVVDst. Obviously when making things
consistent here, there would then need to be measures to avoid the
optimization kicking in on the insns here. Hence an alternative to using
VexVVVVDst here would be a clarifying / justifying comment.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
@ 2023-11-09 10:36   ` Jan Beulich
  2023-11-10  5:43     ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-09 10:36 UTC (permalink / raw)
  To: Cui, Lili, Hu, Lin1; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7208,6 +7208,44 @@ check_EgprOperands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Optimize APX NDD insns to non-NDD insns.  */
> +
> +static bool
> +optimize_NDD_to_nonNDD (const insn_template *t)
> +{
> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> +      && t->opcode_space == SPACE_EVEXMAP4
> +      && !i.has_nf

As mentioned before, I'd still prefer the optimization to only be
added after NF handling was put in place. But I'm not meaning to make
this a strict requirement: Introducing the has_nf field here (with
it only being read, never [explicitly] written) is certainly okay-ish.
(But see also below.)

Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there
yet in the patches, but which will also need excluding from this
optimization. Obviously this concern then extends to any future ND-
encoded insns, which (likely) won't have legacy-encoded (and hence
REX2-encodable) counterparts. Are there any plans how to deal with
such? (There's a possible approach mentioned further down.)

> +      && i.reg_operands >= 2
> +      && i.types[i.operands - 1].bitfield.class == Reg)

Isn't this implicit from the VexVVVV check further up?

> +    {
> +      unsigned int readonly_var = ~0;
> +      unsigned int dest = i.operands - 1;
> +      unsigned int src1 = (i.operands > 2) ? i.operands - 2 : 0;

Since we already know i.operands >= 2 from the earlier check of
i.reg_operands, can't this simply be

      unsigned int src1 = i.operands - 2;

?

> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.types[src1].bitfield.class == Reg
> +	  && i.op[src1].regs == i.op[dest].regs)
> +	readonly_var = src2;

As can be seen in the testcase, this also results in ADCX/ADOX to be
converted to non-ND EVEX forms, i.e. even when that's not a win at all.
We shouldn't change what the user has written when the encoding doesn't
actually improve. (Or else, but I'd be hesitant to accept that, at the
very least the effect would need pointing out in the description or
even a code comment, so that later on it is possible to figure out
whether that was intentional or an oversight.)

This is where my template ordering remark in reply to patch 5 comes
into play: Whether invoking re-parse is okay would further need to
depend on whether an alternative (earlier) template actually allows
REX2 encoding (same base-opcode could be one of the criteria for how
far to look back through earlier templates; an option might also be to
put the 3-operand templates first, so that looking backwards wouldn't
be necessary in the first place). This would then likely also address
one of the forward looking concerns I've raised above.

> +      /* adcx, adox and imul don't have D bit.  */
> +      else if (i.types[src2].bitfield.class == Reg
> +	       && i.op[src2].regs == i.op[dest].regs
> +	       && t->opcode_modifier.commutative)

There's a disconnect between comment and code here: You don't use the
D attribute, so why is it being mentioned?

> +	readonly_var = src1;
> +      if (readonly_var != (unsigned int) ~0)
> +	{
> +	  --i.operands;
> +	  --i.reg_operands;
> +	  --i.tm.operands;
> +
> +	  if (readonly_var != src2)
> +	    swap_2_operands (readonly_var, src2);

May I suggest that just out of precaution the swapping be done before
operand counts are decremented? In principle swap_2_operands() could
do with having assertions added as to it actually dealing with valid
operands. (You'll note that elsewhere, when we add a new operand, we
increment first and then swap.)

> +	  return 1;
> +	}
> +    }
> +  return 0;
> +}

Nit: A function returning bool would preferably return true/false, not
0/1.

> @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
>  	  i.memshift = memshift;
>  	}
>  
> +      /* If we can optimize a NDD insn to non-NDD insn, like
> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))

So you do this optimization at -O1, but not at -O2? Imo the "== 1"
simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
prefixes need respecting. Quite likely respecting {evex} would
eliminate the need for the explicit .has_nf check in the helper
function, as I expect .vec_encoding to be set alongside that bit
anyway. Further quite likely respecting {evex} here will mean that in
patch 3 you need to introduce a new enumerator (e.g.
vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.

As to optimization level: In build_vex_prefix() we leverage C only
at -O2 or higher (including -Os). We may want to be consistent in this
regard here (i.e. by an extra check in the helper function).

> +	{
> +	  t = current_templates->start - 1;

As per a remark further up, this adjustment could be avoided if the ND
templates came ahead of the legacy ones. They can't be wrongly used in
place of the legacy ones, due to the extra operand they require. Then
a comment here would merely point out this ordering aspect. But of
course care will then need to be taken to not go past i386_optab[]'s
bounds (by having suitably ordered conditionals when looking for
whether there is an alternative template in the first place; again see
the respective remark further up).

> +	  continue;
> +	}

Btw, considering this re-matching, I wonder whether "convert" wouldn't
be better in the function name compared to "optimize".

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> @@ -0,0 +1,117 @@
> +# Check 64bit APX NDD instructions with optimized encoding
> +
> +	.text
> +_start:
> +inc    %r31,%r31
> +incb   %r31b,%r31b
> +add    %r31,%r8,%r8
> +addb   %r31b,%r8b,%r8b
> +{store} add    %r31,%r8,%r8
> +{load}  add    %r31,%r8,%r8
> +add    %r31,(%r8),%r31
> +add    (%r31),%r8,%r8
> +add    $0x12344433,%r15,%r15
> +add    $0xfffffffff4332211,%r8,%r8
> +dec    %r17,%r17
> +decb   %r17b,%r17b
> +not    %r17,%r17
> +notb   %r17b,%r17b
> +neg    %r17,%r17
> +negb   %r17b,%r17b
> +sub    %r15,%r17,%r17
> +subb   %r15b,%r17b,%r17b
> +sub    %r15,(%r8),%r15
> +sub    (%r15,%rax,1),%r16,%r16
> +sub    $0x1234,%r30,%r30
> +sbb    %r15,%r17,%r17
> +sbbb   %r15b,%r17b,%r17b
> +sbb    %r15,(%r8),%r15
> +sbb    (%r15,%rax,1),%r16,%r16
> +sbb    $0x1234,%r30,%r30
> +adc    %r15,%r17,%r17
> +adcb   %r15b,%r17b,%r17b
> +adc    %r15,(%r8),%r15
> +adc    (%r15,%rax,1),%r16,%r16
> +adc    $0x1234,%r30,%r30
> +or     %r15,%r17,%r17
> +orb    %r15b,%r17b,%r17b
> +or     %r15,(%r8),%r15
> +or     (%r15,%rax,1),%r16,%r16
> +or     $0x1234,%r30,%r30
> +xor    %r15,%r17,%r17
> +xorb   %r15b,%r17b,%r17b
> +xor    %r15,(%r8),%r15
> +xor    (%r15,%rax,1),%r16,%r16
> +xor    $0x1234,%r30,%r30
> +and    %r15,%r17,%r17
> +andb   %r15b,%r17b,%r17b
> +and    %r15,(%r8),%r15
> +and    (%r15,%rax,1),%r16,%r16
> +and    $0x1234,%r30,%r30
> +ror    %r31,%r31
> +rorb   %r31b,%r31b
> +ror    $0x2,%r12,%r12
> +rorb   $0x2,%r12b,%r12b
> +rol    %r31,%r31
> +rolb   %r31b,%r31b
> +rol    $0x2,%r12,%r12
> +rolb   $0x2,%r12b,%r12b
> +rcr    %r31,%r31
> +rcrb   %r31b,%r31b
> +rcr    $0x2,%r12,%r12
> +rcrb   $0x2,%r12b,%r12b
> +rcl    %r31,%r31
> +rclb   %r31b,%r31b
> +rcl    $0x2,%r12,%r12
> +rclb   $0x2,%r12b,%r12b
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12,%r12
> +shlb   $0x2,%r12b,%r12b
> +sar    %r31,%r31
> +sarb   %r31b,%r31b
> +sar    $0x2,%r12,%r12
> +sarb   $0x2,%r12b,%r12b
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12,%r12
> +shlb   $0x2,%r12b,%r12b
> +shr    %r31,%r31
> +shrb   %r31b,%r31b
> +shr    $0x2,%r12,%r12
> +shrb   $0x2,%r12b,%r12b
> +shld   $0x1,%r12,(%rax),%r12
> +shld   $0x2,%r8,%r12,%r12
> +shld   %cl,%r9,(%rax),%r9
> +shld   %cl,%r12,%r16,%r16
> +shld   %cl,%r13,(%r19,%rax,4),%r13

What's the difference (in what is being tested) between this and the
first of the %cl tests? Shouldn't one of them rather be of the form
"reg1,reg2,reg1"? And then shouldn't there be a similar test with an
immediate operand? (Same for SHRD then, obviously.)

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -145,6 +145,8 @@
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.
> +// And re-use the bit to mark some NDD insns that swapping the source operands
> +// may allow to switch from 3 operands to 2 operands.
>  #define C StaticRounding

The 3-to-2 conversion isn't what we're primarily after (see comments above).
It's the EVEX->REX2 encoding conversion which we'd like to do.

> @@ -166,6 +168,10 @@
>  
>  ### MARKER ###
>  
> +// Please don't add a NDD insn which may be optimized to a REX2 insn before the
> +// mov. It may result that a good UB checker object the behavior
> +// "template->start - 1" at the end of match_template.
> +
>  // Move instructions.

While I mentioned adding a comment here as a minimal solution, did you try
to think of better approaches, or some enforcement of this restriction
(like gas_assert() before the expression in question)? You could even go
as far as simply not trying the optimization when t == i386_optab, with no
need to have a comment here (the comment would then be next to that
part of the condition, thus right where it's really relevant). Then anyone
misplacing a new template in the opcode table would simply observe that
an expected optimization doesn't happen, and they surely would find the
conditional with its comment.

For all of the changes below (which are a little hard to review in email),
aiui they only add C as needed. I once again would prefer if that attribute
could be added right as the templates are introduced, with the description
stating the intention and that the actual use of the attribute will be
added later (i.e. as expressed earlier already for NF).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-09  8:02     ` Cui, Lili
@ 2023-11-09 10:52       ` Jan Beulich
  2023-11-09 13:27         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-09 10:52 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 09.11.2023 09:02, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info
>> *info, int intel_syntax)
>>>        goto out;
>>>      }
>>>
>>> -  if (*ins.codep == 0x0f)
>>> +  /* M0 in rex2 prefix represents map0 or map1.  */  if (*ins.codep
>>> + == 0x0f || (ins.rex2 & REX2_M))
>>
>> I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?
>>
> 
> Changed to REX2.M.
> 
>> Also is this, ...
>>
>>>      {
>>>        unsigned char threebyte;
>>>
>>> -      ins.codep++;
>>> -      if (!fetch_code (info, ins.codep + 1))
>>> -	goto fetch_error_out;
>>> +      if (!ins.rex2)
>>> +	{
>>> +	  ins.codep++;
>>> +	  if (!fetch_code (info, ins.codep + 1))
>>> +	    goto fetch_error_out;
>>> +	}
>>>        threebyte = *ins.codep;
>>>        dp = &dis386_twobyte[threebyte];
>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>
>> ... all the way to here, really correct for d5 00 0f?
>>
> 
> I think the 0f here must indicate that it is the first byte of the legacy map1 instruction, meaning legacy map0 does not have 0f opcode. If this instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does appear, our original code also has the same issue.
> 
> static const struct dis386 dis386[] = {
> ...
> / * 0f  */
> { Bad_Opcode },       /* 0x0f extended opcode escape */

No, this entry simply will never be used, because of how decoding is done.
My comment was about what's going to happen if you encounter the d5 00 0f
byte sequence. That's _not_ an indication to use map1 for decoding, nor
to read another opcode byte. In this case the table entry you quote above
will need to come into play, not any entry from dis386_twobyte[]. (As long
as both are Bad_Opcode the difference may not even be noticeable, but it
would be a latent trap for someone to fall into down the road.)

>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
>> int intel_syntax)
>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>
>>> +  /* Check if the REX2 prefix is used.  */
>>> +  if (ins.last_rex2_prefix >= 0
>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>> +	   && (ins.rex2 & 0x7))
>>
>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>
> 
> Here's an example of a negative scenario, when ins.rex2 == 1 and ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and we don't want to add {rex2} to it. 

Well, that would be dealt with as well by the simpler code I suggested,
wouldn't it?

>> @@
>>> print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>>>  	if (name == NULL)
>>>  	  abort ();
>>>  	prefix_length += strlen (name) + 1;
>>> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
>>> +	if (ins.all_prefixes[i] == REX2_OPCODE)
>>> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
>>
>> Do braces really count as part of the mnemonic?
> 
> Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex, rex.B....

Maybe you didn't understand what I mean: My comment was regarding the use
of dis_style_mnemonic for the entire {rex2} (rather than perhaps just for
what's inside the figure braces).

>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int
>> reg, unsigned int rexmask,
>>>      ins->illegal_masking = true;
>>>
>>>    USED_REX (rexmask);
>>> +  USED_REX2 (rexmask);
>>
>> Do both really need tracking separately? Whatever consumes REX.B will also
>> consume REX2.B4, an so on.
>>
> I was confused here, I think we only need to print {rex2} for the upper 4 bits == *000, which means egpr is not used and we need to use {rex2} to distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete them.
> 
> +  /* Check if the REX2 prefix is used.  */
> +  if (ins.last_rex2_prefix >= 0
> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> +	   && (ins.rex2 & 0x7))

But that's the same you had before. I'm afraid I don't see what you're
trying to tell me.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
@ 2023-11-09 12:59   ` Jan Beulich
  2023-11-14  3:26     ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-09 12:59 UTC (permalink / raw)
  To: Cui, Lili, Hu, Lin1; +Cc: hongjiu.lu, ccoutant, binutils

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7790,7 +7790,8 @@ match_template (char mnem_suffix)
>    if (!quiet_warnings)
>      {
>        if (!intel_syntax
> -	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE)))
> +	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE))
> +	  && t->mnem_off != MN_jmpabs)
>  	as_warn (_("indirect %s without `*'"), insn_name (t));

Coming back to this, which I did comment on before already. The insn
taking an immediate operand doesn't really justify this, as it leaves
open the underlying question of why you use JumpAbsolute in the insn
template in the first place. I've gone through all the uses of
JUMP_ABSOLUTE, and I didn't find any where the respective handling
would be applicable here. In fact it's unclear whether the insn needs
marking as any JUMP_* at all: It's not really different from, say,
"mov $imm, %rcx".

There's a further question regarding its operand representation,
though: Can you explain why it's Imm64, not Disp64? The latter would,
to me, seem more natural to use here. Not only from a assembler
internals perspective, but also from the users' one: The $ in the
operand carries absolutely no meaning (see also the related testcase
comment below) in AT&T syntax, and there's no noticeable difference
in Intel syntax afaict.

> @@ -8939,6 +8940,9 @@ process_operands (void)
>  	}
>      }
>  
> +  if (i.tm.mnem_off == MN_jmpabs)
> +    i.rex2_encoding = true;

Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do
here is effect the latter. Yet as indicated, the pseudo-prefix isn't
really an indication of "must have REX2 prefix", but only a weak
request to do so if possible. I think you want to set i.rex2 here
instead, requiring a way to express that an empty REX2 prefix is
wanted.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
> @@ -0,0 +1,17 @@
> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
> +
> +	.text
> +# With 66 prefix
> +	.byte 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +# With 67 prefix
> +	.byte 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +# With F2 prefix
> +	.byte 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +# With F3 prefix
> +	.byte 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> +# REX2.M0 = 0 REX2.W = 1
> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00

As per earlier comments: This wants expressing via .insn, to yield input
to gas human-readable (even if, as it looks, two .insn are going to be
required per resulting construct). Further in the last comment, why is
REX2.M0 mentioned there, but not elsewhere? Also what purpose serve the
0x64 bytes here? The encodings are invalid irrespective of them. Instead
I'd kind have expected LOCK to also be covered.

Also a spec question as we're talking of what is or is not valid (i.e.
causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's no
use of eGPR-s here.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
> @@ -0,0 +1,10 @@
> +# Check 64bit APX_F JMPABS instructions
> +
> +	.text
> + _start:
> +	jmpabs	      $0x0202020202020202
> +	jmpabs	      $0x2
> +
> +.intel_syntax noprefix
> +	jmpabs	      0x0202020202020202
> +	jmpabs	      0x2

I expect this isn't going to be the normal use of the insn. Instead I
would foresee the typical users to be "jmpabs symbol" (and - as per
above - intentionally omitting the $ already). IOW the testcase also
wants to cover the case requiring a relocation, including a check that
the correct relocation is emitted (covering both ELF and COFF; I'm not
going to insist on also covering Mach-O, as - for a reason that
escapes me - gas can't even be configured for x86_64-*-darwin*).

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
>  static bool DistinctDest_Fixup (instr_info *, int, int);
>  static bool PREFETCHI_Fixup (instr_info *, int, int);
>  static bool PUSH2_POP2_Fixup (instr_info *, int, int);
> +static bool JMPABS_Fixup (instr_info *, int, int);
>  
>  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
>  						enum disassembler_style,
> @@ -258,6 +259,9 @@ struct instr_info
>    char scale_char;
>  
>    enum x86_64_isa isa64;
> +
> +  /* Remember if the current op is jmpabs.  */
> +  bool is_jmpabs;
>  };

This field would probably best live next to op_is_jump (and then also
be named op_is_jmpabs, assuming a separate boolean is indeed needed).
I further expect that op_is_jump also wants setting for JMPABS.

> @@ -2032,7 +2036,7 @@ static const struct dis386 dis386[] = {
>    { "lahf",		{ XX }, 0 },
>    /* a0 */
>    { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
> -  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup, v_mode } }, PREFIX_REX2_ILLEGAL },
>    { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
>    { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
>    { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
> @@ -9648,7 +9652,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>      }
>  
>    if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> -      && ins.last_rex2_prefix >= 0)
> +      && ins.last_rex2_prefix >= 0 && !ins.is_jmpabs)
>      {
>        i386_dis_printf (info, dis_style_text, "(bad)");
>        ret = ins.end_codep - priv.the_buffer;
> @@ -13857,3 +13861,38 @@ PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
>  
>    return OP_VEX (ins, bytemode, sizeflag);
>  }
> +
> +static bool
> +JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag)
> +{
> +  if (ins->address_mode == mode_64bit
> +      && ins->last_rex2_prefix >= 0
> +      && (ins->rex2 & 0x80) == 0x0)
> +    {
> +      uint64_t op;
> +
> +      if (bytemode == eAX_reg)
> +	return true;
> +
> +      if (!get64 (ins, &op))
> +	return false;
> +
> +      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR)) != 0x0
> +	  || (ins->rex & REX_W) != 0x0)
> +	{
> +	  oappend (ins, "(bad)");
> +	  return true;
> +	}
> +
> +      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
> +      ins->all_prefixes[ins->last_rex2_prefix] = 0;

This doesn't look right. REX2.{R,X,B}{3,4} set still want recording
in the output. I expect you may need to set a bit in rex2_used here,
but how exactly that ought to work depends on how comments on earlier
patches are going to be addressed. This may then also eliminate the
need for ...

> +      ins->is_jmpabs = true;

... this field, which likely will be covered by a more generic
approach.

> +      oappend_immediate (ins, op);
> +
> +      return true;
> +    }
> +
> +  if (bytemode == eAX_reg)
> +    return OP_IMREG (ins, bytemode, sizeflag);
> +  return OP_OFF64 (ins, v_mode, sizeflag);

v_mode is, afaics, properly passed into here. Why would you open-code
that, instead of using bytemode? Not doing so will give the compiler
more ICF opportunities.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-09 10:52       ` Jan Beulich
@ 2023-11-09 13:27         ` Cui, Lili
  2023-11-09 15:22           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-09 13:27 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> >> Also is this, ...
> >>
> >>>      {
> >>>        unsigned char threebyte;
> >>>
> >>> -      ins.codep++;
> >>> -      if (!fetch_code (info, ins.codep + 1))
> >>> -	goto fetch_error_out;
> >>> +      if (!ins.rex2)
> >>> +	{
> >>> +	  ins.codep++;
> >>> +	  if (!fetch_code (info, ins.codep + 1))
> >>> +	    goto fetch_error_out;
> >>> +	}
> >>>        threebyte = *ins.codep;
> >>>        dp = &dis386_twobyte[threebyte];
> >>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>
> >> ... all the way to here, really correct for d5 00 0f?
> >>
> >
> > I think the 0f here must indicate that it is the first byte of the legacy map1
> instruction, meaning legacy map0 does not have 0f opcode. If this instruction
> has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does
> appear, our original code also has the same issue.
> >
> > static const struct dis386 dis386[] = { ...
> > / * 0f  */
> > { Bad_Opcode },       /* 0x0f extended opcode escape */
> 
> No, this entry simply will never be used, because of how decoding is done.
> My comment was about what's going to happen if you encounter the d5 00 0f
> byte sequence. That's _not_ an indication to use map1 for decoding, nor to
> read another opcode byte. In this case the table entry you quote above will
> need to come into play, not any entry from dis386_twobyte[]. (As long as both
> are Bad_Opcode the difference may not even be noticeable, but it would be a
> latent trap for someone to fall into down the road.)
> 


  /* REX2.M in rex2 prefix represents map0 or map1.  */
  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
    {
      unsigned char threebyte;

      if (!ins.rex2)
        {
          ins.codep++;
          if (!fetch_code (info, ins.codep + 1))
            goto fetch_error_out;                                                      ---> When there are no bytes after 0f, it will jump to fetch error, but no error will be reported.
        }
      threebyte = *ins.codep;
      dp = &dis386_twobyte[threebyte];
      ins.need_modrm = twobyte_has_modrm[threebyte];
      ins.codep++;
    }

For d5 00 0f
Decode to:
   0:   d5                      rex2
   1:   00 0f                   add    %cl,(%rdi)

For 40 0f
Decode to:
   0:   40                      rex
   1:   0f                      .byte 0xf


> >>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
> >>> *info,
> >> int intel_syntax)
> >>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >>>
> >>> +  /* Check if the REX2 prefix is used.  */
> >>> +  if (ins.last_rex2_prefix >= 0
> >>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>> +	   && (ins.rex2 & 0x7))
> >>
> >> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
> >>
> >
> > Here's an example of a negative scenario, when ins.rex2 == 1 and
> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and
> we don't want to add {rex2} to it.
> 
> Well, that would be dealt with as well by the simpler code I suggested,
> wouldn't it?
> 

No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to delete them. I don't see any point in it at all.

> >> @@
> >>> print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
> >>>  	if (name == NULL)
> >>>  	  abort ();
> >>>  	prefix_length += strlen (name) + 1;
> >>> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> >>> +	if (ins.all_prefixes[i] == REX2_OPCODE)
> >>> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
> >>
> >> Do braces really count as part of the mnemonic?
> >
> > Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex,
> rex.B....
> 
> Maybe you didn't understand what I mean: My comment was regarding the
> use of dis_style_mnemonic for the entire {rex2} (rather than perhaps just for
> what's inside the figure braces).
> 
> >>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
> >>> int
> >> reg, unsigned int rexmask,
> >>>      ins->illegal_masking = true;
> >>>
> >>>    USED_REX (rexmask);
> >>> +  USED_REX2 (rexmask);
> >>
> >> Do both really need tracking separately? Whatever consumes REX.B will
> >> also consume REX2.B4, an so on.
> >>
> > I was confused here, I think we only need to print {rex2} for the upper 4 bits
> == *000, which means egpr is not used and we need to use {rex2} to
> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^
> (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete
> them.
> >
> > +  /* Check if the REX2 prefix is used.  */
> > +  if (ins.last_rex2_prefix >= 0
> > +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> > +	   && (ins.rex2 & 0x7))
> 
> But that's the same you had before. I'm afraid I don't see what you're trying
> to tell me.
> 
After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ", the code changes to 

  +  /* Check if the REX2 prefix is used.  */
  +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))  

When it is true, decode will not print the {rex2} for this insn.

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-09 13:27         ` Cui, Lili
@ 2023-11-09 15:22           ` Jan Beulich
  2023-11-10  7:11             ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-09 15:22 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 09.11.2023 14:27, Cui, Lili wrote:
>>>> Also is this, ...
>>>>
>>>>>      {
>>>>>        unsigned char threebyte;
>>>>>
>>>>> -      ins.codep++;
>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>> -	goto fetch_error_out;
>>>>> +      if (!ins.rex2)
>>>>> +	{
>>>>> +	  ins.codep++;
>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>> +	    goto fetch_error_out;
>>>>> +	}
>>>>>        threebyte = *ins.codep;
>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>
>>>> ... all the way to here, really correct for d5 00 0f?
>>>>
>>>
>>> I think the 0f here must indicate that it is the first byte of the legacy map1
>> instruction, meaning legacy map0 does not have 0f opcode. If this instruction
>> has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does
>> appear, our original code also has the same issue.
>>>
>>> static const struct dis386 dis386[] = { ...
>>> / * 0f  */
>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>
>> No, this entry simply will never be used, because of how decoding is done.
>> My comment was about what's going to happen if you encounter the d5 00 0f
>> byte sequence. That's _not_ an indication to use map1 for decoding, nor to
>> read another opcode byte. In this case the table entry you quote above will
>> need to come into play, not any entry from dis386_twobyte[]. (As long as both
>> are Bad_Opcode the difference may not even be noticeable, but it would be a
>> latent trap for someone to fall into down the road.)
>>
> 
> 
>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>     {
>       unsigned char threebyte;
> 
>       if (!ins.rex2)
>         {
>           ins.codep++;
>           if (!fetch_code (info, ins.codep + 1))
>             goto fetch_error_out;                                                      ---> When there are no bytes after 0f, it will jump to fetch error, but no error will be reported.
>         }
>       threebyte = *ins.codep;
>       dp = &dis386_twobyte[threebyte];
>       ins.need_modrm = twobyte_has_modrm[threebyte];
>       ins.codep++;
>     }
> 
> For d5 00 0f
> Decode to:
>    0:   d5                      rex2
>    1:   00 0f                   add    %cl,(%rdi)

But this would better have d5 00 0f all on the first line (it
definitely needs to have d5 00 on the same line, as the bytes belong
together), as opposed to ...

> For 40 0f
> Decode to:
>    0:   40                      rex
>    1:   0f                      .byte 0xf

... this where there truly is a known missing byte before we could
proceed further. (It's still a little questionable to print REX
separately in this case, but that's the way the binutils disassembler
has always worked.)

Yet to restate - to see what I mean, you'd need to populate at least
one of the two 0f slots in the mentioned arrays. What I'm suspecting
from the code as this patch version has it is that d5 00 0f would
wrongly descend into dis386_twobyte[]. Yet you can tell that from
it correctly using dis386[] only if the two 0f slots of these arrays
are meaningfully different (or by actually looking at things in e.g.
a debugger).

>>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
>>>>> *info,
>>>> int intel_syntax)
>>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>>>
>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>> +	   && (ins.rex2 & 0x7))
>>>>
>>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>>>
>>>
>>> Here's an example of a negative scenario, when ins.rex2 == 1 and
>> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and
>> we don't want to add {rex2} to it.
>>
>> Well, that would be dealt with as well by the simpler code I suggested,
>> wouldn't it?
>>
> 
> No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to delete them. I don't see any point in it at all.

Hmm, I guess I'm confused. How would you present unconsumed REX2.{R,X,B}{3,4}
then?

>>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
>>>>> int
>>>> reg, unsigned int rexmask,
>>>>>      ins->illegal_masking = true;
>>>>>
>>>>>    USED_REX (rexmask);
>>>>> +  USED_REX2 (rexmask);
>>>>
>>>> Do both really need tracking separately? Whatever consumes REX.B will
>>>> also consume REX2.B4, an so on.
>>>>
>>> I was confused here, I think we only need to print {rex2} for the upper 4 bits
>> == *000, which means egpr is not used and we need to use {rex2} to
>> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^
>> (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete
>> them.
>>>
>>> +  /* Check if the REX2 prefix is used.  */
>>> +  if (ins.last_rex2_prefix >= 0
>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>> +	   && (ins.rex2 & 0x7))
>>
>> But that's the same you had before. I'm afraid I don't see what you're trying
>> to tell me.
>>
> After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ", the code changes to 
> 
>   +  /* Check if the REX2 prefix is used.  */
>   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))  
> 
> When it is true, decode will not print the {rex2} for this insn.

Yet ins.rex2 having any of the low 3 bits set says nothing about whether
every one of these was consumed while processing operands / suffixes.
You need to consult .rex{,2}_used; my earlier point was merely that you
don't need a separate .rex2_used; the bits in .rex_used are all you
require to get this right (as a consumer of, say, REX.X / REX2.X3 is
also a consumer of REX2.X4; leaving aside EVEX encoded insns for the
moment).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-09 10:36   ` Jan Beulich
@ 2023-11-10  5:43     ` Hu, Lin1
  2023-11-10  9:54       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-10  5:43 UTC (permalink / raw)
  To: Beulich, Jan, Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

> On 02.11.2023 12:29, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -7208,6 +7208,44 @@ check_EgprOperands (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Optimize APX NDD insns to non-NDD insns.  */
> > +
> > +static bool
> > +optimize_NDD_to_nonNDD (const insn_template *t) {
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> > +      && t->opcode_space == SPACE_EVEXMAP4
> > +      && !i.has_nf
> 
> As mentioned before, I'd still prefer the optimization to only be added after NF
> handling was put in place. But I'm not meaning to make this a strict requirement:
> Introducing the has_nf field here (with it only being read, never [explicitly]
> written) is certainly okay-ish.
> (But see also below.)
>

Of course we can put the optimization patch after the NF patch. For now, we are just putting it here for review.

> 
> Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there yet in
> the patches, but which will also need excluding from this optimization. Obviously
> this concern then extends to any future ND- encoded insns, which (likely) won't
> have legacy-encoded (and hence
> REX2-encodable) counterparts. Are there any plans how to deal with such?
> (There's a possible approach mentioned further down.)
>

Looking at other current NDD instructions, it should be possible to use evex encoding even if it doesn't have rex2 encoding.
 
>
> > +      && i.reg_operands >= 2
> > +      && i.types[i.operands - 1].bitfield.class == Reg)
> 
> Isn't this implicit from the VexVVVV check further up?
>

Yes.

> 
> > +    {
> > +      unsigned int readonly_var = ~0;
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = (i.operands > 2) ? i.operands - 2 : 0;
> 
> Since we already know i.operands >= 2 from the earlier check of i.reg_operands,
> can't this simply be
> 
>       unsigned int src1 = i.operands - 2;
> 
> ?
>

OK.

> 
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +	  && i.op[src1].regs == i.op[dest].regs)
> > +	readonly_var = src2;
> 
> As can be seen in the testcase, this also results in ADCX/ADOX to be converted to
> non-ND EVEX forms, i.e. even when that's not a win at all.
> We shouldn't change what the user has written when the encoding doesn't
> actually improve. (Or else, but I'd be hesitant to accept that, at the very least the
> effect would need pointing out in the description or even a code comment, so
> that later on it is possible to figure out whether that was intentional or an
> oversight.)
> 
> This is where my template ordering remark in reply to patch 5 comes into play:
> Whether invoking re-parse is okay would further need to depend on whether an
> alternative (earlier) template actually allows
> REX2 encoding (same base-opcode could be one of the criteria for how far to
> look back through earlier templates; an option might also be to put the 3-
> operand templates first, so that looking backwards wouldn't be necessary in the
> first place). This would then likely also address one of the forward looking
> concerns I've raised above.
>

Indeed, adcx's legacy insn can't support rex2.

For my problem, I prefer to re-order templates order, because, I hadn't thought of a way to simply move t to the farthest same base_opcode template for the moment. The following is a tentative scenario: the order will be ndd evex - rex2 - evex. And I will need a tmp_variable to avoid the insn doesn't match the rex2, let me backtrack the match's result and the value of i.

> 
> > +      /* adcx, adox and imul don't have D bit.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +	       && i.op[src2].regs == i.op[dest].regs
> > +	       && t->opcode_modifier.commutative)
> 
> There's a disconnect between comment and code here: You don't use the D
> attribute, so why is it being mentioned?
>

I forgot to modify it.
 
>
> > +	readonly_var = src1;
> > +      if (readonly_var != (unsigned int) ~0)
> > +	{
> > +	  --i.operands;
> > +	  --i.reg_operands;
> > +	  --i.tm.operands;
> > +
> > +	  if (readonly_var != src2)
> > +	    swap_2_operands (readonly_var, src2);
> 
> May I suggest that just out of precaution the swapping be done before operand
> counts are decremented? In principle swap_2_operands() could do with having
> assertions added as to it actually dealing with valid operands. (You'll note that
> elsewhere, when we add a new operand, we increment first and then swap.)
> 

Indeed, it's safer, I've exchanged the order of execution, do you have any other comments on the assertions (If I understand correctly, there is a desire for some gcc_assert?), for the time being I can guarantee that the two indexes are definitely in range, is there anything else that needs to be judged?

> > @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
> >  	  i.memshift = memshift;
> >  	}
> >
> > +      /* If we can optimize a NDD insn to non-NDD insn, like
> > +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> > +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
> 
> So you do this optimization at -O1, but not at -O2? Imo the "== 1"
> simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
> prefixes need respecting. Quite likely respecting {evex} would eliminate the need
> for the explicit .has_nf check in the helper function, as I expect .vec_encoding to
> be set alongside that bit anyway. Further quite likely respecting {evex} here will
> mean that in patch 3 you need to introduce a new enumerator (e.g.
> vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
> setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.
> 
> As to optimization level: In build_vex_prefix() we leverage C only at -O2 or
> higher (including -Os). We may want to be consistent in this regard here (i.e. by
> an extra check in the helper function).
> 

It's a mistake, I have fixed it. The conditions will be. I will try later, after the NF patch is done, to see if the constraint i.has_nf can be removed or not.

       /* If we can optimize a NDD insn to non-NDD insn, like
         add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
-      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
+      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
+         && optimize && optimize_NDD_to_nonNDD (t))
        {

>
> > +	{
> > +	  t = current_templates->start - 1;
> 
> As per a remark further up, this adjustment could be avoided if the ND templates
> came ahead of the legacy ones. They can't be wrongly used in place of the
> legacy ones, due to the extra operand they require. Then a comment here would
> merely point out this ordering aspect. But of course care will then need to be
> taken to not go past i386_optab[]'s bounds (by having suitably ordered
> conditionals when looking for whether there is an alternative template in the
> first place; again see the respective remark further up).
>

Yes, if we reorder the template's order, I will remove the line. Only one example of a possible implementation is given here:

        }

+      bool have_converted_NDD_to_nonNDD = false;
+      i386_insn tmp_i;
+
+      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
+         && optimize && !have_converted_NDD_to_nonNDD
+         && convert_NDD_to_nonNDD (t))
+       {
+         have_converted_NDD_to_nonNDD = true;
+         tmp_i = i;
+       }
+
       /* We've found a match; break out of loop.  */
       break;
     }
@@ -7787,6 +7802,9 @@ match_template (char mnem_suffix)
       return NULL;
     }

+  if (have_converted_to_nonNDD)
+    i = tmp_i;
+
   if (!quiet_warnings)

> 
> > +	  continue;
> > +	}
> 
> Btw, considering this re-matching, I wonder whether "convert" wouldn't be
> better in the function name compared to "optimize".
> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> > @@ -0,0 +1,117 @@
> > +# Check 64bit APX NDD instructions with optimized encoding
> > +
> > +	.text
> > +_start:
> > +inc    %r31,%r31
> > +incb   %r31b,%r31b
> > +add    %r31,%r8,%r8
> > +addb   %r31b,%r8b,%r8b
> > +{store} add    %r31,%r8,%r8
> > +{load}  add    %r31,%r8,%r8
> > +add    %r31,(%r8),%r31
> > +add    (%r31),%r8,%r8
> > +add    $0x12344433,%r15,%r15
> > +add    $0xfffffffff4332211,%r8,%r8
> > +dec    %r17,%r17
> > +decb   %r17b,%r17b
> > +not    %r17,%r17
> > +notb   %r17b,%r17b
> > +neg    %r17,%r17
> > +negb   %r17b,%r17b
> > +sub    %r15,%r17,%r17
> > +subb   %r15b,%r17b,%r17b
> > +sub    %r15,(%r8),%r15
> > +sub    (%r15,%rax,1),%r16,%r16
> > +sub    $0x1234,%r30,%r30
> > +sbb    %r15,%r17,%r17
> > +sbbb   %r15b,%r17b,%r17b
> > +sbb    %r15,(%r8),%r15
> > +sbb    (%r15,%rax,1),%r16,%r16
> > +sbb    $0x1234,%r30,%r30
> > +adc    %r15,%r17,%r17
> > +adcb   %r15b,%r17b,%r17b
> > +adc    %r15,(%r8),%r15
> > +adc    (%r15,%rax,1),%r16,%r16
> > +adc    $0x1234,%r30,%r30
> > +or     %r15,%r17,%r17
> > +orb    %r15b,%r17b,%r17b
> > +or     %r15,(%r8),%r15
> > +or     (%r15,%rax,1),%r16,%r16
> > +or     $0x1234,%r30,%r30
> > +xor    %r15,%r17,%r17
> > +xorb   %r15b,%r17b,%r17b
> > +xor    %r15,(%r8),%r15
> > +xor    (%r15,%rax,1),%r16,%r16
> > +xor    $0x1234,%r30,%r30
> > +and    %r15,%r17,%r17
> > +andb   %r15b,%r17b,%r17b
> > +and    %r15,(%r8),%r15
> > +and    (%r15,%rax,1),%r16,%r16
> > +and    $0x1234,%r30,%r30
> > +ror    %r31,%r31
> > +rorb   %r31b,%r31b
> > +ror    $0x2,%r12,%r12
> > +rorb   $0x2,%r12b,%r12b
> > +rol    %r31,%r31
> > +rolb   %r31b,%r31b
> > +rol    $0x2,%r12,%r12
> > +rolb   $0x2,%r12b,%r12b
> > +rcr    %r31,%r31
> > +rcrb   %r31b,%r31b
> > +rcr    $0x2,%r12,%r12
> > +rcrb   $0x2,%r12b,%r12b
> > +rcl    %r31,%r31
> > +rclb   %r31b,%r31b
> > +rcl    $0x2,%r12,%r12
> > +rclb   $0x2,%r12b,%r12b
> > +shl    %r31,%r31
> > +shlb   %r31b,%r31b
> > +shl    $0x2,%r12,%r12
> > +shlb   $0x2,%r12b,%r12b
> > +sar    %r31,%r31
> > +sarb   %r31b,%r31b
> > +sar    $0x2,%r12,%r12
> > +sarb   $0x2,%r12b,%r12b
> > +shl    %r31,%r31
> > +shlb   %r31b,%r31b
> > +shl    $0x2,%r12,%r12
> > +shlb   $0x2,%r12b,%r12b
> > +shr    %r31,%r31
> > +shrb   %r31b,%r31b
> > +shr    $0x2,%r12,%r12
> > +shrb   $0x2,%r12b,%r12b
> > +shld   $0x1,%r12,(%rax),%r12
> > +shld   $0x2,%r8,%r12,%r12
> > +shld   %cl,%r9,(%rax),%r9
> > +shld   %cl,%r12,%r16,%r16
> > +shld   %cl,%r13,(%r19,%rax,4),%r13
> 
> What's the difference (in what is being tested) between this and 
the first of
> the %cl tests? Shouldn't one of them rather be of the form "reg1,reg2,reg1"?
> And then shouldn't there be a similar test with an immediate operand? (Same for
> SHRD then, obviously.)
>

Have modified.

> 
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -145,6 +145,8 @@
> >  // The EVEX purpose of StaticRounding appears only together with SAE.
> > Re-use  // the bit to mark commutative VEX encodings where swapping
> > the source  // operands may allow to switch from 3-byte to 2-byte VEX
> encoding.
> > +// And re-use the bit to mark some NDD insns that swapping the source
> > +operands // may allow to switch from 3 operands to 2 operands.
> >  #define C StaticRounding
> 
> The 3-to-2 conversion isn't what we're primarily after (see comments above).
> It's the EVEX->REX2 encoding conversion which we'd like to do.
> 
> > @@ -166,6 +168,10 @@
> >
> >  ### MARKER ###
> >
> > +// Please don't add a NDD insn which may be optimized to a REX2 insn
> > +before the // mov. It may result that a good UB checker object the
> > +behavior // "template->start - 1" at the end of match_template.
> > +
> >  // Move instructions.
> 
> While I mentioned adding a comment here as a minimal solution, did you try to
> think of better approaches, or some enforcement of this restriction (like
> gas_assert() before the expression in question)? You could even go as far as
> simply not trying the optimization when t == i386_optab, with no need to have a
> comment here (the comment would then be next to that part of the condition,
> thus right where it's really relevant). Then anyone misplacing a new template in
> the opcode table would simply observe that an expected optimization doesn't
> happen, and they surely would find the conditional with its comment.
>

If we don't reorder i386.tbl, I will consider to modify t = current_templates->start - 1 to

If (t != current_templates->start)
   t = current_templates->start - 1;
 
>
> For all of the changes below (which are a little hard to review in email), aiui they
> only add C as needed. I once again would prefer if that attribute could be added
> right as the templates are introduced, with the description stating the intention
> and that the actual use of the attribute will be added later (i.e. as expressed
> earlier already for NF).
>

After the changes are finalized, I'll break out this part of the modification that adds the C to lili so she can put it where it belongs.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-09 15:22           ` Jan Beulich
@ 2023-11-10  7:11             ` Cui, Lili
  2023-11-10  9:14               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-10  7:11 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>> Also is this, ...
> >>>>
> >>>>>      {
> >>>>>        unsigned char threebyte;
> >>>>>
> >>>>> -      ins.codep++;
> >>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>> -	goto fetch_error_out;
> >>>>> +      if (!ins.rex2)
> >>>>> +	{
> >>>>> +	  ins.codep++;
> >>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>> +	    goto fetch_error_out;
> >>>>> +	}
> >>>>>        threebyte = *ins.codep;
> >>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>
> >>>> ... all the way to here, really correct for d5 00 0f?
> >>>>
> >>>
> >>> I think the 0f here must indicate that it is the first byte of the
> >>> legacy map1
> >> instruction, meaning legacy map0 does not have 0f opcode. If this
> >> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >> If a bad binary does appear, our original code also has the same issue.
> >>>
> >>> static const struct dis386 dis386[] = { ...
> >>> / * 0f  */
> >>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>
> >> No, this entry simply will never be used, because of how decoding is done.
> >> My comment was about what's going to happen if you encounter the d5
> >> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >> decoding, nor to read another opcode byte. In this case the table
> >> entry you quote above will need to come into play, not any entry from
> >> dis386_twobyte[]. (As long as both are Bad_Opcode the difference may
> >> not even be noticeable, but it would be a latent trap for someone to
> >> fall into down the road.)
> >>
> >
> >
> >   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >     {
> >       unsigned char threebyte;
> >
> >       if (!ins.rex2)
> >         {
> >           ins.codep++;
> >           if (!fetch_code (info, ins.codep + 1))
> >             goto fetch_error_out;                                                      ---> When there
> are no bytes after 0f, it will jump to fetch error, but no error will be reported.
> >         }
> >       threebyte = *ins.codep;
> >       dp = &dis386_twobyte[threebyte];
> >       ins.need_modrm = twobyte_has_modrm[threebyte];
> >       ins.codep++;
> >     }
> >
> > For d5 00 0f
> > Decode to:
> >    0:   d5                      rex2
> >    1:   00 0f                   add    %cl,(%rdi)
> 
> But this would better have d5 00 0f all on the first line (it definitely needs to
> have d5 00 on the same line, as the bytes belong together), as opposed to ...
> 
> > For 40 0f
> > Decode to:
> >    0:   40                      rex
> >    1:   0f                      .byte 0xf
> 
> ... this where there truly is a known missing byte before we could proceed
> further. (It's still a little questionable to print REX separately in this case, but
> that's the way the binutils disassembler has always worked.)
> 
> Yet to restate - to see what I mean, you'd need to populate at least one of the
> two 0f slots in the mentioned arrays. What I'm suspecting from the code as
> this patch version has it is that d5 00 0f would wrongly descend into
> dis386_twobyte[]. Yet you can tell that from it correctly using dis386[] only if
> the two 0f slots of these arrays are meaningfully different (or by actually
> looking at things in e.g.
> a debugger).
> 

I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will find there is no byte there and then go to fetch_error_out and then it will return from print_insn and I don't have a chance to do anything for it. It cannot reach dis386_twobyte[]. 

> >>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
> >>>>> *info,
> >>>> int intel_syntax)
> >>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >>>>>
> >>>>> +  /* Check if the REX2 prefix is used.  */
> >>>>> +  if (ins.last_rex2_prefix >= 0
> >>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>>>> +	   && (ins.rex2 & 0x7))
> >>>>
> >>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
> >>>>
> >>>
> >>> Here's an example of a negative scenario, when ins.rex2 == 1 and
> >> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has
> >> egpr and we don't want to add {rex2} to it.
> >>
> >> Well, that would be dealt with as well by the simpler code I
> >> suggested, wouldn't it?
> >>
> >
> > No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to
> delete them. I don't see any point in it at all.
> 
> Hmm, I guess I'm confused. How would you present unconsumed
> REX2.{R,X,B}{3,4} then?
> 

For rex2.R4X4B4. I can't imagine why they set RXB but didn't consume it at the end. Since rex2 has no content listed, so I think it is not useful to have rex2_used.
For rex2.WR3X3B3, I think it can be ignored in rex2, which doesn't list what's there.

We currently use the following rules to disambiguate.

When the instruction has egpr, it means it has rex2 prefix and we don't print {rex2} for it. When it also has an evex version, we add {evex} to the evex instruction.
When the instruction has no egpr, we need to print {rex2} for it. In case of evex instruction, we need to print {evex} for it.

            adc     $0x7b,%r8b
{rex2} adc     $0x7b,%r8b
            adc     $0x7b,%r16b
{evex} adc     $0x7b,%r16b
{evex} adc     $0x7b,%r8b

> >>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
> >>>>> int
> >>>> reg, unsigned int rexmask,
> >>>>>      ins->illegal_masking = true;
> >>>>>
> >>>>>    USED_REX (rexmask);
> >>>>> +  USED_REX2 (rexmask);
> >>>>
> >>>> Do both really need tracking separately? Whatever consumes REX.B
> >>>> will also consume REX2.B4, an so on.
> >>>>
> >>> I was confused here, I think we only need to print {rex2} for the
> >>> upper 4 bits
> >> == *000, which means egpr is not used and we need to use {rex2} to
> >> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2
> >> & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I
> >> intend to delete them.
> >>>
> >>> +  /* Check if the REX2 prefix is used.  */
> >>> +  if (ins.last_rex2_prefix >= 0
> >>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>> +	   && (ins.rex2 & 0x7))
> >>
> >> But that's the same you had before. I'm afraid I don't see what
> >> you're trying to tell me.
> >>
> > After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ",
> > the code changes to
> >
> >   +  /* Check if the REX2 prefix is used.  */
> >   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))
> >
> > When it is true, decode will not print the {rex2} for this insn.
> 
> Yet ins.rex2 having any of the low 3 bits set says nothing about whether every
> one of these was consumed while processing operands / suffixes.
> You need to consult .rex{,2}_used; my earlier point was merely that you don't
> need a separate .rex2_used; the bits in .rex_used are all you require to get
> this right (as a consumer of, say, REX.X / REX2.X3 is also a consumer of
> REX2.X4; leaving aside EVEX encoded insns for the moment).
> 

Here are some cases:
0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2

It is not helpful to judge rex_used in rex2.

Thanks.
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  7:11             ` Cui, Lili
@ 2023-11-10  9:14               ` Jan Beulich
  2023-11-10  9:21                 ` Jan Beulich
  2023-11-10  9:47                 ` Cui, Lili
  0 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-10  9:14 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 10.11.2023 08:11, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>> Also is this, ...
>>>>>>
>>>>>>>      {
>>>>>>>        unsigned char threebyte;
>>>>>>>
>>>>>>> -      ins.codep++;
>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>> -	goto fetch_error_out;
>>>>>>> +      if (!ins.rex2)
>>>>>>> +	{
>>>>>>> +	  ins.codep++;
>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>> +	    goto fetch_error_out;
>>>>>>> +	}
>>>>>>>        threebyte = *ins.codep;
>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>
>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>
>>>>>
>>>>> I think the 0f here must indicate that it is the first byte of the
>>>>> legacy map1
>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>> If a bad binary does appear, our original code also has the same issue.
>>>>>
>>>>> static const struct dis386 dis386[] = { ...
>>>>> / * 0f  */
>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>
>>>> No, this entry simply will never be used, because of how decoding is done.
>>>> My comment was about what's going to happen if you encounter the d5
>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>> decoding, nor to read another opcode byte. In this case the table
>>>> entry you quote above will need to come into play, not any entry from
>>>> dis386_twobyte[]. (As long as both are Bad_Opcode the difference may
>>>> not even be noticeable, but it would be a latent trap for someone to
>>>> fall into down the road.)
>>>>
>>>
>>>
>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>     {
>>>       unsigned char threebyte;
>>>
>>>       if (!ins.rex2)
>>>         {
>>>           ins.codep++;
>>>           if (!fetch_code (info, ins.codep + 1))
>>>             goto fetch_error_out;                                                      ---> When there
>> are no bytes after 0f, it will jump to fetch error, but no error will be reported.
>>>         }
>>>       threebyte = *ins.codep;
>>>       dp = &dis386_twobyte[threebyte];
>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>       ins.codep++;
>>>     }
>>>
>>> For d5 00 0f
>>> Decode to:
>>>    0:   d5                      rex2
>>>    1:   00 0f                   add    %cl,(%rdi)
>>
>> But this would better have d5 00 0f all on the first line (it definitely needs to
>> have d5 00 on the same line, as the bytes belong together), as opposed to ...
>>
>>> For 40 0f
>>> Decode to:
>>>    0:   40                      rex
>>>    1:   0f                      .byte 0xf
>>
>> ... this where there truly is a known missing byte before we could proceed
>> further. (It's still a little questionable to print REX separately in this case, but
>> that's the way the binutils disassembler has always worked.)
>>
>> Yet to restate - to see what I mean, you'd need to populate at least one of the
>> two 0f slots in the mentioned arrays. What I'm suspecting from the code as
>> this patch version has it is that d5 00 0f would wrongly descend into
>> dis386_twobyte[]. Yet you can tell that from it correctly using dis386[] only if
>> the two 0f slots of these arrays are meaningfully different (or by actually
>> looking at things in e.g.
>> a debugger).
>>
> 
> I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will find there is no byte there and then go to fetch_error_out and then it will return from print_insn and I don't have a chance to do anything for it. It cannot reach dis386_twobyte[]. 

But why would it even try to fetch the next byte? 0f already is the major
opcode byte in this case. Fetching more can only mean either there's an
entry in dis386[] specifying operands, or there's an attempt to index
dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude
that what you say confirms my suspicion that dis386_twobyte[] is
(attempted to be) used here.

>>>>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
>>>>>>> *info,
>>>>>> int intel_syntax)
>>>>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>>>>>
>>>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>>>> +	   && (ins.rex2 & 0x7))
>>>>>>
>>>>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>>>>>
>>>>>
>>>>> Here's an example of a negative scenario, when ins.rex2 == 1 and
>>>> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has
>>>> egpr and we don't want to add {rex2} to it.
>>>>
>>>> Well, that would be dealt with as well by the simpler code I
>>>> suggested, wouldn't it?
>>>>
>>>
>>> No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to
>> delete them. I don't see any point in it at all.
>>
>> Hmm, I guess I'm confused. How would you present unconsumed
>> REX2.{R,X,B}{3,4} then?
>>
> 
> For rex2.R4X4B4. I can't imagine why they set RXB but didn't consume it at the end. Since rex2 has no content listed, so I think it is not useful to have rex2_used.
> For rex2.WR3X3B3, I think it can be ignored in rex2, which doesn't list what's there.
> 
> We currently use the following rules to disambiguate.
> 
> When the instruction has egpr, it means it has rex2 prefix and we don't print {rex2} for it. When it also has an evex version, we add {evex} to the evex instruction.
> When the instruction has no egpr, we need to print {rex2} for it. In case of evex instruction, we need to print {evex} for it.
> 
>             adc     $0x7b,%r8b
> {rex2} adc     $0x7b,%r8b
>             adc     $0x7b,%r16b
> {evex} adc     $0x7b,%r16b
> {evex} adc     $0x7b,%r8b

Hmm, I guess I see where your and my thinking diverges: You view REX2 more
like EVEX, when I view it more like REX. See more on this below.

>>>>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
>>>>>>> int
>>>>>> reg, unsigned int rexmask,
>>>>>>>      ins->illegal_masking = true;
>>>>>>>
>>>>>>>    USED_REX (rexmask);
>>>>>>> +  USED_REX2 (rexmask);
>>>>>>
>>>>>> Do both really need tracking separately? Whatever consumes REX.B
>>>>>> will also consume REX2.B4, an so on.
>>>>>>
>>>>> I was confused here, I think we only need to print {rex2} for the
>>>>> upper 4 bits
>>>> == *000, which means egpr is not used and we need to use {rex2} to
>>>> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2
>>>> & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I
>>>> intend to delete them.
>>>>>
>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>> +	   && (ins.rex2 & 0x7))
>>>>
>>>> But that's the same you had before. I'm afraid I don't see what
>>>> you're trying to tell me.
>>>>
>>> After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ",
>>> the code changes to
>>>
>>>   +  /* Check if the REX2 prefix is used.  */
>>>   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))
>>>
>>> When it is true, decode will not print the {rex2} for this insn.
>>
>> Yet ins.rex2 having any of the low 3 bits set says nothing about whether every
>> one of these was consumed while processing operands / suffixes.
>> You need to consult .rex{,2}_used; my earlier point was merely that you don't
>> need a separate .rex2_used; the bits in .rex_used are all you require to get
>> this right (as a consumer of, say, REX.X / REX2.X3 is also a consumer of
>> REX2.X4; leaving aside EVEX encoded insns for the moment).
>>
> 
> Here are some cases:
> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> 
> It is not helpful to judge rex_used in rex2.

So: REX2, like REX, is a prefix for legacy encodings. Therefore my view
is that it ought to be treated similarly to REX in the disassembler (I'm
fine to avoid the introduction of a myriad of rex2... [no figure braces]
prefixes in gas). Just like in the first line of what you present above
for the REX.B case, I'd expect the same for REX2. E.g., taking the 2nd
line from above

0:   d5 01 0f a8             {rex2.B3} push %gs

An alternative, matching your intention of treating REX2 more like EVEX,
would be to (subsequently, not right away) do away with rex.B and alike
as well, and only print {rex} in that case as well. Such an intention
would then want mentioning in the description (and eventually carrying
out). The main difference between REX/REX2 and VEX/EVEX, as I view it
(and as would be speaking against this alternative approach), is that
in VEX/EVEX it is kind of normal that certain bits are deliberately
ignored (and might be set either way - see gas'es command line options
actually allowing to drive the values for some of such ignored fields).
REX/REX2, otoh, shouldn't normally specify unused bits, much like other
legacy prefixes aren't expected to be present for no reason (and hence
are explicitly printed when present).

This explicit printing has, btw, a purpose beyond merely trying to not
hide information: If there was a related bug in the disassembler, this
extra information may allow the observer to still recognize what insn
(form) is actually being dealt with.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:14               ` Jan Beulich
@ 2023-11-10  9:21                 ` Jan Beulich
  2023-11-10 12:38                   ` Cui, Lili
  2023-12-14 10:13                   ` Cui, Lili
  2023-11-10  9:47                 ` Cui, Lili
  1 sibling, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-10  9:21 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 10.11.2023 10:14, Jan Beulich wrote:
> On 10.11.2023 08:11, Cui, Lili wrote:
>> Here are some cases:
>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
>>
>> It is not helpful to judge rex_used in rex2.
> 
> So: REX2, like REX, is a prefix for legacy encodings. Therefore my view
> is that it ought to be treated similarly to REX in the disassembler (I'm
> fine to avoid the introduction of a myriad of rex2... [no figure braces]
> prefixes in gas). Just like in the first line of what you present above
> for the REX.B case, I'd expect the same for REX2. E.g., taking the 2nd
> line from above
> 
> 0:   d5 01 0f a8             {rex2.B3} push %gs
> 
> An alternative, matching your intention of treating REX2 more like EVEX,
> would be to (subsequently, not right away) do away with rex.B and alike
> as well, and only print {rex} in that case as well. Such an intention
> would then want mentioning in the description (and eventually carrying
> out). The main difference between REX/REX2 and VEX/EVEX, as I view it
> (and as would be speaking against this alternative approach), is that
> in VEX/EVEX it is kind of normal that certain bits are deliberately
> ignored (and might be set either way - see gas'es command line options
> actually allowing to drive the values for some of such ignored fields).
> REX/REX2, otoh, shouldn't normally specify unused bits, much like other
> legacy prefixes aren't expected to be present for no reason (and hence
> are explicitly printed when present).

However, irrespective of what I said above, please feel free to go ahead
with the simplified approach, to allow making progress. I'd like to have
H.J.'s input on how to achieve overall consistency, and we can make
further adjustments later on. But please make sure you actually mention
this aspect in the description of the patch.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:14               ` Jan Beulich
  2023-11-10  9:21                 ` Jan Beulich
@ 2023-11-10  9:47                 ` Cui, Lili
  2023-11-10  9:57                   ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-10  9:47 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 08:11, Cui, Lili wrote:
> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>>>> Also is this, ...
> >>>>>>
> >>>>>>>      {
> >>>>>>>        unsigned char threebyte;
> >>>>>>>
> >>>>>>> -      ins.codep++;
> >>>>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>>>> -	goto fetch_error_out;
> >>>>>>> +      if (!ins.rex2)
> >>>>>>> +	{
> >>>>>>> +	  ins.codep++;
> >>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>>>> +	    goto fetch_error_out;
> >>>>>>> +	}
> >>>>>>>        threebyte = *ins.codep;
> >>>>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>
> >>>>>> ... all the way to here, really correct for d5 00 0f?
> >>>>>>
> >>>>>
> >>>>> I think the 0f here must indicate that it is the first byte of the
> >>>>> legacy map1
> >>>> instruction, meaning legacy map0 does not have 0f opcode. If this
> >>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >>>> If a bad binary does appear, our original code also has the same issue.
> >>>>>
> >>>>> static const struct dis386 dis386[] = { ...
> >>>>> / * 0f  */
> >>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>>>
> >>>> No, this entry simply will never be used, because of how decoding is
> done.
> >>>> My comment was about what's going to happen if you encounter the d5
> >>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >>>> decoding, nor to read another opcode byte. In this case the table
> >>>> entry you quote above will need to come into play, not any entry
> >>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
> >>>> difference may not even be noticeable, but it would be a latent
> >>>> trap for someone to fall into down the road.)
> >>>>
> >>>
> >>>
> >>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>     {
> >>>       unsigned char threebyte;
> >>>
> >>>       if (!ins.rex2)
> >>>         {
> >>>           ins.codep++;
> >>>           if (!fetch_code (info, ins.codep + 1))
> >>>             goto fetch_error_out;                                                      ---> When
> there
> >> are no bytes after 0f, it will jump to fetch error, but no error will be
> reported.
> >>>         }
> >>>       threebyte = *ins.codep;
> >>>       dp = &dis386_twobyte[threebyte];
> >>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>       ins.codep++;
> >>>     }
> >>>
> >>> For d5 00 0f
> >>> Decode to:
> >>>    0:   d5                      rex2
> >>>    1:   00 0f                   add    %cl,(%rdi)
> >>
> >> But this would better have d5 00 0f all on the first line (it
> >> definitely needs to have d5 00 on the same line, as the bytes belong
> together), as opposed to ...
> >>
> >>> For 40 0f
> >>> Decode to:
> >>>    0:   40                      rex
> >>>    1:   0f                      .byte 0xf
> >>
> >> ... this where there truly is a known missing byte before we could
> >> proceed further. (It's still a little questionable to print REX
> >> separately in this case, but that's the way the binutils disassembler
> >> has always worked.)
> >>
> >> Yet to restate - to see what I mean, you'd need to populate at least
> >> one of the two 0f slots in the mentioned arrays. What I'm suspecting
> >> from the code as this patch version has it is that d5 00 0f would
> >> wrongly descend into dis386_twobyte[]. Yet you can tell that from it
> >> correctly using dis386[] only if the two 0f slots of these arrays are
> >> meaningfully different (or by actually looking at things in e.g.
> >> a debugger).
> >>
> >
> > I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will
> find there is no byte there and then go to fetch_error_out and then it will
> return from print_insn and I don't have a chance to do anything for it. It
> cannot reach dis386_twobyte[].
> 
> But why would it even try to fetch the next byte? 0f already is the major
> opcode byte in this case. Fetching more can only mean either there's an entry
> in dis386[] specifying operands, or there's an attempt to index
> dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude that
> what you say confirms my suspicion that dis386_twobyte[] is (attempted to
> be) used here.
> 

I don't know how to identify that 0f is the last byte of the binary, if we can get this information in advance, we can use dis386[] to report bad, in the current case, only when ins.codep++ and fetch code return error, then we can know 0f is the last byte, we should use dis386[] for it, but it has returned. This is what I'm confused about.

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-10  5:43     ` Hu, Lin1
@ 2023-11-10  9:54       ` Jan Beulich
  2023-11-14  2:28         ` Hu, Lin1
  2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
  0 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-10  9:54 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 10.11.2023 06:43, Hu, Lin1 wrote:
>> On 02.11.2023 12:29, Cui, Lili wrote:

Btw, your shrinking of reply context above from here is problematic. Someone
reading just this mail can't tell who ...

>> Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there yet in
>> the patches, but which will also need excluding from this optimization. Obviously
>> this concern then extends to any future ND- encoded insns, which (likely) won't
>> have legacy-encoded (and hence
>> REX2-encodable) counterparts. Are there any plans how to deal with such?
>> (There's a possible approach mentioned further down.)

... originally said this.

> Looking at other current NDD instructions, it should be possible to use evex encoding even if it doesn't have rex2 encoding.

Should be possible - yes. But why would you do such a transformation? That's
not an optimization at all, afaict. And we shouldn't alter what the
programmer wrote if the result isn't in at least some respect deemed better
than the original. Considering this, the helper function may want further
naming differently than already suggested, to e.g. convert_NDD_to_REX2().

>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
>>> +
>>> +      if (i.types[src1].bitfield.class == Reg
>>> +	  && i.op[src1].regs == i.op[dest].regs)
>>> +	readonly_var = src2;
>>
>> As can be seen in the testcase, this also results in ADCX/ADOX to be converted to
>> non-ND EVEX forms, i.e. even when that's not a win at all.
>> We shouldn't change what the user has written when the encoding doesn't
>> actually improve. (Or else, but I'd be hesitant to accept that, at the very least the
>> effect would need pointing out in the description or even a code comment, so
>> that later on it is possible to figure out whether that was intentional or an
>> oversight.)
>>
>> This is where my template ordering remark in reply to patch 5 comes into play:
>> Whether invoking re-parse is okay would further need to depend on whether an
>> alternative (earlier) template actually allows
>> REX2 encoding (same base-opcode could be one of the criteria for how far to
>> look back through earlier templates; an option might also be to put the 3-
>> operand templates first, so that looking backwards wouldn't be necessary in the
>> first place). This would then likely also address one of the forward looking
>> concerns I've raised above.
>>
> 
> Indeed, adcx's legacy insn can't support rex2.
> 
> For my problem, I prefer to re-order templates order, because, I hadn't thought of a way to simply move t to the farthest same base_opcode template for the moment. The following is a tentative scenario: the order will be ndd evex - rex2 - evex.

Yes, this matches my understanding / expectation.

> And I will need a tmp_variable to avoid the insn doesn't match the rex2, let me backtrack the match's result and the value of i.

This, however, I'm not convinced of. I'd rather see this vaguely in line
with 58bceb182740 ("x86: prefer VEX encodings over EVEX ones when
possible"): Do another full matching round with the removed operand,
arranging for "internal error" to be raised in case that fails. Your
approach would, I think, result in silent bad code generation in case
something went wrong. Thing is - you don't even need to advance (or
backtrack) t in that case

>>> +	readonly_var = src1;
>>> +      if (readonly_var != (unsigned int) ~0)
>>> +	{
>>> +	  --i.operands;
>>> +	  --i.reg_operands;
>>> +	  --i.tm.operands;
>>> +
>>> +	  if (readonly_var != src2)
>>> +	    swap_2_operands (readonly_var, src2);
>>
>> May I suggest that just out of precaution the swapping be done before operand
>> counts are decremented? In principle swap_2_operands() could do with having
>> assertions added as to it actually dealing with valid operands. (You'll note that
>> elsewhere, when we add a new operand, we increment first and then swap.)
>>
> 
> Indeed, it's safer, I've exchanged the order of execution, do you have any other comments on the assertions (If I understand correctly, there is a desire for some gcc_assert?), for the time being I can guarantee that the two indexes are definitely in range, is there anything else that needs to be judged?

That was a remark towards possible future changes, independent of your
work here. I merely want to make sure that possibly introducing such an
assertion wouldn't require code changes elsewhere when that can be
easily avoided right away.

>>> @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
>>>  	  i.memshift = memshift;
>>>  	}
>>>
>>> +      /* If we can optimize a NDD insn to non-NDD insn, like
>>> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
>>> +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
>>
>> So you do this optimization at -O1, but not at -O2? Imo the "== 1"
>> simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
>> prefixes need respecting. Quite likely respecting {evex} would eliminate the need
>> for the explicit .has_nf check in the helper function, as I expect .vec_encoding to
>> be set alongside that bit anyway. Further quite likely respecting {evex} here will
>> mean that in patch 3 you need to introduce a new enumerator (e.g.
>> vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
>> setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.
>>
>> As to optimization level: In build_vex_prefix() we leverage C only at -O2 or
>> higher (including -Os). We may want to be consistent in this regard here (i.e. by
>> an extra check in the helper function).
>>
> 
> It's a mistake, I have fixed it. The conditions will be. I will try later, after the NF patch is done, to see if the constraint i.has_nf can be removed or not.
> 
>        /* If we can optimize a NDD insn to non-NDD insn, like
>          add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> -      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
> +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> +         && optimize && optimize_NDD_to_nonNDD (t))
>         {

Regardless of what the final expression is going to be, please keep the
check of "optimize" first, such that the common case of optimization
being disabled will be impacted as little as possible.

>>> +	{
>>> +	  t = current_templates->start - 1;
>>
>> As per a remark further up, this adjustment could be avoided if the ND templates
>> came ahead of the legacy ones. They can't be wrongly used in place of the
>> legacy ones, due to the extra operand they require. Then a comment here would
>> merely point out this ordering aspect. But of course care will then need to be
>> taken to not go past i386_optab[]'s bounds (by having suitably ordered
>> conditionals when looking for whether there is an alternative template in the
>> first place; again see the respective remark further up).
>>
> 
> Yes, if we reorder the template's order, I will remove the line. Only one example of a possible implementation is given here:
> 
>         }
> 
> +      bool have_converted_NDD_to_nonNDD = false;
> +      i386_insn tmp_i;
> +
> +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> +         && optimize && !have_converted_NDD_to_nonNDD
> +         && convert_NDD_to_nonNDD (t))
> +       {
> +         have_converted_NDD_to_nonNDD = true;
> +         tmp_i = i;
> +       }
> +
>        /* We've found a match; break out of loop.  */
>        break;
>      }
> @@ -7787,6 +7802,9 @@ match_template (char mnem_suffix)
>        return NULL;
>      }
> 
> +  if (have_converted_to_nonNDD)
> +    i = tmp_i;
> +
>    if (!quiet_warnings)

I have to admit that I don't understand what the goal is of this playing
with i and tmp_i.

>> For all of the changes below (which are a little hard to review in email), aiui they
>> only add C as needed. I once again would prefer if that attribute could be added
>> right as the templates are introduced, with the description stating the intention
>> and that the actual use of the attribute will be added later (i.e. as expressed
>> earlier already for NF).
> 
> After the changes are finalized, I'll break out this part of the modification that adds the C to lili so she can put it where it belongs.

Hmm, that will need doing early, as the NDD patch is hopefully going to
land soon-ish. Same for the template re-ordering (which will need
explaining when pulled ahead, but it will want pulling ahead to reduce
churn).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:47                 ` Cui, Lili
@ 2023-11-10  9:57                   ` Jan Beulich
  2023-11-10 12:05                     ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-10  9:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 10.11.2023 10:47, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>>>> Also is this, ...
>>>>>>>>
>>>>>>>>>      {
>>>>>>>>>        unsigned char threebyte;
>>>>>>>>>
>>>>>>>>> -      ins.codep++;
>>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>>>> -	goto fetch_error_out;
>>>>>>>>> +      if (!ins.rex2)
>>>>>>>>> +	{
>>>>>>>>> +	  ins.codep++;
>>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>>>> +	    goto fetch_error_out;
>>>>>>>>> +	}
>>>>>>>>>        threebyte = *ins.codep;
>>>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>>
>>>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>>>
>>>>>>>
>>>>>>> I think the 0f here must indicate that it is the first byte of the
>>>>>>> legacy map1
>>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>>>> If a bad binary does appear, our original code also has the same issue.
>>>>>>>
>>>>>>> static const struct dis386 dis386[] = { ...
>>>>>>> / * 0f  */
>>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>>>
>>>>>> No, this entry simply will never be used, because of how decoding is
>> done.
>>>>>> My comment was about what's going to happen if you encounter the d5
>>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>>>> decoding, nor to read another opcode byte. In this case the table
>>>>>> entry you quote above will need to come into play, not any entry
>>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
>>>>>> difference may not even be noticeable, but it would be a latent
>>>>>> trap for someone to fall into down the road.)
>>>>>>
>>>>>
>>>>>
>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>>>     {
>>>>>       unsigned char threebyte;
>>>>>
>>>>>       if (!ins.rex2)
>>>>>         {
>>>>>           ins.codep++;
>>>>>           if (!fetch_code (info, ins.codep + 1))
>>>>>             goto fetch_error_out;                                                      ---> When
>> there
>>>> are no bytes after 0f, it will jump to fetch error, but no error will be
>> reported.
>>>>>         }
>>>>>       threebyte = *ins.codep;
>>>>>       dp = &dis386_twobyte[threebyte];
>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>       ins.codep++;
>>>>>     }
>>>>>
>>>>> For d5 00 0f
>>>>> Decode to:
>>>>>    0:   d5                      rex2
>>>>>    1:   00 0f                   add    %cl,(%rdi)
>>>>
>>>> But this would better have d5 00 0f all on the first line (it
>>>> definitely needs to have d5 00 on the same line, as the bytes belong
>> together), as opposed to ...
>>>>
>>>>> For 40 0f
>>>>> Decode to:
>>>>>    0:   40                      rex
>>>>>    1:   0f                      .byte 0xf
>>>>
>>>> ... this where there truly is a known missing byte before we could
>>>> proceed further. (It's still a little questionable to print REX
>>>> separately in this case, but that's the way the binutils disassembler
>>>> has always worked.)
>>>>
>>>> Yet to restate - to see what I mean, you'd need to populate at least
>>>> one of the two 0f slots in the mentioned arrays. What I'm suspecting
>>>> from the code as this patch version has it is that d5 00 0f would
>>>> wrongly descend into dis386_twobyte[]. Yet you can tell that from it
>>>> correctly using dis386[] only if the two 0f slots of these arrays are
>>>> meaningfully different (or by actually looking at things in e.g.
>>>> a debugger).
>>>>
>>>
>>> I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will
>> find there is no byte there and then go to fetch_error_out and then it will
>> return from print_insn and I don't have a chance to do anything for it. It
>> cannot reach dis386_twobyte[].
>>
>> But why would it even try to fetch the next byte? 0f already is the major
>> opcode byte in this case. Fetching more can only mean either there's an entry
>> in dis386[] specifying operands, or there's an attempt to index
>> dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude that
>> what you say confirms my suspicion that dis386_twobyte[] is (attempted to
>> be) used here.
>>
> 
> I don't know how to identify that 0f is the last byte of the binary,

That's entirely irrelevant here. I gave the byte sequence d5 00 0f just
as the minimal one required to make my point. My original concern equally
applies to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].

Jan

> if we can get this information in advance, we can use dis386[] to report bad, in the current case, only when ins.codep++ and fetch code return error, then we can know 0f is the last byte, we should use dis386[] for it, but it has returned. This is what I'm confused about.
> 
> Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:57                   ` Jan Beulich
@ 2023-11-10 12:05                     ` Cui, Lili
  2023-11-10 12:35                       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-10 12:05 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:57 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:47, Cui, Lili wrote:
> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 10.11.2023 08:11, Cui, Lili wrote:
> >>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>>>
> >>>> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>>>>>> Also is this, ...
> >>>>>>>>
> >>>>>>>>>      {
> >>>>>>>>>        unsigned char threebyte;
> >>>>>>>>>
> >>>>>>>>> -      ins.codep++;
> >>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>>>>>> -	goto fetch_error_out;
> >>>>>>>>> +      if (!ins.rex2)
> >>>>>>>>> +	{
> >>>>>>>>> +	  ins.codep++;
> >>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>>>>>> +	    goto fetch_error_out;
> >>>>>>>>> +	}
> >>>>>>>>>        threebyte = *ins.codep;
> >>>>>>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>>>
> >>>>>>>> ... all the way to here, really correct for d5 00 0f?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I think the 0f here must indicate that it is the first byte of
> >>>>>>> the legacy map1
> >>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
> >>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >>>>>> If a bad binary does appear, our original code also has the same
> issue.
> >>>>>>>
> >>>>>>> static const struct dis386 dis386[] = { ...
> >>>>>>> / * 0f  */
> >>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>>>>>
> >>>>>> No, this entry simply will never be used, because of how decoding
> >>>>>> is
> >> done.
> >>>>>> My comment was about what's going to happen if you encounter the
> >>>>>> d5
> >>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >>>>>> decoding, nor to read another opcode byte. In this case the table
> >>>>>> entry you quote above will need to come into play, not any entry
> >>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
> >>>>>> difference may not even be noticeable, but it would be a latent
> >>>>>> trap for someone to fall into down the road.)
> >>>>>>
> >>>>>
> >>>>>
> >>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>>>     {
> >>>>>       unsigned char threebyte;
> >>>>>
> >>>>>       if (!ins.rex2)
> >>>>>         {
> >>>>>           ins.codep++;
> >>>>>           if (!fetch_code (info, ins.codep + 1))
> >>>>>             goto fetch_error_out;                                                      ---> When
> >> there
> >>>> are no bytes after 0f, it will jump to fetch error, but no error
> >>>> will be
> >> reported.
> >>>>>         }
> >>>>>       threebyte = *ins.codep;
> >>>>>       dp = &dis386_twobyte[threebyte];
> >>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>       ins.codep++;
> >>>>>     }
> >>>>>
> >>>>> For d5 00 0f
> >>>>> Decode to:
> >>>>>    0:   d5                      rex2
> >>>>>    1:   00 0f                   add    %cl,(%rdi)
> >>>>
> >>>> But this would better have d5 00 0f all on the first line (it
> >>>> definitely needs to have d5 00 on the same line, as the bytes
> >>>> belong
> >> together), as opposed to ...
> >>>>
> >>>>> For 40 0f
> >>>>> Decode to:
> >>>>>    0:   40                      rex
> >>>>>    1:   0f                      .byte 0xf
> >>>>
> >>>> ... this where there truly is a known missing byte before we could
> >>>> proceed further. (It's still a little questionable to print REX
> >>>> separately in this case, but that's the way the binutils
> >>>> disassembler has always worked.)
> >>>>
> >>>> Yet to restate - to see what I mean, you'd need to populate at
> >>>> least one of the two 0f slots in the mentioned arrays. What I'm
> >>>> suspecting from the code as this patch version has it is that d5 00
> >>>> 0f would wrongly descend into dis386_twobyte[]. Yet you can tell
> >>>> that from it correctly using dis386[] only if the two 0f slots of
> >>>> these arrays are meaningfully different (or by actually looking at things in
> e.g.
> >>>> a debugger).
> >>>>
> >>>
> >>> I'm confused here, for d5 00 0f when it fetches the next byte after
> >>> 0f it will
> >> find there is no byte there and then go to fetch_error_out and then
> >> it will return from print_insn and I don't have a chance to do
> >> anything for it. It cannot reach dis386_twobyte[].
> >>
> >> But why would it even try to fetch the next byte? 0f already is the
> >> major opcode byte in this case. Fetching more can only mean either
> >> there's an entry in dis386[] specifying operands, or there's an
> >> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
> >> that slot, I conclude that what you say confirms my suspicion that
> >> dis386_twobyte[] is (attempted to
> >> be) used here.
> >>
> >
> > I don't know how to identify that 0f is the last byte of the binary,
> 
> That's entirely irrelevant here. I gave the byte sequence d5 00 0f just as the
> minimal one required to make my point. My original concern equally applies
> to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].
> 
Aha, I  got you. Changed the code to

   /* REX2.M in rex2 prefix represents map0 or map1.  */
-  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
+  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 & REX2_M))

For d5 00 0f c0
0000000000000000 <_start>:
   0:   d5 00 0f                {rex2} (bad)
   3:   c0                      .byte 0xc0

Thanks,
Lili.

> 
> > if we can get this information in advance, we can use dis386[] to report bad,
> in the current case, only when ins.codep++ and fetch code return error, then
> we can know 0f is the last byte, we should use dis386[] for it, but it has
> returned. This is what I'm confused about.
> >
> > Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10 12:05                     ` Cui, Lili
@ 2023-11-10 12:35                       ` Jan Beulich
  2023-11-13  0:18                         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-10 12:35 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 10.11.2023 13:05, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 10, 2023 5:57 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
>> binutils@sourceware.org
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 10.11.2023 10:47, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>>>
>>>>>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>>>>>> Also is this, ...
>>>>>>>>>>
>>>>>>>>>>>      {
>>>>>>>>>>>        unsigned char threebyte;
>>>>>>>>>>>
>>>>>>>>>>> -      ins.codep++;
>>>>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>>>>>> -	goto fetch_error_out;
>>>>>>>>>>> +      if (!ins.rex2)
>>>>>>>>>>> +	{
>>>>>>>>>>> +	  ins.codep++;
>>>>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>>>>>> +	    goto fetch_error_out;
>>>>>>>>>>> +	}
>>>>>>>>>>>        threebyte = *ins.codep;
>>>>>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>>>>
>>>>>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the 0f here must indicate that it is the first byte of
>>>>>>>>> the legacy map1
>>>>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>>>>>> If a bad binary does appear, our original code also has the same
>> issue.
>>>>>>>>>
>>>>>>>>> static const struct dis386 dis386[] = { ...
>>>>>>>>> / * 0f  */
>>>>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>>>>>
>>>>>>>> No, this entry simply will never be used, because of how decoding
>>>>>>>> is
>>>> done.
>>>>>>>> My comment was about what's going to happen if you encounter the
>>>>>>>> d5
>>>>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>>>>>> decoding, nor to read another opcode byte. In this case the table
>>>>>>>> entry you quote above will need to come into play, not any entry
>>>>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
>>>>>>>> difference may not even be noticeable, but it would be a latent
>>>>>>>> trap for someone to fall into down the road.)
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>>>>>     {
>>>>>>>       unsigned char threebyte;
>>>>>>>
>>>>>>>       if (!ins.rex2)
>>>>>>>         {
>>>>>>>           ins.codep++;
>>>>>>>           if (!fetch_code (info, ins.codep + 1))
>>>>>>>             goto fetch_error_out;                                                      ---> When
>>>> there
>>>>>> are no bytes after 0f, it will jump to fetch error, but no error
>>>>>> will be
>>>> reported.
>>>>>>>         }
>>>>>>>       threebyte = *ins.codep;
>>>>>>>       dp = &dis386_twobyte[threebyte];
>>>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>       ins.codep++;
>>>>>>>     }
>>>>>>>
>>>>>>> For d5 00 0f
>>>>>>> Decode to:
>>>>>>>    0:   d5                      rex2
>>>>>>>    1:   00 0f                   add    %cl,(%rdi)
>>>>>>
>>>>>> But this would better have d5 00 0f all on the first line (it
>>>>>> definitely needs to have d5 00 on the same line, as the bytes
>>>>>> belong
>>>> together), as opposed to ...
>>>>>>
>>>>>>> For 40 0f
>>>>>>> Decode to:
>>>>>>>    0:   40                      rex
>>>>>>>    1:   0f                      .byte 0xf
>>>>>>
>>>>>> ... this where there truly is a known missing byte before we could
>>>>>> proceed further. (It's still a little questionable to print REX
>>>>>> separately in this case, but that's the way the binutils
>>>>>> disassembler has always worked.)
>>>>>>
>>>>>> Yet to restate - to see what I mean, you'd need to populate at
>>>>>> least one of the two 0f slots in the mentioned arrays. What I'm
>>>>>> suspecting from the code as this patch version has it is that d5 00
>>>>>> 0f would wrongly descend into dis386_twobyte[]. Yet you can tell
>>>>>> that from it correctly using dis386[] only if the two 0f slots of
>>>>>> these arrays are meaningfully different (or by actually looking at things in
>> e.g.
>>>>>> a debugger).
>>>>>>
>>>>>
>>>>> I'm confused here, for d5 00 0f when it fetches the next byte after
>>>>> 0f it will
>>>> find there is no byte there and then go to fetch_error_out and then
>>>> it will return from print_insn and I don't have a chance to do
>>>> anything for it. It cannot reach dis386_twobyte[].
>>>>
>>>> But why would it even try to fetch the next byte? 0f already is the
>>>> major opcode byte in this case. Fetching more can only mean either
>>>> there's an entry in dis386[] specifying operands, or there's an
>>>> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
>>>> that slot, I conclude that what you say confirms my suspicion that
>>>> dis386_twobyte[] is (attempted to
>>>> be) used here.
>>>>
>>>
>>> I don't know how to identify that 0f is the last byte of the binary,
>>
>> That's entirely irrelevant here. I gave the byte sequence d5 00 0f just as the
>> minimal one required to make my point. My original concern equally applies
>> to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].
>>
> Aha, I  got you. Changed the code to
> 
>    /* REX2.M in rex2 prefix represents map0 or map1.  */
> -  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> +  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 & REX2_M))

Would you mind considering

  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))

as an alternative?

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:21                 ` Jan Beulich
@ 2023-11-10 12:38                   ` Cui, Lili
  2023-12-14 10:13                   ` Cui, Lili
  1 sibling, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-10 12:38 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:22 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:14, Jan Beulich wrote:
> > On 10.11.2023 08:11, Cui, Lili wrote:
> >> Here are some cases:
> >> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> will print it.
> >> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it
> uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>
> >> It is not helpful to judge rex_used in rex2.
> >
> > So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> > view is that it ought to be treated similarly to REX in the
> > disassembler (I'm fine to avoid the introduction of a myriad of
> > rex2... [no figure braces] prefixes in gas). Just like in the first
> > line of what you present above for the REX.B case, I'd expect the same
> > for REX2. E.g., taking the 2nd line from above
> >
> > 0:   d5 01 0f a8             {rex2.B3} push %gs
> >
> > An alternative, matching your intention of treating REX2 more like
> > EVEX, would be to (subsequently, not right away) do away with rex.B
> > and alike as well, and only print {rex} in that case as well. Such an
> > intention would then want mentioning in the description (and
> > eventually carrying out). The main difference between REX/REX2 and
> > VEX/EVEX, as I view it (and as would be speaking against this
> > alternative approach), is that in VEX/EVEX it is kind of normal that
> > certain bits are deliberately ignored (and might be set either way -
> > see gas'es command line options actually allowing to drive the values for
> some of such ignored fields).
> > REX/REX2, otoh, shouldn't normally specify unused bits, much like
> > other legacy prefixes aren't expected to be present for no reason (and
> > hence are explicitly printed when present).
> 
> However, irrespective of what I said above, please feel free to go ahead with
> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> on how to achieve overall consistency, and we can make further adjustments
> later on. But please make sure you actually mention this aspect in the
> description of the patch.
> 

OK, I'll add a description of the current implementation.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10 12:35                       ` Jan Beulich
@ 2023-11-13  0:18                         ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-13  0:18 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> >>>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>>>>>     {
> >>>>>>>       unsigned char threebyte;
> >>>>>>>
> >>>>>>>       if (!ins.rex2)
> >>>>>>>         {
> >>>>>>>           ins.codep++;
> >>>>>>>           if (!fetch_code (info, ins.codep + 1))
> >>>>>>>             goto fetch_error_out;                                                      ---> When
> >>>> there
> >>>>>> are no bytes after 0f, it will jump to fetch error, but no error
> >>>>>> will be
> >>>> reported.
> >>>>>>>         }
> >>>>>>>       threebyte = *ins.codep;
> >>>>>>>       dp = &dis386_twobyte[threebyte];
> >>>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>>       ins.codep++;
> >>>>>>>     }
> >>>>>>>
> >>>>>>> For d5 00 0f
> >>>>>>> Decode to:
> >>>>>>>    0:   d5                      rex2
> >>>>>>>    1:   00 0f                   add    %cl,(%rdi)
> >>>>>>
> >>>>>> But this would better have d5 00 0f all on the first line (it
> >>>>>> definitely needs to have d5 00 on the same line, as the bytes
> >>>>>> belong
> >>>> together), as opposed to ...
> >>>>>>
> >>>>>>> For 40 0f
> >>>>>>> Decode to:
> >>>>>>>    0:   40                      rex
> >>>>>>>    1:   0f                      .byte 0xf
> >>>>>>
> >>>>>> ... this where there truly is a known missing byte before we
> >>>>>> could proceed further. (It's still a little questionable to print
> >>>>>> REX separately in this case, but that's the way the binutils
> >>>>>> disassembler has always worked.)
> >>>>>>
> >>>>>> Yet to restate - to see what I mean, you'd need to populate at
> >>>>>> least one of the two 0f slots in the mentioned arrays. What I'm
> >>>>>> suspecting from the code as this patch version has it is that d5
> >>>>>> 00 0f would wrongly descend into dis386_twobyte[]. Yet you can
> >>>>>> tell that from it correctly using dis386[] only if the two 0f
> >>>>>> slots of these arrays are meaningfully different (or by actually
> >>>>>> looking at things in
> >> e.g.
> >>>>>> a debugger).
> >>>>>>
> >>>>>
> >>>>> I'm confused here, for d5 00 0f when it fetches the next byte
> >>>>> after 0f it will
> >>>> find there is no byte there and then go to fetch_error_out and then
> >>>> it will return from print_insn and I don't have a chance to do
> >>>> anything for it. It cannot reach dis386_twobyte[].
> >>>>
> >>>> But why would it even try to fetch the next byte? 0f already is the
> >>>> major opcode byte in this case. Fetching more can only mean either
> >>>> there's an entry in dis386[] specifying operands, or there's an
> >>>> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
> >>>> that slot, I conclude that what you say confirms my suspicion that
> >>>> dis386_twobyte[] is (attempted to
> >>>> be) used here.
> >>>>
> >>>
> >>> I don't know how to identify that 0f is the last byte of the binary,
> >>
> >> That's entirely irrelevant here. I gave the byte sequence d5 00 0f
> >> just as the minimal one required to make my point. My original
> >> concern equally applies to e.g. d5 00 0f 01 00, which may not use
> dis386_twobyte[0x01].
> >>
> > Aha, I  got you. Changed the code to
> >
> >    /* REX2.M in rex2 prefix represents map0 or map1.  */
> > -  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> > +  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 &
> > + REX2_M))
> 
> Would you mind considering
> 
>   if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
> 
> as an alternative?
> 

Done, thanks!

Lili

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-10  9:54       ` Jan Beulich
@ 2023-11-14  2:28         ` Hu, Lin1
  2023-11-14 10:50           ` Jan Beulich
  2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
  1 sibling, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-14  2:28 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:54 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Cui, Lili <lili.cui@intel.com>
> Subject: Re: [PATCH 7/8] Support APX NDD optimized encoding.
> 
> On 10.11.2023 06:43, Hu, Lin1 wrote:
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> 
> Btw, your shrinking of reply context above from here is problematic. Someone
> reading just this mail can't tell who ...
> 
> >> Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there
> >> yet in the patches, but which will also need excluding from this
> >> optimization. Obviously this concern then extends to any future ND-
> >> encoded insns, which (likely) won't have legacy-encoded (and hence
> >> REX2-encodable) counterparts. Are there any plans how to deal with such?
> >> (There's a possible approach mentioned further down.)
> 
> ... originally said this.
>

Thanks for your advises.

> 
> > Looking at other current NDD instructions, it should be possible to use evex
> encoding even if it doesn't have rex2 encoding.
> 
> Should be possible - yes. But why would you do such a transformation? That's
> not an optimization at all, afaict. And we shouldn't alter what the programmer
> wrote if the result isn't in at least some respect deemed better than the original.
> Considering this, the helper function may want further naming differently than
> already suggested, to e.g. convert_NDD_to_REX2().
>

Indeed.
 
>
> >>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> >>> +
> >>> +      if (i.types[src1].bitfield.class == Reg
> >>> +	  && i.op[src1].regs == i.op[dest].regs)
> >>> +	readonly_var = src2;
> >>
> >> As can be seen in the testcase, this also results in ADCX/ADOX to be
> >> converted to non-ND EVEX forms, i.e. even when that's not a win at all.
> >> We shouldn't change what the user has written when the encoding
> >> doesn't actually improve. (Or else, but I'd be hesitant to accept
> >> that, at the very least the effect would need pointing out in the
> >> description or even a code comment, so that later on it is possible
> >> to figure out whether that was intentional or an
> >> oversight.)
> >>
> >> This is where my template ordering remark in reply to patch 5 comes into play:
> >> Whether invoking re-parse is okay would further need to depend on
> >> whether an alternative (earlier) template actually allows
> >> REX2 encoding (same base-opcode could be one of the criteria for how
> >> far to look back through earlier templates; an option might also be
> >> to put the 3- operand templates first, so that looking backwards
> >> wouldn't be necessary in the first place). This would then likely
> >> also address one of the forward looking concerns I've raised above.
> >>
> >
> > Indeed, adcx's legacy insn can't support rex2.
> >
> > For my problem, I prefer to re-order templates order, because, I hadn't
> thought of a way to simply move t to the farthest same base_opcode template
> for the moment. The following is a tentative scenario: the order will be ndd evex
> - rex2 - evex.
> 
> Yes, this matches my understanding / expectation.
> 
> > And I will need a tmp_variable to avoid the insn doesn't match the rex2, let me
> backtrack the match's result and the value of i.
> 
> This, however, I'm not convinced of. I'd rather see this vaguely in line with
> 58bceb182740 ("x86: prefer VEX encodings over EVEX ones when
> possible"): Do another full matching round with the removed operand, arranging
> for "internal error" to be raised in case that fails. Your approach would, I think,
> result in silent bad code generation in case something went wrong. Thing is - you
> don't even need to advance (or
> backtrack) t in that case
>

I tried to reorder the templates and modify the code as follows:

@ -7728,6 +7765,40 @@ match_template (char mnem_suffix)
          i.memshift = memshift;
        }

+      /* If we can optimize a NDD insn to non-NDD insn, like
+        add %r16, %r8, %r8 -> add %r16, %r8,
+        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
+        Note that the semantics have not been changed.  */
+      if (optimize
+         && !i.no_optimize
+         && i.vec_encoding != vex_encoding_evex
+         && t + 1 < current_templates->end
+         && !t[1].opcode_modifier.evex)
+       {
+         unsigned int readonly_var = convert_NDD_to_REX2 (t);
+         if (readonly_var != ~0)
+           {
+             if (!check_EgprOperands (t + 1))
+               {
+                 specific_error = progress (internal_error);
+                 continue;
+               }
+             ++i.operands;
+             ++i.reg_operands;
+             ++i.tm.operands;
+
+             if (readonly_var == 1)
+               swap_2_operands (0, 1);
+           }
+       }

convert_NDD_to_REX2 return readonly_var now. check_EgprOperands aims to exclude some insns like adcx and adox. Because their opcode_space is legacy-map2 can't support rex2.

And I need some modifications in tc-i386.c after reorder i386-opc.tbl.

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 7a86aff1828..d98950c7dfd 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -14401,7 +14401,9 @@ static bool check_register (const reg_entry *r)

   if (r->reg_flags & RegRex2)
     {
-      if (is_evex_encoding (current_templates->start))
+      if (is_evex_encoding (current_templates->start)
+         && ((current_templates->start + 1 >= current_templates->end)
+             || (is_evex_encoding (current_templates->start + 1))))
        i.vec_encoding = vex_encoding_evex;

       if (!cpu_arch_flags.bitfield.cpuapx_f

What's your opinion?

> 
> >>> @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
> >>>  	  i.memshift = memshift;
> >>>  	}
> >>>
> >>> +      /* If we can optimize a NDD insn to non-NDD insn, like
> >>> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> >>> +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
> >>
> >> So you do this optimization at -O1, but not at -O2? Imo the "== 1"
> >> simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
> >> prefixes need respecting. Quite likely respecting {evex} would
> >> eliminate the need for the explicit .has_nf check in the helper
> >> function, as I expect .vec_encoding to be set alongside that bit
> >> anyway. Further quite likely respecting {evex} here will mean that in patch 3
> you need to introduce a new enumerator (e.g.
> >> vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
> >> setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.
> >>
> >> As to optimization level: In build_vex_prefix() we leverage C only at
> >> -O2 or higher (including -Os). We may want to be consistent in this
> >> regard here (i.e. by an extra check in the helper function).
> >>
> >
> > It's a mistake, I have fixed it. The conditions will be. I will try later, after the NF
> patch is done, to see if the constraint i.has_nf can be removed or not.
> >
> >        /* If we can optimize a NDD insn to non-NDD insn, like
> >          add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> > -      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
> > +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> > +         && optimize && optimize_NDD_to_nonNDD (t))
> >         {
> 
> Regardless of what the final expression is going to be, please keep the check of
> "optimize" first, such that the common case of optimization being disabled will
> be impacted as little as possible.
>

Okay.
 
>
> >>> +	{
> >>> +	  t = current_templates->start - 1;
> >>
> >> As per a remark further up, this adjustment could be avoided if the
> >> ND templates came ahead of the legacy ones. They can't be wrongly
> >> used in place of the legacy ones, due to the extra operand they
> >> require. Then a comment here would merely point out this ordering
> >> aspect. But of course care will then need to be taken to not go past
> >> i386_optab[]'s bounds (by having suitably ordered conditionals when
> >> looking for whether there is an alternative template in the first place; again
> see the respective remark further up).
> >>
> >
> > Yes, if we reorder the template's order, I will remove the line. Only one
> example of a possible implementation is given here:
> >
> >         }
> >
> > +      bool have_converted_NDD_to_nonNDD = false;
> > +      i386_insn tmp_i;
> > +
> > +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> > +         && optimize && !have_converted_NDD_to_nonNDD
> > +         && convert_NDD_to_nonNDD (t))
> > +       {
> > +         have_converted_NDD_to_nonNDD = true;
> > +         tmp_i = i;
> > +       }
> > +
> >        /* We've found a match; break out of loop.  */
> >        break;
> >      }
> > @@ -7787,6 +7802,9 @@ match_template (char mnem_suffix)
> >        return NULL;
> >      }
> >
> > +  if (have_converted_to_nonNDD)
> > +    i = tmp_i;
> > +
> >    if (!quiet_warnings)
> 
> I have to admit that I don't understand what the goal is of this playing with i and
> tmp_i.
>

This variable has been removed in the new version, so there's no need to think about it.

> 
> >> For all of the changes below (which are a little hard to review in
> >> email), aiui they only add C as needed. I once again would prefer if
> >> that attribute could be added right as the templates are introduced,
> >> with the description stating the intention and that the actual use of
> >> the attribute will be added later (i.e. as expressed earlier already for NF).
> >
> > After the changes are finalized, I'll break out this part of the modification that
> adds the C to lili so she can put it where it belongs.
> 
> Hmm, that will need doing early, as the NDD patch is hopefully going to land
> soon-ish. Same for the template re-ordering (which will need explaining when
> pulled ahead, but it will want pulling ahead to reduce churn).
> 

OK. These changes, if there are no major problems, I'll give a patch (This patch will be attached to this email for double-check.) to lili, and write in the changelog the reason why we made these changes.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 1/2] Reorder APX insns in i386.tbl
  2023-11-10  9:54       ` Jan Beulich
  2023-11-14  2:28         ` Hu, Lin1
@ 2023-11-14  2:58         ` Hu, Lin1
  2023-11-14 11:20           ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-14  2:58 UTC (permalink / raw)
  To: binutils; +Cc: JBeulich, hongjiu.lu

---
 gas/config/tc-i386.c |     4 +-
 opcodes/i386-opc.tbl |   156 +-
 5 files changed, 13189 insertions(+), 10771 deletions(-)

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 7a86aff1828..d98950c7dfd 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -14401,7 +14401,9 @@ static bool check_register (const reg_entry *r)
 
   if (r->reg_flags & RegRex2)
     {
-      if (is_evex_encoding (current_templates->start))
+      if (is_evex_encoding (current_templates->start)
+	  && ((current_templates->start + 1 >= current_templates->end)
+	      || (is_evex_encoding (current_templates->start + 1))))
 	i.vec_encoding = vex_encoding_evex;
 
       if (!cpu_arch_flags.bitfield.cpuapx_f
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 370bf82c13d..7e77db1205c 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -145,6 +145,8 @@
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
+// And re-use the bit to mark some NDD insns that swapping the source operands
+// may allow to switch from EVEX encoding to REX2 encoding.
 #define C StaticRounding
 
 #define FP 387|287|8087
@@ -291,40 +293,40 @@ std, 0xfd, 0, NoSuf, {}
 sti, 0xfb, 0, NoSuf, {}
 
 // Arithmetic.
+add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
-add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
+add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
-inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
+inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
 sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
-sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
-sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
-dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
-sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -335,50 +337,50 @@ test, 0x84, 0, D|W|C|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, R
 test, 0xa8, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 test, 0xf6/0, 0, W|Modrm|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
-and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-and, 0x20, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
-or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-or, 0x8, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
-xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-xor, 0x30, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 // clr with 1 operand is really xor with 2 operands.
 clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
 
+adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
-adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
-adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
-neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 not, 0xf6/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 aaa, 0x37, No64, NoSuf, {}
 aas, 0x3f, No64, NoSuf, {}
@@ -411,8 +413,8 @@ cqto, 0x99, x64, Size64|NoSuf, {}
 // These multiplies can only be selected with single operand forms.
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
-imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 // imul with 2 operands mimics imul with 3 by putting the register in
@@ -426,99 +428,99 @@ div, 0xf6/6, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|
 idiv, 0xf6/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 idiv, 0xf6/7, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 
-rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xc0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd2/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xd2/3, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd2/3, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xc0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd2/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xc0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd2/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
-shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
-shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
-shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0x24, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 
-shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
-shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
-shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0x2c, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 
 // Control transfer instructions.
 call, 0xe8, No64, JumpDword|DefaultSize|No_bSuf|No_sSuf|No_qSuf|BNDPrefixOk, { Disp16|Disp32 }
@@ -1023,8 +1025,8 @@ ud2b, 0xfb9, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U
 // 3rd official undefined instr (older CPUs don't take a ModR/M byte)
 ud0, 0xfff, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
-cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 cmov<cc>, 0x4<cc:opc>, CMOV|APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
+cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 fcmovb, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
 fcmovnae, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
@@ -2124,12 +2126,12 @@ xcryptofb, 0xf30fa7e8, PadLock, NoSuf|RepPrefixOk, {}
 xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 
 // Multy-precision Add Carry, rdseed instructions.
+adcx, 0x6666, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+adox, 0xf366, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-09 12:59   ` Jan Beulich
@ 2023-11-14  3:26     ` Hu, Lin1
  2023-11-14 11:15       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-14  3:26 UTC (permalink / raw)
  To: Beulich, Jan, Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

 > On 02.11.2023 12:29, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -7790,7 +7790,8 @@ match_template (char mnem_suffix)
> >    if (!quiet_warnings)
> >      {
> >        if (!intel_syntax
> > -	  && (i.jumpabsolute != (t->opcode_modifier.jump ==
> JUMP_ABSOLUTE)))
> > +	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE))
> > +	  && t->mnem_off != MN_jmpabs)
> >  	as_warn (_("indirect %s without `*'"), insn_name (t));
> 
> Coming back to this, which I did comment on before already. The insn taking an
> immediate operand doesn't really justify this, as it leaves open the underlying
> question of why you use JumpAbsolute in the insn template in the first place. I've
> gone through all the uses of JUMP_ABSOLUTE, and I didn't find any where the
> respective handling would be applicable here. In fact it's unclear whether the
> insn needs marking as any JUMP_* at all: It's not really different from, say, "mov
> $imm, %rcx".
>

According to the doc JMPABS is a 64-bit only ISA extension, and acts as a near-direct branch with an absolute target. I made this markup simply because I was mimicking other jmp's. If we don't need the attribute, I'm glad I can remove it.

> 
> There's a further question regarding its operand representation,
> though: Can you explain why it's Imm64, not Disp64? The latter would, to me,
> seem more natural to use here. Not only from a assembler internals perspective,
> but also from the users' one: The $ in the operand carries absolutely no meaning
> (see also the related testcase comment below) in AT&T syntax, and there's no
> noticeable difference in Intel syntax afaict.
>

In my opinion, If compiler want  to jump "anywhere" and the displacement can not fit in a 32-bit immediate , compiler will fallback to indirect branches. My current knowledge is that jmpabs came about as a solution to the problem about indirect braches. It's not the same as the jmp. Currently the parameters of jmpabs are absolute addresses optimized by PLT or JIT. I think using imm64 avoids confusion with the disp64. That's why the designer designed it this way.

One colleague in our group have written an introductory document can be referred. (https://kanrobert.github.io/rfc/All-about-APX-JMPABS/)

>
> > @@ -8939,6 +8940,9 @@ process_operands (void)
> >  	}
> >      }
> >
> > +  if (i.tm.mnem_off == MN_jmpabs)
> > +    i.rex2_encoding = true;
> 
> Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do here is effect
> the latter. Yet as indicated, the pseudo-prefix isn't really an indication of "must
> have REX2 prefix", but only a weak request to do so if possible. I think you want
> to set i.rex2 here instead, requiring a way to express that an empty REX2 prefix
> is wanted.
>

But in terms of encoding, i.rex2 should be 0. Can I do special handling in build_rex2_prefix? 

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
> > @@ -0,0 +1,17 @@
> > +# Check bytecode of APX_F jmpabs instructions with illegal encode.
> > +
> > +	.text
> > +# With 66 prefix
> > +	.byte
> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +# With 67 prefix
> > +	.byte
> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +# With F2 prefix
> > +	.byte
> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +# With F3 prefix
> > +	.byte
> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> > +# REX2.M0 = 0 REX2.W = 1
> > +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> 
> As per earlier comments: This wants expressing via .insn, to yield input to gas
> human-readable (even if, as it looks, two .insn are going to be required per
> resulting construct). Further in the last comment, why is
> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve the
> 0x64 bytes here? The encodings are invalid irrespective of them. Instead I'd kind
> have expected LOCK to also be covered.
>

Because this error line is only for the special case where M0 == 0, and base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1, base_opcode == 0xa1, I think it could decoding as mov rax, moffs or ( some future insn). Elsewhere it's just excluding invalid prefixes. I don't see in the docs that it triggers #UD, am I missing something?

> 
> Also a spec question as we're talking of what is or is not valid (i.e.
> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's no use
> of eGPR-s here.
>

Sorry, what is XCR0.APX?
 
>
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
> > @@ -0,0 +1,10 @@
> > +# Check 64bit APX_F JMPABS instructions
> > +
> > +	.text
> > + _start:
> > +	jmpabs	      $0x0202020202020202
> > +	jmpabs	      $0x2
> > +
> > +.intel_syntax noprefix
> > +	jmpabs	      0x0202020202020202
> > +	jmpabs	      0x2
> 
> I expect this isn't going to be the normal use of the insn. Instead I would foresee
> the typical users to be "jmpabs symbol" (and - as per above - intentionally
> omitting the $ already). IOW the testcase also wants to cover the case requiring
> a relocation, including a check that the correct relocation is emitted (covering
> both ELF and COFF; I'm not going to insist on also covering Mach-O, as - for a
> reason that escapes me - gas can't even be configured for x86_64-*-darwin*).
>

Based on the previous discussion, we think that this usage is not currently supported. If users want to use symbol, I think they can use "jmp symbol".

> 
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
> > static bool DistinctDest_Fixup (instr_info *, int, int);  static bool
> > PREFETCHI_Fixup (instr_info *, int, int);  static bool
> > PUSH2_POP2_Fixup (instr_info *, int, int);
> > +static bool JMPABS_Fixup (instr_info *, int, int);
> >
> >  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
> >  						enum disassembler_style,
> > @@ -258,6 +259,9 @@ struct instr_info
> >    char scale_char;
> >
> >    enum x86_64_isa isa64;
> > +
> > +  /* Remember if the current op is jmpabs.  */  bool is_jmpabs;
> >  };
> 
> This field would probably best live next to op_is_jump (and then also be named
> op_is_jmpabs, assuming a separate boolean is indeed needed).
> I further expect that op_is_jump also wants setting for JMPABS.
>

Can I change op_is_jump's type from bool to unsigned int?
 
>
> > @@ -2032,7 +2036,7 @@ static const struct dis386 dis386[] = {
> >    { "lahf",		{ XX }, 0 },
> >    /* a0 */
> >    { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
> > -  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
> > +  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup,
> v_mode } }, PREFIX_REX2_ILLEGAL },
> >    { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
> >    { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
> >    { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
> > @@ -9648,7 +9652,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int
> intel_syntax)
> >      }
> >
> >    if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> > -      && ins.last_rex2_prefix >= 0)
> > +      && ins.last_rex2_prefix >= 0 && !ins.is_jmpabs)
> >      {
> >        i386_dis_printf (info, dis_style_text, "(bad)");
> >        ret = ins.end_codep - priv.the_buffer; @@ -13857,3 +13861,38 @@
> > PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
> >
> >    return OP_VEX (ins, bytemode, sizeflag);  }
> > +
> > +static bool
> > +JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag) {
> > +  if (ins->address_mode == mode_64bit
> > +      && ins->last_rex2_prefix >= 0
> > +      && (ins->rex2 & 0x80) == 0x0)
> > +    {
> > +      uint64_t op;
> > +
> > +      if (bytemode == eAX_reg)
> > +	return true;
> > +
> > +      if (!get64 (ins, &op))
> > +	return false;
> > +
> > +      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR)) != 0x0
> > +	  || (ins->rex & REX_W) != 0x0)
> > +	{
> > +	  oappend (ins, "(bad)");
> > +	  return true;
> > +	}
> > +
> > +      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
> > +      ins->all_prefixes[ins->last_rex2_prefix] = 0;
> 
> This doesn't look right. REX2.{R,X,B}{3,4} set still want recording in the output. I
> expect you may need to set a bit in rex2_used here, but how exactly that ought
> to work depends on how comments on earlier patches are going to be
> addressed. This may then also eliminate the need for ...
> 
> > +      ins->is_jmpabs = true;
> 
> ... this field, which likely will be covered by a more generic approach.
> 
>

Then this part of the discussion, as well as the modifications, I will wait for the front patch to be finalized.

>
> > +      oappend_immediate (ins, op);
> > +
> > +      return true;
> > +    }
> > +
> > +  if (bytemode == eAX_reg)
> > +    return OP_IMREG (ins, bytemode, sizeflag);  return OP_OFF64 (ins,
> > + v_mode, sizeflag);
> 
> v_mode is, afaics, properly passed into here. Why would you open-code that,
> instead of using bytemode? Not doing so will give the compiler more ICF
> opportunities.
> 

I reckon it's a mistake. Have fixed.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-14  2:28         ` Hu, Lin1
@ 2023-11-14 10:50           ` Jan Beulich
  2023-11-15  2:52             ` Hu, Lin1
  2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
  0 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-14 10:50 UTC (permalink / raw)
  To: Hu, Lin1, Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 14.11.2023 03:28, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>> On 10.11.2023 06:43, Hu, Lin1 wrote:
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
>>>>> +
>>>>> +      if (i.types[src1].bitfield.class == Reg
>>>>> +	  && i.op[src1].regs == i.op[dest].regs)
>>>>> +	readonly_var = src2;
>>>>
>>>> As can be seen in the testcase, this also results in ADCX/ADOX to be
>>>> converted to non-ND EVEX forms, i.e. even when that's not a win at all.
>>>> We shouldn't change what the user has written when the encoding
>>>> doesn't actually improve. (Or else, but I'd be hesitant to accept
>>>> that, at the very least the effect would need pointing out in the
>>>> description or even a code comment, so that later on it is possible
>>>> to figure out whether that was intentional or an
>>>> oversight.)
>>>>
>>>> This is where my template ordering remark in reply to patch 5 comes into play:
>>>> Whether invoking re-parse is okay would further need to depend on
>>>> whether an alternative (earlier) template actually allows
>>>> REX2 encoding (same base-opcode could be one of the criteria for how
>>>> far to look back through earlier templates; an option might also be
>>>> to put the 3- operand templates first, so that looking backwards
>>>> wouldn't be necessary in the first place). This would then likely
>>>> also address one of the forward looking concerns I've raised above.
>>>>
>>>
>>> Indeed, adcx's legacy insn can't support rex2.
>>>
>>> For my problem, I prefer to re-order templates order, because, I hadn't
>> thought of a way to simply move t to the farthest same base_opcode template
>> for the moment. The following is a tentative scenario: the order will be ndd evex
>> - rex2 - evex.
>>
>> Yes, this matches my understanding / expectation.
>>
>>> And I will need a tmp_variable to avoid the insn doesn't match the rex2, let me
>> backtrack the match's result and the value of i.
>>
>> This, however, I'm not convinced of. I'd rather see this vaguely in line with
>> 58bceb182740 ("x86: prefer VEX encodings over EVEX ones when
>> possible"): Do another full matching round with the removed operand, arranging
>> for "internal error" to be raised in case that fails. Your approach would, I think,
>> result in silent bad code generation in case something went wrong. Thing is - you
>> don't even need to advance (or
>> backtrack) t in that case
>>
> 
> I tried to reorder the templates and modify the code as follows:
> 
> @ -7728,6 +7765,40 @@ match_template (char mnem_suffix)
>           i.memshift = memshift;
>         }
> 
> +      /* If we can optimize a NDD insn to non-NDD insn, like
> +        add %r16, %r8, %r8 -> add %r16, %r8,
> +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> +        Note that the semantics have not been changed.  */
> +      if (optimize
> +         && !i.no_optimize
> +         && i.vec_encoding != vex_encoding_evex
> +         && t + 1 < current_templates->end
> +         && !t[1].opcode_modifier.evex)
> +       {
> +         unsigned int readonly_var = convert_NDD_to_REX2 (t);
> +         if (readonly_var != ~0)
> +           {
> +             if (!check_EgprOperands (t + 1))
> +               {
> +                 specific_error = progress (internal_error);
> +                 continue;
> +               }
> +             ++i.operands;
> +             ++i.reg_operands;

DYM decrement rather than increment for these? We're trying to go from
3 to 2 operands, after all.

> +             ++i.tm.operands;

Why is this? Aren't we ahead of filling i.tm here?

> +
> +             if (readonly_var == 1)
> +               swap_2_operands (0, 1);
> +           }
> +       }
> 
> convert_NDD_to_REX2 return readonly_var now. check_EgprOperands aims to exclude some insns like adcx and adox. Because their opcode_space is legacy-map2 can't support rex2.

Good. Looking forward to seeing the full change.

> And I need some modifications in tc-i386.c after reorder i386-opc.tbl.
> 
> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
> index 7a86aff1828..d98950c7dfd 100644
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -14401,7 +14401,9 @@ static bool check_register (const reg_entry *r)
> 
>    if (r->reg_flags & RegRex2)
>      {
> -      if (is_evex_encoding (current_templates->start))
> +      if (is_evex_encoding (current_templates->start)
> +         && ((current_templates->start + 1 >= current_templates->end)
> +             || (is_evex_encoding (current_templates->start + 1))))
>         i.vec_encoding = vex_encoding_evex;
> 
>        if (!cpu_arch_flags.bitfield.cpuapx_f
> 
> What's your opinion?

See my comments to Lili on already the original code (which you further
modify) here. There cannot be a dependency on current_templates here,
imo. Lili - the fact Lin needs the modification above actually looks to
support my view on this.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-14  3:26     ` Hu, Lin1
@ 2023-11-14 11:15       ` Jan Beulich
  2023-11-24  5:40         ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-14 11:15 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 14.11.2023 04:26, Hu, Lin1 wrote:
>  > On 02.11.2023 12:29, Cui, Lili wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -7790,7 +7790,8 @@ match_template (char mnem_suffix)
>>>    if (!quiet_warnings)
>>>      {
>>>        if (!intel_syntax
>>> -	  && (i.jumpabsolute != (t->opcode_modifier.jump ==
>> JUMP_ABSOLUTE)))
>>> +	  && (i.jumpabsolute != (t->opcode_modifier.jump == JUMP_ABSOLUTE))
>>> +	  && t->mnem_off != MN_jmpabs)
>>>  	as_warn (_("indirect %s without `*'"), insn_name (t));
>>
>> Coming back to this, which I did comment on before already. The insn taking an
>> immediate operand doesn't really justify this, as it leaves open the underlying
>> question of why you use JumpAbsolute in the insn template in the first place. I've
>> gone through all the uses of JUMP_ABSOLUTE, and I didn't find any where the
>> respective handling would be applicable here. In fact it's unclear whether the
>> insn needs marking as any JUMP_* at all: It's not really different from, say, "mov
>> $imm, %rcx".
>>
> 
> According to the doc JMPABS is a 64-bit only ISA extension, and acts as a near-direct branch with an absolute target. I made this markup simply because I was mimicking other jmp's.

"absolute target" can have different meaning. There's nothing wrong with the
linker establishing the ultimate value; it may well not be an assembly-time
constant. In terms of ELF relocations it simply wouldn't be R_X86_64_PC64
(resembling R_X86_64_PC32 used for other direct branches), but R_X86_64_64.

> If we don't need the attribute, I'm glad I can remove it.

Please do, unless you can explain why you add it.

>> There's a further question regarding its operand representation,
>> though: Can you explain why it's Imm64, not Disp64? The latter would, to me,
>> seem more natural to use here. Not only from a assembler internals perspective,
>> but also from the users' one: The $ in the operand carries absolutely no meaning
>> (see also the related testcase comment below) in AT&T syntax, and there's no
>> noticeable difference in Intel syntax afaict.
> 
> In my opinion, If compiler want  to jump "anywhere" and the displacement can not fit in a 32-bit immediate , compiler will fallback to indirect branches.

Unless it knows that it may use this ISA extension.

> My current knowledge is that jmpabs came about as a solution to the problem about indirect braches. It's not the same as the jmp. Currently the parameters of jmpabs are absolute addresses optimized by PLT or JIT. I think using imm64 avoids confusion with the disp64. That's why the designer designed it this way.
> 
> One colleague in our group have written an introductory document can be referred. (https://kanrobert.github.io/rfc/All-about-APX-JMPABS/)

Well, "Compiler usages" there ignores other than small-code-model programs.
Furthermore "none currently" doesn't mean the assembler shouldn't support
all possible uses. If I was going from what's said there, there wouldn't be
a need to support JMPABS in gas at all.

>>> @@ -8939,6 +8940,9 @@ process_operands (void)
>>>  	}
>>>      }
>>>
>>> +  if (i.tm.mnem_off == MN_jmpabs)
>>> +    i.rex2_encoding = true;
>>
>> Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do here is effect
>> the latter. Yet as indicated, the pseudo-prefix isn't really an indication of "must
>> have REX2 prefix", but only a weak request to do so if possible. I think you want
>> to set i.rex2 here instead, requiring a way to express that an empty REX2 prefix
>> is wanted.
>>
> 
> But in terms of encoding, i.rex2 should be 0. Can I do special handling in build_rex2_prefix? 

build_rex2_prefix() wants to remain generic. What I was trying to hint at though
is that it ought to be possible to set bits in i.rex2 (to make it non-zero) which
then aren't encoded into the REX2 payload byte (leveraging that only the low
three bits are actually contributing to the final encoding). The important point
is that both i.rex2 and i.rex2_encoding retain the specific meaning they are
intended to have.

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
>>> @@ -0,0 +1,17 @@
>>> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
>>> +
>>> +	.text
>>> +# With 66 prefix
>>> +	.byte
>> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +# With 67 prefix
>>> +	.byte
>> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +# With F2 prefix
>>> +	.byte
>> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +# With F3 prefix
>>> +	.byte
>> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>> +# REX2.M0 = 0 REX2.W = 1
>>> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>
>> As per earlier comments: This wants expressing via .insn, to yield input to gas
>> human-readable (even if, as it looks, two .insn are going to be required per
>> resulting construct). Further in the last comment, why is
>> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve the
>> 0x64 bytes here? The encodings are invalid irrespective of them. Instead I'd kind
>> have expected LOCK to also be covered.
>>
> 
> Because this error line is only for the special case where M0 == 0, and base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1, base_opcode == 0xa1, I think it could decoding as mov rax, moffs or ( some future insn). Elsewhere it's just excluding invalid prefixes.

Yet REX2.M == 0 is as relevant there (until such time where some of those
prefixes used is assigned meaning).

> I don't see in the docs that it triggers #UD, am I missing something?

What's "it" here?

>> Also a spec question as we're talking of what is or is not valid (i.e.
>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's no use
>> of eGPR-s here.
>>
> 
> Sorry, what is XCR0.APX?

Bit 19 of the XCR0 register. It is mentioned in exactly this way in the
APX-LEGACY-JMPABS exception class description.

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
>>> @@ -0,0 +1,10 @@
>>> +# Check 64bit APX_F JMPABS instructions
>>> +
>>> +	.text
>>> + _start:
>>> +	jmpabs	      $0x0202020202020202
>>> +	jmpabs	      $0x2
>>> +
>>> +.intel_syntax noprefix
>>> +	jmpabs	      0x0202020202020202
>>> +	jmpabs	      0x2
>>
>> I expect this isn't going to be the normal use of the insn. Instead I would foresee
>> the typical users to be "jmpabs symbol" (and - as per above - intentionally
>> omitting the $ already). IOW the testcase also wants to cover the case requiring
>> a relocation, including a check that the correct relocation is emitted (covering
>> both ELF and COFF; I'm not going to insist on also covering Mach-O, as - for a
>> reason that escapes me - gas can't even be configured for x86_64-*-darwin*).
>>
> 
> Based on the previous discussion, we think that this usage is not currently supported. If users want to use symbol, I think they can use "jmp symbol".

See my respective comments above, the summary of which was "Then why add it to
gas in the first place?"

>>> --- a/opcodes/i386-dis.c
>>> +++ b/opcodes/i386-dis.c
>>> @@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
>>> static bool DistinctDest_Fixup (instr_info *, int, int);  static bool
>>> PREFETCHI_Fixup (instr_info *, int, int);  static bool
>>> PUSH2_POP2_Fixup (instr_info *, int, int);
>>> +static bool JMPABS_Fixup (instr_info *, int, int);
>>>
>>>  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
>>>  						enum disassembler_style,
>>> @@ -258,6 +259,9 @@ struct instr_info
>>>    char scale_char;
>>>
>>>    enum x86_64_isa isa64;
>>> +
>>> +  /* Remember if the current op is jmpabs.  */  bool is_jmpabs;
>>>  };
>>
>> This field would probably best live next to op_is_jump (and then also be named
>> op_is_jmpabs, assuming a separate boolean is indeed needed).
>> I further expect that op_is_jump also wants setting for JMPABS.
>>
> 
> Can I change op_is_jump's type from bool to unsigned int?

If you need it to hold a 3rd value, perhaps. Albeit more to an enum then
than to unsigned int.

>>> @@ -2032,7 +2036,7 @@ static const struct dis386 dis386[] = {
>>>    { "lahf",		{ XX }, 0 },
>>>    /* a0 */
>>>    { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
>>> -  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
>>> +  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup,
>> v_mode } }, PREFIX_REX2_ILLEGAL },
>>>    { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
>>>    { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
>>>    { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
>>> @@ -9648,7 +9652,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int
>> intel_syntax)
>>>      }
>>>
>>>    if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
>>> -      && ins.last_rex2_prefix >= 0)
>>> +      && ins.last_rex2_prefix >= 0 && !ins.is_jmpabs)
>>>      {
>>>        i386_dis_printf (info, dis_style_text, "(bad)");
>>>        ret = ins.end_codep - priv.the_buffer; @@ -13857,3 +13861,38 @@
>>> PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
>>>
>>>    return OP_VEX (ins, bytemode, sizeflag);  }
>>> +
>>> +static bool
>>> +JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag) {
>>> +  if (ins->address_mode == mode_64bit
>>> +      && ins->last_rex2_prefix >= 0
>>> +      && (ins->rex2 & 0x80) == 0x0)
>>> +    {
>>> +      uint64_t op;
>>> +
>>> +      if (bytemode == eAX_reg)
>>> +	return true;
>>> +
>>> +      if (!get64 (ins, &op))
>>> +	return false;
>>> +
>>> +      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR)) != 0x0
>>> +	  || (ins->rex & REX_W) != 0x0)
>>> +	{
>>> +	  oappend (ins, "(bad)");
>>> +	  return true;
>>> +	}
>>> +
>>> +      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
>>> +      ins->all_prefixes[ins->last_rex2_prefix] = 0;
>>
>> This doesn't look right. REX2.{R,X,B}{3,4} set still want recording in the output. I
>> expect you may need to set a bit in rex2_used here, but how exactly that ought
>> to work depends on how comments on earlier patches are going to be
>> addressed. This may then also eliminate the need for ...
>>
>>> +      ins->is_jmpabs = true;
>>
>> ... this field, which likely will be covered by a more generic approach.
>>
>>
> 
> Then this part of the discussion, as well as the modifications, I will wait for the front patch to be finalized.

I suggested to Lili to go with the simplified approach for now, printing
just {rex2} but no details. Yet such minimal printing may still be needed
here then as well, consistent with what is (going to be) done in the
earlier patch. ("Consistent" as in: If nothing would be printed there,
printing nothing would be the goal here as well then.)

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/2] Reorder APX insns in i386.tbl
  2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
@ 2023-11-14 11:20           ` Jan Beulich
  2023-11-15  1:49             ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-14 11:20 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: hongjiu.lu, binutils

On 14.11.2023 03:58, Hu, Lin1 wrote:
> ---
>  gas/config/tc-i386.c |     4 +-
>  opcodes/i386-opc.tbl |   156 +-
>  5 files changed, 13189 insertions(+), 10771 deletions(-)

What was the goal of sending this patch to the list, without any further
comments or explanations? It quite clearly doesn't apply to the present
code base.

The diffstat is pretty odd, too: There clearly aren't as many files/lines
changed.

> @@ -2124,12 +2126,12 @@ xcryptofb, 0xf30fa7e8, PadLock, NoSuf|RepPrefixOk, {}
>  xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
>  
>  // Multy-precision Add Carry, rdseed instructions.
> +adcx, 0x6666, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>  adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>  adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> +adox, 0xf366, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>  adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>  adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }

What is this part about? I thought we agreed that ADCX/ADOX aren't suitable
for NDD->REX2 optimization, at which point the ordering of templates here
could as well be left alone.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/2] Reorder APX insns in i386.tbl
  2023-11-14 11:20           ` Jan Beulich
@ 2023-11-15  1:49             ` Hu, Lin1
  2023-11-15  8:52               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-15  1:49 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, November 14, 2023 7:21 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH 1/2] Reorder APX insns in i386.tbl
> 
> On 14.11.2023 03:58, Hu, Lin1 wrote:
> > ---
> >  gas/config/tc-i386.c |     4 +-
> >  opcodes/i386-opc.tbl |   156 +-
> >  5 files changed, 13189 insertions(+), 10771 deletions(-)
> 
> What was the goal of sending this patch to the list, without any further
> comments or explanations? It quite clearly doesn't apply to the present code
> base.
> 
> The diffstat is pretty odd, too: There clearly aren't as many files/lines changed.

This patch is just to show you the changes I would need to make if I reorder .tbl, the big difference in lines is because I removed the changes to mnem.h, init.h, etc. in the email.

> 
> > @@ -2124,12 +2126,12 @@ xcryptofb, 0xf30fa7e8, PadLock,
> > NoSuf|RepPrefixOk, {}  xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
> >
> >  // Multy-precision Add Carry, rdseed instructions.
> > +adcx, 0x6666, ADX|APX_F,
> >
> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVV
> VDe
> > +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> > +Reg32|Reg64, Reg32|Reg64 }
> >  adcx, 0x660f38f6, ADX,
> > Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
> > Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adcx, 0x6666,
> > ADX|APX_F,
> >
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|E
> Vex
> > Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adcx,
> > 0x6666, ADX|APX_F,
> >
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVD
> est|
> > EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64,
> > Reg32|Reg64 }
> > +adox, 0xf366, ADX|APX_F,
> >
> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVV
> VDe
> > +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> > +Reg32|Reg64, Reg32|Reg64 }
> >  adox, 0xf30f38f6, ADX,
> > Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
> > Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adox, 0xf366,
> > ADX|APX_F,
> >
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|E
> Vex
> > Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adox,
> > 0xf366, ADX|APX_F,
> >
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVD
> est|
> > EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64,
> > Reg32|Reg64 }
> 
> What is this part about? I thought we agreed that ADCX/ADOX aren't suitable for
> NDD->REX2 optimization, at which point the ordering of templates here could as
> well be left alone.
>
      62 54 bd 18 66 c7       adcx   %r15,%r8,%r8
      66 4d 0f  38 f6  c7       adcx   %r15,%r8
      
The code can optimize adcx/adox from NDD to legacy, currently. Do you mean we don't consider the optimization If the code length remain the same?

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-14 10:50           ` Jan Beulich
@ 2023-11-15  2:52             ` Hu, Lin1
  2023-11-15  8:57               ` Jan Beulich
  2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
  1 sibling, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-15  2:52 UTC (permalink / raw)
  To: Beulich, Jan, Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, November 14, 2023 6:51 PM
> To: Hu, Lin1 <lin1.hu@intel.com>; Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 7/8] Support APX NDD optimized encoding.
> 
> On 14.11.2023 03:28, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >>
> >> On 10.11.2023 06:43, Hu, Lin1 wrote:
> >>>> On 02.11.2023 12:29, Cui, Lili wrote:
> >>>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> >>>>> +
> >>>>> +      if (i.types[src1].bitfield.class == Reg
> >>>>> +	  && i.op[src1].regs == i.op[dest].regs)
> >>>>> +	readonly_var = src2;
> >>>>
> >>>> As can be seen in the testcase, this also results in ADCX/ADOX to
> >>>> be converted to non-ND EVEX forms, i.e. even when that's not a win at all.
> >>>> We shouldn't change what the user has written when the encoding
> >>>> doesn't actually improve. (Or else, but I'd be hesitant to accept
> >>>> that, at the very least the effect would need pointing out in the
> >>>> description or even a code comment, so that later on it is possible
> >>>> to figure out whether that was intentional or an
> >>>> oversight.)
> >>>>
> >>>> This is where my template ordering remark in reply to patch 5 comes into
> play:
> >>>> Whether invoking re-parse is okay would further need to depend on
> >>>> whether an alternative (earlier) template actually allows
> >>>> REX2 encoding (same base-opcode could be one of the criteria for
> >>>> how far to look back through earlier templates; an option might
> >>>> also be to put the 3- operand templates first, so that looking
> >>>> backwards wouldn't be necessary in the first place). This would
> >>>> then likely also address one of the forward looking concerns I've raised
> above.
> >>>>
> >>>
> >>> Indeed, adcx's legacy insn can't support rex2.
> >>>
> >>> For my problem, I prefer to re-order templates order, because, I
> >>> hadn't
> >> thought of a way to simply move t to the farthest same base_opcode
> >> template for the moment. The following is a tentative scenario: the
> >> order will be ndd evex
> >> - rex2 - evex.
> >>
> >> Yes, this matches my understanding / expectation.
> >>
> >>> And I will need a tmp_variable to avoid the insn doesn't match the
> >>> rex2, let me
> >> backtrack the match's result and the value of i.
> >>
> >> This, however, I'm not convinced of. I'd rather see this vaguely in
> >> line with
> >> 58bceb182740 ("x86: prefer VEX encodings over EVEX ones when
> >> possible"): Do another full matching round with the removed operand,
> >> arranging for "internal error" to be raised in case that fails. Your
> >> approach would, I think, result in silent bad code generation in case
> >> something went wrong. Thing is - you don't even need to advance (or
> >> backtrack) t in that case
> >>
> >
> > I tried to reorder the templates and modify the code as follows:
> >
> > @ -7728,6 +7765,40 @@ match_template (char mnem_suffix)
> >           i.memshift = memshift;
> >         }
> >
> > +      /* If we can optimize a NDD insn to non-NDD insn, like
> > +        add %r16, %r8, %r8 -> add %r16, %r8,
> > +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +        Note that the semantics have not been changed.  */
> > +      if (optimize
> > +         && !i.no_optimize
> > +         && i.vec_encoding != vex_encoding_evex
> > +         && t + 1 < current_templates->end
> > +         && !t[1].opcode_modifier.evex)
> > +       {
> > +         unsigned int readonly_var = convert_NDD_to_REX2 (t);
> > +         if (readonly_var != ~0)
> > +           {
> > +             if (!check_EgprOperands (t + 1))
> > +               {
> > +                 specific_error = progress (internal_error);
> > +                 continue;
> > +               }
> > +             ++i.operands;
> > +             ++i.reg_operands;
> 
> DYM decrement rather than increment for these? We're trying to go from
> 3 to 2 operands, after all.
>

Here's a backtrace to considering for possible other opcode_space (0f38,...) instructions that can't accept the r16+ register, but can accept other rex registers or the normal. I decrement i.operands and i.reg_operands in convert_NDD_to_REX2. If the legacy or rex version of the insn can't support rex2 registers, I won't optimize it. So I need to increment these.

> 
> > +             ++i.tm.operands;
> 
> Why is this? Aren't we ahead of filling i.tm here?
>

Indeed. I removed it.

> 
> > +
> > +             if (readonly_var == 1)
> > +               swap_2_operands (0, 1);
> > +           }
> > +       }
> >
> > convert_NDD_to_REX2 return readonly_var now. check_EgprOperands aims
> to exclude some insns like adcx and adox. Because their opcode_space is legacy-
> map2 can't support rex2.
> 
> Good. Looking forward to seeing the full change.
> 

For some insns like adcx and adox, I'd like to add some details. check_EgprOperands only used to exclude some situation that these insns with gpr32 registers. If we think about optimization in terms of encoding length. Is it safe to assume that some insn with prefixes 66, f2, f3 and their opcode_space isn't legacy-map0 or legacy-map1 won't reduce the length of the code even if they are optimized? If yes, I think the code can be simplified like:

       /* If we can optimize a NDD insn to non-NDD insn, like
          add %r16, %r8, %r8 -> add %r16, %r8,
           add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
           Note that the semantics have not been changed.  */
        if (optimize
            && !i.no_optimize
            && i.vec_encoding != vex_encoding_evex
            && t + 1 < current_templates->end
            && !t[1].opcode_modifier.evex
            && convert_NDD_to_REX2 (t))
          {
            specific_error = progress (internal_error);
            continue;
          }  

For those instructions that don't need to be optimized, like adcx and adox we just don't swap the order, so we don't need check_EgprOperands and backtrack, and convert_NDD_to_REX2 has the same return value as before.

PS. So shouldn't the name of the function be convert_NDD_to_legacy.

>
> > And I need some modifications in tc-i386.c after reorder i386-opc.tbl.
> >
> > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index
> > 7a86aff1828..d98950c7dfd 100644
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -14401,7 +14401,9 @@ static bool check_register (const reg_entry
> > *r)
> >
> >    if (r->reg_flags & RegRex2)
> >      {
> > -      if (is_evex_encoding (current_templates->start))
> > +      if (is_evex_encoding (current_templates->start)
> > +         && ((current_templates->start + 1 >= current_templates->end)
> > +             || (is_evex_encoding (current_templates->start + 1))))
> >         i.vec_encoding = vex_encoding_evex;
> >
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >
> > What's your opinion?
> 
> See my comments to Lili on already the original code (which you further
> modify) here. There cannot be a dependency on current_templates here, imo.
> Lili - the fact Lin needs the modification above actually looks to support my view
> on this.

This part of the code won't be in the NDD optimize patch but rather should be in the NDD patch. So I'd like to skip that part of the discussion for now until I get a new base branch.

While I still have questions about the previous discussion, I will then send out the current patch.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-14 10:50           ` Jan Beulich
  2023-11-15  2:52             ` Hu, Lin1
@ 2023-11-15  2:59             ` Hu, Lin1
  2023-11-15  9:34               ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-15  2:59 UTC (permalink / raw)
  To: binutils; +Cc: JBeulich, hongjiu.lu

Hi,

This patch is based on a possible version of what we discussed earlier, which
was developed based on the fact that I had reorder the i386.tbl.

And it only optimize some insns in opcode_space legacy-map0 or legacy-map1.
Like adcx/adox, I don't reorder them as you expectation, so I can use the condition
!t[1].opcode_modifier.evex to exclude them.

The other version in the previous email was made to optimize adcx and adox for
legacy.

BRs,
Lin


This patch aims to optimize:

add %r16, %r15, %r15 -> add %r16, %r15

gas/ChangeLog:

	* config/tc-i386.c (optimize_NDD_to_nonNDD): New function.
	(match_template): If we can optimzie APX NDD insns, so rematch
	template.
	* testsuite/gas/i386/x86-64.exp: Add test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto.

opcodes/ChangeLog:

	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-tbl.h: Ditto.
	* i386-opc.tbl: Add C to some instructions for support
	optimization.
---
 gas/config/tc-i386.c                          |  52 ++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 126 ++++++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 119 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 4 files changed, 298 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d98950c7dfd..6d6fc65383e 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
   return 0;
 }
 
+/* Optimize APX NDD insns to legacy insns.  */
+static bool
+convert_NDD_to_REX2 (const insn_template *t)
+{
+  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
+      && t->opcode_space == SPACE_EVEXMAP4
+      && !i.has_nf
+      && i.reg_operands >= 2)
+    {
+      unsigned int readonly_var = ~0;
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = i.operands - 2;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+	  && i.op[src1].regs == i.op[dest].regs)
+	readonly_var = src2;
+      /* adcx, adox and imul can't support to swap the source operands.  */
+      else if (i.types[src2].bitfield.class == Reg
+	       && i.op[src2].regs == i.op[dest].regs
+	       && optimize > 1
+	       && t->opcode_modifier.commutative)
+	readonly_var = src1;
+      if (readonly_var != (unsigned int) ~0)
+	{
+	  if (readonly_var != src2)
+	    swap_2_operands (readonly_var, src2);
+
+	  --i.operands;
+	  --i.reg_operands;
+
+	  return true;
+	}
+    }
+  return false;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7728,6 +7765,21 @@ match_template (char mnem_suffix)
 	  i.memshift = memshift;
 	}
 
+      /* If we can optimize a NDD insn to non-NDD insn, like
+	 add %r16, %r8, %r8 -> add %r16, %r8,
+	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.  
+	 Note that the semantics have not been changed.  */
+      if (optimize
+	  && !i.no_optimize
+	  && i.vec_encoding != vex_encoding_evex
+	  && t + 1 < current_templates->end
+	  && !t[1].opcode_modifier.evex
+	  && convert_NDD_to_REX2 (t))
+	{
+	  specific_error = progress (internal_error);
+	  continue;
+	}
+
       /* We've found a match; break out of loop.  */
       break;
     }
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
new file mode 100644
index 00000000000..d13daed38b5
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
@@ -0,0 +1,126 @@
+#as: -Os
+#objdump: -drw
+#name: x86-64 APX NDD optimized encoding
+#source: x86-64-apx-ndd-optimize.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 19 ff c7          	inc    %r31
+\s*[a-f0-9]+:\s*d5 11 fe c7          	inc    %r31b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 45 00 f8          	add    %r31b,%r8b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 1d 03 c7          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 4d 03 38          	add    \(%r8\),%r31
+\s*[a-f0-9]+:\s*d5 1d 03 07          	add    \(%r31\),%r8
+\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 	add    \$0x12344433,%r15
+\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%r8
+\s*[a-f0-9]+:\s*d5 18 ff c9          	dec    %r17
+\s*[a-f0-9]+:\s*d5 10 fe c9          	dec    %r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d1          	not    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d1          	not    %r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d9          	neg    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d9          	neg    %r17b
+\s*[a-f0-9]+:\s*d5 1c 29 f9          	sub    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 28 f9          	sub    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 29 38    	sub    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 2b 04 07       	sub    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 	sub    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 19 f9          	sbb    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 18 f9          	sbb    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 19 38    	sbb    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 1b 04 07       	sbb    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 	sbb    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 11 f9          	adc    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 10 f9          	adc    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 13 38             	adc    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 13 04 07       	adc    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 	adc    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 09 f9          	or     %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 08 f9          	or     %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 0b 38             	or     \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 0b 04 07       	or     \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 	or     \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 31 f9          	xor    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 30 f9          	xor    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 33 38             	xor    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 33 04 07       	xor    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 	xor    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 21 f9          	and    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 20 f9          	and    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 23 38             	and    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 23 04 07       	and    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 	and    \$0x1234,%r30d
+\s*[a-f0-9]+:\s*d5 19 d1 cf          	ror    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 cf          	ror    %r31b
+\s*[a-f0-9]+:\s*49 c1 cc 02          	ror    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 cc 02          	ror    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 c7          	rol    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 c7          	rol    %r31b
+\s*[a-f0-9]+:\s*49 c1 c4 02          	rol    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 c4 02          	rol    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 df          	rcr    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 df          	rcr    %r31b
+\s*[a-f0-9]+:\s*49 c1 dc 02          	rcr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 dc 02          	rcr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 d7          	rcl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 d7          	rcl    %r31b
+\s*[a-f0-9]+:\s*49 c1 d4 02          	rcl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 d4 02          	rcl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    %r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ff          	sar    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 ff          	sar    %r31b
+\s*[a-f0-9]+:\s*49 c1 fc 02          	sar    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 fc 02          	sar    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    %r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ef          	shr    %r31
+\s*[a-f0-9]+:\s*d5 11 d0 ef          	shr    %r31b
+\s*[a-f0-9]+:\s*49 c1 ec 02          	shr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 ec 02          	shr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f a4 c4 02       	shld   \$0x2,%r8,%r12
+\s*[a-f0-9]+:\s*62 54 bc 18 24 c4 02 	shld   \$0x2,%r8,%r12,%r8
+\s*[a-f0-9]+:\s*62 74 b4 18 a5 08    	shld   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c a5 e0          	shld   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 a5 e0    	shld   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f ac ec 01       	shrd   \$0x1,%r13,%r12
+\s*[a-f0-9]+:\s*62 54 94 18 2c ec 01 	shrd   \$0x1,%r13,%r12,%r13
+\s*[a-f0-9]+:\s*62 74 b4 18 ad 08    	shrd   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c ad e0          	shrd   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 ad e0    	shrd   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*62 54 bd 18 66 c7    	adcx   %r15,%r8,%r8
+\s*[a-f0-9]+:\s*62 14 b9 18 66 04 3f 	adcx   \(%r15,%r31,1\),%r8,%r8
+\s*[a-f0-9]+:\s*62 54 bd 18 66 c8    	adcx   %r8,%r9,%r8
+\s*[a-f0-9]+:\s*62 54 be 18 66 c7    	adox   %r15,%r8,%r8
+\s*[a-f0-9]+:\s*62 14 ba 18 66 04 3f 	adox   \(%r15,%r31,1\),%r8,%r8
+\s*[a-f0-9]+:\s*62 54 be 18 66 c8    	adox   %r8,%r9,%r8
+\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx
+\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx
+\s*[a-f0-9]+:\s*48 0f af d0          	imul   %rax,%rdx
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
new file mode 100644
index 00000000000..80c39059143
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
@@ -0,0 +1,119 @@
+# Check 64bit APX NDD instructions with optimized encoding
+
+	.text
+_start:
+inc    %r31,%r31
+incb   %r31b,%r31b
+add    %r31,%r8,%r8
+addb   %r31b,%r8b,%r8b
+{store} add    %r31,%r8,%r8
+{load}  add    %r31,%r8,%r8
+add    %r31,(%r8),%r31
+add    (%r31),%r8,%r8
+add    $0x12344433,%r15,%r15
+add    $0xfffffffff4332211,%r8,%r8
+dec    %r17,%r17
+decb   %r17b,%r17b
+not    %r17,%r17
+notb   %r17b,%r17b
+neg    %r17,%r17
+negb   %r17b,%r17b
+sub    %r15,%r17,%r17
+subb   %r15b,%r17b,%r17b
+sub    %r15,(%r8),%r15
+sub    (%r15,%rax,1),%r16,%r16
+sub    $0x1234,%r30,%r30
+sbb    %r15,%r17,%r17
+sbbb   %r15b,%r17b,%r17b
+sbb    %r15,(%r8),%r15
+sbb    (%r15,%rax,1),%r16,%r16
+sbb    $0x1234,%r30,%r30
+adc    %r15,%r17,%r17
+adcb   %r15b,%r17b,%r17b
+adc    %r15,(%r8),%r15
+adc    (%r15,%rax,1),%r16,%r16
+adc    $0x1234,%r30,%r30
+or     %r15,%r17,%r17
+orb    %r15b,%r17b,%r17b
+or     %r15,(%r8),%r15
+or     (%r15,%rax,1),%r16,%r16
+or     $0x1234,%r30,%r30
+xor    %r15,%r17,%r17
+xorb   %r15b,%r17b,%r17b
+xor    %r15,(%r8),%r15
+xor    (%r15,%rax,1),%r16,%r16
+xor    $0x1234,%r30,%r30
+and    %r15,%r17,%r17
+andb   %r15b,%r17b,%r17b
+and    %r15,(%r8),%r15
+and    (%r15,%rax,1),%r16,%r16
+and    $0x1234,%r30,%r30
+ror    %r31,%r31
+rorb   %r31b,%r31b
+ror    $0x2,%r12,%r12
+rorb   $0x2,%r12b,%r12b
+rol    %r31,%r31
+rolb   %r31b,%r31b
+rol    $0x2,%r12,%r12
+rolb   $0x2,%r12b,%r12b
+rcr    %r31,%r31
+rcrb   %r31b,%r31b
+rcr    $0x2,%r12,%r12
+rcrb   $0x2,%r12b,%r12b
+rcl    %r31,%r31
+rclb   %r31b,%r31b
+rcl    $0x2,%r12,%r12
+rclb   $0x2,%r12b,%r12b
+sal    %r31,%r31
+salb   %r31b,%r31b
+sal    $0x2,%r12,%r12
+salb   $0x2,%r12b,%r12b
+sar    %r31,%r31
+sarb   %r31b,%r31b
+sar    $0x2,%r12,%r12
+sarb   $0x2,%r12b,%r12b
+shl    %r31,%r31
+shlb   %r31b,%r31b
+shl    $0x2,%r12,%r12
+shlb   $0x2,%r12b,%r12b
+shr    %r31,%r31
+shrb   %r31b,%r31b
+shr    $0x2,%r12,%r12
+shrb   $0x2,%r12b,%r12b
+shld   $0x1,%r12,(%rax),%r12
+shld   $0x2,%r8,%r12,%r12
+shld   $0x2,%r8,%r12,%r8
+shld   %cl,%r9,(%rax),%r9
+shld   %cl,%r12,%r16,%r16
+shld   %cl,%r12,%r16,%r12
+shrd   $0x1,%r12,(%rax),%r12
+shrd   $0x1,%r13,%r12,%r12
+shrd   $0x1,%r13,%r12,%r13
+shrd   %cl,%r9,(%rax),%r9
+shrd   %cl,%r12,%r16,%r16
+shrd   %cl,%r12,%r16,%r12
+adcx   %r15,%r8,%r8
+adcx   (%r15,%r31,1),%r8,%r8
+adcx   %r8,%r9,%r8
+adox   %r15,%r8,%r8
+adox   (%r15,%r31,1),%r8,%r8
+adox   %r8,%r9,%r8
+cmovo  0x90909090(%eax),%edx,%edx
+cmovno 0x90909090(%eax),%edx,%edx
+cmovb  0x90909090(%eax),%edx,%edx
+cmovae 0x90909090(%eax),%edx,%edx
+cmove  0x90909090(%eax),%edx,%edx
+cmovne 0x90909090(%eax),%edx,%edx
+cmovbe 0x90909090(%eax),%edx,%edx
+cmova  0x90909090(%eax),%edx,%edx
+cmovs  0x90909090(%eax),%edx,%edx
+cmovns 0x90909090(%eax),%edx,%edx
+cmovp  0x90909090(%eax),%edx,%edx
+cmovnp 0x90909090(%eax),%edx,%edx
+cmovl  0x90909090(%eax),%edx,%edx
+cmovge 0x90909090(%eax),%edx,%edx
+cmovle 0x90909090(%eax),%edx,%edx
+cmovg  0x90909090(%eax),%edx,%edx
+imul   0x90909(%eax),%edx,%edx
+imul   0x909(%rax,%r31,8),%rdx,%rdx
+imul   %rdx,%rax,%rdx
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 668b366a212..eab99f9e52b 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -552,6 +552,7 @@ run_dump_test "x86-64-optimize-6"
 run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al"
 run_dump_test "x86-64-optimize-7b"
 run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al"
+run_dump_test "x86-64-apx-ndd-optimize"
 run_dump_test "x86-64-align-branch-1a"
 run_dump_test "x86-64-align-branch-1b"
 run_dump_test "x86-64-align-branch-1c"
-- 
2.31.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/2] Reorder APX insns in i386.tbl
  2023-11-15  1:49             ` Hu, Lin1
@ 2023-11-15  8:52               ` Jan Beulich
  2023-11-17  3:27                 ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-15  8:52 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils

On 15.11.2023 02:49, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, November 14, 2023 7:21 PM
>> To: Hu, Lin1 <lin1.hu@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
>> Subject: Re: [PATCH 1/2] Reorder APX insns in i386.tbl
>>
>> On 14.11.2023 03:58, Hu, Lin1 wrote:
>>> ---
>>>  gas/config/tc-i386.c |     4 +-
>>>  opcodes/i386-opc.tbl |   156 +-
>>>  5 files changed, 13189 insertions(+), 10771 deletions(-)
>>
>> What was the goal of sending this patch to the list, without any further
>> comments or explanations? It quite clearly doesn't apply to the present code
>> base.
>>
>> The diffstat is pretty odd, too: There clearly aren't as many files/lines changed.
> 
> This patch is just to show you the changes I would need to make if I reorder .tbl, the big difference in lines is because I removed the changes to mnem.h, init.h, etc. in the email.

Okay, but this isn't going to be a separate patch. It ought to be merged
into the patch introducing the new templates, so there's not going to be
extra diff or churn.

>>> @@ -2124,12 +2126,12 @@ xcryptofb, 0xf30fa7e8, PadLock,
>>> NoSuf|RepPrefixOk, {}  xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
>>>
>>>  // Multy-precision Add Carry, rdseed instructions.
>>> +adcx, 0x6666, ADX|APX_F,
>>>
>> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVV
>> VDe
>>> +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
>>> +Reg32|Reg64, Reg32|Reg64 }
>>>  adcx, 0x660f38f6, ADX,
>>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
>>> Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adcx, 0x6666,
>>> ADX|APX_F,
>>>
>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|E
>> Vex
>>> Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adcx,
>>> 0x6666, ADX|APX_F,
>>>
>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVD
>> est|
>>> EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64,
>>> Reg32|Reg64 }
>>> +adox, 0xf366, ADX|APX_F,
>>>
>> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVV
>> VDe
>>> +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
>>> +Reg32|Reg64, Reg32|Reg64 }
>>>  adox, 0xf30f38f6, ADX,
>>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
>>> Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adox, 0xf366,
>>> ADX|APX_F,
>>>
>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|E
>> Vex
>>> Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adox,
>>> 0xf366, ADX|APX_F,
>>>
>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVD
>> est|
>>> EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64,
>>> Reg32|Reg64 }
>>
>> What is this part about? I thought we agreed that ADCX/ADOX aren't suitable for
>> NDD->REX2 optimization, at which point the ordering of templates here could as
>> well be left alone.
>>
>       62 54 bd 18 66 c7       adcx   %r15,%r8,%r8
>       66 4d 0f  38 f6  c7       adcx   %r15,%r8
>       
> The code can optimize adcx/adox from NDD to legacy, currently. Do you mean we don't consider the optimization If the code length remain the same?

As said before, any optimization we do has to actually result in some sort
of win. If neither performance nor code size change, there's no point in
making any adjustments to what the user has written.

That said, though: Looks like there would still be a 1-byte win for 32-bit
ADCX/ADOX with just the low 8 registers used as operands, i.e. when no REX
prefix is needed in the encoding.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 7/8] Support APX NDD optimized encoding.
  2023-11-15  2:52             ` Hu, Lin1
@ 2023-11-15  8:57               ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-15  8:57 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 15.11.2023 03:52, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, November 14, 2023 6:51 PM
>>
>> On 14.11.2023 03:28, Hu, Lin1 wrote:
>>> @ -7728,6 +7765,40 @@ match_template (char mnem_suffix)
>>>           i.memshift = memshift;
>>>         }
>>>
>>> +      /* If we can optimize a NDD insn to non-NDD insn, like
>>> +        add %r16, %r8, %r8 -> add %r16, %r8,
>>> +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
>>> +        Note that the semantics have not been changed.  */
>>> +      if (optimize
>>> +         && !i.no_optimize
>>> +         && i.vec_encoding != vex_encoding_evex
>>> +         && t + 1 < current_templates->end
>>> +         && !t[1].opcode_modifier.evex)
>>> +       {
>>> +         unsigned int readonly_var = convert_NDD_to_REX2 (t);
>>> +         if (readonly_var != ~0)
>>> +           {
>>> +             if (!check_EgprOperands (t + 1))
>>> +               {
>>> +                 specific_error = progress (internal_error);
>>> +                 continue;
>>> +               }
>>> +             ++i.operands;
>>> +             ++i.reg_operands;
>>
>> DYM decrement rather than increment for these? We're trying to go from
>> 3 to 2 operands, after all.
>>
> 
> Here's a backtrace to considering for possible other opcode_space (0f38,...) instructions that can't accept the r16+ register, but can accept other rex registers or the normal. I decrement i.operands and i.reg_operands in convert_NDD_to_REX2. If the legacy or rex version of the insn can't support rex2 registers, I won't optimize it. So I need to increment these.

Okay, I need to see the full patch for this. Incrementing to undo earlier
decrementing still looks suspicious to me (for now).

>>> +
>>> +             if (readonly_var == 1)
>>> +               swap_2_operands (0, 1);
>>> +           }
>>> +       }
>>>
>>> convert_NDD_to_REX2 return readonly_var now. check_EgprOperands aims
>> to exclude some insns like adcx and adox. Because their opcode_space is legacy-
>> map2 can't support rex2.
>>
>> Good. Looking forward to seeing the full change.
>>
> 
> For some insns like adcx and adox, I'd like to add some details. check_EgprOperands only used to exclude some situation that these insns with gpr32 registers. If we think about optimization in terms of encoding length. Is it safe to assume that some insn with prefixes 66, f2, f3 and their opcode_space isn't legacy-map0 or legacy-map1 won't reduce the length of the code even if they are optimized?

Well, no, not always. See my other reply regarding 32-bit ADCX/ADOX.

> If yes, I think the code can be simplified like:
> 
>        /* If we can optimize a NDD insn to non-NDD insn, like
>           add %r16, %r8, %r8 -> add %r16, %r8,
>            add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
>            Note that the semantics have not been changed.  */
>         if (optimize
>             && !i.no_optimize
>             && i.vec_encoding != vex_encoding_evex
>             && t + 1 < current_templates->end
>             && !t[1].opcode_modifier.evex
>             && convert_NDD_to_REX2 (t))
>           {
>             specific_error = progress (internal_error);
>             continue;
>           }  
> 
> For those instructions that don't need to be optimized, like adcx and adox we just don't swap the order, so we don't need check_EgprOperands and backtrack, and convert_NDD_to_REX2 has the same return value as before.
> 
> PS. So shouldn't the name of the function be convert_NDD_to_legacy.

Perhaps yes, if the result can also be non-REX2 encodings.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
@ 2023-11-15  9:34               ` Jan Beulich
  2023-11-17  7:24                 ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-15  9:34 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: hongjiu.lu, binutils

On 15.11.2023 03:59, Hu, Lin1 wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Optimize APX NDD insns to legacy insns.  */
> +static bool
> +convert_NDD_to_REX2 (const insn_template *t)
> +{
> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> +      && t->opcode_space == SPACE_EVEXMAP4
> +      && !i.has_nf
> +      && i.reg_operands >= 2)
> +    {
> +      unsigned int readonly_var = ~0;
> +      unsigned int dest = i.operands - 1;
> +      unsigned int src1 = i.operands - 2;
> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.types[src1].bitfield.class == Reg
> +	  && i.op[src1].regs == i.op[dest].regs)
> +	readonly_var = src2;
> +      /* adcx, adox and imul can't support to swap the source operands.  */
> +      else if (i.types[src2].bitfield.class == Reg
> +	       && i.op[src2].regs == i.op[dest].regs
> +	       && optimize > 1
> +	       && t->opcode_modifier.commutative)

Comment and code still aren't in line: "support to swap the source operands"
really is the D attribute in the opcode table, whereas
t->opcode_modifier.commutative is related to the C attribute (and all three
insns named really are commutative). It looks to me that the code is
correct, so it would then be the comment that may need updating. But it may
also be better to additionally check .d here (making the code robust against
C being added to the truly commutative yet not eligible to be optimized
insns). In which case the comment might say "adcx, adox, and imul, while
commutative, don't support to swap the source operands".

> +	readonly_var = src1;
> +      if (readonly_var != (unsigned int) ~0)
> +	{
> +	  if (readonly_var != src2)
> +	    swap_2_operands (readonly_var, src2);
> +
> +	  --i.operands;
> +	  --i.reg_operands;
> +
> +	  return true;
> +	}
> +    }
> +  return false;
> +}
> +
>  /* Helper function for the progress() macro in match_template().  */
>  static INLINE enum i386_error progress (enum i386_error new,
>  					enum i386_error last,
> @@ -7728,6 +7765,21 @@ match_template (char mnem_suffix)
>  	  i.memshift = memshift;
>  	}
>  
> +      /* If we can optimize a NDD insn to non-NDD insn, like

The terminology here wants to match the function name below, i.e. (as
indicated elsewhere for the name, in reply to your question) "legacy"
instead of "non-NDD" (assuming the function name is changed as well,
in line with that).

> +	 add %r16, %r8, %r8 -> add %r16, %r8,
> +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.  
> +	 Note that the semantics have not been changed.  */
> +      if (optimize
> +	  && !i.no_optimize
> +	  && i.vec_encoding != vex_encoding_evex
> +	  && t + 1 < current_templates->end
> +	  && !t[1].opcode_modifier.evex

This is more fragile than it needs to be; it would imo be better to indeed
go from opcode space of the supposed alternative encoding. Perhaps that's
going to mean checking both.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-08  9:11   ` Jan Beulich
@ 2023-11-15 14:56     ` Cui, Lili
  2023-11-16  9:17       ` Jan Beulich
  2023-11-16 15:34     ` Cui, Lili
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-15 14:56 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> > +	phminposuw (%r23),%xmm4
> > +	pinsrb $100,%r23,%xmm4
> > +	pinsrb $100,(%r23),%xmm4
> > +	pinsrd $100, %r23d, %xmm4
> > +	pinsrd $100,(%r23),%xmm4
> > +	pinsrq $100, %r24, %xmm4
> > +	pinsrq $100,(%r24),%xmm4
> > +	pmaxsb (%r24),%xmm6
> > +	pmaxsd (%r24),%xmm6
> > +	pmaxud (%r24),%xmm6
> > +	pmaxuw (%r24),%xmm6
> > +	pminsb (%r24),%xmm6
> > +	pminsd (%r24),%xmm6
> > +	pminud (%r24),%xmm6
> > +	pminuw (%r24),%xmm6
> > +	pmovsxbw (%r24),%xmm4
> > +	pmovsxbd (%r24),%xmm4
> > +	pmovsxbq (%r24),%xmm4
> > +	pmovsxwd (%r24),%xmm4
> > +	pmovsxwq (%r24),%xmm4
> > +	pmovsxdq (%r24),%xmm4
> > +	pmovsxbw (%r24),%xmm4
> > +	pmovzxbd (%r24),%xmm4
> > +	pmovzxbq (%r24),%xmm4
> > +	pmovzxwd (%r24),%xmm4
> > +	pmovzxwq (%r24),%xmm4
> > +	pmovzxdq (%r24),%xmm4
> > +	pmuldq (%r24),%xmm4
> > +	pmulld (%r24),%xmm4
> > +	roundpd $100,(%r24),%xmm6
> > +	roundps $100,(%r24),%xmm6
> > +	roundsd $100,(%r24),%xmm6
> > +	roundss $100,(%r24),%xmm6
> > +	pcmpestri $100,(%r25),%xmm6
> > +	pcmpestrm $100,(%r25),%xmm6
> > +	pcmpgtq (%r25),%xmm4
> > +	pcmpistri $100,(%r25),%xmm6
> > +	pcmpistrm $100,(%r25),%xmm6
> > +#AES
> > +	aesdec (%r26),%xmm6
> > +	aesdeclast (%r26),%xmm6
> > +	aesenc (%r26),%xmm6
> > +	aesenclast (%r26),%xmm6
> > +	aesimc (%r26),%xmm6
> > +	aeskeygenassist $100,(%r26),%xmm6
> > +	pclmulqdq $100,(%r26),%xmm6
> > +	pclmullqlqdq (%r26),%xmm6
> > +	pclmulhqlqdq (%r26),%xmm6
> > +	pclmullqhqdq (%r26),%xmm6
> > +	pclmulhqhqdq (%r26),%xmm6
> > +#GFNI
> > +	gf2p8affineqb $100,(%r26),%xmm6
> > +	gf2p8affineinvqb $100,(%r26),%xmm6
> > +	gf2p8mulb (%r26),%xmm6
> > +#VEX without evex
> > +	vblendpd $7,(%r27),%xmm6,%xmm2
> > +	vblendpd $7,(%r27),%ymm6,%ymm2
> > +	vblendps $7,(%r27),%xmm6,%xmm2
> > +	vblendps $7,(%r27),%ymm6,%ymm2
> > +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
> > +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
> > +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
> > +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
> > +	vdppd $7,(%r27),%xmm6,%xmm2
> > +	vdpps $7,(%r27),%xmm6,%xmm2
> > +	vdpps $7,(%r27),%ymm6,%ymm2
> > +	vhaddpd (%r27),%xmm6,%xmm5
> > +	vhaddpd (%r27),%ymm6,%ymm5
> > +	vhsubps (%r27),%xmm6,%xmm5
> > +	vhsubps (%r27),%ymm6,%ymm5
> > +	vlddqu (%r27),%xmm4
> > +	vlddqu (%r27),%ymm4
> > +	vldmxcsr (%r27)
> 
> As mentioned before, for this, ...
> 
> > +	vmaskmovpd (%r27),%xmm4,%xmm6
> > +	vmaskmovpd %xmm4,%xmm6,(%r27)
> > +	vmaskmovps (%r27),%xmm4,%xmm6
> > +	vmaskmovps %xmm4,%xmm6,(%r27)
> > +	vmaskmovpd (%r27),%ymm4,%ymm6
> > +	vmaskmovpd %ymm4,%ymm6,(%r27)
> > +	vmaskmovps (%r27),%ymm4,%ymm6
> > +	vmaskmovps %ymm4,%ymm6,(%r27)
> > +	vmovmskpd %xmm4,%r27d
> > +	vmovmskpd %xmm8,%r27d
> > +	vmovmskps %xmm4,%r27d
> > +	vmovmskps %ymm8,%r27d
> > +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
> > +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
> > +	vpblendw $7,(%r27),%xmm6,%xmm2
> > +	vpblendw $7,(%r27),%ymm6,%ymm2
> > +	vpcmpestri $7,(%r27),%xmm6
> > +	vpcmpestrm $7,(%r27),%xmm6
> > +	vperm2f128 $7,(%r27),%ymm6,%ymm2
> > +	vphaddd (%r27),%xmm6,%xmm7
> > +	vphaddsw (%r27),%xmm6,%xmm7
> > +	vphaddw (%r27),%xmm6,%xmm7
> > +	vphsubd (%r27),%xmm6,%xmm7
> > +	vphsubsw (%r27),%xmm6,%xmm7
> > +	vphsubw (%r27),%xmm6,%xmm7
> > +	vphaddd (%r27),%ymm6,%ymm7
> > +	vphaddsw (%r27),%ymm6,%ymm7
> > +	vphaddw (%r27),%ymm6,%ymm7
> > +	vphsubd (%r27),%ymm6,%ymm7
> > +	vphsubsw (%r27),%ymm6,%ymm7
> > +	vphsubw (%r27),%ymm6,%ymm7
> > +	vphminposuw (%r27),%xmm6
> > +	vpmovmskb %xmm4,%r27
> > +	vpmovmskb %ymm4,%r27d
> > +	vpsignb (%r27),%xmm6,%xmm7
> > +	vpsignw (%r27),%xmm6,%xmm7
> > +	vpsignd (%r27),%xmm6,%xmm7
> > +	vpsignb (%r27),%xmm6,%xmm7
> > +	vpsignw (%r27),%xmm6,%xmm7
> > +	vpsignd (%r27),%xmm6,%xmm7
> > +	vptest (%r27),%xmm6
> > +	vptest (%r27),%ymm6
> > +	vrcpps (%r27),%xmm6
> > +	vrcpps (%r27),%ymm6
> > +	vrcpss (%r27),%xmm6,%xmm6
> > +	vrsqrtps (%r27),%xmm6
> > +	vrsqrtps (%r27),%ymm6
> > +	vrsqrtss (%r27),%xmm6,%xmm6
> > +	vstmxcsr (%r27)
> 
> ... this, and ...
> 
> > +	vtestps (%r27),%xmm6
> > +	vtestps (%r27),%ymm6
> > +	vtestpd (%r27),%xmm6
> > +	vtestps (%r27),%ymm6
> > +	vtestpd (%r27),%ymm6
> > +	vpblendd $7,(%r27),%xmm6,%xmm2
> > +	vpblendd $7,(%r27),%ymm6,%ymm2
> > +	vperm2i128 $7,(%r27),%ymm6,%ymm2
> > +	vpmaskmovd (%r27),%xmm4,%xmm6
> > +	vpmaskmovd %xmm4,%xmm6,(%r27)
> > +	vpmaskmovq (%r27),%xmm4,%xmm6
> > +	vpmaskmovq %xmm4,%xmm6,(%r27)
> > +	vpmaskmovd (%r27),%ymm4,%ymm6
> > +	vpmaskmovd %ymm4,%ymm6,(%r27)
> > +	vpmaskmovq (%r27),%ymm4,%ymm6
> > +	vpmaskmovq %ymm4,%ymm6,(%r27)
> > +	vaesimc (%r27), %xmm3
> > +	vaeskeygenassist $7,(%r27),%xmm3
> > +	vroundpd $1,(%r24),%xmm6
> > +	vroundps $2,(%r24),%xmm6
> > +	vroundsd $3,(%r24),%xmm6,%xmm3
> > +	vroundss $4,(%r24),%xmm6,%xmm3
> 
> ... and these four I wonder whether the documentation shouldn't at least
> allow room for translating them, for there being functionally equivalent
> encodings.
> 

Could you give an example with equivalent encodings? Thanks.

Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-15 14:56     ` Cui, Lili
@ 2023-11-16  9:17       ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-16  9:17 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 15.11.2023 15:56, Cui, Lili wrote:
>>> +	phminposuw (%r23),%xmm4
>>> +	pinsrb $100,%r23,%xmm4
>>> +	pinsrb $100,(%r23),%xmm4
>>> +	pinsrd $100, %r23d, %xmm4
>>> +	pinsrd $100,(%r23),%xmm4
>>> +	pinsrq $100, %r24, %xmm4
>>> +	pinsrq $100,(%r24),%xmm4
>>> +	pmaxsb (%r24),%xmm6
>>> +	pmaxsd (%r24),%xmm6
>>> +	pmaxud (%r24),%xmm6
>>> +	pmaxuw (%r24),%xmm6
>>> +	pminsb (%r24),%xmm6
>>> +	pminsd (%r24),%xmm6
>>> +	pminud (%r24),%xmm6
>>> +	pminuw (%r24),%xmm6
>>> +	pmovsxbw (%r24),%xmm4
>>> +	pmovsxbd (%r24),%xmm4
>>> +	pmovsxbq (%r24),%xmm4
>>> +	pmovsxwd (%r24),%xmm4
>>> +	pmovsxwq (%r24),%xmm4
>>> +	pmovsxdq (%r24),%xmm4
>>> +	pmovsxbw (%r24),%xmm4
>>> +	pmovzxbd (%r24),%xmm4
>>> +	pmovzxbq (%r24),%xmm4
>>> +	pmovzxwd (%r24),%xmm4
>>> +	pmovzxwq (%r24),%xmm4
>>> +	pmovzxdq (%r24),%xmm4
>>> +	pmuldq (%r24),%xmm4
>>> +	pmulld (%r24),%xmm4
>>> +	roundpd $100,(%r24),%xmm6
>>> +	roundps $100,(%r24),%xmm6
>>> +	roundsd $100,(%r24),%xmm6
>>> +	roundss $100,(%r24),%xmm6
>>> +	pcmpestri $100,(%r25),%xmm6
>>> +	pcmpestrm $100,(%r25),%xmm6
>>> +	pcmpgtq (%r25),%xmm4
>>> +	pcmpistri $100,(%r25),%xmm6
>>> +	pcmpistrm $100,(%r25),%xmm6
>>> +#AES
>>> +	aesdec (%r26),%xmm6
>>> +	aesdeclast (%r26),%xmm6
>>> +	aesenc (%r26),%xmm6
>>> +	aesenclast (%r26),%xmm6
>>> +	aesimc (%r26),%xmm6
>>> +	aeskeygenassist $100,(%r26),%xmm6
>>> +	pclmulqdq $100,(%r26),%xmm6
>>> +	pclmullqlqdq (%r26),%xmm6
>>> +	pclmulhqlqdq (%r26),%xmm6
>>> +	pclmullqhqdq (%r26),%xmm6
>>> +	pclmulhqhqdq (%r26),%xmm6
>>> +#GFNI
>>> +	gf2p8affineqb $100,(%r26),%xmm6
>>> +	gf2p8affineinvqb $100,(%r26),%xmm6
>>> +	gf2p8mulb (%r26),%xmm6
>>> +#VEX without evex
>>> +	vblendpd $7,(%r27),%xmm6,%xmm2
>>> +	vblendpd $7,(%r27),%ymm6,%ymm2
>>> +	vblendps $7,(%r27),%xmm6,%xmm2
>>> +	vblendps $7,(%r27),%ymm6,%ymm2
>>> +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
>>> +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
>>> +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
>>> +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
>>> +	vdppd $7,(%r27),%xmm6,%xmm2
>>> +	vdpps $7,(%r27),%xmm6,%xmm2
>>> +	vdpps $7,(%r27),%ymm6,%ymm2
>>> +	vhaddpd (%r27),%xmm6,%xmm5
>>> +	vhaddpd (%r27),%ymm6,%ymm5
>>> +	vhsubps (%r27),%xmm6,%xmm5
>>> +	vhsubps (%r27),%ymm6,%ymm5
>>> +	vlddqu (%r27),%xmm4
>>> +	vlddqu (%r27),%ymm4
>>> +	vldmxcsr (%r27)
>>
>> As mentioned before, for this, ...
>>
>>> +	vmaskmovpd (%r27),%xmm4,%xmm6
>>> +	vmaskmovpd %xmm4,%xmm6,(%r27)
>>> +	vmaskmovps (%r27),%xmm4,%xmm6
>>> +	vmaskmovps %xmm4,%xmm6,(%r27)
>>> +	vmaskmovpd (%r27),%ymm4,%ymm6
>>> +	vmaskmovpd %ymm4,%ymm6,(%r27)
>>> +	vmaskmovps (%r27),%ymm4,%ymm6
>>> +	vmaskmovps %ymm4,%ymm6,(%r27)
>>> +	vmovmskpd %xmm4,%r27d
>>> +	vmovmskpd %xmm8,%r27d
>>> +	vmovmskps %xmm4,%r27d
>>> +	vmovmskps %ymm8,%r27d
>>> +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
>>> +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
>>> +	vpblendw $7,(%r27),%xmm6,%xmm2
>>> +	vpblendw $7,(%r27),%ymm6,%ymm2
>>> +	vpcmpestri $7,(%r27),%xmm6
>>> +	vpcmpestrm $7,(%r27),%xmm6
>>> +	vperm2f128 $7,(%r27),%ymm6,%ymm2
>>> +	vphaddd (%r27),%xmm6,%xmm7
>>> +	vphaddsw (%r27),%xmm6,%xmm7
>>> +	vphaddw (%r27),%xmm6,%xmm7
>>> +	vphsubd (%r27),%xmm6,%xmm7
>>> +	vphsubsw (%r27),%xmm6,%xmm7
>>> +	vphsubw (%r27),%xmm6,%xmm7
>>> +	vphaddd (%r27),%ymm6,%ymm7
>>> +	vphaddsw (%r27),%ymm6,%ymm7
>>> +	vphaddw (%r27),%ymm6,%ymm7
>>> +	vphsubd (%r27),%ymm6,%ymm7
>>> +	vphsubsw (%r27),%ymm6,%ymm7
>>> +	vphsubw (%r27),%ymm6,%ymm7
>>> +	vphminposuw (%r27),%xmm6
>>> +	vpmovmskb %xmm4,%r27
>>> +	vpmovmskb %ymm4,%r27d
>>> +	vpsignb (%r27),%xmm6,%xmm7
>>> +	vpsignw (%r27),%xmm6,%xmm7
>>> +	vpsignd (%r27),%xmm6,%xmm7
>>> +	vpsignb (%r27),%xmm6,%xmm7
>>> +	vpsignw (%r27),%xmm6,%xmm7
>>> +	vpsignd (%r27),%xmm6,%xmm7
>>> +	vptest (%r27),%xmm6
>>> +	vptest (%r27),%ymm6
>>> +	vrcpps (%r27),%xmm6
>>> +	vrcpps (%r27),%ymm6
>>> +	vrcpss (%r27),%xmm6,%xmm6
>>> +	vrsqrtps (%r27),%xmm6
>>> +	vrsqrtps (%r27),%ymm6
>>> +	vrsqrtss (%r27),%xmm6,%xmm6
>>> +	vstmxcsr (%r27)
>>
>> ... this, and ...
>>
>>> +	vtestps (%r27),%xmm6
>>> +	vtestps (%r27),%ymm6
>>> +	vtestpd (%r27),%xmm6
>>> +	vtestps (%r27),%ymm6
>>> +	vtestpd (%r27),%ymm6
>>> +	vpblendd $7,(%r27),%xmm6,%xmm2
>>> +	vpblendd $7,(%r27),%ymm6,%ymm2
>>> +	vperm2i128 $7,(%r27),%ymm6,%ymm2
>>> +	vpmaskmovd (%r27),%xmm4,%xmm6
>>> +	vpmaskmovd %xmm4,%xmm6,(%r27)
>>> +	vpmaskmovq (%r27),%xmm4,%xmm6
>>> +	vpmaskmovq %xmm4,%xmm6,(%r27)
>>> +	vpmaskmovd (%r27),%ymm4,%ymm6
>>> +	vpmaskmovd %ymm4,%ymm6,(%r27)
>>> +	vpmaskmovq (%r27),%ymm4,%ymm6
>>> +	vpmaskmovq %ymm4,%ymm6,(%r27)
>>> +	vaesimc (%r27), %xmm3
>>> +	vaeskeygenassist $7,(%r27),%xmm3
>>> +	vroundpd $1,(%r24),%xmm6
>>> +	vroundps $2,(%r24),%xmm6
>>> +	vroundsd $3,(%r24),%xmm6,%xmm3
>>> +	vroundss $4,(%r24),%xmm6,%xmm3
>>
>> ... and these four I wonder whether the documentation shouldn't at least
>> allow room for translating them, for there being functionally equivalent
>> encodings.
> 
> Could you give an example with equivalent encodings? Thanks.

	ldmxcsr (%r27)
	stmxcsr (%r27)
	vrndscalepd $1,(%r24),%xmm6
	vrndscaleps $2,(%r24),%xmm6
	vrndscalesd $3,(%r24),%xmm6,%xmm3
	vrndscaless $4,(%r24),%xmm6,%xmm3

Of course for the former two the decision to not support EVEX-encoded
V{LD,ST}MXCSR needs to be firm, or else later on what these mnemonics
translate to (when using extended registers for addressing) would change.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-08  9:11   ` Jan Beulich
  2023-11-15 14:56     ` Cui, Lili
@ 2023-11-16 15:34     ` Cui, Lili
  2023-11-16 16:50       ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-16 15:34 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> > +	vpcmpistri $100,(%r25),%xmm6
> > +	vpcmpistrm $100,(%r25),%xmm6
> > +	vpcmpeqb (%r26),%ymm6,%ymm2
> > +	vpcmpeqw (%r16),%ymm6,%ymm2
> > +	vpcmpeqd (%r26),%ymm6,%ymm2
> > +	vpcmpeqq (%r16),%ymm6,%ymm2
> > +	vpcmpgtb (%r26),%ymm6,%ymm2
> > +	vpcmpgtw (%r16),%ymm6,%ymm2
> > +	vpcmpgtd (%r26),%ymm6,%ymm2
> > +	vpcmpgtq (%r16),%ymm6,%ymm2
> 
> As an overall remark to this (and perhaps similar) test(s): It would be nice if
> there was some consistent sorting criteria applied throughout the test as
> whole or (here) the sub-sections (validly grouped by category). Without that
> it's needlessly hard to spot any omissions.
> 

Re-sorted for each group.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
> > @@ -0,0 +1,16 @@
> > +.*: Assembler messages:
> > +.*:4: Error: `movbe' is not supported on `x86_64.nomovbe'
> > +.*:5: Error: `movbe' is not supported on `x86_64.nomovbe'
> > +.*:7: Error: `invept' is not supported on `x86_64.nomovbe.noept'
> > +.*:8: Error: `invept' is not supported on `x86_64.nomovbe.noept'
> > +.*:10: Error: `kmovq' is not supported on
> `x86_64.nomovbe.noept.noavx512bw'
> > +.*:11: Error: `kmovq' is not supported on
> `x86_64.nomovbe.noept.noavx512bw'
> > +.*:13: Error: `kmovb' is not supported on
> `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
> > +.*:14: Error: `kmovb' is not supported on
> `x86_64.nomovbe.noept.noavx512bw.noavx512dq'
> > +.*:16: Error: `kmovw' is not supported on
> `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'
> > +.*:17: Error: `kmovw' is not supported on
> `x86_64.nomovbe.noept.noavx512bw.noavx512dq.noavx512f'
> 
> Can the irrelevant middle parts of these .no* expecations please be omitted?
> The construction of these strings is in need of improvement, and it would be
> nice if testcases where the precise string doesn't matter would then not need
> touching. (This is a more general principle: Testcase expectations would
> better be only as specific as needed for what is under test. Certainly multiple
> aspects may be tested in one go, but quite commonly expecations are
> needlessly strict, and hence needlessly prone to breaking when unrelated
> changes are made somewhere in the code.)
> 

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> > @@ -0,0 +1,17 @@
> > +# Check illegal 64bit APX EVEX promoted instructions
> > +	.text
> > +	.arch .nomovbe
> > +	movbe (%r16), %r17
> > +	movbe (%rax), %rcx
> > +	.arch .noept
> > +	invept (%r16), %r17
> > +	invept (%rax), %rcx
> > +	.arch .noavx512bw
> > +	kmovq %k1, (%r16)
> > +	kmovq %k1, (%r8)
> > +	.arch .noavx512dq
> > +	kmovb %k1, %r16d
> > +	kmovb %k1, %r8d
> > +	.arch .noavx512f
> > +	kmovw %k1, %r16d
> > +	kmovw %k1, %r8d
> 
> What about BMI/BMI2 insns? Or AMX ones? (I surely missed further groups.)
> 

We don’t want to list all the instructions here, just a few representatives.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -0,0 +1,29 @@
# Check Illegal prefix for 64bit EVEX-promoted instructions

        .allow_index_reg
        .text
_start:
        #movbe %r18w,%ax set EVEX.pp = f3 (illegal value).
        .byte 0x62, 0xfc, 0x7e, 0x08, 0x60, 0xc2
        .byte 0xff, 0xff
        #movbe %r18w,%ax set EVEX.pp = f2 (illegal value).
        .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
        .byte 0xff, 0xff
        #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
        #(illegal value).
        .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
        .byte 0xff
        #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
        .byte 0x62, 0xfd, 0x7d, 0x08, 0x60, 0xc2
        .byte 0xff, 0xff
        #EVEX_MAP4 movbe %r18w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).
        .byte 0x62, 0xfd, 0x7d, 0x09, 0x60, 0xc2
        .byte 0xff, 0xff
        #EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == b001 (illegal value).
        .byte 0x62, 0xfd, 0x7d, 0x28, 0x60, 0xc2
        .byte 0xff, 0xff
        #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[17:16](EVEX.aa) == 1 (illegal value).
        .byte 0x62, 0xda, 0x7c, 0x09, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
        #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[22:21](EVEX.L’L) == 1 (illegal value).
        .byte 0x62, 0xda, 0x7c, 0x28, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
        #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[20](EVEX.b) == 1 (illegal value).
        .byte 0x62, 0xda, 0x7c, 0x18, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00

> I suspect at least some of these can be expressed via .insn, which would
> greatly help readability (i.e. recognizing what is actually being done, and
> what's expected-wrong about it).
> 

Update test cases.
I try to express the first case using .insn. I can't find a way to express EVEX.P[3:2] == 11, do you have any ideas?

0x62, 0xfc  ---> EVEX.P[3:2] of normal EVEX must be 00.

> Also - nit - there are again indentation inconsistencies here.

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> > @@ -0,0 +1,322 @@
> > +# Check 64bit APX_F EVEX-Promoted instructions.
> > +
> > +	.text
> > +_start:
> >[...]
> > +.intel_syntax noprefix
> 
> Didn't you say you corrected directive indentation throughout the series?
> 

Done.

> > +	aadd	DWORD PTR [r31+rax*4+0x123],r25d
> > +	aadd	QWORD PTR [r31+rax*4+0x123],r31
> > +	aand	DWORD PTR [r31+rax*4+0x123],r25d
> > +	aand	QWORD PTR [r31+rax*4+0x123],r31
> > +	aesdec128kl	xmm22,[r31+rax*4+0x123]
> > +	aesdec256kl	xmm22,[r31+rax*4+0x123]
> > +	aesdecwide128kl	[r31+rax*4+0x123]
> > +	aesdecwide256kl	[r31+rax*4+0x123]
> > +	aesenc128kl	xmm22,[r31+rax*4+0x123]
> > +	aesenc256kl	xmm22,[r31+rax*4+0x123]
> > +	aesencwide128kl	[r31+rax*4+0x123]
> > +	aesencwide256kl	[r31+rax*4+0x123]
> > +	aor	DWORD PTR [r31+rax*4+0x123],r25d
> > +	aor	QWORD PTR [r31+rax*4+0x123],r31
> > +	axor	DWORD PTR [r31+rax*4+0x123],r25d
> > +	axor	QWORD PTR [r31+rax*4+0x123],r31
> > +	bextr	r10d,edx,r25d
> > +	bextr	edx,DWORD PTR [r31+rax*4+0x123],r25d
> > +	bextr	r11,r15,r31
> > +	bextr	r15,QWORD PTR [r31+rax*4+0x123],r31
> 
> Going just down to here (it extends throughout the Intel syntax part):
> Can there please also be cases where the xxx PTR is omitted from the
> memory operands? That doesn't mean there always need to be both forms,
> but there should be a fair mix. (I notice you have one such example with
> INVPCID below.)
> 

Changed.

> >[...]
> > +	crc32	r22,r31
> > +	crc32	r22,QWORD PTR [r31]
> > +	crc32	r17,r19b
> > +	crc32	r21d,r19b
> > +	crc32	ebx,BYTE PTR [r19]
> > +	crc32	r23d,r31d
> > +	crc32	r23d,DWORD PTR [r31]
> > +	crc32	r21d,r31w
> > +	crc32	r21d,WORD PTR [r31]
> > +	crc32	r18,rax
> 
> These could do with moving up, since otherwise things look to be sorted
> alphabetically here. But seeing these also reminds me that the noreg64 test
> also needs extending, to cover these new forms (handled by separate
> templates).
> 

I'm confused here about adding crc test case in noreg64.s, could you elaborate on what testcase you want to add?

        pfx crc32       (%rax), %eax
        pfx16 crc32     (%rax), %rax
+       pfx crc32       (%r31),%r21d   ---> data size prefix invalid with `crc32'
+       pfx crc32       (%r31),%r21     ---> data size prefix invalid with `crc32'

> > +	kmovb	k5,k3 
> 
> This (and its siblings) doesn't belong, here, does it? It continues to be VEX-
> encoded.
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64.exp
> > +++ b/gas/testsuite/gas/i386/x86-64.exp
> > @@ -360,8 +360,13 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
> >  run_dump_test "x86-64-avx512f-rcigrne"
> >  run_dump_test "x86-64-avx512f-rcigru-intel"
> >  run_dump_test "x86-64-avx512f-rcigru"
> > -run_list_test "x86-64-apx-egpr-inval" "-al"
> > +run_list_test "x86-64-apx-egpr-inval"
> 
> This should be put in its final shape right in patch 1; no need to touch it here
> again. (Else you'd need to mention the change in the ChangeLog
> entry.)
> 

Done.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-16 15:34     ` Cui, Lili
@ 2023-11-16 16:50       ` Jan Beulich
  2023-11-17 12:42         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-16 16:50 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 16.11.2023 16:34, Cui, Lili wrote:
>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
>>> @@ -0,0 +1,17 @@
>>> +# Check illegal 64bit APX EVEX promoted instructions
>>> +	.text
>>> +	.arch .nomovbe
>>> +	movbe (%r16), %r17
>>> +	movbe (%rax), %rcx
>>> +	.arch .noept
>>> +	invept (%r16), %r17
>>> +	invept (%rax), %rcx
>>> +	.arch .noavx512bw
>>> +	kmovq %k1, (%r16)
>>> +	kmovq %k1, (%r8)
>>> +	.arch .noavx512dq
>>> +	kmovb %k1, %r16d
>>> +	kmovb %k1, %r8d
>>> +	.arch .noavx512f
>>> +	kmovw %k1, %r16d
>>> +	kmovw %k1, %r8d
>>
>> What about BMI/BMI2 insns? Or AMX ones? (I surely missed further groups.)
> 
> We don’t want to list all the instructions here, just a few representatives.

Sure. I'm asking for representatives from the BMI and BMI2 groups.

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>>> @@ -0,0 +1,29 @@
> # Check Illegal prefix for 64bit EVEX-promoted instructions
> 
>         .allow_index_reg
>         .text
> _start:
>         #movbe %r18w,%ax set EVEX.pp = f3 (illegal value).
>         .byte 0x62, 0xfc, 0x7e, 0x08, 0x60, 0xc2
>         .byte 0xff, 0xff
>         #movbe %r18w,%ax set EVEX.pp = f2 (illegal value).
>         .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
>         .byte 0xff, 0xff
>         #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
>         #(illegal value).
>         .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
>         .byte 0xff
>         #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
>         .byte 0x62, 0xfd, 0x7d, 0x08, 0x60, 0xc2
>         .byte 0xff, 0xff
>         #EVEX_MAP4 movbe %r18w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).
>         .byte 0x62, 0xfd, 0x7d, 0x09, 0x60, 0xc2
>         .byte 0xff, 0xff
>         #EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == b001 (illegal value).
>         .byte 0x62, 0xfd, 0x7d, 0x28, 0x60, 0xc2
>         .byte 0xff, 0xff
>         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[17:16](EVEX.aa) == 1 (illegal value).
>         .byte 0x62, 0xda, 0x7c, 0x09, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
>         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[22:21](EVEX.L’L) == 1 (illegal value).
>         .byte 0x62, 0xda, 0x7c, 0x28, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
>         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[20](EVEX.b) == 1 (illegal value).
>         .byte 0x62, 0xda, 0x7c, 0x18, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
> 
>> I suspect at least some of these can be expressed via .insn, which would
>> greatly help readability (i.e. recognizing what is actually being done, and
>> what's expected-wrong about it).
>>
> 
> Update test cases.
> I try to express the first case using .insn. I can't find a way to express EVEX.P[3:2] == 11, do you have any ideas?
> 
> 0x62, 0xfc  ---> EVEX.P[3:2] of normal EVEX must be 00.

There are terminology issues here again. The first case in the test talks
about EVEX.pp set to the equivalent of an F3 prefix. That's neither
encoded as 11, nor in EVEX.P[3:2] (I don't like the EVEX.P[] notation
anyway), but in EVEX.P[9:8].

Irrespective, these are some examples of what I use to encode MOVBE (note
that all of this Intel syntax and the comments are MASM-style):

	.insn EVEX.L0.66.M12.W0 0x60, di, ax		; movbe di, r16w
	.insn EVEX.L0.66.M12.W0 0x60, di, [rax]		; movbe di, [r16]
	.insn EVEX.L0.M4 0x60, xmm16, rdi		; movbe r16, rdi
	.insn EVEX.L0.M4.W0 0x60, xmm16, [rdi]		; movbe r16d, [rdi]
	.insn EVEX.L0.66.M4.W0 0x61, [rdi], xmm16	; movbe [rdi], r16w
	.insn EVEX.L0.M4 0x61, xmm16, edi		; movbe edi, r16d
	.insn EVEX.L0.M12 0x61, [rax], rdi		; movbe [r16], rdi

Surely you can find variations to support the forms you're after. Plus
if you think the .insn documentation is unclear, please point out what
you think needs improving.

>>> [...]
>>> +	crc32	r22,r31
>>> +	crc32	r22,QWORD PTR [r31]
>>> +	crc32	r17,r19b
>>> +	crc32	r21d,r19b
>>> +	crc32	ebx,BYTE PTR [r19]
>>> +	crc32	r23d,r31d
>>> +	crc32	r23d,DWORD PTR [r31]
>>> +	crc32	r21d,r31w
>>> +	crc32	r21d,WORD PTR [r31]
>>> +	crc32	r18,rax
>>
>> These could do with moving up, since otherwise things look to be sorted
>> alphabetically here. But seeing these also reminds me that the noreg64 test
>> also needs extending, to cover these new forms (handled by separate
>> templates).
>>
> 
> I'm confused here about adding crc test case in noreg64.s, could you elaborate on what testcase you want to add?
> 
>         pfx crc32       (%rax), %eax
>         pfx16 crc32     (%rax), %rax
> +       pfx crc32       (%r31),%r21d   ---> data size prefix invalid with `crc32'
> +       pfx crc32       (%r31),%r21     ---> data size prefix invalid with `crc32'

Well, of course you can't use the "pfx" macro (at least not as is), which
will emit a data size prefix when DATA16 is defined. Likewise it would emit
"rex64" when REX64 is defined, which doesn't make sense with EVEX-encoded
insns. Ideally you would introduce a new macro to control operand size in
an EVEX-like manner, just that I'm afraid that the way you're adding EVEX-
encoding support to gas doesn't offer any means equivalent to that of legacy
encodings. Hence only the "bare" EVEX-encoded insns (without the use of any
pfx*) should be added for the time being.

Also, ftaod, CRC32 was only an example here. Any new template you add which
allows for potentially ambiguous operand size will need an example added
here. This set of tests (noreg64*) is intended to be (and remain) exhaustive.

Albeit, thinking a little further, perhaps you simply want to introduce a
noreg64-evex.d referencing the same source file, but arranging for {evex} to
be emitted in the pfx macro (or a further clone thereof, as some of the
insns cannot be EVEX-encoded)? That would then also deal with covering all
relevant new templates (I think). You'd need to check what, if anything,
needs doing to the pfx16 and pfx64 macros. But of course you could also
introduce a fully standalone noreg64-apx.{s,d} test, to escape some of the
possible hassles.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/2] Reorder APX insns in i386.tbl
  2023-11-15  8:52               ` Jan Beulich
@ 2023-11-17  3:27                 ` Hu, Lin1
  0 siblings, 0 replies; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-17  3:27 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, November 15, 2023 4:52 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH 1/2] Reorder APX insns in i386.tbl
> 
> On 15.11.2023 02:49, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Tuesday, November 14, 2023 7:21 PM
> >> To: Hu, Lin1 <lin1.hu@intel.com>
> >> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> >> Subject: Re: [PATCH 1/2] Reorder APX insns in i386.tbl
> >>
> >> On 14.11.2023 03:58, Hu, Lin1 wrote:
> >>> ---
> >>>  gas/config/tc-i386.c |     4 +-
> >>>  opcodes/i386-opc.tbl |   156 +-
> >>>  5 files changed, 13189 insertions(+), 10771 deletions(-)
> >>
> >> What was the goal of sending this patch to the list, without any
> >> further comments or explanations? It quite clearly doesn't apply to
> >> the present code base.
> >>
> >> The diffstat is pretty odd, too: There clearly aren't as many files/lines
> changed.
> >
> > This patch is just to show you the changes I would need to make if I
> reorder .tbl, the big difference in lines is because I removed the changes to
> mnem.h, init.h, etc. in the email.
> 
> Okay, but this isn't going to be a separate patch. It ought to be merged into
> the patch introducing the new templates, so there's not going to be extra
> diff or churn.
>

Don't worry, it won't be a standalone patch, it just seems like lili hasn't changed it to NDD yet, I send it out for you to look at first.
 
>
> >>> @@ -2124,12 +2126,12 @@ xcryptofb, 0xf30fa7e8, PadLock,
> >>> NoSuf|RepPrefixOk, {}  xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk,
> >>> NoSuf|{}
> >>>
> >>>  // Multy-precision Add Carry, rdseed instructions.
> >>> +adcx, 0x6666, ADX|APX_F,
> >>>
> >>
> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexV
> VV
> >> VDe
> >>> +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> >>> +Reg32|Reg64, Reg32|Reg64 }
> >>>  adcx, 0x660f38f6, ADX,
> >>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
> >>> Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adcx, 0x6666,
> >>> ADX|APX_F,
> >>>
> >>
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128
> |E
> >> Vex
> >>> Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adcx,
> >>> 0x6666, ADX|APX_F,
> >>>
> >>
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVV
> D
> >> est|
> >>> EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> Reg32|Reg64,
> >>> Reg32|Reg64 }
> >>> +adox, 0xf366, ADX|APX_F,
> >>>
> >>
> +C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexV
> VV
> >> VDe
> >>> +st|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> >>> +Reg32|Reg64, Reg32|Reg64 }
> >>>  adox, 0xf30f38f6, ADX,
> >>> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, {
> >>> Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }  adox, 0xf366,
> >>> ADX|APX_F,
> >>>
> >>
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128
> |E
> >> Vex
> >>> Map4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adox,
> >>> 0xf366, ADX|APX_F,
> >>>
> >>
> Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVV
> D
> >> est|
> >>> EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex,
> Reg32|Reg64,
> >>> Reg32|Reg64 }
> >>
> >> What is this part about? I thought we agreed that ADCX/ADOX aren't
> >> suitable for
> >> NDD->REX2 optimization, at which point the ordering of templates here
> >> NDD->could as
> >> well be left alone.
> >>
> >       62 54 bd 18 66 c7       adcx   %r15,%r8,%r8
> >       66 4d 0f  38 f6  c7       adcx   %r15,%r8
> >
> > The code can optimize adcx/adox from NDD to legacy, currently. Do you
> mean we don't consider the optimization If the code length remain the same?
> 
> As said before, any optimization we do has to actually result in some sort of
> win. If neither performance nor code size change, there's no point in making
> any adjustments to what the user has written.
> 
> That said, though: Looks like there would still be a 1-byte win for 32-bit
> ADCX/ADOX with just the low 8 registers used as operands, i.e. when no REX
> prefix is needed in the encoding.
>

OK, I will try to optimize adcx/adox with the low 8 register.
 
>
> Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-15  9:34               ` Jan Beulich
@ 2023-11-17  7:24                 ` Hu, Lin1
  2023-11-17  9:47                   ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-17  7:24 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, November 15, 2023 5:35 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH][v3] Support APX NDD optimized encoding.
> 
> On 15.11.2023 03:59, Hu, Lin1 wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Optimize APX NDD insns to legacy insns.  */ static bool
> > +convert_NDD_to_REX2 (const insn_template *t) {
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> > +      && t->opcode_space == SPACE_EVEXMAP4
> > +      && !i.has_nf
> > +      && i.reg_operands >= 2)
> > +    {
> > +      unsigned int readonly_var = ~0;
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = i.operands - 2;
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +	  && i.op[src1].regs == i.op[dest].regs)
> > +	readonly_var = src2;
> > +      /* adcx, adox and imul can't support to swap the source operands.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +	       && i.op[src2].regs == i.op[dest].regs
> > +	       && optimize > 1
> > +	       && t->opcode_modifier.commutative)
> 
> Comment and code still aren't in line: "support to swap the source
> operands"
> really is the D attribute in the opcode table, whereas
> t->opcode_modifier.commutative is related to the C attribute (and all
> t->three
> insns named really are commutative). It looks to me that the code is correct,
> so it would then be the comment that may need updating. But it may also
> be better to additionally check .d here (making the code robust against C
> being added to the truly commutative yet not eligible to be optimized insns).
> In which case the comment might say "adcx, adox, and imul, while
> commutative, don't support to swap the source operands".
>

I think we don't need to worry about it for now, because we've constrained the function with vexvvvvvvdest, and these instructions must be NDD instructions. And adcx, adox and imul don't have D attribute. If I add check .d here, I will need to exclude them. The code will back, which we had initially hoped to avoid by using C.

>
> > +	readonly_var = src1;
> > +      if (readonly_var != (unsigned int) ~0)
> > +	{
> > +	  if (readonly_var != src2)
> > +	    swap_2_operands (readonly_var, src2);
> > +
> > +	  --i.operands;
> > +	  --i.reg_operands;
> > +
> > +	  return true;
> > +	}
> > +    }
> > +  return false;
> > +}
> > +
> >  /* Helper function for the progress() macro in match_template().  */
> > static INLINE enum i386_error progress (enum i386_error new,
> >  					enum i386_error last,
> > @@ -7728,6 +7765,21 @@ match_template (char mnem_suffix)
> >  	  i.memshift = memshift;
> >  	}
> >
> > +      /* If we can optimize a NDD insn to non-NDD insn, like
> 
> The terminology here wants to match the function name below, i.e. (as
> indicated elsewhere for the name, in reply to your question) "legacy"
> instead of "non-NDD" (assuming the function name is changed as well, in
> line with that).
>

OK.

> 
> > +	 add %r16, %r8, %r8 -> add %r16, %r8,
> > +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +	 Note that the semantics have not been changed.  */
> > +      if (optimize
> > +	  && !i.no_optimize
> > +	  && i.vec_encoding != vex_encoding_evex
> > +	  && t + 1 < current_templates->end
> > +	  && !t[1].opcode_modifier.evex
> 
> This is more fragile than it needs to be; it would imo be better to indeed go
> from opcode space of the supposed alternative encoding. Perhaps that's
> going to mean checking both.
>

Based on our previous discussion, I modified tc-i386.c as follows

+/* Check if the instruction use the REX registers.  */
+static bool
+check_RexOperands (const insn_template *t)
+{
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg
+         /* Special case for (%dx) while doing input/output op */
+         || i.input_output_operand)
+       continue;
+
+      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
+       return true;
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
+      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
+    return true;
+
+  /* Check pseudo prefix {rex} are valid.  */
+  if (i.rex_encoding)
+    return true;
+  return false;
+}
+
+/* Optimize APX NDD insns to legacy insns.  */
+static unsigned int
+convert_NDD_to_legacy (const insn_template *t)
+{
+  unsigned int readonly_var = ~0;
+
+  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
+      && t->opcode_space == SPACE_EVEXMAP4
+      && !i.has_nf
+      && i.reg_operands >= 2)
+    {
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = i.operands - 2;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+         && i.op[src1].regs == i.op[dest].regs)
+       readonly_var = src2;
+      /* adcx, adox, and imul, while commutative, don't support to swap
+        the source operands.  */
+      else if (i.types[src2].bitfield.class == Reg
+              && i.op[src2].regs == i.op[dest].regs
+              && optimize > 1
+              && t->opcode_modifier.commutative)
+       readonly_var = src1;
+    }
+  return readonly_var;
+}
+

@@ -7728,6 +7782,55 @@ match_template (char mnem_suffix)
          i.memshift = memshift;
        }

+      /* If we can optimize a NDD insn to legacy insn, like
+        add %r16, %r8, %r8 -> add %r16, %r8,
+        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
+        Note that the semantics have not been changed.  */
+      if (optimize
+         && !i.no_optimize
+         && i.vec_encoding != vex_encoding_evex
+         && t + 1 < current_templates->end
+         && !t[1].opcode_modifier.evex
+         && t[1].opcode_space <= SPACE_0F38)
+       {
+         unsigned int readonly_var = convert_NDD_to_legacy (t);
+         size_match = true;
+
+         if (readonly_var != (unsigned int) ~0)
+           {
+             for (j = 0; j < i.operands - 2; j++)
+               {
+                 check_register = j;
+                 if (t->opcode_modifier.d)
+                   check_register ^= 1;
+                 overlap0 = operand_type_and (i.types[check_register],
+                                              t[1].operand_types[check_register]);
+                 if (!operand_type_match (overlap0, i.types[check_register]))
+                   size_match = false;
+               }
+
+             if (size_match
+                 && (t[1].opcode_space <= SPACE_0F
+                     || (!check_EgprOperands (t + 1)	 // These conditions are exclude adcx/adox with inappropriate registers.
+                         && !check_RexOperands (t + 1)
+                         && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))
+               {
+                 unsigned int src1 = i.operands - 2;
+                 unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+                 if (readonly_var != src2)
+                   swap_2_operands (readonly_var, src2);
+
+                 --i.operands;
+                 --i.reg_operands;
+
+                 specific_error = progress (internal_error);
+                 continue;
+               }
+
+           }
+       }
+
       /* We've found a match; break out of loop.  */
       break;

What's your opinion?

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-17  7:24                 ` Hu, Lin1
@ 2023-11-17  9:47                   ` Jan Beulich
  2023-11-20  3:28                     ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-17  9:47 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils

On 17.11.2023 08:24, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, November 15, 2023 5:35 PM
>>
>> On 15.11.2023 03:59, Hu, Lin1 wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Optimize APX NDD insns to legacy insns.  */ static bool
>>> +convert_NDD_to_REX2 (const insn_template *t) {
>>> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
>>> +      && t->opcode_space == SPACE_EVEXMAP4
>>> +      && !i.has_nf
>>> +      && i.reg_operands >= 2)
>>> +    {
>>> +      unsigned int readonly_var = ~0;
>>> +      unsigned int dest = i.operands - 1;
>>> +      unsigned int src1 = i.operands - 2;
>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
>>> +
>>> +      if (i.types[src1].bitfield.class == Reg
>>> +	  && i.op[src1].regs == i.op[dest].regs)
>>> +	readonly_var = src2;
>>> +      /* adcx, adox and imul can't support to swap the source operands.  */
>>> +      else if (i.types[src2].bitfield.class == Reg
>>> +	       && i.op[src2].regs == i.op[dest].regs
>>> +	       && optimize > 1
>>> +	       && t->opcode_modifier.commutative)
>>
>> Comment and code still aren't in line: "support to swap the source
>> operands"
>> really is the D attribute in the opcode table, whereas
>> t->opcode_modifier.commutative is related to the C attribute (and all
>> t->three
>> insns named really are commutative). It looks to me that the code is correct,
>> so it would then be the comment that may need updating. But it may also
>> be better to additionally check .d here (making the code robust against C
>> being added to the truly commutative yet not eligible to be optimized insns).
>> In which case the comment might say "adcx, adox, and imul, while
>> commutative, don't support to swap the source operands".
>>
> 
> I think we don't need to worry about it for now, because we've constrained the function with vexvvvvvvdest, and these instructions must be NDD instructions. And adcx, adox and imul don't have D attribute.

Right, and I thought to leverage this. IOW ...

> If I add check .d here, I will need to exclude them.

... I don't think I understand this.

> Based on our previous discussion, I modified tc-i386.c as follows
> 
> +/* Check if the instruction use the REX registers.  */
> +static bool
> +check_RexOperands (const insn_template *t)

I don't think I can spot a use of the parameter in the function.

> +{
> +  for (unsigned int op = 0; op < i.operands; op++)
> +    {
> +      if (i.types[op].bitfield.class != Reg
> +         /* Special case for (%dx) while doing input/output op */
> +         || i.input_output_operand)

Once again: Is this needed? Respective insns shouldn't even make it here.
Plus if they did, ...

> +       continue;
> +
> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> +       return true;

... the loop would continue for (%dx) kind operands anyway.

> +    }
> +
> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> +    return true;
> +
> +  /* Check pseudo prefix {rex} are valid.  */
> +  if (i.rex_encoding)
> +    return true;
> +  return false;

Just "return i.rex_encoding;"?

> +}
> +
> +/* Optimize APX NDD insns to legacy insns.  */
> +static unsigned int
> +convert_NDD_to_legacy (const insn_template *t)
> +{
> +  unsigned int readonly_var = ~0;

One issue I continue to have is the name of this variable. Good names
help understanding what code is doing. And in 3-operand NDD insns there
are uniformly 2 operands which are only read.

> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> +      && t->opcode_space == SPACE_EVEXMAP4
> +      && !i.has_nf
> +      && i.reg_operands >= 2)
> +    {
> +      unsigned int dest = i.operands - 1;
> +      unsigned int src1 = i.operands - 2;
> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.types[src1].bitfield.class == Reg
> +         && i.op[src1].regs == i.op[dest].regs)
> +       readonly_var = src2;
> +      /* adcx, adox, and imul, while commutative, don't support to swap
> +        the source operands.  */
> +      else if (i.types[src2].bitfield.class == Reg
> +              && i.op[src2].regs == i.op[dest].regs
> +              && optimize > 1
> +              && t->opcode_modifier.commutative)
> +       readonly_var = src1;
> +    }
> +  return readonly_var;
> +}

You're no longer converting anything in this function, which - I'm sorry to
say that - once again makes its name unsuitable.

> @@ -7728,6 +7782,55 @@ match_template (char mnem_suffix)
>           i.memshift = memshift;
>         }
> 
> +      /* If we can optimize a NDD insn to legacy insn, like
> +        add %r16, %r8, %r8 -> add %r16, %r8,
> +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> +        Note that the semantics have not been changed.  */
> +      if (optimize
> +         && !i.no_optimize
> +         && i.vec_encoding != vex_encoding_evex
> +         && t + 1 < current_templates->end
> +         && !t[1].opcode_modifier.evex
> +         && t[1].opcode_space <= SPACE_0F38)

In all of these checks what I'm missing is a check that we're actually
dealing with an NDD template.

> +       {
> +         unsigned int readonly_var = convert_NDD_to_legacy (t);
> +         size_match = true;
> +
> +         if (readonly_var != (unsigned int) ~0)
> +           {
> +             for (j = 0; j < i.operands - 2; j++)
> +               {
> +                 check_register = j;
> +                 if (t->opcode_modifier.d)
> +                   check_register ^= 1;
> +                 overlap0 = operand_type_and (i.types[check_register],
> +                                              t[1].operand_types[check_register]);
> +                 if (!operand_type_match (overlap0, i.types[check_register]))
> +                   size_match = false;
> +               }

I'm afraid that without a comment I don't understand what this is about.

> +             if (size_match
> +                 && (t[1].opcode_space <= SPACE_0F
> +                     || (!check_EgprOperands (t + 1)	 // These conditions are exclude adcx/adox with inappropriate registers.
> +                         && !check_RexOperands (t + 1)
> +                         && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))

Saying "inappropriate" in such a comment doesn't really help, as it's then
still unclear what is "appropriate". But the comment will need re-formatting
anyway.

> +               {
> +                 unsigned int src1 = i.operands - 2;

Looks like this variable is no longer used?

> +                 unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +                 if (readonly_var != src2)
> +                   swap_2_operands (readonly_var, src2);
> +
> +                 --i.operands;
> +                 --i.reg_operands;
> +
> +                 specific_error = progress (internal_error);
> +                 continue;
> +               }
> +
> +           }
> +       }
> +
>        /* We've found a match; break out of loop.  */
>        break;
> 
> What's your opinion?

I need some further clarification first, as per above. I also don't think I
can properly identify (yet) which parts of the code are solely related to
the ADCX/ADOX special case. The more code that's special for these, the more
I'd be inclined to ask that dealing with them be a separate patch, for us to
judge whether effort and effect are in reasonable balance.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-16 16:50       ` Jan Beulich
@ 2023-11-17 12:42         ` Cui, Lili
  2023-11-17 14:38           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-17 12:42 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils

> Subject: Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
> 
> On 16.11.2023 16:34, Cui, Lili wrote:
> >>> --- /dev/null
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> >>> @@ -0,0 +1,17 @@
> >>> +# Check illegal 64bit APX EVEX promoted instructions
> >>> +	.text
> >>> +	.arch .nomovbe
> >>> +	movbe (%r16), %r17
> >>> +	movbe (%rax), %rcx
> >>> +	.arch .noept
> >>> +	invept (%r16), %r17
> >>> +	invept (%rax), %rcx
> >>> +	.arch .noavx512bw
> >>> +	kmovq %k1, (%r16)
> >>> +	kmovq %k1, (%r8)
> >>> +	.arch .noavx512dq
> >>> +	kmovb %k1, %r16d
> >>> +	kmovb %k1, %r8d
> >>> +	.arch .noavx512f
> >>> +	kmovw %k1, %r16d
> >>> +	kmovw %k1, %r8d
> >>
> >> What about BMI/BMI2 insns? Or AMX ones? (I surely missed further
> >> groups.)
> >
> > We don’t want to list all the instructions here, just a few representatives.
> 
> Sure. I'm asking for representatives from the BMI and BMI2 groups.
> 

Done.

> >>> --- /dev/null
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> >>> @@ -0,0 +1,29 @@
> > # Check Illegal prefix for 64bit EVEX-promoted instructions
> >
> >         .allow_index_reg
> >         .text
> > _start:
> >         #movbe %r18w,%ax set EVEX.pp = f3 (illegal value).
> >         .byte 0x62, 0xfc, 0x7e, 0x08, 0x60, 0xc2
> >         .byte 0xff, 0xff
> >         #movbe %r18w,%ax set EVEX.pp = f2 (illegal value).
> >         .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
> >         .byte 0xff, 0xff
> >         #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
> == 0
> >         #(illegal value).
> >         .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
> >         .byte 0xff
> >         #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
> >         .byte 0x62, 0xfd, 0x7d, 0x08, 0x60, 0xc2
> >         .byte 0xff, 0xff
> >         #EVEX_MAP4 movbe %r18w,%ax set EVEX.aa(P[17:16]) == b01 (illegal
> value).
> >         .byte 0x62, 0xfd, 0x7d, 0x09, 0x60, 0xc2
> >         .byte 0xff, 0xff
> >         #EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == b001 (illegal value).
> >         .byte 0x62, 0xfd, 0x7d, 0x28, 0x60, 0xc2
> >         .byte 0xff, 0xff
> >         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31
> EVEX.P[17:16](EVEX.aa) == 1 (illegal value).
> >         .byte 0x62, 0xda, 0x7c, 0x09, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
> >         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31
> EVEX.P[22:21](EVEX.L’L) == 1 (illegal value).
> >         .byte 0x62, 0xda, 0x7c, 0x28, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00
> >         #EVEX from VEX ldtilecfg 0x123(%r31,%rax,4),%r31 EVEX.P[20](EVEX.b)
> == 1 (illegal value).
> >         .byte 0x62, 0xda, 0x7c, 0x18, 0x49, 0x84, 0x87, 0x23, 0x01,
> > 0x00, 0x00
> >
> >> I suspect at least some of these can be expressed via .insn, which
> >> would greatly help readability (i.e. recognizing what is actually
> >> being done, and what's expected-wrong about it).
> >>
> >
> > Update test cases.
> > I try to express the first case using .insn. I can't find a way to express
> EVEX.P[3:2] == 11, do you have any ideas?
> >
> > 0x62, 0xfc  ---> EVEX.P[3:2] of normal EVEX must be 00.
> 
> There are terminology issues here again. The first case in the test talks about
> EVEX.pp set to the equivalent of an F3 prefix. That's neither encoded as 11,
> nor in EVEX.P[3:2] (I don't like the EVEX.P[] notation anyway), but in
> EVEX.P[9:8].
> 
> Irrespective, these are some examples of what I use to encode MOVBE (note
> that all of this Intel syntax and the comments are MASM-style):
> 
> 	.insn EVEX.L0.66.M12.W0 0x60, di, ax		; movbe di, r16w
> 	.insn EVEX.L0.66.M12.W0 0x60, di, [rax]		; movbe di, [r16]
> 	.insn EVEX.L0.M4 0x60, xmm16, rdi		; movbe r16, rdi
> 	.insn EVEX.L0.M4.W0 0x60, xmm16, [rdi]		; movbe r16d,
> [rdi]
> 	.insn EVEX.L0.66.M4.W0 0x61, [rdi], xmm16	; movbe [rdi], r16w
> 	.insn EVEX.L0.M4 0x61, xmm16, edi		; movbe edi, r16d
> 	.insn EVEX.L0.M12 0x61, [rax], rdi		; movbe [r16], rdi
> 
> Surely you can find variations to support the forms you're after. Plus if you
> think the .insn documentation is unclear, please point out what you think
> needs improving.
> 

Done.

After your example, I found that M can cover the 4 bits of 00mm in EVEX. I was confused that mm is only 2 bits but can express M0-M15, so I didn’t understand what M meant before. 

> >>> [...]
> >>> +	crc32	r22,r31
> >>> +	crc32	r22,QWORD PTR [r31]
> >>> +	crc32	r17,r19b
> >>> +	crc32	r21d,r19b
> >>> +	crc32	ebx,BYTE PTR [r19]
> >>> +	crc32	r23d,r31d
> >>> +	crc32	r23d,DWORD PTR [r31]
> >>> +	crc32	r21d,r31w
> >>> +	crc32	r21d,WORD PTR [r31]
> >>> +	crc32	r18,rax
> >>
> >> These could do with moving up, since otherwise things look to be
> >> sorted alphabetically here. But seeing these also reminds me that the
> >> noreg64 test also needs extending, to cover these new forms (handled
> >> by separate templates).
> >>
> >
> > I'm confused here about adding crc test case in noreg64.s, could you
> elaborate on what testcase you want to add?
> >
> >         pfx crc32       (%rax), %eax
> >         pfx16 crc32     (%rax), %rax
> > +       pfx crc32       (%r31),%r21d   ---> data size prefix invalid with `crc32'
> > +       pfx crc32       (%r31),%r21     ---> data size prefix invalid with `crc32'
> 
> Well, of course you can't use the "pfx" macro (at least not as is), which will
> emit a data size prefix when DATA16 is defined. Likewise it would emit "rex64"
> when REX64 is defined, which doesn't make sense with EVEX-encoded insns.
> Ideally you would introduce a new macro to control operand size in an EVEX-
> like manner, just that I'm afraid that the way you're adding EVEX- encoding
> support to gas doesn't offer any means equivalent to that of legacy
> encodings. Hence only the "bare" EVEX-encoded insns (without the use of
> any
> pfx*) should be added for the time being.
> 
> Also, ftaod, CRC32 was only an example here. Any new template you add
> which allows for potentially ambiguous operand size will need an example
> added here. This set of tests (noreg64*) is intended to be (and remain)
> exhaustive.
> 
> Albeit, thinking a little further, perhaps you simply want to introduce a
> noreg64-evex.d referencing the same source file, but arranging for {evex} to
> be emitted in the pfx macro (or a further clone thereof, as some of the insns
> cannot be EVEX-encoded)? That would then also deal with covering all
> relevant new templates (I think). You'd need to check what, if anything, needs
> doing to the pfx16 and pfx64 macros. But of course you could also introduce a
> fully standalone noreg64-apx.{s,d} test, to escape some of the possible
> hassles.
> 

I listed some tests, most of EVEX-promoted instructions support prefix 66, we included all of these test cases in Part II 1/6 (except for crc32 which is already listed in the current file). Part II 1/6 it is suspended, because it also covers the NF patch instructions.

        /* Set EVEX.pp to 66.  */
        crc32  %r31w,%r21d
        crc32w (%r31),%r21d
        adc $1, (%r31w)
        adcw $1, (%r31)

        /* Set EVEX.W to 1.  */
        crc32  %rax,%r18
        adc %r15,%r16

Thanks.
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-17 12:42         ` Cui, Lili
@ 2023-11-17 14:38           ` Jan Beulich
  2023-11-22 13:40             ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-17 14:38 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 17.11.2023 13:42, Cui, Lili wrote:
>> Subject: Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
>>
>> On 16.11.2023 16:34, Cui, Lili wrote:
>>> I'm confused here about adding crc test case in noreg64.s, could you
>> elaborate on what testcase you want to add?
>>>
>>>         pfx crc32       (%rax), %eax
>>>         pfx16 crc32     (%rax), %rax
>>> +       pfx crc32       (%r31),%r21d   ---> data size prefix invalid with `crc32'
>>> +       pfx crc32       (%r31),%r21     ---> data size prefix invalid with `crc32'
>>
>> Well, of course you can't use the "pfx" macro (at least not as is), which will
>> emit a data size prefix when DATA16 is defined. Likewise it would emit "rex64"
>> when REX64 is defined, which doesn't make sense with EVEX-encoded insns.
>> Ideally you would introduce a new macro to control operand size in an EVEX-
>> like manner, just that I'm afraid that the way you're adding EVEX- encoding
>> support to gas doesn't offer any means equivalent to that of legacy
>> encodings. Hence only the "bare" EVEX-encoded insns (without the use of
>> any
>> pfx*) should be added for the time being.
>>
>> Also, ftaod, CRC32 was only an example here. Any new template you add
>> which allows for potentially ambiguous operand size will need an example
>> added here. This set of tests (noreg64*) is intended to be (and remain)
>> exhaustive.
>>
>> Albeit, thinking a little further, perhaps you simply want to introduce a
>> noreg64-evex.d referencing the same source file, but arranging for {evex} to
>> be emitted in the pfx macro (or a further clone thereof, as some of the insns
>> cannot be EVEX-encoded)? That would then also deal with covering all
>> relevant new templates (I think). You'd need to check what, if anything, needs
>> doing to the pfx16 and pfx64 macros. But of course you could also introduce a
>> fully standalone noreg64-apx.{s,d} test, to escape some of the possible
>> hassles.
>>
> 
> I listed some tests, most of EVEX-promoted instructions support prefix 66, we included all of these test cases in Part II 1/6 (except for crc32 which is already listed in the current file). Part II 1/6 it is suspended, because it also covers the NF patch instructions.
> 
>         /* Set EVEX.pp to 66.  */
>         crc32  %r31w,%r21d
>         crc32w (%r31),%r21d
>         adc $1, (%r31w)

This one ought to be a mistake.

>         adcw $1, (%r31)
> 
>         /* Set EVEX.W to 1.  */
>         crc32  %rax,%r18
>         adc %r15,%r16

Of the above most aren't ambiguous as to operand size. The purpose of the
test (or group of tests) is not so much to check correct encoding (except
of course to prove correct [aka intended] choice of defaults), but to
check that all ambiguities are properly detected and reported (with the
exception of a few where H.J. is of the opinion that they shouldn't be
diagnosed in AT&T mode, even if that lack of diagnostics had - back at
the time - allowed for a gcc bug to go unnoticed for quite some time).

Therefore if e.g. "data16" cannot be used with an insn (as is the case
for EVEX-encoded ones), there's also no need to have special checking
for the EVEX.pp=01 case. Thus my suggestion to simply arrange for the
pfx macro to emit {evex} prefixes (or to clone the test in order to
escape issues with insns which don't allow for EVEX encodings).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-08 10:39   ` Jan Beulich
@ 2023-11-20  1:19     ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-20  1:19 UTC (permalink / raw)
  To: Beulich, Jan, Kong, Lingling; +Cc: Lu, Hongjiu, ccoutant, binutils

> > 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >
> > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index
> > 398909a6a30..5b925505435 100644
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -2317,8 +2317,10 @@ operand_size_match (const insn_template *t)
> >        unsigned int given = i.operands - j - 1;
> >
> >        /* For FMA4 and XOP insns VEX.W controls just the first two
> > -	 register operands.  */
> > -      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> > +	 register operands. And APX_F insns just swap the two source
> operands,
> > +	 with the 3rd one being the destination.  */
> > +      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> > +	  || is_cpu (t,CpuAPX_F))
> >  	given = j < 2 ? 1 - j : j;
> 
> Nit: Please retain consistency wrt style (here: missing blank after comma in
> the addition). Feels like I said so before.
> 

Done.

> > @@ -3959,6 +3961,7 @@ static INLINE bool  is_any_apx_evex_encoding
> > (void)  {
> >    return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || i.rex2_encoding
> >      || (i.vex.register_specifier
> >  	&& i.vex.register_specifier->reg_flags & RegRex2);  }
> 
> See my comment on an earlier patch regarding the use of i.rex2 here. But I
> doubt this is correct in the first place: Why would {rex2} cause EVEX encoding
> to be picked?
> 

Deleted.

> > @@ -7481,26 +7484,33 @@ match_template (char mnem_suffix)
> >  	  overlap1 = operand_type_and (operand_types[0],
> operand_types[1]);
> >  	  if (t->opcode_modifier.d && i.reg_operands == i.operands
> >  	      && !operand_type_all_zero (&overlap1))
> > -	    switch (i.dir_encoding)
> > -	      {
> > -	      case dir_encoding_load:
> > -		if (operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    || t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +	    {
> >
> > -	      case dir_encoding_store:
> > -		if (!operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    && !t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +	      int MemOperand = i.operands - 1 -
> > +		(t->opcode_space == SPACE_EVEXMAP4
> > +		 && t->opcode_modifier.vexvvvv);
> 
> Nit: I don't think local variables should start with a capital letter. I wonder
> anyway - can't you just re-use e.g. j here? That would then also avoid the
> misleading name: Right here you don't know yet whether that's the
> "memory" operand. Finally, if you set the variable ahead if the enclosing if(),
> you could (I think) avoid all this re-indentation, making the change quite a bit
> easier to review (i.e. to see what really changes).
> 

Good idea. Changed.

> > @@ -7530,11 +7540,13 @@ match_template (char mnem_suffix)
> >  		continue;
> >  	      /* Try reversing direction of operands.  */
> >  	      j = is_cpu (t, CpuFMA4)
> > -		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> > +		  || is_cpu (t, CpuXOP)
> > +		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
> >  	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
> >  	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
> >  	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
> > -	      gas_assert (t->operands != 3 || !check_register);
> > +	      gas_assert (t->operands != 3 || !check_register
> > +			  || is_cpu (t,CpuAPX_F));
> 
> Nit: Missing blank again. And again in the next hunk. I won't comment on
> such any further, expecting you to go through globally.
>
 
Done.

> > @@ -8588,11 +8609,10 @@ process_operands (void)
> >    const reg_entry *default_seg = NULL;
> >
> >    /* We only need to check those implicit registers for instructions
> > -     with 3 operands or less.  */
> > -  if (i.operands <= 3)
> > -    for (unsigned int j = 0; j < i.operands; j++)
> > -      if (i.types[j].bitfield.instance != InstanceNone)
> > -	i.reg_operands--;
> > +     with 4 operands or less.  */
> > +  for (unsigned int j = 0; j < i.operands; j++)
> > +    if (i.types[j].bitfield.instance != InstanceNone)
> > +      i.reg_operands--;
> 
> While you made the requested code adjustment, adjusting the comment
> renders it stale now. It needs dropping instead, as it did only explain the if(),
> not the for().
> 

Done.

> > @@ -8946,26 +8966,35 @@ build_modrm_byte (void)
> >  				     || i.vec_encoding == vex_encoding_evex));
> >      }
> >
> > -  for (v = source + 1; v < dest; ++v)
> > -    if (v != reg_slot)
> > -      break;
> > -  if (v >= dest)
> > -    v = ~0;
> > -  if (i.tm.extension_opcode != None)
> > +  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
> >      {
> > -      if (dest != source)
> > -	v = dest;
> > -      dest = ~0;
> > +      v = dest;
> > +      dest-- ;
> >      }
> > -  gas_assert (source < dest);
> > -  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> > -      && source != op)
> > +  else if (i.tm.opcode_modifier.vexvvvv == VexVVVV_SRC)
> >      {
> > -      unsigned int tmp = source;
> > +      v = source + 1;
> > +      for (v = source + 1; v < dest; ++v)
> > +	if (v != reg_slot)
> > +	  break;
> > +      if (i.tm.extension_opcode != None)
> > +	{
> > +	  if (dest != source)
> > +	    v = dest;
> > +	  dest = ~0;
> > +	}
> > +      gas_assert (source < dest);
> > +      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> > +	  && source != op)
> > +	{
> > +	  unsigned int tmp = source;
> >
> > -      source = v;
> > -      v = tmp;
> > +	  source = v;
> > +	  v = tmp;
> > +	}
> >      }
> > +  else
> > +    v = ~0;
> >
> >    if (v < MAX_OPERANDS)
> >      {
> 
> I'm having trouble following this change. The VexVVVV-is-source case
> shouldn't change at all, I'd expect. This would ideally be easily visible from
> the change done. Yet if looking a little more closely I can e.g. spot a stray "v =
> source + 1;" which wasn't there before. And there also look to be things being
> dropped. Such a change (again) wants doing such that it is easy to see what
> changes. If need be by making a mechanical prereq change doing just re-
> indentation, but nothing else.
> It feels though as if there is too much changing here anyway: The difference
> for VexVVVV-is-dest is that you need to consume the destination early. If you
> do so, then the rest should be possible to keep as is: You'll subsequently deal
> with just the normal ModR/M, as if without a VexVVVV-encoded operand.
> 

Changed to add these codes before the old code.

  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
    {
      v = dest;
      dest-- ;
    }
  else
    {
...
    }

> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -10,7 +10,7 @@ _start:
> >          .byte 0x62, 0xfc, 0x7f, 0x08, 0x60, 0xc2
> >          .byte 0xff, 0xff
> >          #VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set
> EVEX.P[10] == 0
> > -	#(illegal value).
> > +        #(illegal value).
> >          .byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
> >          .byte 0xff
> >          #EVEX_MAP4 movbe %r18w,%ax set EVEX.mm == b01 (illegal value).
> 
> ???
>
> > @@ -27,3 +27,6 @@ _start:
> >          .byte 0xff
> >          #EVEX from VEX enqcmd 0x123(%r31,%rax,4),%r31 EVEX.P[23:22] == 1
> (illegal value).
> >          .byte 0x62, 0x4c, 0x7f, 0x28, 0xf8, 0xbc, 0x87, 0x23, 0x01,
> > 0x00, 0x00
> > +        .byte 0xff
> 
> This then similarly looks to belong into the earlier patch, suggesting that it
> had #pass too early.
> 
Changed.

> > +        #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
> > +        .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
> 
> Again I suspect this can be expressed via .insn, thus ending up more clear.
> I can only stress again that one of the reasons of introducing .insn was to
> reduce the number of such entirely unreadable byte sequences.

Done.

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> > @@ -0,0 +1,154 @@
> > +# Check 64bit APX NDD instructions with evex prefix encoding
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +inc    %rax,%rbx
> 
> Please can instructions be indented by a tab?
> 

Done.

> > +inc    %r31,%r8
> > +inc    %r31,%r16
> > +add    %r31b,%r8b,%r16b
> > +addb    %r31b,%r8b,%r16b
> > +add    %r31,%r8,%r16
> > +addq    %r31,%r8,%r16
> > +add    %r31d,%r8d,%r16d
> > +addl    %r31d,%r8d,%r16d
> > +add    %r31w,%r8w,%r16w
> > +addw    %r31w,%r8w,%r16w
> > +{store} add    %r31,%r8,%r16
> > +{load}  add    %r31,%r8,%r16
> > +add    %r31,(%r8),%r16
> > +add    (%r31),%r8,%r16
> > +add    0x9090(%r31,%r16,1),%r8,%r16
> > +add    %r31,(%r8,%r16,8),%r16
> > +add    $0x34,%r13b,%r17b
> > +addl   $0x11,(%r19,%rax,4),%r20d
> > +add    $0x1234,%ax,%r30w
> > +add    $0x12344433,%r15,%r16
> > +addq   $0x12344433,(%r15,%rcx,4),%r16
> > +add    $0xfffffffff4332211,%rax,%r8
> > +dec    %rax,%r17
> > +decb   (%r31,%r12,1),%r8b
> > +not    %rax,%r17
> > +notb   (%r31,%r12,1),%r8b
> > +neg    %rax,%r17
> > +negb   (%r31,%r12,1),%r8b
> > +sub    %r15b,%r17b,%r18b
> > +sub    %r15d,(%r8),%r18d
> > +sub    (%r15,%rax,1),%r16b,%r8b
> > +sub    (%r15,%rax,1),%r16w,%r8w
> > +subl   $0x11,(%r19,%rax,4),%r20d
> > +sub    $0x1234,%ax,%r30w
> > +sbb    %r15b,%r17b,%r18b
> > +sbb    %r15d,(%r8),%r18d
> > +sbb    (%r15,%rax,1),%r16b,%r8b
> > +sbb    (%r15,%rax,1),%r16w,%r8w
> > +sbbl   $0x11,(%r19,%rax,4),%r20d
> > +sbb    $0x1234,%ax,%r30w
> > +adc    %r15b,%r17b,%r18b
> > +adc    %r15d,(%r8),%r18d
> > +adc    (%r15,%rax,1),%r16b,%r8b
> > +adc    (%r15,%rax,1),%r16w,%r8w
> > +adcl   $0x11,(%r19,%rax,4),%r20d
> > +adc    $0x1234,%ax,%r30w
> > +or     %r15b,%r17b,%r18b
> > +or     %r15d,(%r8),%r18d
> > +or     (%r15,%rax,1),%r16b,%r8b
> > +or     (%r15,%rax,1),%r16w,%r8w
> > +orl    $0x11,(%r19,%rax,4),%r20d
> > +or     $0x1234,%ax,%r30w
> > +xor    %r15b,%r17b,%r18b
> > +xor    %r15d,(%r8),%r18d
> > +xor    (%r15,%rax,1),%r16b,%r8b
> > +xor    (%r15,%rax,1),%r16w,%r8w
> > +xorl   $0x11,(%r19,%rax,4),%r20d
> > +xor    $0x1234,%ax,%r30w
> > +and    %r15b,%r17b,%r18b
> > +and    %r15d,(%r8),%r18d
> > +and    (%r15,%rax,1),%r16b,%r8b
> > +and    (%r15,%rax,1),%r16w,%r8w
> > +andl   $0x11,(%r19,%rax,4),%r20d
> > +and    $0x1234,%ax,%r30w
> > +rorb   (%rax),%r31b
> 
> While there's a doc problem here as well, the question here and there is the
> same: Which form of ROR does this represent, when the shift count isn't
> specified explicitly? It could be %cl or $1. I would strongly advise against
> introducing further ambiguous instruction patterns like this, and instead
> demand that both %cl and $1 be always named explicitly as operands.
> 

Done.  GCC also changed.

> > +ror    $0x2,%r12b,%r31b
> > +rorl   $0x2,(%rax),%r31d
> > +rorw   (%rax),%r31w
> > +ror    %cl,%r16b,%r8b
> > +rorw   %cl,(%r19,%rax,4),%r31w
> > +rolb   (%rax),%r31b
> > +rol    $0x2,%r12b,%r31b
> > +roll   $0x2,(%rax),%r31d
> > +rolw   (%rax),%r31w
> > +rol    %cl,%r16b,%r8b
> > +rolw   %cl,(%r19,%rax,4),%r31w
> > +rcrb   (%rax),%r31b
> > +rcr    $0x2,%r12b,%r31b
> > +rcrl   $0x2,(%rax),%r31d
> > +rcrw   (%rax),%r31w
> > +rcr    %cl,%r16b,%r8b
> > +rcrw   %cl,(%r19,%rax,4),%r31w
> > +rclb   (%rax),%r31b
> > +rcl    $0x2,%r12b,%r31b
> > +rcll   $0x2,(%rax),%r31d
> > +rclw   (%rax),%r31w
> > +rcl    %cl,%r16b,%r8b
> > +rclw   %cl,(%r19,%rax,4),%r31w
> > +shlb   (%rax),%r31b
> > +shl    $0x2,%r12b,%r31b
> > +shll   $0x2,(%rax),%r31d
> > +shlw   (%rax),%r31w
> > +shl    %cl,%r16b,%r8b
> > +shlw   %cl,(%r19,%rax,4),%r31w
> > +sarb   (%rax),%r31b
> > +sar    $0x2,%r12b,%r31b
> > +sarl   $0x2,(%rax),%r31d
> > +sarw   (%rax),%r31w
> > +sar    %cl,%r16b,%r8b
> > +sarw   %cl,(%r19,%rax,4),%r31w
> > +shlb   (%rax),%r31b
> > +shl    $0x2,%r12b,%r31b
> > +shll   $0x2,(%rax),%r31d
> > +shlw   (%rax),%r31w
> > +shl    %cl,%r16b,%r8b
> > +shlw   %cl,(%r19,%rax,4),%r31w
> > +shrb   (%rax),%r31b
> > +shr    $0x2,%r12b,%r31b
> > +shrl   $0x2,(%rax),%r31d
> > +shrw   (%rax),%r31w
> > +shr    %cl,%r16b,%r8b
> > +shrw   %cl,(%r19,%rax,4),%r31w
> > +shld   $0x1,%r12,(%rax),%r31
> > +shld   $0x2,%r8w,%r12w,%r31w
> > +shld   $0x2,%r15d,(%rax),%r31d
> > +shld   %cl,%r9w,(%rax),%r31w
> > +shld   %cl,%r12,%r16,%r8
> > +shld   %cl,%r13w,(%r19,%rax,4),%r31w
> > +shrd   $0x1,%r12,(%rax),%r31
> > +shrd   $0x2,%r8w,%r12w,%r31w
> > +shrd   $0x2,%r15d,(%rax),%r31d
> > +shrd   %cl,%r9w,(%rax),%r31w
> > +shrd   %cl,%r12,%r16,%r8
> > +shrd   %cl,%r13w,(%r19,%rax,4),%r31w
> > +adcx   %r15d,%r8d,%r18d
> > +adcx   (%r15,%r31,1),%r8d,%r18d
> > +adcx   (%r15,%r31,1),%r8
> > +adox   %r15d,%r8d,%r18d
> > +adox   (%r15,%r31,1),%r8d,%r18d
> > +adox   (%r15,%r31,1),%r8
> > +cmovo  0x90909090(%eax),%edx,%r8d
> > +cmovno 0x90909090(%eax),%edx,%r8d
> > +cmovb  0x90909090(%eax),%edx,%r8d
> > +cmovae 0x90909090(%eax),%edx,%r8d
> > +cmove  0x90909090(%eax),%edx,%r8d
> > +cmovne 0x90909090(%eax),%edx,%r8d
> > +cmovbe 0x90909090(%eax),%edx,%r8d
> > +cmova  0x90909090(%eax),%edx,%r8d
> > +cmovs  0x90909090(%eax),%edx,%r8d
> > +cmovns 0x90909090(%eax),%edx,%r8d
> > +cmovp  0x90909090(%eax),%edx,%r8d
> > +cmovnp 0x90909090(%eax),%edx,%r8d
> > +cmovl  0x90909090(%eax),%edx,%r8d
> > +cmovge 0x90909090(%eax),%edx,%r8d
> > +cmovle 0x90909090(%eax),%edx,%r8d
> > +cmovg  0x90909090(%eax),%edx,%r8d
> > +imul   0x90909(%eax),%edx,%r8d
> > +imul   0x909(%rax,%r31,8),%rdx,%r25
> 
> Overall there's also again the sorting criteria question: Without any sorting,
> how is one to (easily) check full coverage?
> 

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -473,6 +473,7 @@ static bitfield opcode_modifiers[] =
> >    BITFIELD (IntelSyntax),
> >    BITFIELD (ISA64),
> >    BITFIELD (NoEgpr),
> > +  BITFIELD (NF),
> >  };
> 
> This wants introducing earlier, when the BMI / BMI2 templates are touched
> anyway. As said before, it would be very nice if within such a series the same
> places wouldn't needlessly be touched more than once. You don't implement
> parsing of NF here anyway, so how much in advance the attribute is added to
> the opcode table is really irrelevant, and should hence be done as
> conventiently as possible.
> 

Done.

> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -636,7 +636,10 @@ enum
> >    /* How to encode VEX.vvvv:
> >       0: VEX.vvvv must be 1111b.
> >       1: VEX.vvvv encodes one of the register operands.
> > +     2: VEX.vvvv encodes as the dest register operands.
> >     */
> > +#define VexVVVV_SRC   1
> > +#define VexVVVV_DST   2
> >    VexVVVV,
> 
> Nit: Singular in the new comment line, please (plus perhaps drop "as").
> And "source" wants adding to the earlier comment line.
> 

Done.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -138,6 +138,9 @@
> >  #define Vsz512 Vsz=VSZ512
> >
> >  #define APX_F APX_F|x64
> > +#define VexVVVVSrc  VexVVVV=VexVVVV_SRC #define VexVVVVDest
> > +VexVVVV=VexVVVV_DST
> 
> I don't this we need the former. Continuing to use just VexVVVV there is going
> to be quite fine, and easier to read. Plus, I'm sorry to say it this bluntly, it is
> entirely inappropriate to bloat an already large patch by mechanically
> replacing VexVVVV by this new alias. If such a change was really wanted, it
> would need separating out as an entirely mechanical one.
> 

Removed VexVVVV_SRC.

> For the latter, to help readability, how about DstVVVV?
> 

Done.

> > +
> >
> >  // The EVEX purpose of StaticRounding appears only together with SAE.
> > Re-use  // the bit to mark commutative VEX encodings where swapping
> > the source
> 
> Please don't add stray (especially double) blank lines. Instead a blank line
> would be wanted _ahead_ of the addition.
> 

Done.

> > @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64,
> > D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De  mov,
> 0xf21,
> > x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug,
> Reg64
> > }  mov, 0xf24, i386|No64,
> > D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test,
> Reg32 }
> >
> > +// Move after swapping the bytes
> > +movbe, 0x0f38f0, Movbe,
> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> >  // Move after swapping the bytes
> >  movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf,
> {
> > Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> movbe,
> > 0x60, Movbe|APX_F,
> > D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, {
> > Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> 
> What is this addition about? Please can you look at your own patches before
> sending them out?
> 

Done.

> > @@ -290,22 +295,36 @@ add, 0x0, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
> { Reg8|Reg16|Reg3
> > add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
> 0x4,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> > +add, 0x0, APX_F,
> >
> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
> p4|NF, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64 } add, 0x83/0, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
> Map4|N
> > +F, { Imm8S,
> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 } add, 0x80/0, APX_F,
> >
> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4
> |NF, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64}
> 
> Earlier review comments were not addressed. I'll stop here, as far as this file is
> concerned, expecting a new version to be sent addressing earlier comments
> and having been sanity checked and having the bogus VexVVVV renaming
> dropped.
> 

 Ok.

Thanks,
Lili.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
@ 2023-11-20  1:33     ` Cui, Lili
  2023-11-20  8:19       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-20  1:33 UTC (permalink / raw)
  To: Beulich, Jan, Kong, Lingling; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, November 9, 2023 5:37 PM
> To: Cui, Lili <lili.cui@intel.com>; Kong, Lingling <lingling.kong@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 5/8] Support APX NDD
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64,
> > D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De  mov,
> 0xf21,
> > x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug,
> Reg64
> > }  mov, 0xf24, i386|No64,
> > D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test,
> Reg32 }
> >
> > +// Move after swapping the bytes
> > +movbe, 0x0f38f0, Movbe,
> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> >  // Move after swapping the bytes
> >  movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf,
> {
> > Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> movbe,
> > 0x60, Movbe|APX_F,
> > D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, {
> > Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } @@
> > -290,22 +295,36 @@ add, 0x0, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
> { Reg8|Reg16|Reg3
> > add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
> 0x4,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> > +add, 0x0, APX_F,
> >
> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
> p4|NF, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64 } add, 0x83/0, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
> Map4|N
> > +F, { Imm8S,
> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 } add, 0x80/0, APX_F,
> >
> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4
> |NF, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64}
> 
> Why are there (just as an example) only 3 new forms of ADD, but ...
> 
> > @@ -338,10 +366,19 @@ adc, 0x10, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
> { Reg8|Reg16|Reg
> > adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  adc,
> 0x14,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> > +adc, 0x10, APX_F,
> > +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex }
> > +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
> { Imm8S,
> > +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } adc,
> > +0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex }
> > +adc, 0x10, APX_F,
> >
> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
> p4, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64 } adc, 0x83/2, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
> Map4,
> > +{ Imm8S,
> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 } adc, 0x80/2, APX_F,
> >
> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4,
> {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> > +Reg8|Reg16|Reg32|Reg64 }
> 
> .... 6 new forms of ADC? My guess is that's NF-related, but doesn't that mean
> that until NF support is added e.g. "{evex} add %eax, %eax" won't assemble
> as intended? IOW in line with NF as an attribute being added right here, those
> other templates also will want introducing right here.
> 
> While looking at patch 7 I'm also wondering whether same-base-opcode
> forms wouldn't better be kept together. I haven't finished yet looking at
> what's needed there, but even if it doesn't turn out strictly necessary there it
> may still be a good idea anyway (unless of course that would get in the way of
> anything).
> 

For ADC, there are 3 templates that do not support NDD and NF, I moved them here from patch 3/8 which we discussed before since its decoding is supported together in the NDD patch.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-17  9:47                   ` Jan Beulich
@ 2023-11-20  3:28                     ` Hu, Lin1
  2023-11-20  8:34                       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-20  3:28 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 17, 2023 5:48 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH][v3] Support APX NDD optimized encoding.
> 
> On 17.11.2023 08:24, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, November 15, 2023 5:35 PM
> >>
> >> On 15.11.2023 03:59, Hu, Lin1 wrote:
> >>> --- a/gas/config/tc-i386.c
> >>> +++ b/gas/config/tc-i386.c
> >>> @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
> >>>    return 0;
> >>>  }
> >>>
> >>> +/* Optimize APX NDD insns to legacy insns.  */ static bool
> >>> +convert_NDD_to_REX2 (const insn_template *t) {
> >>> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> >>> +      && t->opcode_space == SPACE_EVEXMAP4
> >>> +      && !i.has_nf
> >>> +      && i.reg_operands >= 2)
> >>> +    {
> >>> +      unsigned int readonly_var = ~0;
> >>> +      unsigned int dest = i.operands - 1;
> >>> +      unsigned int src1 = i.operands - 2;
> >>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> >>> +
> >>> +      if (i.types[src1].bitfield.class == Reg
> >>> +	  && i.op[src1].regs == i.op[dest].regs)
> >>> +	readonly_var = src2;
> >>> +      /* adcx, adox and imul can't support to swap the source operands.
> */
> >>> +      else if (i.types[src2].bitfield.class == Reg
> >>> +	       && i.op[src2].regs == i.op[dest].regs
> >>> +	       && optimize > 1
> >>> +	       && t->opcode_modifier.commutative)
> >>
> >> Comment and code still aren't in line: "support to swap the source
> >> operands"
> >> really is the D attribute in the opcode table, whereas
> >> t->opcode_modifier.commutative is related to the C attribute (and all
> >> t->three
> >> insns named really are commutative). It looks to me that the code is
> >> correct, so it would then be the comment that may need updating. But
> >> it may also be better to additionally check .d here (making the code
> >> robust against C being added to the truly commutative yet not eligible to
> be optimized insns).
> >> In which case the comment might say "adcx, adox, and imul, while
> >> commutative, don't support to swap the source operands".
> >>
> >
> > I think we don't need to worry about it for now, because we've constrained
> the function with vexvvvvvvdest, and these instructions must be NDD
> instructions. And adcx, adox and imul don't have D attribute.
> 
> Right, and I thought to leverage this. IOW ...
> 
> > If I add check .d here, I will need to exclude them.
> 
> ... I don't think I understand this.
>

I mean this place is to check if we can optimize something like "adcx %eax, %ebx, %eax -> adcx %ebx, %eax".  Adcx doesn't support D attribute, if we want to add a constraint t->opcode_modifier.d. The constraints should be && (t->opcode_modifier.d || t->mnem_off == MN_adcx) && t->opcode_modifier.commutative, because, adcx doesn't support D attribute. If we doesn't exclude it, it will be stopped by t->opcode_modifier.d.
 
>
> > Based on our previous discussion, I modified tc-i386.c as follows
> >
> > +/* Check if the instruction use the REX registers.  */ static bool
> > +check_RexOperands (const insn_template *t)
> 
> I don't think I can spot a use of the parameter in the function.
>

I merely mimicked Check_EgprOperands, and didn't pay attention to your comments about it. I will remove the parameter.

> 
> > +{
> > +  for (unsigned int op = 0; op < i.operands; op++)
> > +    {
> > +      if (i.types[op].bitfield.class != Reg
> > +         /* Special case for (%dx) while doing input/output op */
> > +         || i.input_output_operand)
> 
> Once again: Is this needed? Respective insns shouldn't even make it here.
> Plus if they did, ...
>

I modified the constraint be

	If (i.types[op].bitfield.class != Reg)
	  continue;
 
>
> > +       continue;
> > +
> > +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> > +       return true;
> 
> ... the loop would continue for (%dx) kind operands anyway.
>
>
> > +    }
> > +
> > +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> > +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> > +    return true;
> > +
> > +  /* Check pseudo prefix {rex} are valid.  */  if (i.rex_encoding)
> > +    return true;
> > +  return false;
> 
> Just "return i.rex_encoding;"?
>

Indeed, it's more simplified.
 
>
> > +}
> > +
> > +/* Optimize APX NDD insns to legacy insns.  */ static unsigned int
> > +convert_NDD_to_legacy (const insn_template *t) {
> > +  unsigned int readonly_var = ~0;
> 
> One issue I continue to have is the name of this variable. Good names help
> understanding what code is doing. And in 3-operand NDD insns there are
> uniformly 2 operands which are only read.
>

Maybe I should change the variable to be called readonly_reg_pos.

> 
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> > +      && t->opcode_space == SPACE_EVEXMAP4
> > +      && !i.has_nf
> > +      && i.reg_operands >= 2)
> > +    {
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = i.operands - 2;
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +         && i.op[src1].regs == i.op[dest].regs)
> > +       readonly_var = src2;
> > +      /* adcx, adox, and imul, while commutative, don't support to swap
> > +        the source operands.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +              && i.op[src2].regs == i.op[dest].regs
> > +              && optimize > 1
> > +              && t->opcode_modifier.commutative)
> > +       readonly_var = src1;
> > +    }
> > +  return readonly_var;
> > +}
> 
> You're no longer converting anything in this function, which - I'm sorry to say
> that - once again makes its name unsuitable.
>

True, but I think the function's name can be changed later, removing some operations on i prevents me from doing unnecessary backtracking.

I think the function maybe can be named as can_convert_NDD_to_legacy. Return 0 means can't, others mean can.

> 
> > @@ -7728,6 +7782,55 @@ match_template (char mnem_suffix)
> >           i.memshift = memshift;
> >         }
> >
> > +      /* If we can optimize a NDD insn to legacy insn, like
> > +        add %r16, %r8, %r8 -> add %r16, %r8,
> > +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +        Note that the semantics have not been changed.  */
> > +      if (optimize
> > +         && !i.no_optimize
> > +         && i.vec_encoding != vex_encoding_evex
> > +         && t + 1 < current_templates->end
> > +         && !t[1].opcode_modifier.evex
> > +         && t[1].opcode_space <= SPACE_0F38)
> 
> In all of these checks what I'm missing is a check that we're actually dealing
> with an NDD template.
>

Of course I can add it here, then I'll remove it from convert_NDD_to_legacy.
 
>
> > +       {
> > +         unsigned int readonly_var = convert_NDD_to_legacy (t);
> > +         size_match = true;
> > +
> > +         if (readonly_var != (unsigned int) ~0)
> > +           {
> > +             for (j = 0; j < i.operands - 2; j++)
> > +               {
> > +                 check_register = j;
> > +                 if (t->opcode_modifier.d)
> > +                   check_register ^= 1;
> > +                 overlap0 = operand_type_and (i.types[check_register],
> > +                                              t[1].operand_types[check_register]);
> > +                 if (!operand_type_match (overlap0, i.types[check_register]))
> > +                   size_match = false;
> > +               }
> 
> I'm afraid that without a comment I don't understand what this is about.
> 

I want to make sure that the two neighboring templates have the same input. I tried base_code but it misses some special cases, so I want to start with the first parameter (ATT), like shld Imm8 and shld shiftCount. But some insns with .d. So I need to check if the first operand can match the second type.  Seems like it looks like NDD can simply match that way now.

I'm having some problems with this current version, I've modified it. And delete this part of the code now won't have any effect because the original version of the NDD instruction is sorted by us, and I think that this part will only be used in the case that someone has sorted .tbl incorrectly.

The current version is

+         if (readonly_var != (unsigned int) ~0)
+           {
+             overlap0 = operand_type_and (i.types[0],
+                                          t[1].operand_types[0]);
+             if (t->opcode_modifier.d)
+               overlap1 = operand_type_and (i.types[0],
+                                            t[1].operand_types[1]);
+             if (!operand_type_match (overlap0, i.types[0])
+                 && (!t->opcode_modifier.d
+                     || (t->opcode_modifier.d
+                         && !operand_type_match (overlap1, i.types[0]))))
+               size_match = false;
+

>
> > +             if (size_match
> > +                 && (t[1].opcode_space <= SPACE_0F
> > +                     || (!check_EgprOperands (t + 1)	 // These conditions are
> exclude adcx/adox with inappropriate registers.
> > +                         && !check_RexOperands (t + 1)
> > +                         && !i.op[i.operands -
> > + 1].regs->reg_type.bitfield.qword)))
> 
> Saying "inappropriate" in such a comment doesn't really help, as it's then
> still unclear what is "appropriate". But the comment will need re-formatting
> anyway.
>

I modifed the comment as "Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable."
 
>
> > +               {
> > +                 unsigned int src1 = i.operands - 2;
> 
> Looks like this variable is no longer used?
>

Yes, I removed it.

> 
> > +                 unsigned int src2 = (i.operands > 3) ? i.operands -
> > + 3 : 0;
> > +
> > +                 if (readonly_var != src2)
> > +                   swap_2_operands (readonly_var, src2);
> > +
> > +                 --i.operands;
> > +                 --i.reg_operands;
> > +
> > +                 specific_error = progress (internal_error);
> > +                 continue;
> > +               }
> > +
> > +           }
> > +       }
> > +
> >        /* We've found a match; break out of loop.  */
> >        break;
> >
> > What's your opinion?
> 
> I need some further clarification first, as per above. I also don't think I can
> properly identify (yet) which parts of the code are solely related to the
> ADCX/ADOX special case. The more code that's special for these, the more I'd
> be inclined to ask that dealing with them be a separate patch, for us to
> judge whether effort and effect are in reasonable balance.
> 

Currently only this part of the code 

+                 /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable.  */
+                 && (t[1].opcode_space <= SPACE_0F
+                     || (!check_EgprOperands (t + 1)
+                         && !check_RexOperands ()
+                         && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))

is related to adcx/adox (Include check_RexOperands), all other kinds of constraints are just to make the code more robust. If you still think this part is too complicated, I would consider holding off on optimizing adcx/adox, after all, there aren't many of these types of commands at the moment.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-20  1:33     ` Cui, Lili
@ 2023-11-20  8:19       ` Jan Beulich
  2023-11-20 12:54         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-20  8:19 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

On 20.11.2023 02:33, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, November 9, 2023 5:37 PM
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64,
>>> D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De  mov,
>> 0xf21,
>>> x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64, { Debug,
>> Reg64
>>> }  mov, 0xf24, i386|No64,
>>> D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test,
>> Reg32 }
>>>
>>> +// Move after swapping the bytes
>>> +movbe, 0x0f38f0, Movbe,
>> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, {
>>> +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>>>  // Move after swapping the bytes
>>>  movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf,
>> {
>>> Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>> movbe,
>>> 0x60, Movbe|APX_F,
>>> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, {
>>> Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } @@
>>> -290,22 +295,36 @@ add, 0x0, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
>> { Reg8|Reg16|Reg3
>>> add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
>> 0x4,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>> +add, 0x0, APX_F,
>>>
>> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
>> p4|NF, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>> +Reg8|Reg16|Reg32|Reg64 } add, 0x83/0, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
>> Map4|N
>>> +F, { Imm8S,
>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 } add, 0x80/0, APX_F,
>>>
>> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4
>> |NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>> +Reg8|Reg16|Reg32|Reg64}
>>
>> Why are there (just as an example) only 3 new forms of ADD, but ...
>>
>>> @@ -338,10 +366,19 @@ adc, 0x10, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
>> { Reg8|Reg16|Reg
>>> adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  adc,
>> 0x14,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>> +adc, 0x10, APX_F,
>>> +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex }
>>> +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
>> { Imm8S,
>>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } adc,
>>> +0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex }
>>> +adc, 0x10, APX_F,
>>>
>> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
>> p4, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>> +Reg8|Reg16|Reg32|Reg64 } adc, 0x83/2, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
>> Map4,
>>> +{ Imm8S,
>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 } adc, 0x80/2, APX_F,
>>>
>> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4,
>> {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>
>> .... 6 new forms of ADC? My guess is that's NF-related, but doesn't that mean
>> that until NF support is added e.g. "{evex} add %eax, %eax" won't assemble
>> as intended? IOW in line with NF as an attribute being added right here, those
>> other templates also will want introducing right here.
>>
>> While looking at patch 7 I'm also wondering whether same-base-opcode
>> forms wouldn't better be kept together. I haven't finished yet looking at
>> what's needed there, but even if it doesn't turn out strictly necessary there it
>> may still be a good idea anyway (unless of course that would get in the way of
>> anything).
>>
> 
> For ADC, there are 3 templates that do not support NDD and NF, I moved them here from patch 3/8 which we discussed before since its decoding is supported together in the NDD patch.

Feels like my question wasn't really answered: Why are there 6 new ADC templates
here, but only 3 ADD ones? Or in other words: Why are the other 3 ADD ones only
added later?

Also I'm curious: Why do ADC and SBB not allow for NF? They always consume
EFLAGS.CF, but subsequent insns may have no need for their EFLAGS output,
just like for e.g. ADD and SUB. Yet NF is only about EFLAGS output aiui, not
about any bit(s) consumed from EFLAGS.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH][v3] Support APX NDD optimized encoding.
  2023-11-20  3:28                     ` Hu, Lin1
@ 2023-11-20  8:34                       ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-20  8:34 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils

On 20.11.2023 04:28, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 17, 2023 5:48 PM
>>
>> On 17.11.2023 08:24, Hu, Lin1 wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, November 15, 2023 5:35 PM
>>>>
>>>> On 15.11.2023 03:59, Hu, Lin1 wrote:
>>>>> --- a/gas/config/tc-i386.c
>>>>> +++ b/gas/config/tc-i386.c
>>>>> @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
>>>>>    return 0;
>>>>>  }
>>>>>
>>>>> +/* Optimize APX NDD insns to legacy insns.  */ static bool
>>>>> +convert_NDD_to_REX2 (const insn_template *t) {
>>>>> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
>>>>> +      && t->opcode_space == SPACE_EVEXMAP4
>>>>> +      && !i.has_nf
>>>>> +      && i.reg_operands >= 2)
>>>>> +    {
>>>>> +      unsigned int readonly_var = ~0;
>>>>> +      unsigned int dest = i.operands - 1;
>>>>> +      unsigned int src1 = i.operands - 2;
>>>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
>>>>> +
>>>>> +      if (i.types[src1].bitfield.class == Reg
>>>>> +	  && i.op[src1].regs == i.op[dest].regs)
>>>>> +	readonly_var = src2;
>>>>> +      /* adcx, adox and imul can't support to swap the source operands.
>> */
>>>>> +      else if (i.types[src2].bitfield.class == Reg
>>>>> +	       && i.op[src2].regs == i.op[dest].regs
>>>>> +	       && optimize > 1
>>>>> +	       && t->opcode_modifier.commutative)
>>>>
>>>> Comment and code still aren't in line: "support to swap the source
>>>> operands"
>>>> really is the D attribute in the opcode table, whereas
>>>> t->opcode_modifier.commutative is related to the C attribute (and all
>>>> t->three
>>>> insns named really are commutative). It looks to me that the code is
>>>> correct, so it would then be the comment that may need updating. But
>>>> it may also be better to additionally check .d here (making the code
>>>> robust against C being added to the truly commutative yet not eligible to
>> be optimized insns).
>>>> In which case the comment might say "adcx, adox, and imul, while
>>>> commutative, don't support to swap the source operands".
>>>>
>>>
>>> I think we don't need to worry about it for now, because we've constrained
>> the function with vexvvvvvvdest, and these instructions must be NDD
>> instructions. And adcx, adox and imul don't have D attribute.
>>
>> Right, and I thought to leverage this. IOW ...
>>
>>> If I add check .d here, I will need to exclude them.
>>
>> ... I don't think I understand this.
>>
> 
> I mean this place is to check if we can optimize something like "adcx %eax, %ebx, %eax -> adcx %ebx, %eax".  Adcx doesn't support D attribute, if we want to add a constraint t->opcode_modifier.d. The constraints should be && (t->opcode_modifier.d || t->mnem_off == MN_adcx) && t->opcode_modifier.commutative, because, adcx doesn't support D attribute. If we doesn't exclude it, it will be stopped by t->opcode_modifier.d.

Just to clarify again: My main issue is with comment and code not being
in sync. Once that's sorted, maybe it'll become clear why you need to
check C rather than D, when at the same time you avoid to add C for
those insns which don't support D anyway (and would hence be
recognizable either way).

>>> +/* Optimize APX NDD insns to legacy insns.  */ static unsigned int
>>> +convert_NDD_to_legacy (const insn_template *t) {
>>> +  unsigned int readonly_var = ~0;
>>
>> One issue I continue to have is the name of this variable. Good names help
>> understanding what code is doing. And in 3-operand NDD insns there are
>> uniformly 2 operands which are only read.
> 
> Maybe I should change the variable to be called readonly_reg_pos.

What would this change? There are, as before, potentially two operands
which are input only (i.e. readonly). What property of the operand is it
that you're after? Answering that will tell what a reasonable name for
the variable is.

>>> +       {
>>> +         unsigned int readonly_var = convert_NDD_to_legacy (t);
>>> +         size_match = true;
>>> +
>>> +         if (readonly_var != (unsigned int) ~0)
>>> +           {
>>> +             for (j = 0; j < i.operands - 2; j++)
>>> +               {
>>> +                 check_register = j;
>>> +                 if (t->opcode_modifier.d)
>>> +                   check_register ^= 1;
>>> +                 overlap0 = operand_type_and (i.types[check_register],
>>> +                                              t[1].operand_types[check_register]);
>>> +                 if (!operand_type_match (overlap0, i.types[check_register]))
>>> +                   size_match = false;
>>> +               }
>>
>> I'm afraid that without a comment I don't understand what this is about.
>>
> 
> I want to make sure that the two neighboring templates have the same input. I tried base_code but it misses some special cases, so I want to start with the first parameter (ATT), like shld Imm8 and shld shiftCount. But some insns with .d. So I need to check if the first operand can match the second type.  Seems like it looks like NDD can simply match that way now.

Okay, something along these lines will want saying in a comment then. That's
not to say though that I'm convinced this is needed. With all these code
fragments and out-of-order patches that were sent I'm really looking forward
to getting a consistent, properly ordered series again, such that I can
actually see how things fit together.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-08 11:13   ` Jan Beulich
@ 2023-11-20 12:36     ` Cui, Lili
  2023-11-20 16:33       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-20 12:36 UTC (permalink / raw)
  To: Beulich, Jan, Kong, Lingling; +Cc: Lu, Hongjiu, ccoutant, binutils

> On 02.11.2023 12:29, Cui, Lili wrote:
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> > @@ -338,10 +338,6 @@
> >      { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
> >      { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
> >    },
> > -  /* PREFIX_EVEX_MAP4_66 */
> > -  {
> > -    { "wrssK",	{ M, Gdq }, 0 },
> > -  },
> >    /* PREFIX_EVEX_MAP4_D8 */
> >    {
> >      { "sha1nexte", { XM, EXxmm }, 0 },
> 
> What's going on here?
>

It should be in patch 3/8. changed.

> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -56,6 +56,36 @@
> >      { "blsmskS",	{ VexGdq, Edq }, 0 },
> >      { "blsiS",		{ VexGdq, Edq }, 0 },
> >    },
> > +  /* REG_EVEX_MAP4_80 */
> > +  {
> > +    { "addA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "orA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "adcA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "andA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "subA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "xorA",	{ VexGb, Eb, Ib }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_81 */
> > +  {
> > +    { "addQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "orQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "andQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "subQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_83 */
> > +  {
> > +    { "addQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "orQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "andQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "subQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
> > +  },
> 
> No sign of prefix decoding, and also no PREFIX_OPCODE present?
> 

Added NO_PREFIX and PREFIX_NP_OR_DATA for them.

> > @@ -63,3 +93,27 @@
> >      { "aesencwide256kl",	{ M }, 0 },
> >      { "aesdecwide256kl",	{ M }, 0 },
> >    },
> > +  /* REG_EVEX_MAP4_F6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { Bad_Opcode },
> > +    { "notA",	{ VexGb, Eb }, 0 },
> > +    { "negA",	{ VexGb, Eb }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_F7 */
> > +  {
> > +    { Bad_Opcode },
> > +    { Bad_Opcode },
> > +    { "notQ",	{ VexGv, Ev }, 0 },
> > +    { "negQ",	{ VexGv, Ev }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_FE */
> > +  {
> > +    { "incA",   { VexGb ,Eb }, 0 },
> > +    { "decA",   { VexGb ,Eb }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_FF */
> > +  {
> > +    { "incQ",   { VexGv ,Ev }, 0 },
> > +    { "decQ",   { VexGv ,Ev }, 0 },
> > +  },
> 
> Same here, plus for the inc/dec some commas are misplaced. Padding also
> looks to be incosnsitent (tab vs blanks).
> 

Done.

> > --- a/opcodes/i386-dis-evex.h
> > +++ b/opcodes/i386-dis-evex.h
> >[...]
> > @@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      /* 40 */
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
> >      /* 48 */
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
> > +    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },
> 
> Considering CFCMOVcc which sits at the same opcode, doing things like this
> sets us up for needing to touch all of these again. Maybe that's the best that
> can be done, but I still wonder whether this couldn't be taken care of right
> away when introducing these entries.
> 

How about adding a special letter CF% in front of them?

> > @@ -989,7 +989,7 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { "wrussK",	{ M, Gdq }, PREFIX_DATA },
> > -    { PREFIX_TABLE (PREFIX_EVEX_MAP4_66) },
> > +    { PREFIX_TABLE (PREFIX_0F38F6) },
> 
> Perhaps related to the earlier question: What's going on here?
> 

It should be in patch 3/8. changed.

> > @@ -1060,7 +1060,7 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { "shldS",		{ VexGv, Ev, Gv, CL }, 0 },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      /* A8 */
> > @@ -1069,9 +1069,9 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> > +    { "shrdS",		{ VexGv, Ev, Gv, CL }, 0 },
> >      { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { "imulS",		{ VexGv, Gv, Ev }, 0 },
> 
> PREFIX_OPCODE (or prefix decoding) again missing in all of these?
> 

Done.

> > @@ -1091,8 +1091,8 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      /* C0 */
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { REG_TABLE (REG_C0) },
> > +    { REG_TABLE (REG_C1) },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> > @@ -1109,10 +1109,10 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      /* D0 */
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { REG_TABLE (REG_D0) },
> > +    { REG_TABLE (REG_D1) },
> > +    { REG_TABLE (REG_D2) },
> > +    { REG_TABLE (REG_D3) },
> 
> Some form of prefix decoding is going to be needed for these too, despite the
> goal of wanting to re-use the legacy table entries. Perhaps adding
> PREFIX_OPCODE there would be benign to the legacy insns?
> 

Added PREFIX_NP_OR_DATA and NO_PREFIX for them.

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> >[...]
> > @@ -2660,47 +2668,47 @@ static const struct dis386 reg_table[][8] = {
> >    },
> >    /* REG_D0 */
> >    {
> > -    { "rolA",	{ Eb, I1 }, 0 },
> > -    { "rorA",	{ Eb, I1 }, 0 },
> > -    { "rclA",	{ Eb, I1 }, 0 },
> > -    { "rcrA",	{ Eb, I1 }, 0 },
> > -    { "shlA",	{ Eb, I1 }, 0 },
> > -    { "shrA",	{ Eb, I1 }, 0 },
> > -    { "shlA",	{ Eb, I1 }, 0 },
> > -    { "sarA",	{ Eb, I1 }, 0 },
> > +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
> >    },
> >    /* REG_D1 */
> >    {
> > -    { "rolQ",	{ Ev, I1 }, 0 },
> > -    { "rorQ",	{ Ev, I1 }, 0 },
> > -    { "rclQ",	{ Ev, I1 }, 0 },
> > -    { "rcrQ",	{ Ev, I1 }, 0 },
> > -    { "shlQ",	{ Ev, I1 }, 0 },
> > -    { "shrQ",	{ Ev, I1 }, 0 },
> > -    { "shlQ",	{ Ev, I1 }, 0 },
> > -    { "sarQ",	{ Ev, I1 }, 0 },
> > +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },  
> >    },
> 
> As mentioned on the assembler side already, I think we would be better off
> making const_1_mode print $1 in AT&T syntax at least for these new insn
> forms, to eliminate the ambiguity.
> 

It is related to correctness and should be revised. Since they share the same entries, I will created a new patch to modify the legacy instruction and then extend them to NDD. Do you agree?

> > @@ -9061,6 +9069,15 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	  ins->rex &= ~REX_B;
> >  	  ins->rex2 &= ~REX_R;
> >  	}
> > +      if (ins->evex_type == evex_from_legacy)
> > +	{
> > +	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
> > +	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> > +	  if (!ins->vex.b && (ins->vex.register_specifier
> > +				  || !ins->vex.v))
> > +	    return &bad_opcode;
> > +	  ins->rex |= REX_OPCODE;
> 
> What is this line about?
> 
Changed to :

      /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
         all bits of EVEX.vvvv and EVEX.V' must be 1.  */
      if (ins->evex_type == evex_from_legacy && !ins->vex.b
          && (ins->vex.register_specifier || !ins->vex.v))
        return &bad_opcode;

> > @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	return &err_opcode;
> >
> >        /* Set vector length.  */
> > -      if (ins->modrm.mod == 3 && ins->vex.b)
> > +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
> > + evex_default)
> >  	ins->vex.length = 512;
> >        else
> >  	{
> 
> Is this change really needed for anything?
> 

If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong value. 

> > @@ -9530,6 +9547,7 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >  		    {
> >  		      oappend (&ins, "{bad}");
> >  		      continue;
> > +
> >  		    }
> >
> >  		  /* Instructions with a mask register destination allow for
> 
> Stray and bogus change.
> 

Done.

> > @@ -9553,7 +9571,7 @@ print_insn (bfd_vma pc, disassemble_info *info,
> > int intel_syntax)
> >
> >  	  /* Check whether rounding control was enabled for an insn not
> >  	     supporting it.  */
> > -	  if (ins.modrm.mod == 3 && ins.vex.b
> > +	  if (ins.modrm.mod == 3 && ins.vex.b && ins.evex_type ==
> > +evex_default
> >  	      && !(ins.evex_used & EVEX_b_used))
> >  	    {
> 
> This could do with extending the comment, mentioning the aliasing of
> EVEX.brs and EVEX.nd.
> 

Done.

> > @@ -11013,7 +11031,7 @@ print_displacement (instr_info *ins,
> > bfd_signed_vma val)  static void  intel_operand_size (instr_info *ins,
> > int bytemode, int sizeflag)  {
> > -  if (ins->vex.b)
> > +  if (ins->vex.b && ins->evex_type == evex_default)
> >      {
> >        if (!ins->vex.no_broadcast)
> >  	switch (bytemode)
> 
> This aliasing would also be worthwhile mentioning here and ...
> 
> > @@ -11946,7 +11964,7 @@ OP_E_memory (instr_info *ins, int bytemode,
> int sizeflag)
> >  	  print_operand_value (ins, disp & 0xffff, dis_style_text);
> >  	}
> >      }
> > -  if (ins->vex.b)
> > +  if (ins->vex.b && ins->evex_type == evex_default)
> >      {
> >        ins->evex_used |= EVEX_b_used;
> 
> ... here.
> 

Done.

> > @@ -13307,6 +13325,14 @@ OP_VEX (instr_info *ins, int bytemode, int
> sizeflag ATTRIBUTE_UNUSED)
> >    if (!ins->need_vex)
> >      return true;
> >
> > +  if (ins->evex_type == evex_from_legacy)
> > +    {
> > +      ins->evex_used |= EVEX_b_used;
> > +      /* Here vex.b is treated as "EVEX.ND.  */
> 
> Okay, here you have such a helpful comment, just that - nit - there's an
> unbalanced double-quote. (The comment would also more logically come
> first in this block.)
> 

Done.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-20  8:19       ` Jan Beulich
@ 2023-11-20 12:54         ` Cui, Lili
  2023-11-20 16:43           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-20 12:54 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 20, 2023 4:19 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Kong, Lingling <lingling.kong@intel.com>
> Subject: Re: [PATCH 5/8] Support APX NDD
> 
> On 20.11.2023 02:33, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Thursday, November 9, 2023 5:37 PM
> >>
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> >>> @@ -190,6 +193,8 @@ mov, 0xf21, i386|No64,
> >>> D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { De  mov,
> >> 0xf21,
> >>> x64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoRex64,
> { Debug,
> >> Reg64
> >>> }  mov, 0xf24, i386|No64,
> >>> D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Test,
> >> Reg32 }
> >>>
> >>> +// Move after swapping the bytes
> >>> +movbe, 0x0f38f0, Movbe,
> >> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, {
> >>> +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> >>>  // Move after swapping the bytes
> >>>  movbe, 0x0f38f0, Movbe,
> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf,
> >> {
> >>> Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> >> movbe,
> >>> 0x60, Movbe|APX_F,
> >>> D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, {
> >>> Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } @@
> >>> -290,22 +295,36 @@ add, 0x0, 0,
> >>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
> >> { Reg8|Reg16|Reg3
> >>> add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> >>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> add,
> >> 0x4,
> >>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> >> Acc|Byte|Word|Dword|Qword }
> >>> add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x }
> >>> +add, 0x0, APX_F,
> >>>
> >>
> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
> >> p4|NF, {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 } add, 0x83/0, APX_F,
> >>>
> >>
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
> >> Map4|N
> >>> +F, { Imm8S,
> >> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> >>> +Reg16|Reg32|Reg64 } add, 0x80/0, APX_F,
> >>>
> >>
> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4
> >> |NF, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>> +Reg8|Reg16|Reg32|Reg64}
> >>
> >> Why are there (just as an example) only 3 new forms of ADD, but ...
> >>
> >>> @@ -338,10 +366,19 @@ adc, 0x10, 0,
> >>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock,
> >> { Reg8|Reg16|Reg
> >>> adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> >>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> adc,
> >> 0x14,
> >>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> >> Acc|Byte|Word|Dword|Qword }
> >>> adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x }
> >>> +adc, 0x10, APX_F,
> >>> +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex }
> >>> +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
> >> { Imm8S,
> >>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> adc,
> >>> +0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex }
> >>> +adc, 0x10, APX_F,
> >>>
> >>
> +D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMa
> >> p4, {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 } adc, 0x83/2, APX_F,
> >>>
> >>
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVex
> >> Map4,
> >>> +{ Imm8S,
> >> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> >>> +Reg16|Reg32|Reg64 } adc, 0x80/2, APX_F,
> >>>
> >>
> +W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4,
> >> {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>
> >> .... 6 new forms of ADC? My guess is that's NF-related, but doesn't
> >> that mean that until NF support is added e.g. "{evex} add %eax, %eax"
> >> won't assemble as intended? IOW in line with NF as an attribute being
> >> added right here, those other templates also will want introducing right
> here.
> >>
> >> While looking at patch 7 I'm also wondering whether same-base-opcode
> >> forms wouldn't better be kept together. I haven't finished yet
> >> looking at what's needed there, but even if it doesn't turn out
> >> strictly necessary there it may still be a good idea anyway (unless
> >> of course that would get in the way of anything).
> >>
> >
> > For ADC, there are 3 templates that do not support NDD and NF, I moved
> them here from patch 3/8 which we discussed before since its decoding is
> supported together in the NDD patch.
> 
> Feels like my question wasn't really answered: Why are there 6 new ADC
> templates here, but only 3 ADD ones? Or in other words: Why are the other 3
> ADD ones only added later?
> 

The three extra items of ADC are special, they support neither ND nor NF. ADD does not have this situation. The ND part of add is here, and the remaining part of NF place in NF patch.

> Also I'm curious: Why do ADC and SBB not allow for NF? They always
> consume EFLAGS.CF, but subsequent insns may have no need for their EFLAGS
> output, just like for e.g. ADD and SUB. Yet NF is only about EFLAGS output
> aiui, not about any bit(s) consumed from EFLAGS.
> 

I will get back to you when I know the answer。

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-20 12:36     ` Cui, Lili
@ 2023-11-20 16:33       ` Jan Beulich
  2023-11-22  7:46         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-20 16:33 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

On 20.11.2023 13:36, Cui, Lili wrote:
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> --- a/opcodes/i386-dis-evex.h
>>> +++ b/opcodes/i386-dis-evex.h
>>> [...]
>>> @@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
>>>      { Bad_Opcode },
>>>      { Bad_Opcode },
>>>      /* 40 */
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> +    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
>>>      /* 48 */
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> +    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
>>> +    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },
>>
>> Considering CFCMOVcc which sits at the same opcode, doing things like this
>> sets us up for needing to touch all of these again. Maybe that's the best that
>> can be done, but I still wonder whether this couldn't be taken care of right
>> away when introducing these entries.
> 
> How about adding a special letter CF% in front of them?

Why not.

>>> --- a/opcodes/i386-dis.c
>>> +++ b/opcodes/i386-dis.c
>>> [...]
>>> @@ -2660,47 +2668,47 @@ static const struct dis386 reg_table[][8] = {
>>>    },
>>>    /* REG_D0 */
>>>    {
>>> -    { "rolA",	{ Eb, I1 }, 0 },
>>> -    { "rorA",	{ Eb, I1 }, 0 },
>>> -    { "rclA",	{ Eb, I1 }, 0 },
>>> -    { "rcrA",	{ Eb, I1 }, 0 },
>>> -    { "shlA",	{ Eb, I1 }, 0 },
>>> -    { "shrA",	{ Eb, I1 }, 0 },
>>> -    { "shlA",	{ Eb, I1 }, 0 },
>>> -    { "sarA",	{ Eb, I1 }, 0 },
>>> +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
>>> +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
>>>    },
>>>    /* REG_D1 */
>>>    {
>>> -    { "rolQ",	{ Ev, I1 }, 0 },
>>> -    { "rorQ",	{ Ev, I1 }, 0 },
>>> -    { "rclQ",	{ Ev, I1 }, 0 },
>>> -    { "rcrQ",	{ Ev, I1 }, 0 },
>>> -    { "shlQ",	{ Ev, I1 }, 0 },
>>> -    { "shrQ",	{ Ev, I1 }, 0 },
>>> -    { "shlQ",	{ Ev, I1 }, 0 },
>>> -    { "sarQ",	{ Ev, I1 }, 0 },
>>> +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
>>> +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },  
>>>    },
>>
>> As mentioned on the assembler side already, I think we would be better off
>> making const_1_mode print $1 in AT&T syntax at least for these new insn
>> forms, to eliminate the ambiguity.
>>
> 
> It is related to correctness and should be revised. Since they share the same entries, I will created a new patch to modify the legacy instruction and then extend them to NDD. Do you agree?

I certainly appreciate any reusing, where it is possible (and it ought to be
possible here, yes).

>>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
>> instr_info *ins)
>>>  	return &err_opcode;
>>>
>>>        /* Set vector length.  */
>>> -      if (ins->modrm.mod == 3 && ins->vex.b)
>>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
>>> + evex_default)
>>>  	ins->vex.length = 512;
>>>        else
>>>  	{
>>
>> Is this change really needed for anything?
> 
> If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong value. 

But this is recording ->vex.length, not anything NDD related (afaics).

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-20 12:54         ` Cui, Lili
@ 2023-11-20 16:43           ` Jan Beulich
  0 siblings, 0 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-20 16:43 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

On 20.11.2023 13:54, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 20, 2023 4:19 PM
>>
>> On 20.11.2023 02:33, Cui, Lili wrote:
>>> For ADC, there are 3 templates that do not support NDD and NF, I moved
>> them here from patch 3/8 which we discussed before since its decoding is
>> supported together in the NDD patch.
>>
>> Feels like my question wasn't really answered: Why are there 6 new ADC
>> templates here, but only 3 ADD ones? Or in other words: Why are the other 3
>> ADD ones only added later?
>>
> 
> The three extra items of ADC are special, they support neither ND nor NF. ADD does not have this situation. The ND part of add is here, and the remaining part of NF place in NF patch.

Why would ADD not have this situation? Without 2-operand EVEX templates,
how would you deal with e.g.

	{evex} add %eax, %ecx

when at the same time you (correctly) permit

	{evex} adc %eax, %ecx

? The difference between ADD and ADC is solely the presence / absence of
the NF attribute.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-08 11:44   ` Jan Beulich
  2023-11-08 12:52     ` Jan Beulich
@ 2023-11-22  5:48     ` Cui, Lili
  2023-11-22  8:53       ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-22  5:48 UTC (permalink / raw)
  To: Beulich, Jan, Mo, Zewei; +Cc: Lu, Hongjiu, ccoutant, binutils

> On 02.11.2023 12:29, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -256,6 +256,7 @@ enum i386_error
> >      mask_not_on_destination,
> >      no_default_mask,
> >      unsupported_rc_sae,
> > +    unsupported_rsp_register,
> >      invalid_register_operand,
> >      internal_error,
> >    };
> > @@ -5476,6 +5477,9 @@ md_assemble (char *line)
> >  	case unsupported_rc_sae:
> >  	  err_msg = _("unsupported static rounding/sae");
> >  	  break;
> > +	case unsupported_rsp_register:
> > +	  err_msg = _("unsupported rsp register");
> > +	  break;
> 
> Perhaps you mean "cannot be used with" or some such? Also the register
> name needs conditionally prefixing with % in diagnostics.
> 

Done.

> > @@ -6854,6 +6858,24 @@ check_VecOperands (const insn_template *t)
> >  	}
> >      }
> >
> > +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same
> > + registers.  */  if (t->opcode_modifier.push2pop2)
> 
> I question this way of recognizing these two insns: You introduce a whole new
> table column here just to have two entries set this bit.
> This is cheaper by comparing the mnemonic offsets, as we do elsewhere in
> various cases.
> 

Done.

> I also disagree with putting the check in check_VecOperands():
> There's nothing vector-ish here. Either you put it straight in the caller, or you
> introduce a new check_APX_operands().
> 

How about  putting check_EgprOperands into check_APX_operands ?

      /* Check if EGRPS operands(r16-r31) are valid.  */
      if (check_EgprOperands (t))
        {
          specific_error = progress (i.error);
          continue;
        }

      /* Check if APX operands are valid.  */
      if (check_APX_operands (t))
        {
          specific_error = progress (i.error);
          continue;
        }

> > +    {
> > +      unsigned int reg1 = register_number (i.op[0].regs);
> > +      unsigned int reg2 = register_number (i.op[1].regs);
> > +
> > +      if (reg1 == 0x4 || reg2 == 0x4)
> > +	{
> > +	  i.error = unsupported_rsp_register;
> > +	  return 1;
> > +	}
> > +      if (t->base_opcode == 0x8f && reg1 == reg2)
> > +	{
> > +	  i.error = invalid_dest_and_src_register_set;
> 
> This enumerator's disagnostic talks about source and destination register,
> which isn't applicable here.
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -30,3 +30,9 @@ _start:
> >          .byte 0xff
> >          #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
> >          .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
> > +        .byte 0xff, 0xff
> > +	# pop2 %rax, %rbx set EVEX.ND=0.
> > +        .byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
> > +        .byte 0xff, 0xff, 0xff
> > +	# pop2 %rax, %rsp set EVEX.VVVV=0xf.
> > +        .byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
> 
> This 2nd comment looks bogus. What is it that's being tested here?
> 

I think it should be  # pop2 %rax set EVEX.vvvv' = 1111. It wants to test that pop2 has only one operand when decoding.

> Also again note indentation inconsistencies.
> 

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
> > @@ -0,0 +1,15 @@
> > +# Check illegal APX-Push2Pop2 instructions
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +	push2  %eax, %ebx
> 
> It's okay to test 32-bit operands, but more important is to test 16-bit ones, as
> only those could (also) be used with PUSH/POP.
> 

Done.

> > --- a/opcodes/i386-dis-evex-mod.h
> > +++ b/opcodes/i386-dis-evex-mod.h
> > @@ -1,4 +1,9 @@
> >  /* Nothing at present.  */
> > +  /* MOD_EVEX_MAP4_8F_R_0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_8F_R_0_M_1) },  },
> >    /* MOD_EVEX_MAP4_DA_PREFIX_1 */
> >    {
> >      { Bad_Opcode },
> > @@ -41,3 +46,8 @@
> >    {
> >      { "movdiri",	{ Edq, Gdq }, 0 },
> >    },
> > +  /* MOD_EVEX_MAP4_FF_R_6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_FF_R_6_M_1) },  },
> 
> Same comment as before regarding additions to this file.
> 

Done.

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> >[...]
> > @@ -9011,6 +9020,8 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	case 0x4:
> >  	  vex_table_index = EVEX_MAP4;
> >  	  ins->evex_type = evex_from_legacy;
> > +	  if (ins->address_mode != mode_64bit)
> > +	    return &bad_opcode;
> >  	  break;
> 
> This looks to belong into an earlier patch.
> 

Done.

> > @@ -9073,8 +9084,9 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	{
> >  	  /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
> >  	     all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> > -	  if (!ins->vex.b && (ins->vex.register_specifier
> > -				  || !ins->vex.v))
> > +	  if (ins->vex.ll || (!ins->vex.b
> > +			      && (ins->vex.register_specifier
> > +				  || !ins->vex.v)))
> >  	    return &bad_opcode;
> 
> This as well.
> 

Deleted, redundant.

> > @@ -13821,3 +13836,24 @@ PREFETCHI_Fixup (instr_info *ins, int
> > bytemode, int sizeflag)
> >
> >    return OP_M (ins, bytemode, sizeflag);  }
> > +
> > +static bool
> > +PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag) {
> > +  unsigned int vvvv_reg = ins->vex.register_specifier
> > +    | !ins->vex.v << 4;
> 
> Nit: Please parenthesize the shift.
> 

Done.

> > +  unsigned int rm_reg = ins->modrm.rm + (ins->rex & REX_B ? 8 : 0)
> > +    + (ins->rex2 & REX_B ? 16 : 0);
> > +
> > +  /* Here vex.b is treated as "EVEX.ND.  */
> > +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same
> > + registers.  */
> 
> The two comments want folding. As to the former, though: How about having
> 
> #define nd b
> 
> in the EVEX struct declaration (provided we don't have any variables named
> "nd" right now), ...
> 
> > +  if (!ins->vex.b || vvvv_reg == 0x4 || rm_reg == 0x4
> 
> ... allowing to use ins->vex.nd here (at which point that comment is
> unnecessary)?
> 

Done.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3494,3 +3494,10 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}  eretu,
> > 0xf30f01ca, FRED|x64, NoSuf, {}
> >
> >  // FRED instructions end.
> > +
> > +// APX Push2/Pop2 instruction.
> > +
> > +push2, 0xff/6, APX_F,
> >
> +Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No
> _wSuf|No_
> > +lSuf|No_sSuf, { Reg64, Reg64 } push2p, 0xff/6, APX_F,
> >
> +Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No
> _wSuf|No_
> > +lSuf|No_sSuf, { Reg64, Reg64 } pop2, 0x8f/0, APX_F,
> >
> +Modrm|VexW0|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No
> _wSuf|No_
> > +lSuf|No_sSuf, { Reg64, Reg64 } pop2p, 0x8f/0, APX_F,
> >
> +Modrm|VexW1|EVex128|Push2Pop2|EVexMap4|VexVVVVSrc|No_bSuf|No
> _wSuf|No_
> > +lSuf|No_sSuf, { Reg64, Reg64 }
> 
> Like other extensions have it, there also wants to be an "end" comment.

Done.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-20 16:33       ` Jan Beulich
@ 2023-11-22  7:46         ` Cui, Lili
  2023-11-22  8:47           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-22  7:46 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

> >>> --- a/opcodes/i386-dis-evex.h
> >>> +++ b/opcodes/i386-dis-evex.h
> >>> [...]
> >>> @@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
> >>>      { Bad_Opcode },
> >>>      { Bad_Opcode },
> >>>      /* 40 */
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> +    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
> >>>      /* 48 */
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> -    { Bad_Opcode },
> >>> +    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
> >>> +    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },
> >>
> >> Considering CFCMOVcc which sits at the same opcode, doing things like
> >> this sets us up for needing to touch all of these again. Maybe that's
> >> the best that can be done, but I still wonder whether this couldn't
> >> be taken care of right away when introducing these entries.
> >
> > How about adding a special letter %CF in front of them?
> 
> Why not.
>

Done.
 
> >>> --- a/opcodes/i386-dis.c
> >>> +++ b/opcodes/i386-dis.c
> >>> [...]
> >>> @@ -2660,47 +2668,47 @@ static const struct dis386 reg_table[][8] = {
> >>>    },
> >>>    /* REG_D0 */
> >>>    {
> >>> -    { "rolA",	{ Eb, I1 }, 0 },
> >>> -    { "rorA",	{ Eb, I1 }, 0 },
> >>> -    { "rclA",	{ Eb, I1 }, 0 },
> >>> -    { "rcrA",	{ Eb, I1 }, 0 },
> >>> -    { "shlA",	{ Eb, I1 }, 0 },
> >>> -    { "shrA",	{ Eb, I1 }, 0 },
> >>> -    { "shlA",	{ Eb, I1 }, 0 },
> >>> -    { "sarA",	{ Eb, I1 }, 0 },
> >>> +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> >>> +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
> >>>    },
> >>>    /* REG_D1 */
> >>>    {
> >>> -    { "rolQ",	{ Ev, I1 }, 0 },
> >>> -    { "rorQ",	{ Ev, I1 }, 0 },
> >>> -    { "rclQ",	{ Ev, I1 }, 0 },
> >>> -    { "rcrQ",	{ Ev, I1 }, 0 },
> >>> -    { "shlQ",	{ Ev, I1 }, 0 },
> >>> -    { "shrQ",	{ Ev, I1 }, 0 },
> >>> -    { "shlQ",	{ Ev, I1 }, 0 },
> >>> -    { "sarQ",	{ Ev, I1 }, 0 },
> >>> +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> >>> +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
> >>>    },
> >>
> >> As mentioned on the assembler side already, I think we would be
> >> better off making const_1_mode print $1 in AT&T syntax at least for
> >> these new insn forms, to eliminate the ambiguity.
> >>
> >
> > It is related to correctness and should be revised. Since they share the same
> entries, I will created a new patch to modify the legacy instruction and then
> extend them to NDD. Do you agree?
> 
> I certainly appreciate any reusing, where it is possible (and it ought to be
> possible here, yes).
> 

Done.

> >>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
> >> instr_info *ins)
> >>>  	return &err_opcode;
> >>>
> >>>        /* Set vector length.  */
> >>> -      if (ins->modrm.mod == 3 && ins->vex.b)
> >>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
> >>> + evex_default)
> >>>  	ins->vex.length = 512;
> >>>        else
> >>>  	{
> >>
> >> Is this change really needed for anything?
> >
> > If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong value.
> 
> But this is recording ->vex.length, not anything NDD related (afaics).

There are some instructions that use OP_VEX, which will use ->vex.length.

For example:
"addB",             { VexGb, Eb, Gb }

 

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-22  7:46         ` Cui, Lili
@ 2023-11-22  8:47           ` Jan Beulich
  2023-11-22 10:45             ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-22  8:47 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

On 22.11.2023 08:46, Cui, Lili wrote:
>>>>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
>>>> instr_info *ins)
>>>>>  	return &err_opcode;
>>>>>
>>>>>        /* Set vector length.  */
>>>>> -      if (ins->modrm.mod == 3 && ins->vex.b)
>>>>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
>>>>> + evex_default)
>>>>>  	ins->vex.length = 512;
>>>>>        else
>>>>>  	{
>>>>
>>>> Is this change really needed for anything?
>>>
>>> If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong value.
>>
>> But this is recording ->vex.length, not anything NDD related (afaics).
> 
> There are some instructions that use OP_VEX, which will use ->vex.length.
> 
> For example:
> "addB",             { VexGb, Eb, Gb }

But that's a GPR, for which ->vex.length is not supposed to have an effect.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-22  5:48     ` Cui, Lili
@ 2023-11-22  8:53       ` Jan Beulich
  2023-11-22 12:26         ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-22  8:53 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Mo, Zewei

On 22.11.2023 06:48, Cui, Lili wrote:
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -6854,6 +6858,24 @@ check_VecOperands (const insn_template *t)
>>>  	}
>>>      }
>>>
>>> +  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same
>>> + registers.  */  if (t->opcode_modifier.push2pop2)
>>
>> I question this way of recognizing these two insns: You introduce a whole new
>> table column here just to have two entries set this bit.
>> This is cheaper by comparing the mnemonic offsets, as we do elsewhere in
>> various cases.
>>
> 
> Done.
> 
>> I also disagree with putting the check in check_VecOperands():
>> There's nothing vector-ish here. Either you put it straight in the caller, or you
>> introduce a new check_APX_operands().
>>
> 
> How about  putting check_EgprOperands into check_APX_operands ?
> 
>       /* Check if EGRPS operands(r16-r31) are valid.  */
>       if (check_EgprOperands (t))
>         {
>           specific_error = progress (i.error);
>           continue;
>         }
> 
>       /* Check if APX operands are valid.  */
>       if (check_APX_operands (t))
>         {
>           specific_error = progress (i.error);
>           continue;
>         }

Hmm, question and suggested code don't fit together. The suggested code
certainly fits what I was suggesting earlier.

>>> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>>> @@ -30,3 +30,9 @@ _start:
>>>          .byte 0xff
>>>          #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
>>>          .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
>>> +        .byte 0xff, 0xff
>>> +	# pop2 %rax, %rbx set EVEX.ND=0.
>>> +        .byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
>>> +        .byte 0xff, 0xff, 0xff
>>> +	# pop2 %rax, %rsp set EVEX.VVVV=0xf.
>>> +        .byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
>>
>> This 2nd comment looks bogus. What is it that's being tested here?
>>
> 
> I think it should be  # pop2 %rax set EVEX.vvvv' = 1111. It wants to test that pop2 has only one operand when decoding.

But POP2 has two operands, one encoded in EVEX.vvvv. The use of %rsp as
an operand is what I would think is being tested here, but then I don't
see why the comment mentions EVEX.vvvv.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-22  8:47           ` Jan Beulich
@ 2023-11-22 10:45             ` Cui, Lili
  2023-11-23 10:57               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-22 10:45 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, November 22, 2023 4:48 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Kong, Lingling <lingling.kong@intel.com>
> Subject: Re: [PATCH 5/8] Support APX NDD
> 
> On 22.11.2023 08:46, Cui, Lili wrote:
> >>>>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
> >>>> instr_info *ins)
> >>>>>  	return &err_opcode;
> >>>>>
> >>>>>        /* Set vector length.  */
> >>>>> -      if (ins->modrm.mod == 3 && ins->vex.b)
> >>>>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
> >>>>> + evex_default)
> >>>>>  	ins->vex.length = 512;
> >>>>>        else
> >>>>>  	{
> >>>>
> >>>> Is this change really needed for anything?
> >>>
> >>> If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong
> value.
> >>
> >> But this is recording ->vex.length, not anything NDD related (afaics).
> >
> > There are some instructions that use OP_VEX, which will use ->vex.length.
> >
> > For example:
> > "addB",             { VexGb, Eb, Gb }
> 
> But that's a GPR, for which ->vex.length is not supposed to have an effect.
> 

For EVEX-promoted instructions,  evex.ll == 0b00, which has the same encoding as vex.length == 128 and they can share the same processing with ->vex.length, ->vex.length also handles GPR in OP_VEX.


  switch (ins->vex.length)
    {
    case 128:
      switch (bytemode)
        {
        case x_mode:
          names = att_names_xmm;
          ins->evex_used |= EVEX_len_used;
          break;
        case v_mode:
        case dq_mode:
          if (ins->rex & REX_W)
            names = att_names64;
          else if (bytemode == v_mode
                   && !(sizeflag & DFLAG))
            names = att_names16;
          else
            names = att_names32;
          break;
        case b_mode:
          names = att_names8rex;
          break;
        case q_mode:
          names = att_names64;
          break;
        case mask_bd_mode:
        case mask_mode:
          if (reg > 0x7)
            {
              oappend (ins, "(bad)");
              return true;
            }
          names = att_names_mask;
          break;
        default:
          abort ();
          return true;
        }
      break;

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 6/8] Support APX Push2/Pop2
  2023-11-22  8:53       ` Jan Beulich
@ 2023-11-22 12:26         ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-22 12:26 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Mo, Zewei

> >>> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> >>> @@ -30,3 +30,9 @@ _start:
> >>>          .byte 0xff
> >>>          #{evex} inc %rax EVEX.vvvv' > 0 (illegal value).
> >>>          .byte 0x62, 0xf4, 0xec, 0x08, 0xff, 0xc0
> >>> +        .byte 0xff, 0xff
> >>> +	# pop2 %rax, %rbx set EVEX.ND=0.
> >>> +        .byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
> >>> +        .byte 0xff, 0xff, 0xff
> >>> +	# pop2 %rax, %rsp set EVEX.VVVV=0xf.
> >>> +        .byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
> >>
> >> This 2nd comment looks bogus. What is it that's being tested here?
> >>
> >
> > I think it should be  # pop2 %rax set EVEX.vvvv' = 1111. It wants to test that
> pop2 has only one operand when decoding.
> 
> But POP2 has two operands, one encoded in EVEX.vvvv. The use of %rsp as an
> operand is what I would think is being tested here, but then I don't see why
> the comment mentions EVEX.vvvv.
> 

Ok, I changed it to test %rsp.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
  2023-11-17 14:38           ` Jan Beulich
@ 2023-11-22 13:40             ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-22 13:40 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 17, 2023 10:38 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex prefix
> 
> On 17.11.2023 13:42, Cui, Lili wrote:
> >> Subject: Re: [PATCH 4/8] Add tests for APX GPR32 with extend evex
> >> prefix
> >>
> >> On 16.11.2023 16:34, Cui, Lili wrote:
> >>> I'm confused here about adding crc test case in noreg64.s, could you
> >> elaborate on what testcase you want to add?
> >>>
> >>>         pfx crc32       (%rax), %eax
> >>>         pfx16 crc32     (%rax), %rax
> >>> +       pfx crc32       (%r31),%r21d   ---> data size prefix invalid with `crc32'
> >>> +       pfx crc32       (%r31),%r21     ---> data size prefix invalid with `crc32'
> >>
> >> Well, of course you can't use the "pfx" macro (at least not as is),
> >> which will emit a data size prefix when DATA16 is defined. Likewise it
> would emit "rex64"
> >> when REX64 is defined, which doesn't make sense with EVEX-encoded
> insns.
> >> Ideally you would introduce a new macro to control operand size in an
> >> EVEX- like manner, just that I'm afraid that the way you're adding
> >> EVEX- encoding support to gas doesn't offer any means equivalent to
> >> that of legacy encodings. Hence only the "bare" EVEX-encoded insns
> >> (without the use of any
> >> pfx*) should be added for the time being.
> >>
> >> Also, ftaod, CRC32 was only an example here. Any new template you add
> >> which allows for potentially ambiguous operand size will need an
> >> example added here. This set of tests (noreg64*) is intended to be
> >> (and remain) exhaustive.
> >>
> >> Albeit, thinking a little further, perhaps you simply want to
> >> introduce a noreg64-evex.d referencing the same source file, but
> >> arranging for {evex} to be emitted in the pfx macro (or a further
> >> clone thereof, as some of the insns cannot be EVEX-encoded)? That
> >> would then also deal with covering all relevant new templates (I
> >> think). You'd need to check what, if anything, needs doing to the
> >> pfx16 and pfx64 macros. But of course you could also introduce a
> >> fully standalone noreg64-apx.{s,d} test, to escape some of the possible
> hassles.
> >>
> >
> > I listed some tests, most of EVEX-promoted instructions support prefix 66,
> we included all of these test cases in Part II 1/6 (except for crc32 which is
> already listed in the current file). Part II 1/6 it is suspended, because it also
> covers the NF patch instructions.
> >
> >         /* Set EVEX.pp to 66.  */
> >         crc32  %r31w,%r21d
> >         crc32w (%r31),%r21d
> >         adc $1, (%r31w)
> 
> This one ought to be a mistake.
> 
> >         adcw $1, (%r31)
> >
> >         /* Set EVEX.W to 1.  */
> >         crc32  %rax,%r18
> >         adc %r15,%r16
> 
> Of the above most aren't ambiguous as to operand size. The purpose of the
> test (or group of tests) is not so much to check correct encoding (except of
> course to prove correct [aka intended] choice of defaults), but to check that
> all ambiguities are properly detected and reported (with the exception of a
> few where H.J. is of the opinion that they shouldn't be diagnosed in AT&T
> mode, even if that lack of diagnostics had - back at the time - allowed for a gcc
> bug to go unnoticed for quite some time).
> 
> Therefore if e.g. "data16" cannot be used with an insn (as is the case for
> EVEX-encoded ones), there's also no need to have special checking for the
> EVEX.pp=01 case. Thus my suggestion to simply arrange for the pfx macro to
> emit {evex} prefixes (or to clone the test in order to escape issues with insns
> which don't allow for EVEX encodings).
> 

Ok, we still have some apx instructions supported in the NF patch, we will add the entire test case after it.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 5/8] Support APX NDD
  2023-11-22 10:45             ` Cui, Lili
@ 2023-11-23 10:57               ` Jan Beulich
  2023-11-23 12:14                 ` Cui, Lili
  2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
  0 siblings, 2 replies; 113+ messages in thread
From: Jan Beulich @ 2023-11-23 10:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling

On 22.11.2023 11:45, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, November 22, 2023 4:48 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
>> binutils@sourceware.org; Kong, Lingling <lingling.kong@intel.com>
>> Subject: Re: [PATCH 5/8] Support APX NDD
>>
>> On 22.11.2023 08:46, Cui, Lili wrote:
>>>>>>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386 *dp,
>>>>>> instr_info *ins)
>>>>>>>  	return &err_opcode;
>>>>>>>
>>>>>>>        /* Set vector length.  */
>>>>>>> -      if (ins->modrm.mod == 3 && ins->vex.b)
>>>>>>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
>>>>>>> + evex_default)
>>>>>>>  	ins->vex.length = 512;
>>>>>>>        else
>>>>>>>  	{
>>>>>>
>>>>>> Is this change really needed for anything?
>>>>>
>>>>> If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a wrong
>> value.
>>>>
>>>> But this is recording ->vex.length, not anything NDD related (afaics).
>>>
>>> There are some instructions that use OP_VEX, which will use ->vex.length.
>>>
>>> For example:
>>> "addB",             { VexGb, Eb, Gb }
>>
>> But that's a GPR, for which ->vex.length is not supposed to have an effect.
>>
> 
> For EVEX-promoted instructions,  evex.ll == 0b00, which has the same encoding as vex.length == 128 and they can share the same processing with ->vex.length, ->vex.length also handles GPR in OP_VEX.
> 
> 
>   switch (ins->vex.length)
>     {
>     case 128:
>       switch (bytemode)
>         {
>         case x_mode:
>           names = att_names_xmm;
>           ins->evex_used |= EVEX_len_used;
>           break;
>         case v_mode:
>         case dq_mode:
>           if (ins->rex & REX_W)
>             names = att_names64;
>           else if (bytemode == v_mode
>                    && !(sizeflag & DFLAG))
>             names = att_names16;
>           else
>             names = att_names32;
>           break;
>         case b_mode:
>           names = att_names8rex;
>           break;
>         case q_mode:
>           names = att_names64;
>           break;
>         case mask_bd_mode:
>         case mask_mode:
>           if (reg > 0x7)
>             {
>               oappend (ins, "(bad)");
>               return true;
>             }
>           names = att_names_mask;
>           break;
>         default:
>           abort ();
>           return true;
>         }
>       break;

Hmm, okay, I see that this is then down to an anomaly in pre-existing code.
I don't think any of what's quoted above should actually depend on
->vex.length; it'll surely need to change once the first VEX/EVEX-encoded
insns appears while has .l / .ll != 0 but still encodes a GPR. Yet then
nothing I can sensibly demand you fix up front. Without which what I'd like
to ask for (re-iterating earlier remarks): For anything not obvious (which
this falls under), please add some explanation to the patch description.
The general issue there is that while ChangeLog entries enumerate what is
being done, they hardly ever say _why_ certain changes are needed.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 5/8] Support APX NDD
  2023-11-23 10:57               ` Jan Beulich
@ 2023-11-23 12:14                 ` Cui, Lili
  2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
  1 sibling, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-11-23 12:14 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Kong, Lingling



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, November 23, 2023 6:58 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Kong, Lingling <lingling.kong@intel.com>
> Subject: Re: [PATCH 5/8] Support APX NDD
> 
> On 22.11.2023 11:45, Cui, Lili wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, November 22, 2023 4:48 PM
> >> To: Cui, Lili <lili.cui@intel.com>
> >> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> >> binutils@sourceware.org; Kong, Lingling <lingling.kong@intel.com>
> >> Subject: Re: [PATCH 5/8] Support APX NDD
> >>
> >> On 22.11.2023 08:46, Cui, Lili wrote:
> >>>>>>> @@ -9087,7 +9104,7 @@ get_valid_dis386 (const struct dis386
> *dp,
> >>>>>> instr_info *ins)
> >>>>>>>  	return &err_opcode;
> >>>>>>>
> >>>>>>>        /* Set vector length.  */
> >>>>>>> -      if (ins->modrm.mod == 3 && ins->vex.b)
> >>>>>>> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type
> >>>>>>> + ==
> >>>>>>> + evex_default)
> >>>>>>>  	ins->vex.length = 512;
> >>>>>>>        else
> >>>>>>>  	{
> >>>>>>
> >>>>>> Is this change really needed for anything?
> >>>>>
> >>>>> If it's NDD and ins->vex.b ==1, we need to avoid giving NDD a
> >>>>> wrong
> >> value.
> >>>>
> >>>> But this is recording ->vex.length, not anything NDD related (afaics).
> >>>
> >>> There are some instructions that use OP_VEX, which will use ->vex.length.
> >>>
> >>> For example:
> >>> "addB",             { VexGb, Eb, Gb }
> >>
> >> But that's a GPR, for which ->vex.length is not supposed to have an effect.
> >>
> >
> > For EVEX-promoted instructions,  evex.ll == 0b00, which has the same
> encoding as vex.length == 128 and they can share the same processing with -
> >vex.length, ->vex.length also handles GPR in OP_VEX.
> >
> >
> >   switch (ins->vex.length)
> >     {
> >     case 128:
> >       switch (bytemode)
> >         {
> >         case x_mode:
> >           names = att_names_xmm;
> >           ins->evex_used |= EVEX_len_used;
> >           break;
> >         case v_mode:
> >         case dq_mode:
> >           if (ins->rex & REX_W)
> >             names = att_names64;
> >           else if (bytemode == v_mode
> >                    && !(sizeflag & DFLAG))
> >             names = att_names16;
> >           else
> >             names = att_names32;
> >           break;
> >         case b_mode:
> >           names = att_names8rex;
> >           break;
> >         case q_mode:
> >           names = att_names64;
> >           break;
> >         case mask_bd_mode:
> >         case mask_mode:
> >           if (reg > 0x7)
> >             {
> >               oappend (ins, "(bad)");
> >               return true;
> >             }
> >           names = att_names_mask;
> >           break;
> >         default:
> >           abort ();
> >           return true;
> >         }
> >       break;
> 
> Hmm, okay, I see that this is then down to an anomaly in pre-existing code.
> I don't think any of what's quoted above should actually depend on
> ->vex.length; it'll surely need to change once the first
> ->VEX/EVEX-encoded
> insns appears while has .l / .ll != 0 but still encodes a GPR. Yet then nothing I
> can sensibly demand you fix up front. Without which what I'd like to ask for
> (re-iterating earlier remarks): For anything not obvious (which this falls under),
> please add some explanation to the patch description.
> The general issue there is that while ChangeLog entries enumerate what is
> being done, they hardly ever say _why_ certain changes are needed.
> 
Thank you for your suggestion,  comments are really important. These patches were not written by me, when you asked about many strange places without comments, I actually couldn’t understand them either. I could only remove the code and compile it to extract the information from the wrong testcase.  I also realized the importance of comments.

Regards,
Lili.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-14 11:15       ` Jan Beulich
@ 2023-11-24  5:40         ` Hu, Lin1
  2023-11-24  7:21           ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-24  5:40 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, November 14, 2023 7:15 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Cui, Lili <lili.cui@intel.com>
> Subject: Re: [PATCH 8/8] Support APX JMPABS
> 
> On 14.11.2023 04:26, Hu, Lin1 wrote:
> >  > On 02.11.2023 12:29, Cui, Lili wrote:
> >> There's a further question regarding its operand representation,
> >> though: Can you explain why it's Imm64, not Disp64? The latter would,
> >> to me, seem more natural to use here. Not only from a assembler
> >> internals perspective, but also from the users' one: The $ in the
> >> operand carries absolutely no meaning (see also the related testcase
> >> comment below) in AT&T syntax, and there's no noticeable difference in Intel
> syntax afaict.
> >
> > In my opinion, If compiler want  to jump "anywhere" and the displacement can
> not fit in a 32-bit immediate , compiler will fallback to indirect branches.
> 
> Unless it knows that it may use this ISA extension.
> 
> > My current knowledge is that jmpabs came about as a solution to the problem
> about indirect braches. It's not the same as the jmp. Currently the parameters of
> jmpabs are absolute addresses optimized by PLT or JIT. I think using imm64
> avoids confusion with the disp64. That's why the designer designed it this way.
> >
> > One colleague in our group have written an introductory document can
> > be referred. (https://kanrobert.github.io/rfc/All-about-APX-JMPABS/)
> 
> Well, "Compiler usages" there ignores other than small-code-model programs.
> Furthermore "none currently" doesn't mean the assembler shouldn't support all
> possible uses. If I was going from what's said there, there wouldn't be a need to
> support JMPABS in gas at all.
>

We've decided to only support disassembler for now, and for future support for assembler, we're going to wait for HJ to come back to see if linker can support the way "jmpabs symbols" are used.

>
> >>> @@ -8939,6 +8940,9 @@ process_operands (void)
> >>>  	}
> >>>      }
> >>>
> >>> +  if (i.tm.mnem_off == MN_jmpabs)
> >>> +    i.rex2_encoding = true;
> >>
> >> Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do
> >> here is effect the latter. Yet as indicated, the pseudo-prefix isn't
> >> really an indication of "must have REX2 prefix", but only a weak
> >> request to do so if possible. I think you want to set i.rex2 here
> >> instead, requiring a way to express that an empty REX2 prefix is wanted.
> >>
> >
> > But in terms of encoding, i.rex2 should be 0. Can I do special handling in
> build_rex2_prefix?
> 
> build_rex2_prefix() wants to remain generic. What I was trying to hint at though
> is that it ought to be possible to set bits in i.rex2 (to make it non-zero) which
> then aren't encoded into the REX2 payload byte (leveraging that only the low
> three bits are actually contributing to the final encoding). The important point is
> that both i.rex2 and i.rex2_encoding retain the specific meaning they are
> intended to have.
>

I have set i.rex2 = 16;
 
>
> >>> --- /dev/null
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
> >>> @@ -0,0 +1,17 @@
> >>> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
> >>> +
> >>> +	.text
> >>> +# With 66 prefix
> >>> +	.byte
> >> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +# With 67 prefix
> >>> +	.byte
> >> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +# With F2 prefix
> >>> +	.byte
> >> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +# With F3 prefix
> >>> +	.byte
> >> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>> +# REX2.M0 = 0 REX2.W = 1
> >>> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>
> >> As per earlier comments: This wants expressing via .insn, to yield
> >> input to gas human-readable (even if, as it looks, two .insn are
> >> going to be required per resulting construct). Further in the last
> >> comment, why is
> >> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve
> >> the
> >> 0x64 bytes here? The encodings are invalid irrespective of them.
> >> Instead I'd kind have expected LOCK to also be covered.
> >>
> >
> > Because this error line is only for the special case where M0 == 0, and
> base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1,
> base_opcode == 0xa1, I think it could decoding as mov rax, moffs or ( some
> future insn). Elsewhere it's just excluding invalid prefixes.
> 
> Yet REX2.M == 0 is as relevant there (until such time where some of those
> prefixes used is assigned meaning).
>

I don't think so. the original result with W = 1 was MOV RAX, moffs, but the insn can't support REX2. So If someone input .byte REX2.M = 0 && REX2.W = 1, it should be a Bad_Opcode. 

> 
> >> Also a spec question as we're talking of what is or is not valid (i.e.
> >> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's no
> >> use of eGPR-s here.
> >>
> >
> > Sorry, what is XCR0.APX?
> 
> Bit 19 of the XCR0 register. It is mentioned in exactly this way in the APX-
> LEGACY-JMPABS exception class description.
>

I think XCR0.APX is a state bit to control if support APX instruction set not if support eGPR-s.
 
>
> >>> --- a/opcodes/i386-dis.c
> >>> +++ b/opcodes/i386-dis.c
> >>> @@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int,
> >>> int); static bool DistinctDest_Fixup (instr_info *, int, int);
> >>> static bool PREFETCHI_Fixup (instr_info *, int, int);  static bool
> >>> PUSH2_POP2_Fixup (instr_info *, int, int);
> >>> +static bool JMPABS_Fixup (instr_info *, int, int);
> >>>
> >>>  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
> >>>  						enum disassembler_style,
> >>> @@ -258,6 +259,9 @@ struct instr_info
> >>>    char scale_char;
> >>>
> >>>    enum x86_64_isa isa64;
> >>> +
> >>> +  /* Remember if the current op is jmpabs.  */  bool is_jmpabs;
> >>>  };
> >>
> >> This field would probably best live next to op_is_jump (and then also
> >> be named op_is_jmpabs, assuming a separate boolean is indeed needed).
> >> I further expect that op_is_jump also wants setting for JMPABS.
> >>
> >
> > Can I change op_is_jump's type from bool to unsigned int?
> 
> If you need it to hold a 3rd value, perhaps. Albeit more to an enum then than to
> unsigned int.
>

Ok, Have modified.
 
BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH v3 0/9] Support Intel APX EGPR
  2023-11-23 10:57               ` Jan Beulich
  2023-11-23 12:14                 ` Cui, Lili
@ 2023-11-24  6:56                 ` Cui, Lili
  2023-12-07  8:17                   ` Cui, Lili
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-11-24  6:56 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils


This is V3 of all APX patches.
1. Created a patch to make const_1_mode print $1 in AT&T syntax.
2. How to print the rex2 prefix needs to be discussed later.
3. After NF patch, need to add tests for pfx macros emitting {evex} in noreg64.s.

Cui, Lili (5):
  Make const_1_mode print $1 in AT&T syntax
  Support APX GPR32 with rex2 prefix
  Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  Support APX GPR32 with extend evex prefix
  Add tests for APX GPR32 with extend evex prefix

Hu, Lin1 (2):
  Support APX NDD optimized encoding.
  Support APX JMPABS for disassembler

Mo, Zewei (1):
  Support APX Push2/Pop2

konglin1 (1):
  Support APX NDD


Thanks,
Lili.
 gas/config/tc-i386.c                          | 472 +++++++++++--
 gas/doc/c-i386.texi                           |   6 +-
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |   5 +
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |   9 +
 gas/testsuite/gas/i386/i386.exp               |   1 +
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +-
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +-
 gas/testsuite/gas/i386/intel.d                |   6 +-
 gas/testsuite/gas/i386/lfence-load.d          |   2 +-
 gas/testsuite/gas/i386/noreg16-data32.d       |  32 +-
 gas/testsuite/gas/i386/noreg16.d              |  32 +-
 gas/testsuite/gas/i386/noreg32-data16.d       |  32 +-
 gas/testsuite/gas/i386/noreg32.d              |  32 +-
 gas/testsuite/gas/i386/noreg64-data16.d       |  32 +-
 gas/testsuite/gas/i386/noreg64-rex64.d        |  32 +-
 gas/testsuite/gas/i386/noreg64.d              |  32 +-
 gas/testsuite/gas/i386/opcode-suffix.d        |   6 +-
 gas/testsuite/gas/i386/opcode.d               |  10 +-
 .../gas/i386/x86-64-apx-egpr-inval.l          | 203 ++++++
 .../gas/i386/x86-64-apx-egpr-inval.s          | 210 ++++++
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 +  .../gas/i386/x86-64-apx-egpr-promote-inval.s  |  29 +  gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |  20 +  gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  34 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  36 +
 .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 +++++++++
 .../gas/i386/x86-64-apx-evex-promoted.d       | 318 +++++++++
 .../gas/i386/x86-64-apx-evex-promoted.s       | 314 +++++++++
 .../gas/i386/x86-64-apx-jmpabs-intel.d        |  11 +
 .../gas/i386/x86-64-apx-jmpabs-inval.d        |  40 ++
 .../gas/i386/x86-64-apx-jmpabs-inval.s        |  15 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    |  11 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |   5 +
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 130 ++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 123 ++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 +++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 +++++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     |  42 ++
 .../gas/i386/x86-64-apx-push2pop2-inval.l     |  13 +
 .../gas/i386/x86-64-apx-push2pop2-inval.s     |  17 +
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d |  42 ++  gas/testsuite/gas/i386/x86-64-apx-push2pop2.s |  39 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 +++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 +++
 gas/testsuite/gas/i386/x86-64-evex.d          |   2 +-
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |   2 +-
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-opcode.d        |   6 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  59 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  58 ++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  63 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  65 ++
 gas/testsuite/gas/i386/x86-64.exp             |  17 +-
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis-evex-len.h                   |  10 +
 opcodes/i386-dis-evex-prefix.h                |  66 ++
 opcodes/i386-dis-evex-reg.h                   |  71 ++
 opcodes/i386-dis-evex-w.h                     |  10 +
 opcodes/i386-dis-evex-x86-64.h                |  60 ++
 opcodes/i386-dis-evex.h                       | 347 +++++++++-
 opcodes/i386-dis.c                            | 644 +++++++++++++-----
 opcodes/i386-gen.c                            |  57 +-
 opcodes/i386-opc.h                            |  30 +-
 opcodes/i386-opc.tbl                          | 210 ++++--
 opcodes/i386-reg.tbl                          |  64 ++
 70 files changed, 4669 insertions(+), 570 deletions(-)  create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

--
2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-24  5:40         ` Hu, Lin1
@ 2023-11-24  7:21           ` Jan Beulich
  2023-11-27  2:16             ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-24  7:21 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 24.11.2023 06:40, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, November 14, 2023 7:15 PM
>>
>> On 14.11.2023 04:26, Hu, Lin1 wrote:
>>>  > On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -8939,6 +8940,9 @@ process_operands (void)
>>>>>  	}
>>>>>      }
>>>>>
>>>>> +  if (i.tm.mnem_off == MN_jmpabs)
>>>>> +    i.rex2_encoding = true;
>>>>
>>>> Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do
>>>> here is effect the latter. Yet as indicated, the pseudo-prefix isn't
>>>> really an indication of "must have REX2 prefix", but only a weak
>>>> request to do so if possible. I think you want to set i.rex2 here
>>>> instead, requiring a way to express that an empty REX2 prefix is wanted.
>>>>
>>>
>>> But in terms of encoding, i.rex2 should be 0. Can I do special handling in
>> build_rex2_prefix?
>>
>> build_rex2_prefix() wants to remain generic. What I was trying to hint at though
>> is that it ought to be possible to set bits in i.rex2 (to make it non-zero) which
>> then aren't encoded into the REX2 payload byte (leveraging that only the low
>> three bits are actually contributing to the final encoding). The important point is
>> that both i.rex2 and i.rex2_encoding retain the specific meaning they are
>> intended to have.
>>
> 
> I have set i.rex2 = 16;

But hopefully not exactly this way, but using a self-descriptive #define for
the integer literal.

>>>>> --- /dev/null
>>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
>>>>> @@ -0,0 +1,17 @@
>>>>> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
>>>>> +
>>>>> +	.text
>>>>> +# With 66 prefix
>>>>> +	.byte
>>>> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +# With 67 prefix
>>>>> +	.byte
>>>> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +# With F2 prefix
>>>>> +	.byte
>>>> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +# With F3 prefix
>>>>> +	.byte
>>>> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>> +# REX2.M0 = 0 REX2.W = 1
>>>>> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>
>>>> As per earlier comments: This wants expressing via .insn, to yield
>>>> input to gas human-readable (even if, as it looks, two .insn are
>>>> going to be required per resulting construct). Further in the last
>>>> comment, why is
>>>> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve
>>>> the
>>>> 0x64 bytes here? The encodings are invalid irrespective of them.
>>>> Instead I'd kind have expected LOCK to also be covered.
>>>>
>>>
>>> Because this error line is only for the special case where M0 == 0, and
>> base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1,
>> base_opcode == 0xa1, I think it could decoding as mov rax, moffs or ( some
>> future insn). Elsewhere it's just excluding invalid prefixes.
>>
>> Yet REX2.M == 0 is as relevant there (until such time where some of those
>> prefixes used is assigned meaning).
>>
> 
> I don't think so. the original result with W = 1 was MOV RAX, moffs, but the insn can't support REX2. So If someone input .byte REX2.M = 0 && REX2.W = 1, it should be a Bad_Opcode. 

I fully agree with this. But that doesn't invalidate my original comment:
REX2.M is relevant in all of these tests, and hence comments end up
inconsistent. At the risk of repeating myself - especially when you use
.byte for encoding an instruction, what that (unreadable) byte sequence
is supposed to encode wants to be properly stated in a comment then.
When using .insn, some parts may become self-descriptive, hence why
generally .insn ought to be preferred over .byte.

>>>> Also a spec question as we're talking of what is or is not valid (i.e.
>>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's no
>>>> use of eGPR-s here.
>>>>
>>>
>>> Sorry, what is XCR0.APX?
>>
>> Bit 19 of the XCR0 register. It is mentioned in exactly this way in the APX-
>> LEGACY-JMPABS exception class description.
>>
> 
> I think XCR0.APX is a state bit to control if support APX instruction set not if support eGPR-s.

No, XCR0 very certainly is a set of controls affecting register use. If
JMPABS is also controlled by it, then I'd view this as an erratum if it
appeared like that in silicon; that erratum may well be a design one then,
or one "justified" by simplifying the implementation in some way, but it
would still be wrong from a conceptual pov.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-24  7:21           ` Jan Beulich
@ 2023-11-27  2:16             ` Hu, Lin1
  2023-11-27  8:03               ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-27  2:16 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 24, 2023 3:21 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Cui, Lili <lili.cui@intel.com>
> Subject: Re: [PATCH 8/8] Support APX JMPABS
> 
> On 24.11.2023 06:40, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Tuesday, November 14, 2023 7:15 PM
> >>
> >> On 14.11.2023 04:26, Hu, Lin1 wrote:
> >>>  > On 02.11.2023 12:29, Cui, Lili wrote:
> >>>>> @@ -8939,6 +8940,9 @@ process_operands (void)
> >>>>>  	}
> >>>>>      }
> >>>>>
> >>>>> +  if (i.tm.mnem_off == MN_jmpabs)
> >>>>> +    i.rex2_encoding = true;
> >>>>
> >>>> Please see my earlier remarks wrt "rex2" vs "{rex2}". What you do
> >>>> here is effect the latter. Yet as indicated, the pseudo-prefix
> >>>> isn't really an indication of "must have REX2 prefix", but only a
> >>>> weak request to do so if possible. I think you want to set i.rex2
> >>>> here instead, requiring a way to express that an empty REX2 prefix is
> wanted.
> >>>>
> >>>
> >>> But in terms of encoding, i.rex2 should be 0. Can I do special
> >>> handling in
> >> build_rex2_prefix?
> >>
> >> build_rex2_prefix() wants to remain generic. What I was trying to
> >> hint at though is that it ought to be possible to set bits in i.rex2
> >> (to make it non-zero) which then aren't encoded into the REX2 payload
> >> byte (leveraging that only the low three bits are actually
> >> contributing to the final encoding). The important point is that both
> >> i.rex2 and i.rex2_encoding retain the specific meaning they are intended to
> have.
> >>
> >
> > I have set i.rex2 = 16;
> 
> But hopefully not exactly this way, but using a self-descriptive #define for the
> integer literal.
>

OK.

> 
> >>>>> --- /dev/null
> >>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
> >>>>> @@ -0,0 +1,17 @@
> >>>>> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
> >>>>> +
> >>>>> +	.text
> >>>>> +# With 66 prefix
> >>>>> +	.byte
> >>>> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +	.byte
> >>>>> +0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +# With 67 prefix
> >>>>> +	.byte
> >>>> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +	.byte
> >>>>> +0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +# With F2 prefix
> >>>>> +	.byte
> >>>> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +	.byte
> >>>>> +0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +# With F3 prefix
> >>>>> +	.byte
> >>>> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +	.byte
> >>>>> +0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>> +# REX2.M0 = 0 REX2.W = 1
> >>>>> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
> >>>>
> >>>> As per earlier comments: This wants expressing via .insn, to yield
> >>>> input to gas human-readable (even if, as it looks, two .insn are
> >>>> going to be required per resulting construct). Further in the last
> >>>> comment, why is
> >>>> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve
> >>>> the
> >>>> 0x64 bytes here? The encodings are invalid irrespective of them.
> >>>> Instead I'd kind have expected LOCK to also be covered.
> >>>>
> >>>
> >>> Because this error line is only for the special case where M0 == 0,
> >>> and
> >> base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1,
> >> base_opcode == 0xa1, I think it could decoding as mov rax, moffs or (
> >> some future insn). Elsewhere it's just excluding invalid prefixes.
> >>
> >> Yet REX2.M == 0 is as relevant there (until such time where some of
> >> those prefixes used is assigned meaning).
> >>
> >
> > I don't think so. the original result with W = 1 was MOV RAX, moffs, but the
> insn can't support REX2. So If someone input .byte REX2.M = 0 && REX2.W = 1, it
> should be a Bad_Opcode.
> 
> I fully agree with this. But that doesn't invalidate my original comment:
> REX2.M is relevant in all of these tests, and hence comments end up inconsistent.
> At the risk of repeating myself - especially when you use .byte for encoding an
> instruction, what that (unreadable) byte sequence is supposed to encode wants
> to be properly stated in a comment then.
> When using .insn, some parts may become self-descriptive, hence why
> generally .insn ought to be preferred over .byte.
>

OK, I will add REX2.M = 0 in other comments. I'm afraid I can't use .insn, because I don't know how to express d5.

 
>
> >>>> Also a spec question as we're talking of what is or is not valid (i.e.
> >>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's
> >>>> no use of eGPR-s here.
> >>>>
> >>>
> >>> Sorry, what is XCR0.APX?
> >>
> >> Bit 19 of the XCR0 register. It is mentioned in exactly this way in
> >> the APX- LEGACY-JMPABS exception class description.
> >>
> >
> > I think XCR0.APX is a state bit to control if support APX instruction set not if
> support eGPR-s.
> 
> No, XCR0 very certainly is a set of controls affecting register use. If JMPABS is
> also controlled by it, then I'd view this as an erratum if it appeared like that in
> silicon; that erratum may well be a design one then, or one "justified" by
> simplifying the implementation in some way, but it would still be wrong from a
> conceptual pov.
> 

In APX.pdf (https://cdrdv2.intel.com/v1/dl/getContent/784266) section 3.1.4.2, XCR0 govern APX State and prefixes. And In sdm.pdf (https://cdrdv2.intel.com/v1/dl/getContent/671200), section 13.3 page 323,  "Software can excute Intel AVX-512 instructions only if CR4.OSXSAVE = 1 and XCR0[7:5] = 111b." Their focus is on XCR0[7:5], not on what corresponding registers are used, For example, 
page 567 Table 2-37, alougth bit 6 is used for the upper 256 bits of the register ZMM0-ZMM15, 256 encoding instructions still require to set the bit.

I admit that in the earliest introductions of XCR0[7:5], it looks like it's for register state use only, but from actual use later on, they're more of an instruction state, but from the introductions of the XCR0.APX bit, that's not a problem.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-27  2:16             ` Hu, Lin1
@ 2023-11-27  8:03               ` Jan Beulich
  2023-11-27  8:46                 ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-27  8:03 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 27.11.2023 03:16, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 24, 2023 3:21 PM
>>
>> On 24.11.2023 06:40, Hu, Lin1 wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Tuesday, November 14, 2023 7:15 PM
>>>>
>>>> On 14.11.2023 04:26, Hu, Lin1 wrote:
>>>>>  > On 02.11.2023 12:29, Cui, Lili wrote:
>>>>>>> --- /dev/null
>>>>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
>>>>>>> @@ -0,0 +1,17 @@
>>>>>>> +# Check bytecode of APX_F jmpabs instructions with illegal encode.
>>>>>>> +
>>>>>>> +	.text
>>>>>>> +# With 66 prefix
>>>>>>> +	.byte
>>>>>> 0x66,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +	.byte
>>>>>>> +0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +# With 67 prefix
>>>>>>> +	.byte
>>>>>> 0x67,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +	.byte
>>>>>>> +0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +# With F2 prefix
>>>>>>> +	.byte
>>>>>> 0xf2,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +	.byte
>>>>>>> +0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +# With F3 prefix
>>>>>>> +	.byte
>>>>>> 0xf3,0x64,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +	.byte
>>>>>>> +0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>> +# REX2.M0 = 0 REX2.W = 1
>>>>>>> +	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
>>>>>>
>>>>>> As per earlier comments: This wants expressing via .insn, to yield
>>>>>> input to gas human-readable (even if, as it looks, two .insn are
>>>>>> going to be required per resulting construct). Further in the last
>>>>>> comment, why is
>>>>>> REX2.M0 mentioned there, but not elsewhere? Also what purpose serve
>>>>>> the
>>>>>> 0x64 bytes here? The encodings are invalid irrespective of them.
>>>>>> Instead I'd kind have expected LOCK to also be covered.
>>>>>>
>>>>>
>>>>> Because this error line is only for the special case where M0 == 0,
>>>>> and
>>>> base_opcode == 0xa1, W should be 0, other than 1. If M0 = 1, W = 1,
>>>> base_opcode == 0xa1, I think it could decoding as mov rax, moffs or (
>>>> some future insn). Elsewhere it's just excluding invalid prefixes.
>>>>
>>>> Yet REX2.M == 0 is as relevant there (until such time where some of
>>>> those prefixes used is assigned meaning).
>>>>
>>>
>>> I don't think so. the original result with W = 1 was MOV RAX, moffs, but the
>> insn can't support REX2. So If someone input .byte REX2.M = 0 && REX2.W = 1, it
>> should be a Bad_Opcode.
>>
>> I fully agree with this. But that doesn't invalidate my original comment:
>> REX2.M is relevant in all of these tests, and hence comments end up inconsistent.
>> At the risk of repeating myself - especially when you use .byte for encoding an
>> instruction, what that (unreadable) byte sequence is supposed to encode wants
>> to be properly stated in a comment then.
>> When using .insn, some parts may become self-descriptive, hence why
>> generally .insn ought to be preferred over .byte.
>>
> 
> OK, I will add REX2.M = 0 in other comments. I'm afraid I can't use .insn, because I don't know how to express d5.

Things are going to already be far better if you use .byte only for the REX2
prefix, but a normal insn or .insn for the "main part". Then it'll be easily
recognizable which registers are involved in the operation, or in addressing
a memory operand. Of course you won't be able to use 64-bit register names,
as that would lead to unwanted REX prefixes. But using 32-bit register names
there fully serves the purpose, even when the actual encoding has REX2.W=1.

>>>>>> Also a spec question as we're talking of what is or is not valid (i.e.
>>>>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD? There's
>>>>>> no use of eGPR-s here.
>>>>>>
>>>>>
>>>>> Sorry, what is XCR0.APX?
>>>>
>>>> Bit 19 of the XCR0 register. It is mentioned in exactly this way in
>>>> the APX- LEGACY-JMPABS exception class description.
>>>>
>>>
>>> I think XCR0.APX is a state bit to control if support APX instruction set not if
>> support eGPR-s.
>>
>> No, XCR0 very certainly is a set of controls affecting register use. If JMPABS is
>> also controlled by it, then I'd view this as an erratum if it appeared like that in
>> silicon; that erratum may well be a design one then, or one "justified" by
>> simplifying the implementation in some way, but it would still be wrong from a
>> conceptual pov.
>>
> 
> In APX.pdf (https://cdrdv2.intel.com/v1/dl/getContent/784266) section 3.1.4.2, XCR0 govern APX State and prefixes. And In sdm.pdf (https://cdrdv2.intel.com/v1/dl/getContent/671200), section 13.3 page 323,  "Software can excute Intel AVX-512 instructions only if CR4.OSXSAVE = 1 and XCR0[7:5] = 111b." Their focus is on XCR0[7:5], not on what corresponding registers are used, For example, 
> page 567 Table 2-37, alougth bit 6 is used for the upper 256 bits of the register ZMM0-ZMM15, 256 encoding instructions still require to set the bit.
> 
> I admit that in the earliest introductions of XCR0[7:5], it looks like it's for register state use only, but from actual use later on, they're more of an instruction state, but from the introductions of the XCR0.APX bit, that's not a problem.

Since you take AVX512 for analogy: Can you please point me at an insn which
doesn't use any AVX512 register covered by said three XCR0 bits? Talking of
insn use and talking of register use simply is the same there. Hence the
analogy cannot be used when discussing JMPABS. Furthermore the specific
wording in SDM, ISE, or APX doc also cannot be blindly trusted. What matters
is how silicon is going to behave, and for JMPABS my impression is that if
a dependency on XCR0 was existing there, it would have been introduced
artificially, i.e. without real need. _That's_ what I'm putting under
question.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-27  8:03               ` Jan Beulich
@ 2023-11-27  8:46                 ` Hu, Lin1
  2023-11-27  8:54                   ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-27  8:46 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> >>>>>> Also a spec question as we're talking of what is or is not valid (i.e.
> >>>>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD?
> >>>>>> There's no use of eGPR-s here.
> >>>>>>
> >>>>>
> >>>>> Sorry, what is XCR0.APX?
> >>>>
> >>>> Bit 19 of the XCR0 register. It is mentioned in exactly this way in
> >>>> the APX- LEGACY-JMPABS exception class description.
> >>>>
> >>>
> >>> I think XCR0.APX is a state bit to control if support APX
> >>> instruction set not if
> >> support eGPR-s.
> >>
> >> No, XCR0 very certainly is a set of controls affecting register use.
> >> If JMPABS is also controlled by it, then I'd view this as an erratum
> >> if it appeared like that in silicon; that erratum may well be a
> >> design one then, or one "justified" by simplifying the implementation
> >> in some way, but it would still be wrong from a conceptual pov.
> >>
> >
> > In APX.pdf (https://cdrdv2.intel.com/v1/dl/getContent/784266) section
> > 3.1.4.2, XCR0 govern APX State and prefixes. And In sdm.pdf
> (https://cdrdv2.intel.com/v1/dl/getContent/671200), section 13.3 page 323,
> "Software can excute Intel AVX-512 instructions only if CR4.OSXSAVE = 1 and
> XCR0[7:5] = 111b." Their focus is on XCR0[7:5], not on what corresponding
> registers are used, For example, page 567 Table 2-37, alougth bit 6 is used for
> the upper 256 bits of the register ZMM0-ZMM15, 256 encoding instructions still
> require to set the bit.
> >
> > I admit that in the earliest introductions of XCR0[7:5], it looks like it's for
> register state use only, but from actual use later on, they're more of an
> instruction state, but from the introductions of the XCR0.APX bit, that's not a
> problem.
> 
> Since you take AVX512 for analogy: Can you please point me at an insn which
> doesn't use any AVX512 register covered by said three XCR0 bits? Talking of insn
> use and talking of register use simply is the same there. Hence the analogy
> cannot be used when discussing JMPABS. Furthermore the specific wording in
> SDM, ISE, or APX doc also cannot be blindly trusted. What matters is how silicon
> is going to behave, and for JMPABS my impression is that if a dependency on
> XCR0 was existing there, it would have been introduced artificially, i.e. without
> real need. _That's_ what I'm putting under question.
> 

If I use "{evex} vaddpd ymm0, ymm1, ymm2". If it's for register considerations, I think bit 5, 6, 7 can all be zero. Because the insn doesn't use k0-k7 and the upper 256 bits of the registers ZMM0-ZMM15. 

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-27  8:46                 ` Hu, Lin1
@ 2023-11-27  8:54                   ` Jan Beulich
  2023-11-27  9:03                     ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-27  8:54 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 27.11.2023 09:46, Hu, Lin1 wrote:
>>>>>>>> Also a spec question as we're talking of what is or is not valid (i.e.
>>>>>>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD?
>>>>>>>> There's no use of eGPR-s here.
>>>>>>>>
>>>>>>>
>>>>>>> Sorry, what is XCR0.APX?
>>>>>>
>>>>>> Bit 19 of the XCR0 register. It is mentioned in exactly this way in
>>>>>> the APX- LEGACY-JMPABS exception class description.
>>>>>>
>>>>>
>>>>> I think XCR0.APX is a state bit to control if support APX
>>>>> instruction set not if
>>>> support eGPR-s.
>>>>
>>>> No, XCR0 very certainly is a set of controls affecting register use.
>>>> If JMPABS is also controlled by it, then I'd view this as an erratum
>>>> if it appeared like that in silicon; that erratum may well be a
>>>> design one then, or one "justified" by simplifying the implementation
>>>> in some way, but it would still be wrong from a conceptual pov.
>>>>
>>>
>>> In APX.pdf (https://cdrdv2.intel.com/v1/dl/getContent/784266) section
>>> 3.1.4.2, XCR0 govern APX State and prefixes. And In sdm.pdf
>> (https://cdrdv2.intel.com/v1/dl/getContent/671200), section 13.3 page 323,
>> "Software can excute Intel AVX-512 instructions only if CR4.OSXSAVE = 1 and
>> XCR0[7:5] = 111b." Their focus is on XCR0[7:5], not on what corresponding
>> registers are used, For example, page 567 Table 2-37, alougth bit 6 is used for
>> the upper 256 bits of the register ZMM0-ZMM15, 256 encoding instructions still
>> require to set the bit.
>>>
>>> I admit that in the earliest introductions of XCR0[7:5], it looks like it's for
>> register state use only, but from actual use later on, they're more of an
>> instruction state, but from the introductions of the XCR0.APX bit, that's not a
>> problem.
>>
>> Since you take AVX512 for analogy: Can you please point me at an insn which
>> doesn't use any AVX512 register covered by said three XCR0 bits? Talking of insn
>> use and talking of register use simply is the same there. Hence the analogy
>> cannot be used when discussing JMPABS. Furthermore the specific wording in
>> SDM, ISE, or APX doc also cannot be blindly trusted. What matters is how silicon
>> is going to behave, and for JMPABS my impression is that if a dependency on
>> XCR0 was existing there, it would have been introduced artificially, i.e. without
>> real need. _That's_ what I'm putting under question.
>>
> 
> If I use "{evex} vaddpd ymm0, ymm1, ymm2". If it's for register considerations, I think bit 5, 6, 7 can all be zero. Because the insn doesn't use k0-k7 and the upper 256 bits of the registers ZMM0-ZMM15. 

How that? It clears the upper 256 bits of the destination register.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-27  8:54                   ` Jan Beulich
@ 2023-11-27  9:03                     ` Hu, Lin1
  2023-11-27 10:32                       ` Jan Beulich
  0 siblings, 1 reply; 113+ messages in thread
From: Hu, Lin1 @ 2023-11-27  9:03 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 27, 2023 4:55 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Cui, Lili <lili.cui@intel.com>
> Subject: Re: [PATCH 8/8] Support APX JMPABS
> 
> On 27.11.2023 09:46, Hu, Lin1 wrote:
> >>>>>>>> Also a spec question as we're talking of what is or is not valid (i.e.
> >>>>>>>> causing #UD) here: Why would XCR0.APX=0 need to cause #UD?
> >>>>>>>> There's no use of eGPR-s here.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Sorry, what is XCR0.APX?
> >>>>>>
> >>>>>> Bit 19 of the XCR0 register. It is mentioned in exactly this way
> >>>>>> in the APX- LEGACY-JMPABS exception class description.
> >>>>>>
> >>>>>
> >>>>> I think XCR0.APX is a state bit to control if support APX
> >>>>> instruction set not if
> >>>> support eGPR-s.
> >>>>
> >>>> No, XCR0 very certainly is a set of controls affecting register use.
> >>>> If JMPABS is also controlled by it, then I'd view this as an
> >>>> erratum if it appeared like that in silicon; that erratum may well
> >>>> be a design one then, or one "justified" by simplifying the
> >>>> implementation in some way, but it would still be wrong from a conceptual
> pov.
> >>>>
> >>>
> >>> In APX.pdf (https://cdrdv2.intel.com/v1/dl/getContent/784266)
> >>> section 3.1.4.2, XCR0 govern APX State and prefixes. And In sdm.pdf
> >> (https://cdrdv2.intel.com/v1/dl/getContent/671200), section 13.3 page
> >> 323, "Software can excute Intel AVX-512 instructions only if
> >> CR4.OSXSAVE = 1 and XCR0[7:5] = 111b." Their focus is on XCR0[7:5],
> >> not on what corresponding registers are used, For example, page 567
> >> Table 2-37, alougth bit 6 is used for the upper 256 bits of the
> >> register ZMM0-ZMM15, 256 encoding instructions still require to set the bit.
> >>>
> >>> I admit that in the earliest introductions of XCR0[7:5], it looks
> >>> like it's for
> >> register state use only, but from actual use later on, they're more
> >> of an instruction state, but from the introductions of the XCR0.APX
> >> bit, that's not a problem.
> >>
> >> Since you take AVX512 for analogy: Can you please point me at an insn
> >> which doesn't use any AVX512 register covered by said three XCR0
> >> bits? Talking of insn use and talking of register use simply is the
> >> same there. Hence the analogy cannot be used when discussing JMPABS.
> >> Furthermore the specific wording in SDM, ISE, or APX doc also cannot
> >> be blindly trusted. What matters is how silicon is going to behave,
> >> and for JMPABS my impression is that if a dependency on
> >> XCR0 was existing there, it would have been introduced artificially,
> >> i.e. without real need. _That's_ what I'm putting under question.
> >>
> >
> > If I use "{evex} vaddpd ymm0, ymm1, ymm2". If it's for register considerations,
> I think bit 5, 6, 7 can all be zero. Because the insn doesn't use k0-k7 and the
> upper 256 bits of the registers ZMM0-ZMM15.
> 
> How that? It clears the upper 256 bits of the destination register.
> 

OK, so bit 5 and 7 are not affected, it they are zero, I  think the exception shouldn't be triggered.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 8/8] Support APX JMPABS
  2023-11-27  9:03                     ` Hu, Lin1
@ 2023-11-27 10:32                       ` Jan Beulich
  2023-12-04  7:33                         ` Hu, Lin1
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-11-27 10:32 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

On 27.11.2023 10:03, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 27, 2023 4:55 PM
>>
>> On 27.11.2023 09:46, Hu, Lin1 wrote:
>>> If I use "{evex} vaddpd ymm0, ymm1, ymm2". If it's for register considerations,
>> I think bit 5, 6, 7 can all be zero. Because the insn doesn't use k0-k7 and the
>> upper 256 bits of the registers ZMM0-ZMM15.
>>
>> How that? It clears the upper 256 bits of the destination register.
> 
> OK, so bit 5 and 7 are not affected, it they are zero, I  think the exception shouldn't be triggered.

Except that the spec mandates that the three bits are all set or all clear.
This could have been less strict, but that's too late now. For APX otoh it's
not too late yet to avoid quirky behavior.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 8/8] Support APX JMPABS
  2023-11-27 10:32                       ` Jan Beulich
@ 2023-12-04  7:33                         ` Hu, Lin1
  0 siblings, 0 replies; 113+ messages in thread
From: Hu, Lin1 @ 2023-12-04  7:33 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 27, 2023 6:32 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org; Cui, Lili <lili.cui@intel.com>
> Subject: Re: [PATCH 8/8] Support APX JMPABS
> 
> On 27.11.2023 10:03, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, November 27, 2023 4:55 PM
> >>
> >> On 27.11.2023 09:46, Hu, Lin1 wrote:
> >>> If I use "{evex} vaddpd ymm0, ymm1, ymm2". If it's for register
> >>> considerations,
> >> I think bit 5, 6, 7 can all be zero. Because the insn doesn't use
> >> k0-k7 and the upper 256 bits of the registers ZMM0-ZMM15.
> >>
> >> How that? It clears the upper 256 bits of the destination register.
> >
> > OK, so bit 5 and 7 are not affected, it they are zero, I  think the exception
> shouldn't be triggered.
> 
> Except that the spec mandates that the three bits are all set or all clear.
> This could have been less strict, but that's too late now. For APX otoh it's not too
> late yet to avoid quirky behavior.
> 

We had a discussion with the people involved. XCR0.APX controls if enable the REX2 and Extended EVEX prefixes in CPU now. So JMPABS is affected by this bit.

New fields in XCR0:
	– APX_F – Intel® APX state and prefixes are governed by XCR0[APX_F=19]. This control bit
	enables Intel® APX ISA by enabling the use of the REX2 and Extended EVEX prefixes in IA-32e
	64-bit mode and by enabling the XSAVE feature set to manage Intel® APX state. Note that in
	64-bit mode, none of the Intel® APX features (including the REX2 and Extended EVEX prefixes
	and all new Intel® APX instructions) can be used until they are XCR0-enabled.

BRs,
Lin

^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v3 0/9] Support Intel APX EGPR
  2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
@ 2023-12-07  8:17                   ` Cui, Lili
  2023-12-07  8:33                     ` Cui, Lili
  0 siblings, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-12-07  8:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

Hi H.J,

Could you help review these APX patches? There are 11 patches in total.

[PATCH] Clean reg class and base_reg for input output operand (%dx). (for legacy)
[PATCH 1/9] Make const_1_mode print $1 in AT&T syntax. (for legacy)
[PATCH v3 2/9] Support APX GPR32 with rex2 prefix.
[PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
[PATCH v3 4/9] Support APX GPR32 with extend evex prefix.
[PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix.
[PATCH v3 6/9] Support APX NDD.
[PATCH v3 7/9] Support APX Push2/Pop2.
[PATCH] Support APX PUSHP/POPP.
[PATCH v3 8/9] Support APX NDD optimized encoding.
[PATCH v3 9/9] Support APX JMPABS for disassembler.

There are 3 patches that need to be updated. Lin and I will post later.
 [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
 [PATCH] Support APX PUSHP/POPP
 [PATCH v3 9/9] Support APX JMPABS for disassembler

Thanks,
Lili.

> -----Original Message-----
> From: Cui, Lili <lili.cui@intel.com>
> Sent: Friday, November 24, 2023 2:56 PM
> To: Beulich, Jan <JBeulich@suse.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: [PATCH v3 0/9] Support Intel APX EGPR
> 
> 
> This is V3 of all APX patches.
> 1. Created a patch to make const_1_mode print $1 in AT&T syntax.
> 2. How to print the rex2 prefix needs to be discussed later.
> 3. After NF patch, need to add tests for pfx macros emitting {evex} in
> noreg64.s.
> 
> Cui, Lili (5):
>   Make const_1_mode print $1 in AT&T syntax
>   Support APX GPR32 with rex2 prefix
>   Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
>   Support APX GPR32 with extend evex prefix
>   Add tests for APX GPR32 with extend evex prefix
> 
> Hu, Lin1 (2):
>   Support APX NDD optimized encoding.
>   Support APX JMPABS for disassembler
> 
> Mo, Zewei (1):
>   Support APX Push2/Pop2
> 
> konglin1 (1):
>   Support APX NDD
> 
> 
> Thanks,
> Lili.
>  gas/config/tc-i386.c                          | 472 +++++++++++--
>  gas/doc/c-i386.texi                           |   6 +-
>  gas/testsuite/gas/i386/apx-push2pop2-inval.l  |   5 +
>  gas/testsuite/gas/i386/apx-push2pop2-inval.s  |   9 +
>  gas/testsuite/gas/i386/i386.exp               |   1 +
>  .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +-
>  .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +-
>  gas/testsuite/gas/i386/intel.d                |   6 +-
>  gas/testsuite/gas/i386/lfence-load.d          |   2 +-
>  gas/testsuite/gas/i386/noreg16-data32.d       |  32 +-
>  gas/testsuite/gas/i386/noreg16.d              |  32 +-
>  gas/testsuite/gas/i386/noreg32-data16.d       |  32 +-
>  gas/testsuite/gas/i386/noreg32.d              |  32 +-
>  gas/testsuite/gas/i386/noreg64-data16.d       |  32 +-
>  gas/testsuite/gas/i386/noreg64-rex64.d        |  32 +-
>  gas/testsuite/gas/i386/noreg64.d              |  32 +-
>  gas/testsuite/gas/i386/opcode-suffix.d        |   6 +-
>  gas/testsuite/gas/i386/opcode.d               |  10 +-
>  .../gas/i386/x86-64-apx-egpr-inval.l          | 203 ++++++
>  .../gas/i386/x86-64-apx-egpr-inval.s          | 210 ++++++
>  .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 +  .../gas/i386/x86-64-apx-
> egpr-promote-inval.s  |  29 +  gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |
> 20 +  gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 +
>  .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  34 +
>  .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  36 +
>  .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 +++++++++
>  .../gas/i386/x86-64-apx-evex-promoted.d       | 318 +++++++++
>  .../gas/i386/x86-64-apx-evex-promoted.s       | 314 +++++++++
>  .../gas/i386/x86-64-apx-jmpabs-intel.d        |  11 +
>  .../gas/i386/x86-64-apx-jmpabs-inval.d        |  40 ++
>  .../gas/i386/x86-64-apx-jmpabs-inval.s        |  15 +
>  gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    |  11 +
>  gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |   5 +
>  .../gas/i386/x86-64-apx-ndd-optimize.d        | 130 ++++
>  .../gas/i386/x86-64-apx-ndd-optimize.s        | 123 ++++
>  gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 +++++
>  gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 +++++
>  .../gas/i386/x86-64-apx-push2pop2-intel.d     |  42 ++
>  .../gas/i386/x86-64-apx-push2pop2-inval.l     |  13 +
>  .../gas/i386/x86-64-apx-push2pop2-inval.s     |  17 +
>  gas/testsuite/gas/i386/x86-64-apx-push2pop2.d |  42 ++
> gas/testsuite/gas/i386/x86-64-apx-push2pop2.s |  39 ++
>  gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 +++
>  gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 +++
>  gas/testsuite/gas/i386/x86-64-evex.d          |   2 +-
>  gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
>  gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
>  gas/testsuite/gas/i386/x86-64-lfence-load.d   |   2 +-
>  .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
>  gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
>  gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
>  gas/testsuite/gas/i386/x86-64-opcode.d        |   6 +-
>  gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  59 +-
>  gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  58 ++
>  gas/testsuite/gas/i386/x86-64-pseudos.d       |  63 ++
>  gas/testsuite/gas/i386/x86-64-pseudos.s       |  65 ++
>  gas/testsuite/gas/i386/x86-64.exp             |  17 +-
>  include/opcode/i386.h                         |   2 +
>  opcodes/i386-dis-evex-len.h                   |  10 +
>  opcodes/i386-dis-evex-prefix.h                |  66 ++
>  opcodes/i386-dis-evex-reg.h                   |  71 ++
>  opcodes/i386-dis-evex-w.h                     |  10 +
>  opcodes/i386-dis-evex-x86-64.h                |  60 ++
>  opcodes/i386-dis-evex.h                       | 347 +++++++++-
>  opcodes/i386-dis.c                            | 644 +++++++++++++-----
>  opcodes/i386-gen.c                            |  57 +-
>  opcodes/i386-opc.h                            |  30 +-
>  opcodes/i386-opc.tbl                          | 210 ++++--
>  opcodes/i386-reg.tbl                          |  64 ++
>  70 files changed, 4669 insertions(+), 570 deletions(-)  create mode 100644
> gas/testsuite/gas/i386/apx-push2pop2-inval.l
>  create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-
> bad.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-
> intel.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
>  create mode 100644 opcodes/i386-dis-evex-x86-64.h
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH v3 0/9] Support Intel APX EGPR
  2023-12-07  8:17                   ` Cui, Lili
@ 2023-12-07  8:33                     ` Cui, Lili
  0 siblings, 0 replies; 113+ messages in thread
From: Cui, Lili @ 2023-12-07  8:33 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

Sorry, adjust the format.

 [PATCH] Clean reg class and base_reg for input output operand (%dx). (for legacy) .
 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax. (for legacy).
 [PATCH v3 2/9] Support APX GPR32 with rex2 prefix.
 [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
 [PATCH v3 4/9] Support APX GPR32 with extend evex prefix.
 [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix.
 [PATCH v3 6/9] Support APX NDD.
 [PATCH v3 7/9] Support APX Push2/Pop2.
 [PATCH] Support APX PUSHP/POPP.
 [PATCH v3 8/9] Support APX NDD optimized encoding.
 [PATCH v3 9/9] Support APX JMPABS for disassembler.
 
 There are 3 patches that need to be updated. Lin and I will post later.
  [PATCH v3 2/9] Support APX GPR32 with rex2 prefix.
  [PATCH] Support APX PUSHP/POPP.
  [PATCH v3 9/9] Support APX JMPABS for disassembler.


Regards,
Lili.

> -----Original Message-----
> From: Cui, Lili
> Sent: Thursday, December 7, 2023 4:17 PM
> To: H.J. Lu <hjl.tools@gmail.com>
> Cc: binutils@sourceware.org
> Subject: RE: [PATCH v3 0/9] Support Intel APX EGPR
> 
> Hi H.J,
> 
> Could you help review these APX patches? There are 11 patches in total.
> 
> [PATCH] Clean reg class and base_reg for input output operand (%dx). (for
> legacy) [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax. (for legacy)
> [PATCH v3 2/9] Support APX GPR32 with rex2 prefix.
> [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX
> instructions.
> [PATCH v3 4/9] Support APX GPR32 with extend evex prefix.
> [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix.
> [PATCH v3 6/9] Support APX NDD.
> [PATCH v3 7/9] Support APX Push2/Pop2.
> [PATCH] Support APX PUSHP/POPP.
> [PATCH v3 8/9] Support APX NDD optimized encoding.
> [PATCH v3 9/9] Support APX JMPABS for disassembler.
> 
> There are 3 patches that need to be updated. Lin and I will post later.
>  [PATCH v3 2/9] Support APX GPR32 with rex2 prefix  [PATCH] Support APX
> PUSHP/POPP  [PATCH v3 9/9] Support APX JMPABS for disassembler
> 
> Thanks,
> Lili.
> 
> > -----Original Message-----
> > From: Cui, Lili <lili.cui@intel.com>
> > Sent: Friday, November 24, 2023 2:56 PM
> > To: Beulich, Jan <JBeulich@suse.com>
> > Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> > Subject: [PATCH v3 0/9] Support Intel APX EGPR
> >
> >
> > This is V3 of all APX patches.
> > 1. Created a patch to make const_1_mode print $1 in AT&T syntax.
> > 2. How to print the rex2 prefix needs to be discussed later.
> > 3. After NF patch, need to add tests for pfx macros emitting {evex} in
> > noreg64.s.
> >
> > Cui, Lili (5):
> >   Make const_1_mode print $1 in AT&T syntax
> >   Support APX GPR32 with rex2 prefix
> >   Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
> >   Support APX GPR32 with extend evex prefix
> >   Add tests for APX GPR32 with extend evex prefix
> >
> > Hu, Lin1 (2):
> >   Support APX NDD optimized encoding.
> >   Support APX JMPABS for disassembler
> >
> > Mo, Zewei (1):
> >   Support APX Push2/Pop2
> >
> > konglin1 (1):
> >   Support APX NDD
> >
> >
> > Thanks,
> > Lili.
> >  gas/config/tc-i386.c                          | 472 +++++++++++--
> >  gas/doc/c-i386.texi                           |   6 +-
> >  gas/testsuite/gas/i386/apx-push2pop2-inval.l  |   5 +
> >  gas/testsuite/gas/i386/apx-push2pop2-inval.s  |   9 +
> >  gas/testsuite/gas/i386/i386.exp               |   1 +
> >  .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +-
> >  .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +-
> >  gas/testsuite/gas/i386/intel.d                |   6 +-
> >  gas/testsuite/gas/i386/lfence-load.d          |   2 +-
> >  gas/testsuite/gas/i386/noreg16-data32.d       |  32 +-
> >  gas/testsuite/gas/i386/noreg16.d              |  32 +-
> >  gas/testsuite/gas/i386/noreg32-data16.d       |  32 +-
> >  gas/testsuite/gas/i386/noreg32.d              |  32 +-
> >  gas/testsuite/gas/i386/noreg64-data16.d       |  32 +-
> >  gas/testsuite/gas/i386/noreg64-rex64.d        |  32 +-
> >  gas/testsuite/gas/i386/noreg64.d              |  32 +-
> >  gas/testsuite/gas/i386/opcode-suffix.d        |   6 +-
> >  gas/testsuite/gas/i386/opcode.d               |  10 +-
> >  .../gas/i386/x86-64-apx-egpr-inval.l          | 203 ++++++
> >  .../gas/i386/x86-64-apx-egpr-inval.s          | 210 ++++++
> >  .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 +
> > .../gas/i386/x86-64-apx- egpr-promote-inval.s  |  29 +
> > gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |
> > 20 +  gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 +
> >  .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  34 +
> >  .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  36 +
> >  .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 +++++++++
> >  .../gas/i386/x86-64-apx-evex-promoted.d       | 318 +++++++++
> >  .../gas/i386/x86-64-apx-evex-promoted.s       | 314 +++++++++
> >  .../gas/i386/x86-64-apx-jmpabs-intel.d        |  11 +
> >  .../gas/i386/x86-64-apx-jmpabs-inval.d        |  40 ++
> >  .../gas/i386/x86-64-apx-jmpabs-inval.s        |  15 +
> >  gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    |  11 +
> >  gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |   5 +
> >  .../gas/i386/x86-64-apx-ndd-optimize.d        | 130 ++++
> >  .../gas/i386/x86-64-apx-ndd-optimize.s        | 123 ++++
> >  gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 +++++
> >  gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 +++++
> >  .../gas/i386/x86-64-apx-push2pop2-intel.d     |  42 ++
> >  .../gas/i386/x86-64-apx-push2pop2-inval.l     |  13 +
> >  .../gas/i386/x86-64-apx-push2pop2-inval.s     |  17 +
> >  gas/testsuite/gas/i386/x86-64-apx-push2pop2.d |  42 ++
> > gas/testsuite/gas/i386/x86-64-apx-push2pop2.s |  39 ++
> >  gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 +++
> >  gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 +++
> >  gas/testsuite/gas/i386/x86-64-evex.d          |   2 +-
> >  gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
> >  gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
> >  gas/testsuite/gas/i386/x86-64-lfence-load.d   |   2 +-
> >  .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
> >  gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
> >  gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
> >  gas/testsuite/gas/i386/x86-64-opcode.d        |   6 +-
> >  gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  59 +-
> >  gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  58 ++
> >  gas/testsuite/gas/i386/x86-64-pseudos.d       |  63 ++
> >  gas/testsuite/gas/i386/x86-64-pseudos.s       |  65 ++
> >  gas/testsuite/gas/i386/x86-64.exp             |  17 +-
> >  include/opcode/i386.h                         |   2 +
> >  opcodes/i386-dis-evex-len.h                   |  10 +
> >  opcodes/i386-dis-evex-prefix.h                |  66 ++
> >  opcodes/i386-dis-evex-reg.h                   |  71 ++
> >  opcodes/i386-dis-evex-w.h                     |  10 +
> >  opcodes/i386-dis-evex-x86-64.h                |  60 ++
> >  opcodes/i386-dis-evex.h                       | 347 +++++++++-
> >  opcodes/i386-dis.c                            | 644 +++++++++++++-----
> >  opcodes/i386-gen.c                            |  57 +-
> >  opcodes/i386-opc.h                            |  30 +-
> >  opcodes/i386-opc.tbl                          | 210 ++++--
> >  opcodes/i386-reg.tbl                          |  64 ++
> >  70 files changed, 4669 insertions(+), 570 deletions(-)  create mode
> > 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
> >  create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-
> > bad.d
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-
> > intel.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
> >  create mode 100644
> > gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
> >  create mode 100644 opcodes/i386-dis-evex-x86-64.h
> >
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 113+ messages in thread

* RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-11-10  9:21                 ` Jan Beulich
  2023-11-10 12:38                   ` Cui, Lili
@ 2023-12-14 10:13                   ` Cui, Lili
  2023-12-18 15:24                     ` Jan Beulich
  1 sibling, 1 reply; 113+ messages in thread
From: Cui, Lili @ 2023-12-14 10:13 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, ccoutant, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:22 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:14, Jan Beulich wrote:
> > On 10.11.2023 08:11, Cui, Lili wrote:
> >> Here are some cases:
> >> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> will print it.
> >> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>
> >> It is not helpful to judge rex_used in rex2.
> >
> > So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> > view is that it ought to be treated similarly to REX in the
> > disassembler (I'm fine to avoid the introduction of a myriad of
> > rex2... [no figure braces] prefixes in gas). Just like in the first
> > line of what you present above for the REX.B case, I'd expect the same
> > for REX2. E.g., taking the 2nd line from above
> >
> > 0:   d5 01 0f a8             {rex2.B3} push %gs
> >
> > An alternative, matching your intention of treating REX2 more like
> > EVEX, would be to (subsequently, not right away) do away with rex.B
> > and alike as well, and only print {rex} in that case as well. Such an
> > intention would then want mentioning in the description (and
> > eventually carrying out). The main difference between REX/REX2 and
> > VEX/EVEX, as I view it (and as would be speaking against this
> > alternative approach), is that in VEX/EVEX it is kind of normal that
> > certain bits are deliberately ignored (and might be set either way -
> > see gas'es command line options actually allowing to drive the values for
> some of such ignored fields).
> > REX/REX2, otoh, shouldn't normally specify unused bits, much like
> > other legacy prefixes aren't expected to be present for no reason (and
> > hence are explicitly printed when present).
> 
> However, irrespective of what I said above, please feel free to go ahead with
> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> on how to achieve overall consistency, and we can make further adjustments
> later on. But please make sure you actually mention this aspect in the
> description of the patch.
> 

Hi Jan,

We are going to use {rex2 0xXX}.

Lili.



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-12-14 10:13                   ` Cui, Lili
@ 2023-12-18 15:24                     ` Jan Beulich
  2023-12-18 16:23                       ` H.J. Lu
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Beulich @ 2023-12-18 15:24 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, ccoutant, binutils

On 14.12.2023 11:13, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 10, 2023 5:22 PM
>>
>> On 10.11.2023 10:14, Jan Beulich wrote:
>>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>> Here are some cases:
>>>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
>> will print it.
>>>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
>> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
>>>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
>> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
>>>>
>>>> It is not helpful to judge rex_used in rex2.
>>>
>>> So: REX2, like REX, is a prefix for legacy encodings. Therefore my
>>> view is that it ought to be treated similarly to REX in the
>>> disassembler (I'm fine to avoid the introduction of a myriad of
>>> rex2... [no figure braces] prefixes in gas). Just like in the first
>>> line of what you present above for the REX.B case, I'd expect the same
>>> for REX2. E.g., taking the 2nd line from above
>>>
>>> 0:   d5 01 0f a8             {rex2.B3} push %gs
>>>
>>> An alternative, matching your intention of treating REX2 more like
>>> EVEX, would be to (subsequently, not right away) do away with rex.B
>>> and alike as well, and only print {rex} in that case as well. Such an
>>> intention would then want mentioning in the description (and
>>> eventually carrying out). The main difference between REX/REX2 and
>>> VEX/EVEX, as I view it (and as would be speaking against this
>>> alternative approach), is that in VEX/EVEX it is kind of normal that
>>> certain bits are deliberately ignored (and might be set either way -
>>> see gas'es command line options actually allowing to drive the values for
>> some of such ignored fields).
>>> REX/REX2, otoh, shouldn't normally specify unused bits, much like
>>> other legacy prefixes aren't expected to be present for no reason (and
>>> hence are explicitly printed when present).
>>
>> However, irrespective of what I said above, please feel free to go ahead with
>> the simplified approach, to allow making progress. I'd like to have H.J.'s input
>> on how to achieve overall consistency, and we can make further adjustments
>> later on. But please make sure you actually mention this aspect in the
>> description of the patch.
> 
> We are going to use {rex2 0xXX}.

So that's going to be {rex2} alone as equivalent to {rex} and {rex2 0xXX}
as equivalent to rex (no braces)?

Independent of that I'm of course curious how you're meaning to deal with
- collisions with REX2 bits already set for other reasons, and
- yet more specifically control over REX2.M.

Jan

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
  2023-12-18 15:24                     ` Jan Beulich
@ 2023-12-18 16:23                       ` H.J. Lu
  0 siblings, 0 replies; 113+ messages in thread
From: H.J. Lu @ 2023-12-18 16:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Cui, Lili, Lu, Hongjiu, ccoutant, binutils

On Mon, Dec 18, 2023 at 7:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.12.2023 11:13, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, November 10, 2023 5:22 PM
> >>
> >> On 10.11.2023 10:14, Jan Beulich wrote:
> >>> On 10.11.2023 08:11, Cui, Lili wrote:
> >>>> Here are some cases:
> >>>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> >> will print it.
> >>>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> >> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >>>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
> >> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>>>
> >>>> It is not helpful to judge rex_used in rex2.
> >>>
> >>> So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> >>> view is that it ought to be treated similarly to REX in the
> >>> disassembler (I'm fine to avoid the introduction of a myriad of
> >>> rex2... [no figure braces] prefixes in gas). Just like in the first
> >>> line of what you present above for the REX.B case, I'd expect the same
> >>> for REX2. E.g., taking the 2nd line from above
> >>>
> >>> 0:   d5 01 0f a8             {rex2.B3} push %gs
> >>>
> >>> An alternative, matching your intention of treating REX2 more like
> >>> EVEX, would be to (subsequently, not right away) do away with rex.B
> >>> and alike as well, and only print {rex} in that case as well. Such an
> >>> intention would then want mentioning in the description (and
> >>> eventually carrying out). The main difference between REX/REX2 and
> >>> VEX/EVEX, as I view it (and as would be speaking against this
> >>> alternative approach), is that in VEX/EVEX it is kind of normal that
> >>> certain bits are deliberately ignored (and might be set either way -
> >>> see gas'es command line options actually allowing to drive the values for
> >> some of such ignored fields).
> >>> REX/REX2, otoh, shouldn't normally specify unused bits, much like
> >>> other legacy prefixes aren't expected to be present for no reason (and
> >>> hence are explicitly printed when present).
> >>
> >> However, irrespective of what I said above, please feel free to go ahead with
> >> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> >> on how to achieve overall consistency, and we can make further adjustments
> >> later on. But please make sure you actually mention this aspect in the
> >> description of the patch.
> >
> > We are going to use {rex2 0xXX}.
>
> So that's going to be {rex2} alone as equivalent to {rex} and {rex2 0xXX}
> as equivalent to rex (no braces)?
>
> Independent of that I'm of course curious how you're meaning to deal with
> - collisions with REX2 bits already set for other reasons, and
> - yet more specifically control over REX2.M.
>

This is only for disassembler to display the unused REX2 bits.

-- 
H.J.

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2023-12-18 16:23 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).