public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] Support Intel APX EGPR
@ 2023-12-19 12:12 Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 1/9] Support APX GPR32 with rex2 prefix Cui, Lili
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

*** BLURB HERE ***
Future optimizations to be made.
1. The current implementation of vexvvvvv needs to be optimized.
2. The handling of double VEX/EVEX templates in check_register() needs to be optimized.
3. Convert vround* with egpr to VRNDSCALE* instead of reporting an error.
4. Find a suitable variable to replace OperandConstraint=REX2_REQUIRED.

Cui, Lili (5):
  Support APX GPR32 with rex2 prefix
  Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  Support APX GPR32 with extend evex prefix
  Add tests for APX GPR32 with extend evex prefix
  Support APX PUSHP/POPP

Hu, Lin1 (2):
  Support APX NDD optimized encoding.
  Support APX JMPABS for disassembler

Mo, Zewei (1):
  Support APX Push2/Pop2

konglin1 (1):
  Support APX NDD

 gas/config/tc-i386.c                          | 466 +++++++++++-
 gas/doc/c-i386.texi                           |   7 +-
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |   5 +
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |   9 +
 gas/testsuite/gas/i386/i386.exp               |   1 +
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +-
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +-
 .../gas/i386/x86-64-apx-egpr-inval.l          | 202 +++++
 .../gas/i386/x86-64-apx-egpr-inval.s          | 209 +++++
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 +
 .../gas/i386/x86-64-apx-egpr-promote-inval.s  |  29 +
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |  20 +
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  41 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  43 ++
 .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 ++++++++
 .../gas/i386/x86-64-apx-evex-promoted.d       | 318 ++++++++
 .../gas/i386/x86-64-apx-evex-promoted.s       | 314 ++++++++
 .../gas/i386/x86-64-apx-jmpabs-intel.d        |  12 +
 .../gas/i386/x86-64-apx-jmpabs-inval.d        |  40 +
 .../gas/i386/x86-64-apx-jmpabs-inval.s        |  15 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    |  12 +
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |   5 +
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 132 ++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 125 +++
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 ++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 ++++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     |  42 +
 .../gas/i386/x86-64-apx-push2pop2-inval.l     |  13 +
 .../gas/i386/x86-64-apx-push2pop2-inval.s     |  17 +
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d |  42 +
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s |  39 +
 .../gas/i386/x86-64-apx-pushp-popp-intel.d    |  14 +
 .../gas/i386/x86-64-apx-pushp-popp-inval.l    |   5 +
 .../gas/i386/x86-64-apx-pushp-popp-inval.s    |   7 +
 .../gas/i386/x86-64-apx-pushp-popp.d          |  14 +
 .../gas/i386/x86-64-apx-pushp-popp.s          |   8 +
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 +++
 gas/testsuite/gas/i386/x86-64-evex.d          |   2 +-
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  75 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  74 ++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  63 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  64 ++
 gas/testsuite/gas/i386/x86-64.exp             |  20 +-
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis-evex-len.h                   |  10 +
 opcodes/i386-dis-evex-prefix.h                |  66 ++
 opcodes/i386-dis-evex-reg.h                   |  70 ++
 opcodes/i386-dis-evex-w.h                     |  10 +
 opcodes/i386-dis-evex-x86-64.h                |  50 ++
 opcodes/i386-dis-evex.h                       | 347 ++++++++-
 opcodes/i386-dis.c                            | 715 +++++++++++++-----
 opcodes/i386-gen.c                            |  52 +-
 opcodes/i386-opc.h                            |  27 +-
 opcodes/i386-opc.tbl                          | 223 ++++--
 opcodes/i386-reg.tbl                          |  64 ++
 62 files changed, 4688 insertions(+), 455 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 1/9] Support APX GPR32 with rex2 prefix
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-22 13:08   ` Jan Beulich
  2023-12-19 12:12 ` [PATCH v4 2/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

gas/ChangeLog:

2023-12-19  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add unsupported_EGPR_for_addressing
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 and rex2_encoding for
	gpr32.
	(cpu_arch): Add apx_f.
	(is_cpu): Ditto.
	(register_number): Handle RegRex2 for gpr32.
	(is_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(REX2_M): Extend for gpr32.
	(PREFIX_REX2): Ditto.
	(PREFIX_REX2_ILLEGAL): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (rex2_disallowed): Some instructions are not allowed rex2 prefix.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(NoEgpr): New.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 187 ++++++++++--
 gas/doc/c-i386.texi                           |   7 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +---
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +---
 .../gas/i386/x86-64-apx-egpr-inval.l          |  15 +
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  75 ++++-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  74 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  21 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 265 ++++++++++++------
 opcodes/i386-gen.c                            |  50 +++-
 opcodes/i386-opc.h                            |  13 +-
 opcodes/i386-opc.tbl                          |  27 +-
 opcodes/i386-reg.tbl                          |  64 +++++
 24 files changed, 907 insertions(+), 263 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index aa26f5ce034..051cdef2a3a 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -239,6 +239,7 @@ enum i386_error
     bad_imm4,
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
+    unsupported_EGPR_for_addressing,
     unsupported,
     unsupported_on_arch,
     unsupported_64bit,
@@ -249,6 +250,7 @@ enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -356,6 +358,7 @@ struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -429,6 +432,9 @@ struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1146,6 +1152,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
   SUBARCH (user_msr, USER_MSR, USER_MSR, false),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1661,6 +1668,7 @@ _is_cpu (const i386_cpu_attr *a, enum i386_cpu cpu)
     case CpuHLE:      return a->bitfield.cpuhle;
     case CpuAVX512F:  return a->bitfield.cpuavx512f;
     case CpuAVX512VL: return a->bitfield.cpuavx512vl;
+    case CpuAPX_F:    return a->bitfield.cpuapx_f;
     case Cpu64:       return a->bitfield.cpu64;
     case CpuNo64:     return a->bitfield.cpuno64;
     default:
@@ -2332,7 +2340,7 @@ register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
-  if (r->reg_flags & RegVRex)
+  if (r->reg_flags & (RegVRex | RegRex2))
     nr += 16;
 
   return nr;
@@ -3868,6 +3876,12 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4123,6 +4137,22 @@ build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+
+   Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes.
+ */
+static void
+build_rex2_prefix (void)
+{
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  /* For the W R X B bits, the variables of rex prefix will be reused.  */
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4386,14 +4416,22 @@ optimize_encoding (void)
 	  i.types[1].bitfield.byte = 1;
 	  /* Ignore the suffix.  */
 	  i.suffix = 0;
-	  /* Convert to byte registers.  */
+	  /* Convert to byte registers. 8-bit registers are special,
+	     RegRex64 and non-RegRex64 each have 8 registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* 32 (or 40) 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 (or 40) 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 (or 40) 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+
+	  /* In 64-bit mode, the following byte registers cannot be accessed
+	     if using the Rex and Rex2 prefix: AH, BH, CH, DH */
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5283,6 +5321,9 @@ md_assemble (char *line)
 	case unsupported_syntax:
 	  err_msg = _("unsupported syntax");
 	  break;
+	case unsupported_EGPR_for_addressing:
+	  err_msg = _("extended GPR cannot be used as base/index");
+	  break;
 	case unsupported:
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : insn_name (current_templates.start));
@@ -5336,6 +5377,9 @@ md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5591,6 +5635,13 @@ md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2_encoding)
+	{
+	  as_bad (_("{rex2} prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5630,11 +5681,12 @@ md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
 
-      i.rex |= REX_OPCODE;
+      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5645,7 +5697,7 @@ md_assemble (char *line)
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
 		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
+			  "instruction requiring REX/REX2 prefix."),
 			register_prefix, i.op[x].regs->reg_name);
 
 	      /* Otherwise it is equivalent to the extended register.
@@ -5657,11 +5709,11 @@ md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5671,6 +5723,7 @@ md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5678,7 +5731,13 @@ md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (is_apx_rex2_encoding ())
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before (last_insn);
@@ -5752,6 +5811,20 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
 	    goto too_long;
 	  *mnem_p = '\0';
 
+	  /* Point l at the closing brace if there's no other separator.  */
+	  if (*l != END_OF_INSN && !is_space_char (*l)
+	      && *l != PREFIX_SEPARATOR)
+	    --l;
+	}
+      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
+      else if (*mnemonic == '{'&& is_space_char (*l))
+	{
+	  while ( *l != '}')
+	    ++l;
+	  *mnem_p++ = *l++;
+	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
+	    goto too_long;
+	  *mnem_p = '\0';
 	  /* Point l at the closing brace if there's no other separator.  */
 	  if (*l != END_OF_INSN && !is_space_char (*l)
 	      && *l != PREFIX_SEPARATOR)
@@ -5856,6 +5929,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (!t->opcode_modifier.noegpr)
+    return 0;
+
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg)
+	continue;
+
+      if (i.op[op].regs->reg_flags & RegRex2)
+	{
+	  i.error = register_type_mismatch;
+	  return 1;
+	}
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+    {
+      i.error = unsupported_EGPR_for_addressing;
+      return 1;
+    }
+
+  /* Check if pseudo prefix {rex2} is valid.  */
+  if (i.rex2_encoding)
+    {
+      i.error = invalid_pseudo_prefix;
+      return 1;
+    }
+
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7149,6 +7263,14 @@ match_template (char mnem_suffix)
 	      continue;
 	    }
 
+	  /* Check if pseudo prefix {rex2} is valid.  */
+	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
+	    {
+	      i.error = invalid_pseudo_prefix;
+	      specific_error = progress (i.error);
+	      continue;
+	    }
+
 	  /* We've found a match; break out of loop.  */
 	  break;
 	}
@@ -7479,6 +7601,13 @@ match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGPR operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX encoding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.
@@ -8377,6 +8506,18 @@ static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8860,8 +9001,7 @@ build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8932,8 +9072,7 @@ build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8971,8 +9110,7 @@ build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10116,6 +10254,12 @@ output_insn (const struct last_insn *last_insn)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14144,6 +14288,13 @@ static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 03ee980bef7..21f48c93300 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
 @code{avx10.1/256},
 @code{avx10.1/128},
 @code{user_msr},
+@code{apx_f},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -983,6 +984,10 @@ Different encoding options can be specified via pseudo prefixes:
 instructions (x86-64 only).  Note that this differs from the @samp{rex}
 prefix which generates REX prefix unconditionally.
 
+@item
+@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
+instructions (APX_F only).
+
 @item
 @samp{@{nooptimize@}} -- disable instruction size optimization.
 @end itemize
@@ -1663,7 +1668,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx_f}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..56834371133 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -2,49 +2,4 @@
 #as: --32
 #objdump: -dw -Mx86-64 -Mintel
 #name: x86-64 (ILP32) illegal opcodes (Intel mode)
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval-intel.d
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..b5233a5cf93 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -2,49 +2,4 @@
 #as: --32
 #objdump: -dw -Mx86-64
 #name: x86-64 (ILP32) illegal opcodes
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval.d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..bb5c602a2e2
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,15 @@
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: extended GPR cannot be used as base/index for `xsave'
+.*:8: Error: extended GPR cannot be used as base/index for `xsave64'
+.*:9: Error: extended GPR cannot be used as base/index for `xrstor'
+.*:10: Error: extended GPR cannot be used as base/index for `xrstor64'
+.*:11: Error: extended GPR cannot be used as base/index for `xsaves'
+.*:12: Error: extended GPR cannot be used as base/index for `xsaves64'
+.*:13: Error: extended GPR cannot be used as base/index for `xrstors'
+.*:14: Error: extended GPR cannot be used as base/index for `xrstors64'
+.*:15: Error: extended GPR cannot be used as base/index for `xsaveopt'
+.*:16: Error: extended GPR cannot be used as base/index for `xsaveopt64'
+.*:17: Error: extended GPR cannot be used as base/index for `xsavec'
+.*:18: Error: extended GPR cannot be used as base/index for `xsavec64'
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..bfb6b3fd03b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@
+# Check illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..7adf55ff82c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX2.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX2.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX2.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX2.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX2.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..66c4d2cddc0 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -10,41 +10,33 @@ Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..fbb850b56da 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -9,41 +9,33 @@ Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.s b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
index 6cbfe7705a8..fbcda3df773 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
@@ -2,10 +2,6 @@
 # All the followings are illegal opcodes for x86-64.
 aaa:
 	aaa
-aad0:
-	aad
-aad1:
-	aad $2
 aam0:
 	aam
 aam1:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..a72f847085d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -1,6 +1,71 @@
 .*: Assembler messages:
-.*:3: Error: .*`vmovaps'.*
-.*:4: Error: .*`vmovaps'.*
-.*:5: Error: .*`vmovaps'.*
-.*:6: Error: .*`vmovaps'.*
-.*:7: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`xsave'.*
+.*:[0-9]+: Error: .*`xsaves'.*
+.*:[0-9]+: Error: .*`xsaves64'.*
+.*:[0-9]+: Error: .*`xsavec'.*
+.*:[0-9]+: Error: .*`xrstors'.*
+.*:[0-9]+: Error: .*`xrstors64'.*
+.*:[0-9]+: Error: .*`mov'.*
+.*:[0-9]+: Error: .*`movabs'.*
+.*:[0-9]+: Error: .*`cmps'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`jmp'.*
+.*:[0-9]+: Error: .*`loop'.*
+.*:[0-9]+: Error: .*`wrmsr'.*
+.*:[0-9]+: Error: .*`rdtsc'.*
+.*:[0-9]+: Error: .*`rdmsr'.*
+.*:[0-9]+: Error: .*`sysenter'.*
+.*:[0-9]+: Error: .*`sysexit'.*
+.*:[0-9]+: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..54c17a9eab7 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,77 @@ pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} vmovaps %xmm7,%xmm2
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xA* (map0) prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* (map0) and 0x8* (map1) prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+	{rex2} jo     .+6+0x90909090
+	{rex2} jno    .+6+0x90909090
+	{rex2} jb     .+6+0x90909090
+	{rex2} jae    .+6+0x90909090
+	{rex2} je     .+6+0x90909090
+	{rex2} jne    .+6+0x90909090
+	{rex2} jbe    .+6+0x90909090
+	{rex2} ja     .+6+0x90909090
+	{rex2} js     .+6+0x90909090
+	{rex2} jns    .+6+0x90909090
+	{rex2} jp     .+6+0x90909090
+	{rex2} jnp    .+6+0x90909090
+	{rex2} jl     .+6+0x90909090
+	{rex2} jge    .+6+0x90909090
+	{rex2} jle    .+6+0x90909090
+	{rex2} jg     .+6+0x90909090
+
+	#All opcodes in the row 0xE* (map0) prefixed REX2 are illegal.
+	{rex2} in $0x90,%al
+	{rex2} in $0x90
+	{rex2} out $0x90,%al
+	{rex2} out $0x90
+	{rex2} jmp  *%eax
+	{rex2} loop foo
+
+	#All opcodes in the row 0x3* (map1) prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..9c45851cc73 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2 0x0} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2 0x0} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2 0x0} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2 0x0} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2 0x1} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2 0x1} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2 0x80} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2 0x84} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2 0x80} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2 0x81} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2 0x80} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2 0x0} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2 0x1} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2 0x1} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2 0x80} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2 0x84} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2 0x80} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2 0x81} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2 0x80} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..a4582cfa6f1 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,18 @@ _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2 0x80} pmullw %mm0,%mm6
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +434,15 @@ _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index a7f5547017f..2be0df0e981 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -363,6 +363,8 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e78a2a9350e..a9fd17621da 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,12 @@ struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* Record W R4 X4 B4 bits for rex2.  */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+  unsigned char rex2_payload;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +175,7 @@ struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -265,8 +272,13 @@ struct dis_private {
   {							\
     if (value)						\
       {							\
-	if ((ins->rex & value))				\
+	if (ins->rex & value)				\
 	  ins->rex_used |= (value) | REX_OPCODE;	\
+	if (ins->rex2 & value)				\
+	  {						\
+	    ins->rex2_used |= value;			\
+	    ins->rex_used |= REX_OPCODE;		\
+	  }						\
       }							\
     else						\
       ins->rex_used |= REX_OPCODE;			\
@@ -276,6 +288,9 @@ struct dis_private {
 #define EVEX_b_used 1
 #define EVEX_len_used 2
 
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
+
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
 #define PREFIX_REPNZ 2
@@ -289,6 +304,7 @@ struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -370,6 +386,7 @@ fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1888,23 +1905,23 @@ static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1942,23 +1959,23 @@ static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2014,23 +2031,23 @@ static const struct dis386 dis386[] = {
   { FLOAT },
   { FLOAT },
   /* e0 */
-  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "inB",		{ AL, Ib }, 0 },
-  { "inG",		{ zAX, Ib }, 0 },
-  { "outB",		{ Ib, AL }, 0 },
-  { "outG",		{ Ib, zAX }, 0 },
+  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, Ib }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ Ib, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ Ib, zAX }, PREFIX_REX2_ILLEGAL },
   /* e8 */
   { X86_64_TABLE (X86_64_E8) },
   { X86_64_TABLE (X86_64_E9) },
   { X86_64_TABLE (X86_64_EA) },
-  { "jmp",		{ Jb, BND }, 0 },
-  { "inB",		{ AL, indirDX }, 0 },
-  { "inG",		{ zAX, indirDX }, 0 },
-  { "outB",		{ indirDX, AL }, 0 },
-  { "outG",		{ indirDX, zAX }, 0 },
+  { "jmp",		{ Jb, BND }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ indirDX, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ indirDX, zAX }, PREFIX_REX2_ILLEGAL },
   /* f0 */
   { Bad_Opcode },	/* lock prefix */
   { "int1",		{ XX }, 0 },
@@ -2107,12 +2124,12 @@ static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2197,23 +2214,23 @@ static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F7E) },
   { PREFIX_TABLE (PREFIX_0F7F) },
   /* 80 */
-  { "joH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 88 */
-  { "jsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 90 */
   { "seto",		{ Eb }, 0 },
   { "setno",		{ Eb }, 0 },
@@ -2406,22 +2423,30 @@ static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31",
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d",
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w",
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b",
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2810,9 +2835,9 @@ static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3384,7 +3409,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3402,7 +3427,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -4196,19 +4221,19 @@ static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_E8 */
   {
-    { "callP",		{ Jv, BND }, 0 },
-    { "call@",		{ Jv, BND }, 0 }
+    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_E9 */
   {
-    { "jmpP",		{ Jv, BND }, 0 },
-    { "jmp@",		{ Jv, BND }, 0 }
+    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_EA */
   {
-    { "{l|}jmp{P|}", { Ap }, 0 },
+    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
   },
 
   /* X86_64_0F00_REG_6 */
@@ -8184,7 +8209,7 @@ static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8387,6 +8412,24 @@ ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case REX2_OPCODE:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      ins->rex2_payload = *ins->codep;
+	      ins->rex2 = ins->rex2_payload >> 4;
+	      ins->rex = (ins->rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8554,6 +8597,8 @@ prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9202,6 +9247,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9367,13 +9413,18 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* REX2.M in rex2 prefix represents map0 or map1.  */
+  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
+
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9529,7 +9580,15 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  switch (dp->prefix_requirement)
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9588,6 +9647,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+      && (ins.rex ^ ins.rex_used) == 0
+      && (ins.rex2 & 0x7))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9616,7 +9682,11 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s 0x%x} ", name,
+			   (unsigned int) ins.rex2_payload);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11163,6 +11233,8 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
   USED_REX (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11170,7 +11242,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11386,6 +11458,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11559,6 +11633,9 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		}
 	      break;
 	    default:
+	      if (ins->rex2 & REX_X)
+		vindex += 16;
+
 	      if (vindex != 4)
 		indexes = ins->address_mode == mode_64bit && !addr32flag
 			  ? att_names64 : att_names32;
@@ -11946,7 +12023,7 @@ static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11959,8 +12036,8 @@ OP_REG (instr_info *ins, int code, int sizeflag)
   USED_REX (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12674,6 +12751,8 @@ OP_EX (instr_info *ins, int bytemode, int sizeflag)
   USED_REX (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index c0223d16964..fb5a4f81288 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
     "64" },
   { "USER_MSR",
     "64" },
+  { "APX_F",
+    "XSAVE|64" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (USER_MSR),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -484,6 +487,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (Optimize),
   BITFIELD (Dialect),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1070,10 +1074,44 @@ get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+rex2_disallowed (const unsigned long long opcode, unsigned int length,
+		 unsigned int space, const char *cpu_flags)
+{
+  /* Some opcodes encode a ModR/M-like byte directly in the opcode.  */
+  unsigned int base_opcode = opcode >> (8 * length - 8);
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if (space == 0)
+    switch (base_opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+	return true;
+      default:
+	return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (base_opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+	return true;
+      default:
+	return false;
+      }
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool rex2_disallowed)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1200,6 +1238,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	  || modifiers[SAE].value))
     modifiers[EVex].value = EVEXDYN;
 
+  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
+     For templates supporting both Vex and EVex allowing EGPR.  */
+  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
+      && !modifiers[EVex].value)
+    modifiers[NoEgpr].value = 1;
+
   output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
 }
 
@@ -1424,7 +1468,9 @@ output_i386_opcode (FILE *table, const char *name, char *str,
   free (ident);
 
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				rex2_disallowed (opcode, length, space,
+						 cpu_flags));
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index b76f9ecc2a6..8b4a48c8c85 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -319,6 +319,8 @@ enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -354,6 +356,7 @@ enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -745,6 +748,11 @@ enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -790,6 +798,7 @@ typedef struct i386_opcode_modifier
   unsigned int optimize:1;
   unsigned int dialect:2;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1004,7 +1013,8 @@ typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1031,6 +1041,7 @@ typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index fffe05d4399..8c8f6695d22 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -895,7 +895,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:No64, disp32:Disp32:i386, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1428,16 +1428,17 @@ crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Uns
 
 // xsave/xrstor New Instructions.
 
-xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+
+xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // AES instructions.
 
@@ -2480,17 +2481,17 @@ clflushopt, 0x660fae/7, ClflushOpt, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 
 // XSAVES/XRSTORS instructions.
 
-xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVES instructions end.
 
 // XSAVEC instructions.
 
-xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVEC instructions end.
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@ r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@ r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@ r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@ r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 2/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 1/9] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 3/9] Support APX GPR32 with extend evex prefix Cui, Lili
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

opcode/ChangeLog:

	* i386-dis-evex.hi: Added an empty EVEX_MAP4_ sub-table for
	legacy insn promote to EVEX insn.
	* opcodes/i386-dis-evex.h: Add EVEX_MAP4.
---
 opcodes/i386-dis-evex.h | 291 ++++++++++++++++++++++++++++++++++++++++
 opcodes/i386-dis.c      |   1 +
 2 files changed, 292 insertions(+)

diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index e6295119d2b..7ad1edbe72d 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -872,6 +872,297 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
   },
+  /* EVEX_MAP4_ */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 50 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 58 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 98 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
   /* EVEX_MAP5_ */
   {
     /* 00 */
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index a9fd17621da..20fe7070b35 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1298,6 +1298,7 @@ enum
   EVEX_0F = 0,
   EVEX_0F38,
   EVEX_0F3A,
+  EVEX_MAP4,
   EVEX_MAP5,
   EVEX_MAP6,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 1/9] Support APX GPR32 with rex2 prefix Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 2/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-22 13:49   ` Jan Beulich
  2023-12-22 14:19   ` Jan Beulich
  2023-12-19 12:12 ` [PATCH v4 4/9] Add tests for " Cui, Lili
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the “MOVBE
  reg1, reg2”.

  gas/ChangeLog:

  2023-12-19  Lingling Kong <lingling.kong@intel.com>
	      H.J. Lu  <hongjiu.lu@intel.com>
	      Lili Cui <lili.cui@intel.com>
	      Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c (APX_F): New define.
	(install_template): Handled APX combines.
	(is_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* testsuite/gas/i386/x86-64.exp: Adjust x86-64-inval-movbe.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_0F38F2_L_0,
	PREFIX_EVEX_0F38F3_L_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8.
	* i386-dis-evex-reg.h: Handle REG_EVEX_0F38F3_L_0_P_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* opcodes/i386-dis-evex-x86-64.h: Handle Add X86_64_EVEX_0F90,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0, X86_64_EVEX_0F91.
	* i386-dis.c
	(struct instr_info): Deleted bool r.
	(PREFIX_NP_OR_DATA): New.
	(NO_PREFIX): New.
	(putop): Ditto.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier): Add EVEXMAP4.
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                 |  65 ++++++++++-
 gas/testsuite/gas/i386/x86-64-evex.d |   2 +-
 gas/testsuite/gas/i386/x86-64.exp    |   2 +-
 opcodes/i386-dis-evex-len.h          |  10 ++
 opcodes/i386-dis-evex-prefix.h       |  66 +++++++++++
 opcodes/i386-dis-evex-reg.h          |   7 ++
 opcodes/i386-dis-evex-x86-64.h       |  50 ++++++++
 opcodes/i386-dis-evex.h              |  94 +++++++--------
 opcodes/i386-dis.c                   | 165 +++++++++++++++++++++++----
 opcodes/i386-gen.c                   |   2 +
 opcodes/i386-opc.h                   |   6 +
 opcodes/i386-opc.tbl                 | 109 ++++++++++++------
 12 files changed, 466 insertions(+), 112 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 051cdef2a3a..25cfacce138 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -89,6 +89,7 @@
 /* This matches the C -> StaticRounding alias in the opcode table.  */
 #define commutative staticrounding
 
+#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))
 /*
   'templates' is for grouping together 'template' structures for opcodes
   of the same name.  This is only used for storing the insns in the grand
@@ -3673,7 +3674,7 @@ install_template (const insn_template *t)
 
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
-  {
+    {
       if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
 	   || maybe_cpu (t, CpuFMA))
 	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
@@ -3695,7 +3696,15 @@ install_template (const insn_template *t)
 		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
 	    }
 	}
-  }
+
+      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
+	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
+	  || APX_F(CpuBMI2))
+	if (need_evex_encoding ())
+	  i.tm.opcode_modifier.vex = 0;
+	else
+	  i.tm.opcode_modifier.evex = 0;
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3876,6 +3885,15 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+/* We can use this function only when the current encoding is evex.  */
+static INLINE bool
+is_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_apx_rex2_encoding (void)
 {
@@ -4153,6 +4171,27 @@ build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= ~0x10;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= ~0x04;
+  if (i.vex.register_specifier
+      && i.vex.register_specifier->reg_flags & RegRex2)
+    i.vex.bytes[3] &= ~0x08;
+}
+
 static void
 process_immext (void)
 {
@@ -5642,13 +5681,18 @@ md_assemble (char *line)
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_apx_evex_encoding ())
+	build_apx_evex_prefix ();
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -8087,7 +8131,8 @@ process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -8097,7 +8142,11 @@ process_suffix (void)
 	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
 	    prefix = ADDR_PREFIX_OPCODE;
 
-	  if (!add_prefix (prefix))
+	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
+	     needs to be adjusted.  */
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	  else if (!add_prefix (prefix))
 	    return 0;
 	}
 
@@ -14293,6 +14342,12 @@ static bool check_register (const reg_entry *r)
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
+
+      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
+	 For the later install_template function.  */
+      if (current_templates.start->opcode_modifier.vex
+	  && current_templates.start->opcode_modifier.evex)
+	i.vec_encoding = vex_encoding_evex;
     }
 
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 2be0df0e981..7a1fef58735 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
 run_dump_test "x86-64-movbe"
 run_dump_test "x86-64-movbe-intel"
 run_dump_test "x86-64-movbe-suffix"
-run_list_test "x86-64-inval-movbe" "-al"
+run_list_test "x86-64-inval-movbe" "-march=+noapx_f -al"
 run_dump_test "x86-64-ept"
 run_dump_test "x86-64-ept-intel"
 run_list_test "x86-64-inval-ept" "-al"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..ad59a559e0d 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F2_L_0) },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..b11b7adb443 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -285,6 +285,14 @@
     { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
     { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F38F2_L_0 */
+  {
+    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
+  },
+  /* PREFIX_EVEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_EVEX_0F38F3_L_0_P_0) },
+  },
   /* PREFIX_EVEX_0F3A08 */
   {
     { "vrndscalep%XH",  { XM, EXxh, EXxEVexS, Ib }, 0 },
@@ -338,6 +346,64 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_0F38D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { "encodekey128", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { "encodekey256", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { "aesenc128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { "aesdec128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { "aesenc256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { "aesdec256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A", { Gdq, Eb }, 0 },
+    { "invept", { Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q", { Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q", { Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { "enqcmds", { Gva, M },  0 },
+    { "movdir64b", { Gva, M }, 0 },
+    { "enqcmd", { Gva, M }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..8374f0ea93a 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,10 @@
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0_P_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",	{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",	{ VexGdq, Edq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..0d9d98a7691
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,50 @@
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F90_L_0) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F91_L_0) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F92_L_0) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F93_L_0) },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..90c063b2188 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@ static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@ static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_0F38F6) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1",	{ XM, EXxmm }, NO_PREFIX },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 20fe7070b35..5b6f063d016 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@ enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -212,7 +219,6 @@ struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -220,6 +226,8 @@ struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -305,6 +313,8 @@ struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
+#define NO_PREFIX   0x4000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -802,6 +812,7 @@ enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -820,6 +831,8 @@ enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -868,7 +881,7 @@ enum
   REG_VEX_0F73,
   REG_VEX_0FAE,
   REG_VEX_0F3849_X86_64_L_0_W_0_M_1_P_0,
-  REG_VEX_0F38F3_L_0,
+  REG_VEX_0F38F3_L_0_P_0,
   REG_VEX_MAP7_F8_L_0_W_0,
 
   REG_XOP_09_01_L_0,
@@ -880,7 +893,8 @@ enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0_P_0,
 };
 
 enum
@@ -1096,6 +1110,8 @@ enum
   PREFIX_VEX_0F38CC,
   PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38DA_W_0,
+  PREFIX_VEX_0F38F2_L_0,
+  PREFIX_VEX_0F38F3_L_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1147,6 +1163,8 @@ enum
   PREFIX_EVEX_0F389B,
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
+  PREFIX_EVEX_0F38F2_L_0,
+  PREFIX_EVEX_0F38F3_L_0,
 
   PREFIX_EVEX_0F3A08,
   PREFIX_EVEX_0F3A0A,
@@ -1158,6 +1176,18 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1269,7 +1299,19 @@ enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
 };
 
 enum
@@ -1454,6 +1496,8 @@ enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -2884,12 +2928,12 @@ static const struct dis386 reg_table[][8] = {
   {
     { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_0_R_0) },
   },
-  /* REG_VEX_0F38F3_L_0 */
+  /* REG_VEX_0F38F3_L_0_P_0 */
   {
     { Bad_Opcode },
-    { "blsrS",		{ VexGdq, Edq }, PREFIX_OPCODE },
-    { "blsmskS",	{ VexGdq, Edq }, PREFIX_OPCODE },
-    { "blsiS",		{ VexGdq, Edq }, PREFIX_OPCODE },
+    { "blsrS",		{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",		{ VexGdq, Edq }, 0 },
   },
   /* REG_VEX_MAP7_F8_L_0_W_0 */
   {
@@ -4037,6 +4081,16 @@ static const struct dis386 prefix_table[][4] = {
     { "vsm4rnds4", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38F2_L_0 */
+  {
+    { "andnS",          { Gdq, VexGdq, Edq }, 0 },
+  },
+
+  /* PREFIX_VEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_VEX_0F38F3_L_0_P_0) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -4529,6 +4583,7 @@ static const struct dis386 x86_64_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
 
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -7115,12 +7170,12 @@ static const struct dis386 vex_len_table[][2] = {
 
   /* VEX_LEN_0F38F2 */
   {
-    { "andnS",		{ Gdq, VexGdq, Edq }, PREFIX_OPCODE },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
   },
 
   /* VEX_LEN_0F38F3 */
   {
-    { REG_TABLE(REG_VEX_0F38F3_L_0) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
   },
 
   /* VEX_LEN_0F38F5 */
@@ -8734,6 +8789,17 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* EVEX from VEX instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
+	 the lower 2 bits of EVEX.aaa must be 0.  */
+      if ((ins->vex.mask_register_specifier & 0x3) != 0
+	  || ins->vex.ll != 0
+	  || ins->vex.zeroing != 0
+	  || ins->vex.b)
+	return &bad_opcode;
+
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8979,9 +9045,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch (*ins->codep & 0x7)
 	{
 	default:
 	  return &bad_opcode;
@@ -8994,6 +9064,12 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  if (ins->address_mode != mode_64bit)
+	    return &bad_opcode;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
@@ -9010,9 +9086,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -9042,12 +9117,26 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  /* Report bad for !evex_default and when two fixed values of evex
+	     change..  */
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
+	 lower 2 bits of EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy
+	  && ((ins->vex.mask_register_specifier & 0x3) != 0
+	      || ins->vex.ll != 0
+	      || ins->vex.zeroing != 0))
+	return &bad_opcode;
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9462,6 +9551,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, embedded prefix
+	 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9641,6 +9737,25 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix == REPE_PREFIX_OPCODE
+	  || ins.vex.prefix == REPNE_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    case NO_PREFIX:
+      if (ins.vex.prefix)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
     }
 
   /* Check if the REX prefix is used.  */
@@ -10350,7 +10465,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11461,7 +11576,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11605,6 +11720,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require EVEX.X4 not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11807,7 +11929,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -12013,10 +12135,7 @@ OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12647,7 +12766,7 @@ OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13654,7 +13773,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index fb5a4f81288..c42b05d6c9d 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -488,6 +488,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (Dialect),
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
+  BITFIELD (NF),
 };
 
 #define CLASS(n) #n, n
@@ -1121,6 +1122,7 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(VEXMAP7),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 8b4a48c8c85..68dbeca343d 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -753,6 +753,9 @@ enum
      whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,
 
+  /* No CSPAZO flags update indication.  */
+  NF,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -799,6 +802,7 @@ typedef struct i386_opcode_modifier
   unsigned int dialect:2;
   unsigned int isa64:2;
   unsigned int noegpr:1;
+  unsigned int nf:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -974,6 +978,7 @@ typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      7: VEXMAP7 opcode prefix / space.
@@ -985,6 +990,7 @@ typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4	4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_VEXMAP7	7
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 8c8f6695d22..139ec8ce33a 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -106,16 +106,6 @@
 #define HLEPrefixRelease PrefixOk=PrefixHLERelease
 #define NoTrackPrefixOk  PrefixOk=PrefixNoTrack
 
-#define Space0F    OpcodeSpace=SPACE_0F
-#define Space0F38  OpcodeSpace=SPACE_0F38
-#define Space0F3A  OpcodeSpace=SPACE_0F3A
-#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
-#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
-#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
-
-#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
-#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
-
 #define VexMap7 OpcodeSpace=SPACE_VEXMAP7
 
 #define VexW0 VexW=VEXW0
@@ -137,11 +127,25 @@
 #define EVexLIG EVex=EVEXLIG
 #define EVexDYN EVex=EVEXDYN
 
+#define Space0F    OpcodeSpace=SPACE_0F
+#define Space0F38  OpcodeSpace=SPACE_0F38
+#define Space0F3A  OpcodeSpace=SPACE_0F3A
+#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
+#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
+#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
+
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128
+#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
+#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
+
 #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
 
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+// The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
+#define APX_F(cpuid) cpuid&(cpuid|APX_F)
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -197,6 +201,7 @@ mov, 0xf24, i386&No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe&APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -1318,13 +1323,16 @@ getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, INVPCID&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1425,6 +1433,8 @@ pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, Reg
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F, W|Modrm|No_sSuf|No_qSuf|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1839,14 +1849,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
 
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+rorx, 0xf2f0, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1916,11 +1926,11 @@ lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2049,13 +2059,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2117,9 +2134,9 @@ kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { R
 kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
-kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
-kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
-kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, APX_F(<bw:kcpu>), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>91, APX_F(<bw:kcpu>), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>92, APX_F(<bw:kcpu>), D|Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2594,9 +2611,9 @@ vpmovzxdq, 0x6635, AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift
 kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
-kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, APX_F(AVX512BW), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>91, APX_F(AVX512BW), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0xf292, APX_F(AVX512BW), D|Modrm|Vex128|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2995,9 +3012,13 @@ rdsspq, 0xf30f1e/1, SHSTK&x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK&x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, SHSTK&APX_F, Modrm|NoSuf|Size64|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK&x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK&APX_F, Modrm|NoSuf|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3045,7 +3066,9 @@ cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B&APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3074,7 +3097,9 @@ vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, APX_F(ENQCMD), Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, APX_F(ENQCMD), Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3135,8 +3160,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, APX_F(AMX_TILE), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, APX_F(AMX_TILE), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3148,9 +3173,9 @@ tdpbuud, 0x5e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf,
 tdpbusd, 0x665e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3162,15 +3187,25 @@ tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL&APX_F, Modrm|NoSuf|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL&APX_F, Modrm|NoSuf|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3318,7 +3353,7 @@ prefetchit1, 0xf18/6, PREFETCHI, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, APX_F(CMPCCXADD), Modrm|Vex|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3338,9 +3373,13 @@ wrmsrlist, 0xf30f01c6, MSRLIST, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 4/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (2 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 3/9] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-22 14:41   ` Jan Beulich
  2023-12-19 12:12 ` [PATCH v4 5/9] Support APX NDD Cui, Lili
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

gas/ChangeLog:

2023-12-19 Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: Add some insn don't
	support gpr32.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Add new test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.s: New test.
---
 .../gas/i386/x86-64-apx-egpr-inval.l          | 187 ++++++++++
 .../gas/i386/x86-64-apx-egpr-inval.s          | 191 +++++++++++
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 ++
 .../gas/i386/x86-64-apx-egpr-promote-inval.s  |  29 ++
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |  20 ++
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  33 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  34 ++
 .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.d       | 318 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.s       | 314 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   5 +
 12 files changed, 1490 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s

diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
index bb5c602a2e2..0472748978a 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -12,4 +12,191 @@
 .*:16: Error: extended GPR cannot be used as base/index for `xsaveopt64'
 .*:17: Error: extended GPR cannot be used as base/index for `xsavec'
 .*:18: Error: extended GPR cannot be used as base/index for `xsavec64'
+.*:20: Error: extended GPR cannot be used as base/index for `blendpd'
+.*:21: Error: extended GPR cannot be used as base/index for `blendps'
+.*:22: Error: extended GPR cannot be used as base/index for `blendvpd'
+.*:23: Error: extended GPR cannot be used as base/index for `blendvpd'
+.*:24: Error: extended GPR cannot be used as base/index for `blendvps'
+.*:25: Error: extended GPR cannot be used as base/index for `blendvps'
+.*:26: Error: extended GPR cannot be used as base/index for `dppd'
+.*:27: Error: extended GPR cannot be used as base/index for `dpps'
+.*:28: Error: register type mismatch for `extractps'
+.*:29: Error: extended GPR cannot be used as base/index for `extractps'
+.*:30: Error: extended GPR cannot be used as base/index for `insertps'
+.*:31: Error: extended GPR cannot be used as base/index for `movntdqa'
+.*:32: Error: extended GPR cannot be used as base/index for `mpsadbw'
+.*:33: Error: extended GPR cannot be used as base/index for `pabsb'
+.*:34: Error: extended GPR cannot be used as base/index for `pabsd'
+.*:35: Error: extended GPR cannot be used as base/index for `pabsw'
+.*:36: Error: extended GPR cannot be used as base/index for `packusdw'
+.*:37: Error: extended GPR cannot be used as base/index for `palignr'
+.*:38: Error: extended GPR cannot be used as base/index for `pblendvb'
+.*:39: Error: extended GPR cannot be used as base/index for `pblendvb'
+.*:40: Error: extended GPR cannot be used as base/index for `pblendw'
+.*:41: Error: extended GPR cannot be used as base/index for `pcmpeqq'
+.*:42: Error: extended GPR cannot be used as base/index for `pcmpestri'
+.*:43: Error: extended GPR cannot be used as base/index for `pcmpestrm'
+.*:44: Error: extended GPR cannot be used as base/index for `pcmpgtq'
+.*:45: Error: extended GPR cannot be used as base/index for `pcmpistri'
+.*:46: Error: extended GPR cannot be used as base/index for `pcmpistrm'
+.*:47: Error: register type mismatch for `pextrb'
+.*:48: Error: extended GPR cannot be used as base/index for `pextrb'
+.*:49: Error: extended GPR cannot be used as base/index for `pextrd'
+.*:50: Error: extended GPR cannot be used as base/index for `pextrq'
+.*:51: Error: extended GPR cannot be used as base/index for `pextrw'
+.*:52: Error: extended GPR cannot be used as base/index for `phaddd'
+.*:53: Error: extended GPR cannot be used as base/index for `phaddsw'
+.*:54: Error: extended GPR cannot be used as base/index for `phaddw'
+.*:55: Error: extended GPR cannot be used as base/index for `phminposuw'
+.*:56: Error: extended GPR cannot be used as base/index for `phsubw'
+.*:57: Error: register type mismatch for `pinsrb'
+.*:58: Error: extended GPR cannot be used as base/index for `pinsrb'
+.*:59: Error: register type mismatch for `pinsrd'
+.*:60: Error: extended GPR cannot be used as base/index for `pinsrd'
+.*:61: Error: register type mismatch for `pinsrq'
+.*:62: Error: extended GPR cannot be used as base/index for `pinsrq'
+.*:63: Error: extended GPR cannot be used as base/index for `pmaddubsw'
+.*:64: Error: extended GPR cannot be used as base/index for `pmaxsb'
+.*:65: Error: extended GPR cannot be used as base/index for `pmaxsd'
+.*:66: Error: extended GPR cannot be used as base/index for `pmaxud'
+.*:67: Error: extended GPR cannot be used as base/index for `pmaxuw'
+.*:68: Error: extended GPR cannot be used as base/index for `pminsb'
+.*:69: Error: extended GPR cannot be used as base/index for `pminsd'
+.*:70: Error: extended GPR cannot be used as base/index for `pminud'
+.*:71: Error: extended GPR cannot be used as base/index for `pminuw'
+.*:72: Error: extended GPR cannot be used as base/index for `pmovsxbd'
+.*:73: Error: extended GPR cannot be used as base/index for `pmovsxbq'
+.*:74: Error: extended GPR cannot be used as base/index for `pmovsxbw'
+.*:75: Error: extended GPR cannot be used as base/index for `pmovsxbw'
+.*:76: Error: extended GPR cannot be used as base/index for `pmovsxdq'
+.*:77: Error: extended GPR cannot be used as base/index for `pmovsxwd'
+.*:78: Error: extended GPR cannot be used as base/index for `pmovsxwq'
+.*:79: Error: extended GPR cannot be used as base/index for `pmovzxbd'
+.*:80: Error: extended GPR cannot be used as base/index for `pmovzxbq'
+.*:81: Error: extended GPR cannot be used as base/index for `pmovzxdq'
+.*:82: Error: extended GPR cannot be used as base/index for `pmovzxwd'
+.*:83: Error: extended GPR cannot be used as base/index for `pmovzxwq'
+.*:84: Error: extended GPR cannot be used as base/index for `pmuldq'
+.*:85: Error: extended GPR cannot be used as base/index for `pmulhrsw'
+.*:86: Error: extended GPR cannot be used as base/index for `pmulld'
+.*:87: Error: extended GPR cannot be used as base/index for `pshufb'
+.*:88: Error: extended GPR cannot be used as base/index for `psignb'
+.*:89: Error: extended GPR cannot be used as base/index for `psignd'
+.*:90: Error: extended GPR cannot be used as base/index for `psignw'
+.*:91: Error: extended GPR cannot be used as base/index for `roundpd'
+.*:92: Error: extended GPR cannot be used as base/index for `roundps'
+.*:93: Error: extended GPR cannot be used as base/index for `roundsd'
+.*:94: Error: extended GPR cannot be used as base/index for `roundss'
+.*:96: Error: extended GPR cannot be used as base/index for `aesdec'
+.*:97: Error: extended GPR cannot be used as base/index for `aesdeclast'
+.*:98: Error: extended GPR cannot be used as base/index for `aesenc'
+.*:99: Error: extended GPR cannot be used as base/index for `aesenclast'
+.*:100: Error: extended GPR cannot be used as base/index for `aesimc'
+.*:101: Error: extended GPR cannot be used as base/index for `aeskeygenassist'
+.*:102: Error: extended GPR cannot be used as base/index for `pclmulhqhqdq'
+.*:103: Error: extended GPR cannot be used as base/index for `pclmulhqlqdq'
+.*:104: Error: extended GPR cannot be used as base/index for `pclmullqhqdq'
+.*:105: Error: extended GPR cannot be used as base/index for `pclmullqlqdq'
+.*:106: Error: extended GPR cannot be used as base/index for `pclmulqdq'
+.*:108: Error: extended GPR cannot be used as base/index for `gf2p8affineinvqb'
+.*:109: Error: extended GPR cannot be used as base/index for `gf2p8affineqb'
+.*:110: Error: extended GPR cannot be used as base/index for `gf2p8mulb'
+.*:112: Error: extended GPR cannot be used as base/index for `vaesimc'
+.*:113: Error: extended GPR cannot be used as base/index for `vaeskeygenassist'
+.*:114: Error: extended GPR cannot be used as base/index for `vblendpd'
+.*:115: Error: extended GPR cannot be used as base/index for `vblendpd'
+.*:116: Error: extended GPR cannot be used as base/index for `vblendps'
+.*:117: Error: extended GPR cannot be used as base/index for `vblendps'
+.*:118: Error: extended GPR cannot be used as base/index for `vblendvpd'
+.*:119: Error: extended GPR cannot be used as base/index for `vblendvpd'
+.*:120: Error: extended GPR cannot be used as base/index for `vblendvps'
+.*:121: Error: extended GPR cannot be used as base/index for `vblendvps'
+.*:122: Error: extended GPR cannot be used as base/index for `vdppd'
+.*:123: Error: extended GPR cannot be used as base/index for `vdpps'
+.*:124: Error: extended GPR cannot be used as base/index for `vdpps'
+.*:125: Error: extended GPR cannot be used as base/index for `vhaddpd'
+.*:126: Error: extended GPR cannot be used as base/index for `vhaddpd'
+.*:127: Error: extended GPR cannot be used as base/index for `vhsubps'
+.*:128: Error: extended GPR cannot be used as base/index for `vhsubps'
+.*:129: Error: extended GPR cannot be used as base/index for `vlddqu'
+.*:130: Error: extended GPR cannot be used as base/index for `vlddqu'
+.*:131: Error: extended GPR cannot be used as base/index for `vldmxcsr'
+.*:132: Error: extended GPR cannot be used as base/index for `vmaskmovpd'
+.*:133: Error: extended GPR cannot be used as base/index for `vmaskmovpd'
+.*:134: Error: extended GPR cannot be used as base/index for `vmaskmovpd'
+.*:135: Error: extended GPR cannot be used as base/index for `vmaskmovpd'
+.*:136: Error: extended GPR cannot be used as base/index for `vmaskmovps'
+.*:137: Error: extended GPR cannot be used as base/index for `vmaskmovps'
+.*:138: Error: extended GPR cannot be used as base/index for `vmaskmovps'
+.*:139: Error: extended GPR cannot be used as base/index for `vmaskmovps'
+.*:140: Error: register type mismatch for `vmovmskpd'
+.*:141: Error: register type mismatch for `vmovmskpd'
+.*:142: Error: register type mismatch for `vmovmskps'
+.*:143: Error: register type mismatch for `vmovmskps'
+.*:144: Error: extended GPR cannot be used as base/index for `vpblendd'
+.*:145: Error: extended GPR cannot be used as base/index for `vpblendd'
+.*:146: Error: extended GPR cannot be used as base/index for `vpblendvb'
+.*:147: Error: extended GPR cannot be used as base/index for `vpblendvb'
+.*:148: Error: extended GPR cannot be used as base/index for `vpblendw'
+.*:149: Error: extended GPR cannot be used as base/index for `vpblendw'
+.*:150: Error: extended GPR cannot be used as base/index for `vpcmpeqb'
+.*:151: Error: extended GPR cannot be used as base/index for `vpcmpeqd'
+.*:152: Error: extended GPR cannot be used as base/index for `vpcmpeqq'
+.*:153: Error: extended GPR cannot be used as base/index for `vpcmpeqw'
+.*:154: Error: extended GPR cannot be used as base/index for `vpcmpestri'
+.*:155: Error: extended GPR cannot be used as base/index for `vpcmpestrm'
+.*:156: Error: extended GPR cannot be used as base/index for `vpcmpgtb'
+.*:157: Error: extended GPR cannot be used as base/index for `vpcmpgtd'
+.*:158: Error: extended GPR cannot be used as base/index for `vpcmpgtq'
+.*:159: Error: extended GPR cannot be used as base/index for `vpcmpgtw'
+.*:160: Error: extended GPR cannot be used as base/index for `vpcmpistri'
+.*:161: Error: extended GPR cannot be used as base/index for `vpcmpistrm'
+.*:162: Error: extended GPR cannot be used as base/index for `vperm2f128'
+.*:163: Error: extended GPR cannot be used as base/index for `vperm2i128'
+.*:164: Error: extended GPR cannot be used as base/index for `vphaddd'
+.*:165: Error: extended GPR cannot be used as base/index for `vphaddd'
+.*:166: Error: extended GPR cannot be used as base/index for `vphaddsw'
+.*:167: Error: extended GPR cannot be used as base/index for `vphaddsw'
+.*:168: Error: extended GPR cannot be used as base/index for `vphaddw'
+.*:169: Error: extended GPR cannot be used as base/index for `vphaddw'
+.*:170: Error: extended GPR cannot be used as base/index for `vphminposuw'
+.*:171: Error: extended GPR cannot be used as base/index for `vphsubd'
+.*:172: Error: extended GPR cannot be used as base/index for `vphsubd'
+.*:173: Error: extended GPR cannot be used as base/index for `vphsubsw'
+.*:174: Error: extended GPR cannot be used as base/index for `vphsubsw'
+.*:175: Error: extended GPR cannot be used as base/index for `vphsubw'
+.*:176: Error: extended GPR cannot be used as base/index for `vphsubw'
+.*:177: Error: extended GPR cannot be used as base/index for `vpmaskmovd'
+.*:178: Error: extended GPR cannot be used as base/index for `vpmaskmovd'
+.*:179: Error: extended GPR cannot be used as base/index for `vpmaskmovd'
+.*:180: Error: extended GPR cannot be used as base/index for `vpmaskmovd'
+.*:181: Error: extended GPR cannot be used as base/index for `vpmaskmovq'
+.*:182: Error: extended GPR cannot be used as base/index for `vpmaskmovq'
+.*:183: Error: extended GPR cannot be used as base/index for `vpmaskmovq'
+.*:184: Error: extended GPR cannot be used as base/index for `vpmaskmovq'
+.*:185: Error: register type mismatch for `vpmovmskb'
+.*:186: Error: register type mismatch for `vpmovmskb'
+.*:187: Error: extended GPR cannot be used as base/index for `vpsignb'
+.*:188: Error: extended GPR cannot be used as base/index for `vpsignb'
+.*:189: Error: extended GPR cannot be used as base/index for `vpsignd'
+.*:190: Error: extended GPR cannot be used as base/index for `vpsignd'
+.*:191: Error: extended GPR cannot be used as base/index for `vpsignw'
+.*:192: Error: extended GPR cannot be used as base/index for `vpsignw'
+.*:193: Error: extended GPR cannot be used as base/index for `vptest'
+.*:194: Error: extended GPR cannot be used as base/index for `vptest'
+.*:195: Error: extended GPR cannot be used as base/index for `vrcpps'
+.*:196: Error: extended GPR cannot be used as base/index for `vrcpps'
+.*:197: Error: extended GPR cannot be used as base/index for `vrcpss'
+.*:198: Error: extended GPR cannot be used as base/index for `vroundpd'
+.*:199: Error: extended GPR cannot be used as base/index for `vroundps'
+.*:200: Error: extended GPR cannot be used as base/index for `vroundsd'
+.*:201: Error: extended GPR cannot be used as base/index for `vroundss'
+.*:202: Error: extended GPR cannot be used as base/index for `vrsqrtps'
+.*:203: Error: extended GPR cannot be used as base/index for `vrsqrtps'
+.*:204: Error: extended GPR cannot be used as base/index for `vrsqrtss'
+.*:205: Error: extended GPR cannot be used as base/index for `vstmxcsr'
+.*:206: Error: extended GPR cannot be used as base/index for `vtestpd'
+.*:207: Error: extended GPR cannot be used as base/index for `vtestpd'
+.*:208: Error: extended GPR cannot be used as base/index for `vtestps'
+.*:209: Error: extended GPR cannot be used as base/index for `vtestps'
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
index bfb6b3fd03b..fde038d6b2f 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -16,3 +16,194 @@
 	xsaveopt64 (%r16, %r31)
 	xsavec (%r16, %rbx)
 	xsavec64 (%r16, %r31)
+#SSE
+	blendpd $100,(%r18),%xmm6
+	blendps $100,(%r18),%xmm6
+	blendvpd %xmm0,(%r19),%xmm6
+	blendvpd (%r19),%xmm6
+	blendvps %xmm0,(%r19),%xmm6
+	blendvps (%r19),%xmm6
+	dppd $100,(%r20),%xmm6
+	dpps $100,(%r20),%xmm6
+	extractps $100,%xmm4,%r21
+	extractps $100,%xmm4,(%r21)
+	insertps $100,(%r21),%xmm6
+	movntdqa (%r21),%xmm4
+	mpsadbw $100,(%r21),%xmm6
+	pabsb (%r17),%xmm0
+	pabsd (%r17),%xmm0
+	pabsw (%r17),%xmm0
+	packusdw (%r21),%xmm6
+	palignr $100,(%r17),%xmm6
+	pblendvb %xmm0,(%r22),%xmm6
+	pblendvb (%r22),%xmm6
+	pblendw $100,(%r22),%xmm6
+	pcmpeqq (%r22),%xmm6
+	pcmpestri $100,(%r25),%xmm6
+	pcmpestrm $100,(%r25),%xmm6
+	pcmpgtq (%r25),%xmm4
+	pcmpistri $100,(%r25),%xmm6
+	pcmpistrm $100,(%r25),%xmm6
+	pextrb $100,%xmm4,%r22
+	pextrb $100,%xmm4,(%r22)
+	pextrd $100,%xmm4,(%r22)
+	pextrq $100,%xmm4,(%r22)
+	pextrw $100,%xmm4,(%r22)
+	phaddd  (%r17),%xmm0
+	phaddsw (%r17),%xmm0
+	phaddw  (%r17),%xmm0
+	phminposuw (%r23),%xmm4
+	phsubw (%r17),%xmm0
+	pinsrb $100,%r23,%xmm4
+	pinsrb $100,(%r23),%xmm4
+	pinsrd $100, %r23d, %xmm4
+	pinsrd $100,(%r23),%xmm4
+	pinsrq $100, %r24, %xmm4
+	pinsrq $100,(%r24),%xmm4
+	pmaddubsw (%r17),%xmm0
+	pmaxsb (%r24),%xmm6
+	pmaxsd (%r24),%xmm6
+	pmaxud (%r24),%xmm6
+	pmaxuw (%r24),%xmm6
+	pminsb (%r24),%xmm6
+	pminsd (%r24),%xmm6
+	pminud (%r24),%xmm6
+	pminuw (%r24),%xmm6
+	pmovsxbd (%r24),%xmm4
+	pmovsxbq (%r24),%xmm4
+	pmovsxbw (%r24),%xmm4
+	pmovsxbw (%r24),%xmm4
+	pmovsxdq (%r24),%xmm4
+	pmovsxwd (%r24),%xmm4
+	pmovsxwq (%r24),%xmm4
+	pmovzxbd (%r24),%xmm4
+	pmovzxbq (%r24),%xmm4
+	pmovzxdq (%r24),%xmm4
+	pmovzxwd (%r24),%xmm4
+	pmovzxwq (%r24),%xmm4
+	pmuldq (%r24),%xmm4
+	pmulhrsw (%r17),%xmm0
+	pmulld (%r24),%xmm4
+	pshufb (%r17),%xmm0
+	psignb (%r17),%xmm0
+	psignd (%r17),%xmm0
+	psignw (%r17),%xmm0
+	roundpd $100,(%r24),%xmm6
+	roundps $100,(%r24),%xmm6
+	roundsd $100,(%r24),%xmm6
+	roundss $100,(%r24),%xmm6
+#AES
+	aesdec (%r26),%xmm6
+	aesdeclast (%r26),%xmm6
+	aesenc (%r26),%xmm6
+	aesenclast (%r26),%xmm6
+	aesimc (%r26),%xmm6
+	aeskeygenassist $100,(%r26),%xmm6
+	pclmulhqhqdq (%r26),%xmm6
+	pclmulhqlqdq (%r26),%xmm6
+	pclmullqhqdq (%r26),%xmm6
+	pclmullqlqdq (%r26),%xmm6
+	pclmulqdq $100,(%r26),%xmm6
+#GFNI
+	gf2p8affineinvqb $100,(%r26),%xmm6
+	gf2p8affineqb $100,(%r26),%xmm6
+	gf2p8mulb (%r26),%xmm6
+#VEX without evex
+	vaesimc (%r27), %xmm3
+	vaeskeygenassist $7,(%r27),%xmm3
+	vblendpd $7,(%r27),%xmm6,%xmm2
+	vblendpd $7,(%r27),%ymm6,%ymm2
+	vblendps $7,(%r27),%xmm6,%xmm2
+	vblendps $7,(%r27),%ymm6,%ymm2
+	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
+	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
+	vblendvps %xmm4,(%r27),%xmm2,%xmm7
+	vblendvps %ymm4,(%r27),%ymm2,%ymm7
+	vdppd $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%ymm6,%ymm2
+	vhaddpd (%r27),%xmm6,%xmm5
+	vhaddpd (%r27),%ymm6,%ymm5
+	vhsubps (%r27),%xmm6,%xmm5
+	vhsubps (%r27),%ymm6,%ymm5
+	vlddqu (%r27),%xmm4
+	vlddqu (%r27),%ymm4
+	vldmxcsr (%r27)
+	vmaskmovpd %xmm4,%xmm6,(%r27)
+	vmaskmovpd %ymm4,%ymm6,(%r27)
+	vmaskmovpd (%r27),%xmm4,%xmm6
+	vmaskmovpd (%r27),%ymm4,%ymm6
+	vmaskmovps %xmm4,%xmm6,(%r27)
+	vmaskmovps %ymm4,%ymm6,(%r27)
+	vmaskmovps (%r27),%xmm4,%xmm6
+	vmaskmovps (%r27),%ymm4,%ymm6
+	vmovmskpd %xmm4,%r27d
+	vmovmskpd %xmm8,%r27d
+	vmovmskps %xmm4,%r27d
+	vmovmskps %ymm8,%r27d
+	vpblendd $7,(%r27),%xmm6,%xmm2
+	vpblendd $7,(%r27),%ymm6,%ymm2
+	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
+	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
+	vpblendw $7,(%r27),%xmm6,%xmm2
+	vpblendw $7,(%r27),%ymm6,%ymm2
+	vpcmpeqb (%r26),%ymm6,%ymm2
+	vpcmpeqd (%r26),%ymm6,%ymm2
+	vpcmpeqq (%r16),%ymm6,%ymm2
+	vpcmpeqw (%r16),%ymm6,%ymm2
+	vpcmpestri $7,(%r27),%xmm6
+	vpcmpestrm $7,(%r27),%xmm6
+	vpcmpgtb (%r26),%ymm6,%ymm2
+	vpcmpgtd (%r26),%ymm6,%ymm2
+	vpcmpgtq (%r16),%ymm6,%ymm2
+	vpcmpgtw (%r16),%ymm6,%ymm2
+	vpcmpistri $100,(%r25),%xmm6
+	vpcmpistrm $100,(%r25),%xmm6
+	vperm2f128 $7,(%r27),%ymm6,%ymm2
+	vperm2i128 $7,(%r27),%ymm6,%ymm2
+	vphaddd (%r27),%xmm6,%xmm7
+	vphaddd (%r27),%ymm6,%ymm7
+	vphaddsw (%r27),%xmm6,%xmm7
+	vphaddsw (%r27),%ymm6,%ymm7
+	vphaddw (%r27),%xmm6,%xmm7
+	vphaddw (%r27),%ymm6,%ymm7
+	vphminposuw (%r27),%xmm6
+	vphsubd (%r27),%xmm6,%xmm7
+	vphsubd (%r27),%ymm6,%ymm7
+	vphsubsw (%r27),%xmm6,%xmm7
+	vphsubsw (%r27),%ymm6,%ymm7
+	vphsubw (%r27),%xmm6,%xmm7
+	vphsubw (%r27),%ymm6,%ymm7
+	vpmaskmovd %xmm4,%xmm6,(%r27)
+	vpmaskmovd %ymm4,%ymm6,(%r27)
+	vpmaskmovd (%r27),%xmm4,%xmm6
+	vpmaskmovd (%r27),%ymm4,%ymm6
+	vpmaskmovq %xmm4,%xmm6,(%r27)
+	vpmaskmovq %ymm4,%ymm6,(%r27)
+	vpmaskmovq (%r27),%xmm4,%xmm6
+	vpmaskmovq (%r27),%ymm4,%ymm6
+	vpmovmskb %xmm4,%r27
+	vpmovmskb %ymm4,%r27d
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vptest (%r27),%ymm6
+	vptest (%r27),%xmm6
+	vrcpps (%r27),%xmm6
+	vrcpps (%r27),%ymm6
+	vrcpss (%r27),%xmm6,%xmm6
+	vroundpd $1,(%r24),%xmm6
+	vroundps $2,(%r24),%xmm6
+	vroundsd $3,(%r24),%xmm6,%xmm3
+	vroundss $4,(%r24),%xmm6,%xmm3
+	vrsqrtps (%r27),%xmm6
+	vrsqrtps (%r27),%ymm6
+	vrsqrtss (%r27),%xmm6,%xmm6
+	vstmxcsr (%r27)
+	vtestpd (%r27),%xmm6
+	vtestpd (%r27),%ymm6
+	vtestps (%r27),%xmm6
+	vtestps (%r27),%ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
new file mode 100644
index 00000000000..f8701d7ec22
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
@@ -0,0 +1,20 @@
+.*: Assembler messages:
+.*:4: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:5: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:8: Error: `invept' is not supported on `x86_64.noept'
+.*:9: Error: `invept' is not supported on `x86_64.noept'
+.*:12: Error: `kmovq' is not supported on `x86_64.noavx512bw'
+.*:13: Error: `kmovq' is not supported on `x86_64.noavx512bw'
+.*:16: Error: `kmovb' is not supported on `x86_64.noavx512dq'
+.*:17: Error: `kmovb' is not supported on `x86_64.noavx512dq'
+.*:20: Error: `kmovw' is not supported on `x86_64.noavx512f'
+.*:21: Error: `kmovw' is not supported on `x86_64.noavx512f'
+.*:24: Error: `andn' is not supported on `x86_64.nobmi'
+.*:25: Error: `andn' is not supported on `x86_64.nobmi'
+.*:28: Error: `bzhi' is not supported on `x86_64.nobmi2'
+.*:29: Error: `bzhi' is not supported on `x86_64.nobmi2'
+GAS LISTING .*
+#...
+[ 	]*1[ 	]+\# Check illegal 64bit APX EVEX promoted instructions
+[ 	]*2[ 	]+\.text
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
new file mode 100644
index 00000000000..2ea47419b4d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
@@ -0,0 +1,29 @@
+# Check illegal 64bit APX EVEX promoted instructions
+	.text
+	.arch .nomovbe
+	movbe (%r16), %r17
+	movbe (%rax), %rcx
+	.arch default
+	.arch .noept
+	invept (%r16), %r17
+	invept (%rax), %rcx
+	.arch default
+	.arch .noavx512bw
+	kmovq %k1, (%r16)
+	kmovq %k1, (%r8)
+	.arch default
+	.arch .noavx512dq
+	kmovb %k1, %r16d
+	kmovb %k1, %r8d
+	.arch default
+	.arch .noavx512f
+	kmovw %k1, %r16d
+	kmovw %k1, %r8d
+	.arch default
+	.arch .nobmi
+	andn %r16,%r15,%r11
+	andn %r15,%r15,%r11
+	.arch default
+	.arch .nobmi2
+	bzhi %r16,%r15,%r11
+	bzhi %r15,%r15,%r11
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
new file mode 100644
index 00000000000..c3c578675c0
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
@@ -0,0 +1,20 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX old evex insn use gpr32 with extend-evex prefix
+#source: x86-64-apx-evex-egpr.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 fb 79 48 19 04 08 01[	 ]+vextractf32x4 \$0x1,%zmm0,\(%r16,%r17,1\)
+\s*[a-f0-9]+:\s*62 fa 79 48 5a 04 1a[	 ]+vbroadcasti32x4 \(%r18,%r19,1\),%zmm0
+\s*[a-f0-9]+:\s*62 eb 7d 08 17 c4 01[	 ]+vextractps \$0x1,%xmm16,%r20d
+\s*[a-f0-9]+:\s*62 69 97 00 2a f5[	 ]+vcvtsi2sd %r21,%xmm29,%xmm30
+\s*[a-f0-9]+:\s*67 62 fe 55 58 96 36[	 ]+vfmaddsub132ph \(%r22d\)\{1to32\},%zmm5,%zmm6
+\s*[a-f0-9]+:\s*62 81 fe 18 78 fe[	 ]+vcvttss2usi \{sae\},%xmm30,%r23
+\s*[a-f0-9]+:\s*62 25 10 47 58 b4 c5 00 00 00 10[	 ]+vaddph 0x10000000\(%rbp,%r24,8\),%zmm29,%zmm30\{%k7\}
+\s*[a-f0-9]+:\s*62 4d 7c 08 2f 71 7f[	 ]+vcomish 0xfe\(%r25\),%xmm30
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
new file mode 100644
index 00000000000..7d1c5de2b6d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
@@ -0,0 +1,21 @@
+# Check 64bit old evex instructions use gpr32 with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+## DestMem
+	 vextractf32x4	$1, %zmm0, (%r16,%r17)
+## SrcMem
+	 vbroadcasti32x4	(%r18,%r19), %zmm0
+## DestReg
+	 vextractps	$1, %xmm16, %r20d
+## SrcReg
+	 vcvtsi2sdq      %r21, %xmm29, %xmm30
+## Broadcast
+	 vfmaddsub132ph  (%r22d){1to32}, %zmm5, %zmm6
+## SAE
+	 vcvttss2usi     {sae}, %xmm30, %r23
+## Masking
+	 vaddph  0x10000000(%rbp, %r24, 8), %zmm29, %zmm30{%k7}
+## Disp8memshift
+	 vcomish 254(%r25), %xmm30
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
new file mode 100644
index 00000000000..f0bff5cde21
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -0,0 +1,33 @@
+#objdump: -dw
+#name: x86-64 EVEX-promoted bad
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7e 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7f 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 e2 f9 41 91 84[ 	]+vpgatherqq \(bad\),%zmm16\{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+cd ff[ 	]+int    \$0xff
+[ 	]*[a-f0-9]+:[ 	]+62 fd 7d 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7d[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+09 60 c7[ 	]+or     %esp,-0x39\(%rax\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7d[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+28 60 c7[ 	]+.*
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7d[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+8f[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 f2 7c 09 f5[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]+62 f2 7c 28 f5[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]+62 f2 7c 8f f5[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]+62 f2 7c 18 f5[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
new file mode 100644
index 00000000000..b777e48fa41
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -0,0 +1,34 @@
+# Check Illegal prefix for 64bit EVEX-promoted instructions
+
+        .allow_index_reg
+        .text
+_start:
+	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
+	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
+	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
+	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
+	#VSIB vpgatherqq (%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
+	#(illegal value).
+	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd
+	.byte 0xff
+	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == 0b01 (illegal value).
+	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
+	#EVEX_MAP4 movbe %r23w,%ax set EVEX.a1a0 (P[17:16]) == 0b01
+        #(illegal value).
+	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
+	#EVEX_MAP4 movbe %r18w,%ax set EVEX.L'L == 0b01 (illegal value).
+	.insn EVEX.L1.66.M12.W0 0x60, %di, %ax
+	#EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1 (illegal value).
+	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax {%k7}{z}
+	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa) == 0b01
+	#(illegal value).
+	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
+	#EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L) == 0b01
+	#(illegal value).
+	.insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
+	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) == 0b1
+	#(illegal value).
+        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx {%k7}{z}
+	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
+	#(illegal value).
+	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
new file mode 100644
index 00000000000..02e811de88d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
@@ -0,0 +1,318 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F EVEX-Promoted insns (Intel disassembly)
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
new file mode 100644
index 00000000000..3a7dffc013b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
@@ -0,0 +1,318 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F EVEX-Promoted insns
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
new file mode 100644
index 00000000000..39752c27432
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
@@ -0,0 +1,314 @@
+# Check 64bit APX_F EVEX-Promoted instructions.
+
+	.text
+_start:
+	aadd	%r25d,0x123(%r31,%rax,4)
+	aadd	%r31,0x123(%r31,%rax,4)
+	aand	%r25d,0x123(%r31,%rax,4)
+	aand	%r31,0x123(%r31,%rax,4)
+	aesdec128kl	0x123(%r31,%rax,4),%xmm22
+	aesdec256kl	0x123(%r31,%rax,4),%xmm22
+	aesdecwide128kl	0x123(%r31,%rax,4)
+	aesdecwide256kl	0x123(%r31,%rax,4)
+	aesenc128kl	0x123(%r31,%rax,4),%xmm22
+	aesenc256kl	0x123(%r31,%rax,4),%xmm22
+	aesencwide128kl	0x123(%r31,%rax,4)
+	aesencwide256kl	0x123(%r31,%rax,4)
+	aor	%r25d,0x123(%r31,%rax,4)
+	aor	%r31,0x123(%r31,%rax,4)
+	axor	%r25d,0x123(%r31,%rax,4)
+	axor	%r31,0x123(%r31,%rax,4)
+	bextr	%r25d,%edx,%r10d
+	bextr	%r25d,0x123(%r31,%rax,4),%edx
+	bextr	%r31,%r15,%r11
+	bextr	%r31,0x123(%r31,%rax,4),%r15
+	blsi	%r25d,%edx
+	blsi	%r31,%r15
+	blsi	0x123(%r31,%rax,4),%r25d
+	blsi	0x123(%r31,%rax,4),%r31
+	blsmsk	%r25d,%edx
+	blsmsk	%r31,%r15
+	blsmsk	0x123(%r31,%rax,4),%r25d
+	blsmsk	0x123(%r31,%rax,4),%r31
+	blsr	%r25d,%edx
+	blsr	%r31,%r15
+	blsr	0x123(%r31,%rax,4),%r25d
+	blsr	0x123(%r31,%rax,4),%r31
+	bzhi	%r25d,%edx,%r10d
+	bzhi	%r25d,0x123(%r31,%rax,4),%edx
+	bzhi	%r31,%r15,%r11
+	bzhi	%r31,0x123(%r31,%rax,4),%r15
+	cmpbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmplxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmplxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnpxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnpxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmppxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmppxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	crc32q	%r31, %r22
+	crc32q	(%r31), %r22
+	crc32b	%r19b, %r17
+	crc32b	%r19b, %r21d
+	crc32b	(%r19),%ebx
+	crc32l	%r31d, %r23d
+	crc32l	(%r31), %r23d
+	crc32w	%r31w, %r21d
+	crc32w	(%r31),%r21d
+	crc32	%rax, %r18
+	encodekey128	%r25d,%edx
+	encodekey256	%r25d,%edx
+	enqcmd	0x123(%r31d,%eax,4),%r25d
+	enqcmd	0x123(%r31,%rax,4),%r31
+	enqcmds	0x123(%r31d,%eax,4),%r25d
+	enqcmds	0x123(%r31,%rax,4),%r31
+	invept	0x123(%r31,%rax,4),%r31
+	invpcid	0x123(%r31,%rax,4),%r31
+	invvpid	0x123(%r31,%rax,4),%r31
+	kmovb	%k5,%r25d
+	kmovb	%k5,0x123(%r31,%rax,4)
+	kmovb	%r25d,%k5
+	kmovb	0x123(%r31,%rax,4),%k5
+	kmovd	%k5,%r25d
+	kmovd	%k5,0x123(%r31,%rax,4)
+	kmovd	%r25d,%k5
+	kmovd	0x123(%r31,%rax,4),%k5
+	kmovq	%k5,%r31
+	kmovq	%k5,0x123(%r31,%rax,4)
+	kmovq	%r31,%k5
+	kmovq	0x123(%r31,%rax,4),%k5
+	kmovw	%k5,%r25d
+	kmovw	%k5,0x123(%r31,%rax,4)
+	kmovw	%r25d,%k5
+	kmovw	0x123(%r31,%rax,4),%k5
+	ldtilecfg	0x123(%r31,%rax,4)
+	movbe	%r18w,%ax
+	movbe	%r18w,0x123(%r16,%rax,4)
+	movbe	%r18w,0x123(%r31,%rax,4)
+	movbe	%r25d,%edx
+	movbe	%r25d,0x123(%r16,%rax,4)
+	movbe	%r31,%r15
+	movbe	%r31,0x123(%r16,%rax,4)
+	movbe	%r31,0x123(%r31,%rax,4)
+	movbe	0x123(%r16,%rax,4),%r31
+	movbe	0x123(%r31,%rax,4),%r18w
+	movbe	0x123(%r31,%rax,4),%r25d
+	movdir64b	0x123(%r31d,%eax,4),%r25d
+	movdir64b	0x123(%r31,%rax,4),%r31
+	movdiri	%r25d,0x123(%r31,%rax,4)
+	movdiri	%r31,0x123(%r31,%rax,4)
+	pdep	%r25d,%edx,%r10d
+	pdep	%r31,%r15,%r11
+	pdep	0x123(%r31,%rax,4),%r25d,%edx
+	pdep	0x123(%r31,%rax,4),%r31,%r15
+	pext	%r25d,%edx,%r10d
+	pext	%r31,%r15,%r11
+	pext	0x123(%r31,%rax,4),%r25d,%edx
+	pext	0x123(%r31,%rax,4),%r31,%r15
+	sha1msg1	%xmm23,%xmm22
+	sha1msg1	0x123(%r31,%rax,4),%xmm22
+	sha1msg2	%xmm23,%xmm22
+	sha1msg2	0x123(%r31,%rax,4),%xmm22
+	sha1nexte	%xmm23,%xmm22
+	sha1nexte	0x123(%r31,%rax,4),%xmm22
+	sha1rnds4	$0x7b,%xmm23,%xmm22
+	sha1rnds4	$0x7b,0x123(%r31,%rax,4),%xmm22
+	sha256msg1	%xmm23,%xmm22
+	sha256msg1	0x123(%r31,%rax,4),%xmm22
+	sha256msg2	%xmm23,%xmm22
+	sha256msg2	0x123(%r31,%rax,4),%xmm22
+	sha256rnds2	0x123(%r31,%rax,4),%xmm12
+	shlx	%r25d,%edx,%r10d
+	shlx	%r25d,0x123(%r31,%rax,4),%edx
+	shlx	%r31,%r15,%r11
+	shlx	%r31,0x123(%r31,%rax,4),%r15
+	shrx	%r25d,%edx,%r10d
+	shrx	%r25d,0x123(%r31,%rax,4),%edx
+	shrx	%r31,%r15,%r11
+	shrx	%r31,0x123(%r31,%rax,4),%r15
+	sttilecfg	0x123(%r31,%rax,4)
+	tileloadd	0x123(%r31,%rax,4),%tmm6
+	tileloaddt1	0x123(%r31,%rax,4),%tmm6
+	tilestored	%tmm6,0x123(%r31,%rax,4)
+	wrssd	%r25d,0x123(%r31,%rax,4)
+	wrssq	%r31,0x123(%r31,%rax,4)
+	wrussd	%r25d,0x123(%r31,%rax,4)
+	wrussq	%r31,0x123(%r31,%rax,4)
+
+	.intel_syntax noprefix
+	aadd	[r31+rax*4+0x123],r25d
+	aadd	[r31+rax*4+0x123],r31
+	aand	[r31+rax*4+0x123],r25d
+	aand	[r31+rax*4+0x123],r31
+	aesdec128kl	xmm22,[r31+rax*4+0x123]
+	aesdec256kl	xmm22,[r31+rax*4+0x123]
+	aesdecwide128kl	[r31+rax*4+0x123]
+	aesdecwide256kl	[r31+rax*4+0x123]
+	aesenc128kl	xmm22,[r31+rax*4+0x123]
+	aesenc256kl	xmm22,[r31+rax*4+0x123]
+	aesencwide128kl	[r31+rax*4+0x123]
+	aesencwide256kl	[r31+rax*4+0x123]
+	aor	[r31+rax*4+0x123],r25d
+	aor	[r31+rax*4+0x123],r31
+	axor	[r31+rax*4+0x123],r25d
+	axor	[r31+rax*4+0x123],r31
+	bextr	r10d,edx,r25d
+	bextr	edx, [r31+rax*4+0x123],r25d
+	bextr	r11,r15,r31
+	bextr	r15, [r31+rax*4+0x123],r31
+	blsi	edx,r25d
+	blsi	r15,r31
+	blsi	r25d, [r31+rax*4+0x123]
+	blsi	r31,  [r31+rax*4+0x123]
+	blsmsk	edx,r25d
+	blsmsk	r15,r31
+	blsmsk	r25d, [r31+rax*4+0x123]
+	blsmsk	r31,  [r31+rax*4+0x123]
+	blsr	edx,r25d
+	blsr	r15,r31
+	blsr	r25d, [r31+rax*4+0x123]
+	blsr	r31,  [r31+rax*4+0x123]
+	bzhi	r10d,edx,r25d
+	bzhi	edx, [r31+rax*4+0x123],r25d
+	bzhi	r11,r15,r31
+	bzhi	r15, [r31+rax*4+0x123],r31
+	cmpbexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpbexadd	 [r31+rax*4+0x123],r15,r31
+	cmpbxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpbxadd	 [r31+rax*4+0x123],r15,r31
+	cmplxadd	 [r31+rax*4+0x123],edx,r25d
+	cmplxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnbexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnbexadd	 [r31+rax*4+0x123],r15,r31
+	cmpnbxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnbxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnlexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnlexadd	 [r31+rax*4+0x123],r15,r31
+	cmpnlxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnlxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnoxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnoxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnpxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnpxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnsxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnsxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnzxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnzxadd	 [r31+rax*4+0x123],r15,r31
+	cmpoxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpoxadd	 [r31+rax*4+0x123],r15,r31
+	cmppxadd	 [r31+rax*4+0x123],edx,r25d
+	cmppxadd	 [r31+rax*4+0x123],r15,r31
+	cmpsxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpsxadd	 [r31+rax*4+0x123],r15,r31
+	cmpzxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpzxadd	 [r31+rax*4+0x123],r15,r31
+	crc32	r22,r31
+	crc32	r22,QWORD PTR [r31]
+	crc32	r17,r19b
+	crc32	r21d,r19b
+	crc32	ebx,BYTE PTR [r19]
+	crc32	r23d,r31d
+	crc32	r23d,DWORD PTR [r31]
+	crc32	r21d,r31w
+	crc32	r21d,WORD PTR [r31]
+	crc32	r18,rax
+	encodekey128	edx,r25d
+	encodekey256	edx,r25d
+	enqcmd	r25d,[r31d+eax*4+0x123]
+	enqcmd	r31,[r31+rax*4+0x123]
+	enqcmds	r25d,[r31d+eax*4+0x123]
+	enqcmds	r31,[r31+rax*4+0x123]
+	invept	r31,OWORD PTR [r31+rax*4+0x123]
+	invpcid	r31,[r31+rax*4+0x123]
+	invvpid	r31,OWORD PTR [r31+rax*4+0x123]
+	kmovb	r25d,k5
+	kmovb	BYTE PTR [r31+rax*4+0x123],k5
+	kmovb	k5,r25d
+	kmovb	k5,BYTE PTR [r31+rax*4+0x123]
+	kmovd	r25d,k5
+	kmovd	DWORD PTR [r31+rax*4+0x123],k5
+	kmovd	k5,r25d
+	kmovd	k5,DWORD PTR [r31+rax*4+0x123]
+	kmovq	r31,k5
+	kmovq	QWORD PTR [r31+rax*4+0x123],k5
+	kmovq	k5,r31
+	kmovq	k5,QWORD PTR [r31+rax*4+0x123]
+	kmovw	r25d,k5
+	kmovw	WORD PTR [r31+rax*4+0x123],k5
+	kmovw	k5,r25d
+	kmovw	k5,WORD PTR [r31+rax*4+0x123]
+	ldtilecfg	[r31+rax*4+0x123]
+	movbe	ax,r18w
+	movbe	WORD PTR [r16+rax*4+0x123],r18w
+	movbe	WORD PTR [r31+rax*4+0x123],r18w
+	movbe	edx,r25d
+	movbe	DWORD PTR [r16+rax*4+0x123],r25d
+	movbe	r15,r31
+	movbe	QWORD PTR [r16+rax*4+0x123],r31
+	movbe	QWORD PTR [r31+rax*4+0x123],r31
+	movbe	r31,QWORD PTR [r16+rax*4+0x123]
+	movbe	r18w,WORD PTR [r31+rax*4+0x123]
+	movbe	r25d,DWORD PTR [r31+rax*4+0x123]
+	movdir64b	r25d,[r31d+eax*4+0x123]
+	movdir64b	r31,[r31+rax*4+0x123]
+	movdiri	DWORD PTR [r31+rax*4+0x123],r25d
+	movdiri	QWORD PTR [r31+rax*4+0x123],r31
+	pdep	r10d,edx,r25d
+	pdep	r11,r15,r31
+	pdep	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pdep	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	pext	r10d,edx,r25d
+	pext	r11,r15,r31
+	pext	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pext	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	sha1msg1	xmm22,xmm23
+	sha1msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1msg2	xmm22,xmm23
+	sha1msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1nexte	xmm22,xmm23
+	sha1nexte	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1rnds4	xmm22,xmm23,0x7b
+	sha1rnds4	xmm22,XMMWORD PTR [r31+rax*4+0x123],0x7b
+	sha256msg1	xmm22,xmm23
+	sha256msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256msg2	xmm22,xmm23
+	sha256msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256rnds2	xmm12,XMMWORD PTR [r31+rax*4+0x123]
+	shlx	r10d,edx,r25d
+	shlx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shlx	r11,r15,r31
+	shlx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	shrx	r10d,edx,r25d
+	shrx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shrx	r11,r15,r31
+	shrx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	sttilecfg	[r31+rax*4+0x123]
+	tileloadd	tmm6,[r31+rax*4+0x123]
+	tileloaddt1	tmm6,[r31+rax*4+0x123]
+	tilestored	[r31+rax*4+0x123],tmm6
+	wrssd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrssq	QWORD PTR [r31+rax*4+0x123],r31
+	wrussd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrussq	QWORD PTR [r31+rax*4+0x123],r31
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 7a1fef58735..f1931b510a1 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -364,7 +364,12 @@ run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
 run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-evex-promoted-bad"
+run_list_test "x86-64-apx-egpr-promote-inval" "-al"
 run_dump_test "x86-64-apx-rex2"
+run_dump_test "x86-64-apx-evex-promoted"
+run_dump_test "x86-64-apx-evex-promoted-intel"
+run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 5/9] Support APX NDD
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (3 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 4/9] Add tests for " Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 6/9] Support APX Push2/Pop2 Cui, Lili
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich, konglin1

From: konglin1 <lingling.kong@intel.com>

opcodes/ChangeLog:

	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83,  REG_EVEX_MAP4_F6,
	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
	* opcodes/i386-dis-evex.h: Add NDD insn.
	* opcodes/i386-dis.c (nd): New define.
	(VexGb): Ditto.
	(VexGv): Ditto.
	(get_valid_dis386): Change for NDD decode.
	(print_insn): Ditto.
	(putop): Ditto.
	(intel_operand_size): Ditto.
	(OP_E_memory): Ditto.
	(OP_VEX): Ditto.
	* opcodes/i386-opc.h (VexVVVV_DST): New.
	* opcodes/i386-opc.tbl: Add APX NDD instructions and adjust VexVVVV.
	* opcodes/i386-tbl.h: Regenerated.

gas/ChangeLog:

	* gas/config/tc-i386.c (operand_size_match):
	Support APX NDD that the number of operands is 3.
	(build_apx_evex_prefix): Change for ndd encode.
	(process_operands): Ditto.
	(build_modrm_byte): Ditto.
	(match_template): Support swap the first two operands for
	APX NDD.
	* testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
	* testsuite/gas/i386/x86-64-apx-ndd.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d : Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s : Ditto.
---
 gas/config/tc-i386.c                          |  62 +++++--
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |   3 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |   2 +
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  42 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  43 +++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 opcodes/i386-dis-evex-reg.h                   |  54 ++++++
 opcodes/i386-dis-evex.h                       | 124 ++++++-------
 opcodes/i386-dis.c                            | 171 +++++++++++-------
 opcodes/i386-opc.h                            |   6 +-
 opcodes/i386-opc.tbl                          |  75 ++++++++
 13 files changed, 753 insertions(+), 145 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 25cfacce138..4a3dd5e96ca 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2234,8 +2234,10 @@ operand_size_match (const insn_template *t)
       unsigned int given = i.operands - j - 1;
 
       /* For FMA4 and XOP insns VEX.W controls just the first two
-	 register operands.  */
-      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
+	 register operands. And APX_F insns just swap the two source operands,
+	 with the 3rd one being the destination.  */
+      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
+	  || is_cpu (t, CpuAPX_F))
 	given = j < 2 ? 1 - j : j;
 
       if (t->operand_types[j].bitfield.class == Reg
@@ -4190,6 +4192,11 @@ build_apx_evex_prefix (void)
   if (i.vex.register_specifier
       && i.vex.register_specifier->reg_flags & RegRex2)
     i.vex.bytes[3] &= ~0x08;
+
+  /* Encode the NDD bit of the instruction promoted from the legacy
+     space.  */
+  if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
+    i.vex.bytes[3] |= 0x10;
 }
 
 static void
@@ -7469,18 +7476,22 @@ match_template (char mnem_suffix)
 	     - the store form is requested, and the template is a load form,
 	     - the non-default (swapped) form is requested.  */
 	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
+
+	  j = i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4
+				&& t->opcode_modifier.vexvvvv);
+
 	  if (t->opcode_modifier.d && i.reg_operands == i.operands
 	      && !operand_type_all_zero (&overlap1))
 	    switch (i.dir_encoding)
 	      {
 	      case dir_encoding_load:
-		if (operand_type_check (operand_types[i.operands - 1], anymem)
+		if (operand_type_check (operand_types[j], anymem)
 		    || t->opcode_modifier.regmem)
 		  goto check_reverse;
 		break;
 
 	      case dir_encoding_store:
-		if (!operand_type_check (operand_types[i.operands - 1], anymem)
+		if (!operand_type_check (operand_types[j], anymem)
 		    && !t->opcode_modifier.regmem)
 		  goto check_reverse;
 		break;
@@ -7491,6 +7502,7 @@ match_template (char mnem_suffix)
 	      case dir_encoding_default:
 		break;
 	      }
+
 	  /* If we want store form, we skip the current load.  */
 	  if ((i.dir_encoding == dir_encoding_store
 	       || i.dir_encoding == dir_encoding_swap)
@@ -7520,11 +7532,13 @@ match_template (char mnem_suffix)
 		continue;
 	      /* Try reversing direction of operands.  */
 	      j = is_cpu (t, CpuFMA4)
-		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
+		  || is_cpu (t, CpuXOP)
+		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
 	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
 	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
 	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
-	      gas_assert (t->operands != 3 || !check_register);
+	      gas_assert (t->operands != 3 || !check_register
+			  || is_cpu (t, CpuAPX_F));
 	      if (!operand_type_match (overlap0, i.types[0])
 		  || !operand_type_match (overlap1, i.types[j])
 		  || (t->operands == 3
@@ -7559,6 +7573,11 @@ match_template (char mnem_suffix)
 		  found_reverse_match = Opcode_VexW;
 		  goto check_operands_345;
 		}
+	      else if (is_cpu (t, CpuAPX_F) && i.operands == 3)
+		{
+		  found_reverse_match = Opcode_D;
+		  goto check_operands_345;
+		}
 	      else if (t->opcode_space != SPACE_BASE
 		       && (t->opcode_space != SPACE_0F
 			   /* MOV to/from CR/DR/TR, as an exception, follow
@@ -7740,6 +7759,9 @@ match_template (char mnem_suffix)
 
       i.tm.base_opcode ^= found_reverse_match;
 
+      if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	goto swap_first_2;
+
       /* Certain SIMD insns have their load forms specified in the opcode
 	 table, and hence we need to _set_ RegMem instead of clearing it.
 	 We need to avoid setting the bit though on insns like KMOVW.  */
@@ -7759,6 +7781,7 @@ match_template (char mnem_suffix)
 	 flipping VEX.W.  */
       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
 
+    swap_first_2:
       j = i.tm.operand_types[0].bitfield.imm8;
       i.tm.operand_types[j] = operand_types[j + 1];
       i.tm.operand_types[j + 1] = operand_types[j];
@@ -8577,12 +8600,9 @@ process_operands (void)
      unnecessary segment overrides.  */
   const reg_entry *default_seg = NULL;
 
-  /* We only need to check those implicit registers for instructions
-     with 3 operands or less.  */
-  if (i.operands <= 3)
-    for (unsigned int j = 0; j < i.operands; j++)
-      if (i.types[j].bitfield.instance != InstanceNone)
-	i.reg_operands--;
+  for (unsigned int j = 0; j < i.operands; j++)
+    if (i.types[j].bitfield.instance != InstanceNone)
+      i.reg_operands--;
 
   if (i.tm.opcode_modifier.sse2avx)
     {
@@ -8936,11 +8956,19 @@ build_modrm_byte (void)
 				     || i.vec_encoding == vex_encoding_evex));
     }
 
-  for (v = source + 1; v < dest; ++v)
-    if (v != reg_slot)
-      break;
-  if (v >= dest)
-    v = ~0;
+  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
+    {
+      v = dest;
+      dest-- ;
+    }
+  else
+    {
+      for (v = source + 1; v < dest; ++v)
+	if (v != reg_slot)
+	  break;
+      if (v >= dest)
+	v = ~0;
+    }
   if (i.tm.extension_opcode != None)
     {
       if (dest != source)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index f0bff5cde21..e0b14e30178 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -31,3 +31,6 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
 [ 	]*[a-f0-9]+:[ 	]+62 f2 7c 18 f5[ 	]+\(bad\)
 [ 	]*[a-f0-9]+:[ 	]+0c 18[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]+62 f4 e4[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+08 ff[ 	]+.*
+[ 	]*[a-f0-9]+:[ 	]+04 08[ 	]+.*
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index b777e48fa41..9a08a45eb76 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -32,3 +32,5 @@ _start:
 	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
 	#(illegal value).
 	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx
+	#{evex} inc %rax %rbx EVEX.vvvv != 1111 && EVEX.ND = 0.
+	.insn EVEX.L0.NP.M4.W1 0xff/0, (%rax,%rcx), %rbx
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.d b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
new file mode 100644
index 00000000000..73410606ce3
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
@@ -0,0 +1,160 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX NDD instructions with evex prefix encoding
+#source: x86-64-apx-ndd.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d0 34 12 	adc    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 12 04 07 	adc    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 13 04 07 	adc    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 14 83 11 	adcl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 54 6d 10 66 c7    	adcx   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f 	adcx   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 14 69 10 66 04 3f 	adcx   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c0 34 12 	add    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 c7 33 44 34 12 	add    \$0x12344433,%r15,%r16
+\s*[a-f0-9]+:\s*62 d4 74 10 80 c5 34 	add    \$0x34,%r13b,%r17b
+\s*[a-f0-9]+:\s*62 f4 bc 18 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%rax,%r8
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+\s*[a-f0-9]+:\s*62 44 f8 10 01 3c c0 	add    %r31,\(%r8,%r16,8\),%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8    	add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8    	add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8    	add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 5c fc 10 03 07    	add    \(%r31\),%r8,%r16
+\s*[a-f0-9]+:\s*62 5c f8 10 03 84 07 90 90 00 00 	add    0x9090\(%r31,%r16,1\),%r8,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8    	add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8    	add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 04 83 11 	addl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 04 8f 33 44 34 12 	addq   \$0x12344433,\(%r15,%rcx,4\),%r16
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8    	add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 54 6e 10 66 c7    	adox   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f 	adox   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 14 6a 10 66 04 3f 	adox   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e0 34 12 	and    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 22 04 07 	and    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 23 04 07 	and    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 24 83 11 	andl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 f4 f4 10 ff c8    	dec    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 fe 0c 27 	decb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 b4 b0 10 af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx,%r25
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 dc fc 10 ff c7    	inc    %r31,%r16
+\s*[a-f0-9]+:\s*62 dc bc 18 ff c7    	inc    %r31,%r8
+\s*[a-f0-9]+:\s*62 f4 e4 18 ff c0    	inc    %rax,%rbx
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d8    	neg    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 1c 27 	negb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d0    	not    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 14 27 	notb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c8 34 12 	or     \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 0a 04 07 	or     \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 0b 04 07 	or     \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 0c 83 11 	orl    \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 d4 02 	rcl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d0    	rcl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 10    	rclb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 10 02 	rcll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 10    	rclw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 14 83 	rclw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 dc 02 	rcr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d8    	rcr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 18    	rcrb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 18 02 	rcrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 18    	rcrw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 1c 83 	rcrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 c4 02 	rol    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c0    	rol    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 00    	rolb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 00 02 	roll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 00    	rolw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 04 83 	rolw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 cc 02 	ror    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c8    	ror    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 08    	rorb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 08 02 	rorl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 08    	rorw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 0c 83 	rorw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 fc 02 	sar    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 f8    	sar    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 38    	sarb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 38 02 	sarl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 38    	sarw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 3c 83 	sarw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d8 34 12 	sbb    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 1a 04 07 	sbb    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 1b 04 07 	sbb    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 1c 83 11 	sbbl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 74 84 10 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 74 04 10 24 38 02 	shld   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 54 05 10 24 c4 02 	shld   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 a5 e0    	shld   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 a5 2c 83 	shld   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 05 10 a5 08    	shld   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 ec 02 	shr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e8    	shr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 28    	shrb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 74 84 10 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 74 04 10 2c 38 02 	shrd   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 54 05 10 2c c4 02 	shrd   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 ad e0    	shrd   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 ad 2c 83 	shrd   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 05 10 ad 08    	shrd   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 28 02 	shrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 28    	shrw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 2c 83 	shrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e8 34 12 	sub    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 2a 04 07 	sub    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 2b 04 07 	sub    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 2c 83 11 	subl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 f0 34 12 	xor    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 32 04 07 	xor    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 33 04 07 	xor    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 34 83 11 	xorl   \$0x11,\(%r19,%rax,4\),%r20d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.s b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
new file mode 100644
index 00000000000..4e248f737a9
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
@@ -0,0 +1,155 @@
+# Check 64bit APX NDD instructions with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+	adc    $0x1234,%ax,%r30w
+	adc    %r15b,%r17b,%r18b
+	adc    %r15d,(%r8),%r18d
+	adc    (%r15,%rax,1),%r16b,%r8b
+	adc    (%r15,%rax,1),%r16w,%r8w
+	adcl   $0x11,(%r19,%rax,4),%r20d
+	adcx   %r15d,%r8d,%r18d
+	adcx   (%r15,%r31,1),%r8
+	adcx   (%r15,%r31,1),%r8d,%r18d
+	add    $0x1234,%ax,%r30w
+	add    $0x12344433,%r15,%r16
+	add    $0x34,%r13b,%r17b
+	add    $0xfffffffff4332211,%rax,%r8
+	add    %r31,%r8,%r16
+	add    %r31,(%r8),%r16
+	add    %r31,(%r8,%r16,8),%r16
+	add    %r31b,%r8b,%r16b
+	add    %r31d,%r8d,%r16d
+	add    %r31w,%r8w,%r16w
+	add    (%r31),%r8,%r16
+	add    0x9090(%r31,%r16,1),%r8,%r16
+	addb   %r31b,%r8b,%r16b
+	addl   %r31d,%r8d,%r16d
+	addl   $0x11,(%r19,%rax,4),%r20d
+	addq   %r31,%r8,%r16
+	addq   $0x12344433,(%r15,%rcx,4),%r16
+	addw   %r31w,%r8w,%r16w
+	adox   %r15d,%r8d,%r18d
+	{load}  add    %r31,%r8,%r16
+	{store} add    %r31,%r8,%r16
+	adox   (%r15,%r31,1),%r8
+	adox   (%r15,%r31,1),%r8d,%r18d
+	and    $0x1234,%ax,%r30w
+	and    %r15b,%r17b,%r18b
+	and    %r15d,(%r8),%r18d
+	and    (%r15,%rax,1),%r16b,%r8b
+	and    (%r15,%rax,1),%r16w,%r8w
+	andl   $0x11,(%r19,%rax,4),%r20d
+	cmova  0x90909090(%eax),%edx,%r8d
+	cmovae 0x90909090(%eax),%edx,%r8d
+	cmovb  0x90909090(%eax),%edx,%r8d
+	cmovbe 0x90909090(%eax),%edx,%r8d
+	cmove  0x90909090(%eax),%edx,%r8d
+	cmovg  0x90909090(%eax),%edx,%r8d
+	cmovge 0x90909090(%eax),%edx,%r8d
+	cmovl  0x90909090(%eax),%edx,%r8d
+	cmovle 0x90909090(%eax),%edx,%r8d
+	cmovne 0x90909090(%eax),%edx,%r8d
+	cmovno 0x90909090(%eax),%edx,%r8d
+	cmovnp 0x90909090(%eax),%edx,%r8d
+	cmovns 0x90909090(%eax),%edx,%r8d
+	cmovo  0x90909090(%eax),%edx,%r8d
+	cmovp  0x90909090(%eax),%edx,%r8d
+	cmovs  0x90909090(%eax),%edx,%r8d
+	dec    %rax,%r17
+	decb   (%r31,%r12,1),%r8b
+	imul   0x909(%rax,%r31,8),%rdx,%r25
+	imul   0x90909(%eax),%edx,%r8d
+	inc    %r31,%r16
+	inc    %r31,%r8
+	inc    %rax,%rbx
+	neg    %rax,%r17
+	negb   (%r31,%r12,1),%r8b
+	not    %rax,%r17
+	notb   (%r31,%r12,1),%r8b
+	or     $0x1234,%ax,%r30w
+	or     %r15b,%r17b,%r18b
+	or     %r15d,(%r8),%r18d
+	or     (%r15,%rax,1),%r16b,%r8b
+	or     (%r15,%rax,1),%r16w,%r8w
+	orl    $0x11,(%r19,%rax,4),%r20d
+	rcl    $0x2,%r12b,%r31b
+	rcl    %cl,%r16b,%r8b
+	rclb   $0x1,(%rax),%r31b
+	rcll   $0x2,(%rax),%r31d
+	rclw   $0x1,(%rax),%r31w
+	rclw   %cl,(%r19,%rax,4),%r31w
+	rcr    $0x2,%r12b,%r31b
+	rcr    %cl,%r16b,%r8b
+	rcrb   $0x1,(%rax),%r31b
+	rcrl   $0x2,(%rax),%r31d
+	rcrw   $0x1,(%rax),%r31w
+	rcrw   %cl,(%r19,%rax,4),%r31w
+	rol    $0x2,%r12b,%r31b
+	rol    %cl,%r16b,%r8b
+	rolb   $0x1,(%rax),%r31b
+	roll   $0x2,(%rax),%r31d
+	rolw   $0x1,(%rax),%r31w
+	rolw   %cl,(%r19,%rax,4),%r31w
+	ror    $0x2,%r12b,%r31b
+	ror    %cl,%r16b,%r8b
+	rorb   $0x1,(%rax),%r31b
+	rorl   $0x2,(%rax),%r31d
+	rorw   $0x1,(%rax),%r31w
+	rorw   %cl,(%r19,%rax,4),%r31w
+	sar    $0x2,%r12b,%r31b
+	sar    %cl,%r16b,%r8b
+	sarb   $0x1,(%rax),%r31b
+	sarl   $0x2,(%rax),%r31d
+	sarw   $0x1,(%rax),%r31w
+	sarw   %cl,(%r19,%rax,4),%r31w
+	sbb    $0x1234,%ax,%r30w
+	sbb    %r15b,%r17b,%r18b
+	sbb    %r15d,(%r8),%r18d
+	sbb    (%r15,%rax,1),%r16b,%r8b
+	sbb    (%r15,%rax,1),%r16w,%r8w
+	sbbl   $0x11,(%r19,%rax,4),%r20d
+	shl    $0x2,%r12b,%r31b
+	shl    $0x2,%r12b,%r31b
+	shl    %cl,%r16b,%r8b
+	shl    %cl,%r16b,%r8b
+	shlb   $0x1,(%rax),%r31b
+	shlb   $0x1,(%rax),%r31b
+	shld   $0x1,%r12,(%rax),%r31
+	shld   $0x2,%r15d,(%rax),%r31d
+	shld   $0x2,%r8w,%r12w,%r31w
+	shld   %cl,%r12,%r16,%r8
+	shld   %cl,%r13w,(%r19,%rax,4),%r31w
+	shld   %cl,%r9w,(%rax),%r31w
+	shll   $0x2,(%rax),%r31d
+	shll   $0x2,(%rax),%r31d
+	shlw   $0x1,(%rax),%r31w
+	shlw   $0x1,(%rax),%r31w
+	shlw   %cl,(%r19,%rax,4),%r31w
+	shlw   %cl,(%r19,%rax,4),%r31w
+	shr    $0x2,%r12b,%r31b
+	shr    %cl,%r16b,%r8b
+	shrb   $0x1,(%rax),%r31b
+	shrd   $0x1,%r12,(%rax),%r31
+	shrd   $0x2,%r15d,(%rax),%r31d
+	shrd   $0x2,%r8w,%r12w,%r31w
+	shrd   %cl,%r12,%r16,%r8
+	shrd   %cl,%r13w,(%r19,%rax,4),%r31w
+	shrd   %cl,%r9w,(%rax),%r31w
+	shrl   $0x2,(%rax),%r31d
+	shrw   $0x1,(%rax),%r31w
+	shrw   %cl,(%r19,%rax,4),%r31w
+	sub    $0x1234,%ax,%r30w
+	sub    %r15b,%r17b,%r18b
+	sub    %r15d,(%r8),%r18d
+	sub    (%r15,%rax,1),%r16b,%r8b
+	sub    (%r15,%rax,1),%r16w,%r8w
+	subl   $0x11,(%r19,%rax,4),%r20d
+	xor    $0x1234,%ax,%r30w
+	xor    %r15b,%r17b,%r18b
+	xor    %r15d,(%r8),%r18d
+	xor    (%r15,%rax,1),%r16b,%r8b
+	xor    (%r15,%rax,1),%r16w,%r8w
+	xorl   $0x11,(%r19,%rax,4),%r20d
+
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 9c45851cc73..976264c27e9 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -137,6 +137,48 @@ Disassembly of section .text:
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
  +[a-f0-9]+:	31 07                	xor    %eax,\(%rdi\)
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 2a cf    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 1a cf    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 22 cf    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 0a cf    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 32 cf    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 12 cf    	adc    %r15b,%r17b,%r18b
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
  +[a-f0-9]+:	b8 45 03 00 00       	mov    \$0x345,%eax
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index a4582cfa6f1..f958fa65c72 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -134,6 +134,49 @@ _start:
 	{load} xor (%rdi), %eax
 	{store} xor %eax, (%rdi)
 	{store} xor (%rdi), %eax
+	{load}  add    %r31,(%r8),%r16
+	{load}	add    (%r8),%r31,%r16
+	{store} add    %r31,(%r8),%r16
+	{store}	add    (%r8),%r31,%r16
+	{load} 	sub    %r15d,(%r8),%r18d
+	{load}	sub    (%r8),%r15d,%r18d
+	{store} sub    %r15d,(%r8),%r18d
+	{store} sub    (%r8),%r15d,%r18d
+	{load} 	sbb    %r15d,(%r8),%r18d
+	{load}	sbb    (%r8),%r15d,%r18d
+	{store} sbb    %r15d,(%r8),%r18d
+	{store} sbb    (%r8),%r15d,%r18d
+	{load} 	and    %r15d,(%r8),%r18d
+	{load}	and    (%r8),%r15d,%r18d
+	{store} and    %r15d,(%r8),%r18d
+	{store} and    (%r8),%r15d,%r18d
+	{load} 	or     %r15d,(%r8),%r18d
+	{load}	or     (%r8),%r15d,%r18d
+	{store} or     %r15d,(%r8),%r18d
+	{store} or     (%r8),%r15d,%r18d
+	{load} 	xor    %r15d,(%r8),%r18d
+	{load}	xor    (%r8),%r15d,%r18d
+	{store} xor    %r15d,(%r8),%r18d
+	{store} xor    (%r8),%r15d,%r18d
+	{load} 	adc    %r15d,(%r8),%r18d
+	{load}	adc    (%r8),%r15d,%r18d
+	{store} adc    %r15d,(%r8),%r18d
+	{store} adc    (%r8),%r15d,%r18d
+
+	{store} add    %r31,%r8,%r16
+	{load}  add    %r31,%r8,%r16
+	{store} sub    %r15b,%r17b,%r18b
+	{load}	sub    %r15b,%r17b,%r18b
+	{store}	sbb    %r15b,%r17b,%r18b
+	{load}	sbb    %r15b,%r17b,%r18b
+	{store}	and    %r15b,%r17b,%r18b
+	{load}	and    %r15b,%r17b,%r18b
+	{store}	or     %r15b,%r17b,%r18b
+	{load}	or     %r15b,%r17b,%r18b
+	{store}	xor    %r15b,%r17b,%r18b
+	{load}	xor    %r15b,%r17b,%r18b
+	{store}	adc    %r15b,%r17b,%r18b
+	{load}	adc    %r15b,%r17b,%r18b
 
 	.irp m, mov, adc, add, and, cmp, or, sbb, sub, test, xor
 	\m	$0x12, %al
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index f1931b510a1..5e2e302b22a 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -370,6 +370,7 @@ run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
+run_dump_test "x86-64-apx-ndd"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 8374f0ea93a..4dc736253b1 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -56,3 +56,57 @@
     { "blsmskS",	{ VexGdq, Edq }, 0 },
     { "blsiS",	{ VexGdq, Edq }, 0 },
   },
+  /* REG_EVEX_MAP4_80 */
+  {
+    { "addA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "orA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "adcA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "sbbA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "andA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "subA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "xorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_81 */
+  {
+    { "addQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "orQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "adcQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "sbbQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "andQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "subQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "xorQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_83 */
+  {
+    { "addQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "orQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "adcQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "sbbQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "andQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "subQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "xorQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_F6 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notA",	{ VexGb, Eb }, NO_PREFIX },
+    { "negA",	{ VexGb, Eb }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_F7 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { "negQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_FE */
+  {
+    { "incA",	{ VexGb, Eb }, NO_PREFIX },
+    { "decA",	{ VexGb, Eb }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_FF */
+  {
+    { "incQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { "decQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 90c063b2188..a8a891d7f0e 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -875,64 +875,64 @@ static const struct dis386 evex_table[][256] = {
   /* EVEX_MAP4_ */
   {
     /* 00 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "addB",             { VexGb, Eb, Gb }, NO_PREFIX },
+    { "addS",             { VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "addB",             { VexGb, Gb, EbS }, NO_PREFIX },
+    { "addS",             { VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 08 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "orB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "orS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "orB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "orS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 10 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "adcB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "adcS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "adcB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "adcS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 18 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "sbbB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "sbbS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "sbbB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "sbbS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 20 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "andB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "andS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "andB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "andS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
+    { "shldS",		{ VexGv, Ev, Gv, Ib }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 28 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "subB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "subS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "subB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "subS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
+    { "shrdS",		{ VexGv, Ev, Gv, Ib }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 30 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "xorB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "xorS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "xorB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "xorS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 40 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "%CFcmovoS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnoS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovbS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovaeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmoveS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovneS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovbeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovaS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* 48 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "%CFcmovsS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnsS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovpS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnpS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovlS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovgeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovleS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovgS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* 50 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1019,10 +1019,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 80 */
+    { REG_TABLE (REG_EVEX_MAP4_80) },
+    { REG_TABLE (REG_EVEX_MAP4_81) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_83) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1060,7 +1060,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "shldS",	{ VexGv, Ev, Gv, CL }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     /* A8 */
@@ -1069,9 +1069,9 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
+    { "shrdS",	{ VexGv, Ev, Gv, CL }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "imulS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* B0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1091,8 +1091,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* C0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_C0) },
+    { REG_TABLE (REG_C1) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1109,10 +1109,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* D0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_D0) },
+    { REG_TABLE (REG_D1) },
+    { REG_TABLE (REG_D2) },
+    { REG_TABLE (REG_D3) },
     { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1151,8 +1151,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_F6) },
+    { REG_TABLE (REG_EVEX_MAP4_F7) },
     /* F8 */
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
     { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
@@ -1160,8 +1160,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_FE) },
+    { REG_TABLE (REG_EVEX_MAP4_FF) },
   },
   /* EVEX_MAP5_ */
   {
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 5b6f063d016..50274e39ba6 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -226,6 +226,9 @@ struct instr_info
   }
   vex;
 
+/* For APX EVEX-promoted prefix, EVEX.ND shares the same bit as vex.b.  */
+#define nd b
+
   enum evex_type evex_type;
 
   /* Remember if the current op is a jump instruction.  */
@@ -580,6 +583,8 @@ fetch_error (const instr_info *ins)
 #define VexGatherD { OP_VEX, vex_vsib_d_w_dq_mode }
 #define VexGatherQ { OP_VEX, vex_vsib_q_w_dq_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexGb { OP_VEX, b_mode }
+#define VexGv { OP_VEX, v_mode }
 #define VexTmm { OP_VEX, tmm_mode }
 #define XMVexI4 { OP_REG_VexI4, x_mode }
 #define XMVexScalarI4 { OP_REG_VexI4, scalar_mode }
@@ -895,6 +900,13 @@ enum
   REG_EVEX_0F38C6_L_2,
   REG_EVEX_0F38C7_L_2,
   REG_EVEX_0F38F3_L_0_P_0,
+  REG_EVEX_MAP4_80,
+  REG_EVEX_MAP4_81,
+  REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_F6,
+  REG_EVEX_MAP4_F7,
+  REG_EVEX_MAP4_FE,
+  REG_EVEX_MAP4_FF,
 };
 
 enum
@@ -2606,25 +2618,25 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_C0 */
   {
-    { "rolA",	{ Eb, Ib }, 0 },
-    { "rorA",	{ Eb, Ib }, 0 },
-    { "rclA",	{ Eb, Ib }, 0 },
-    { "rcrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "shrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "sarA",	{ Eb, Ib }, 0 },
+    { "rolA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, Ib }, NO_PREFIX },
   },
   /* REG_C1 */
   {
-    { "rolQ",	{ Ev, Ib }, 0 },
-    { "rorQ",	{ Ev, Ib }, 0 },
-    { "rclQ",	{ Ev, Ib }, 0 },
-    { "rcrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "shrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "sarQ",	{ Ev, Ib }, 0 },
+    { "rolQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
   },
   /* REG_C6 */
   {
@@ -2650,47 +2662,47 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_D0 */
   {
-    { "rolA",	{ Eb, I1 }, 0 },
-    { "rorA",	{ Eb, I1 }, 0 },
-    { "rclA",	{ Eb, I1 }, 0 },
-    { "rcrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "shrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "sarA",	{ Eb, I1 }, 0 },
+    { "rolA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, I1 }, NO_PREFIX },
   },
   /* REG_D1 */
   {
-    { "rolQ",	{ Ev, I1 }, 0 },
-    { "rorQ",	{ Ev, I1 }, 0 },
-    { "rclQ",	{ Ev, I1 }, 0 },
-    { "rcrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "shrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "sarQ",	{ Ev, I1 }, 0 },
+    { "rolQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
   },
   /* REG_D2 */
   {
-    { "rolA",	{ Eb, CL }, 0 },
-    { "rorA",	{ Eb, CL }, 0 },
-    { "rclA",	{ Eb, CL }, 0 },
-    { "rcrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "shrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "sarA",	{ Eb, CL }, 0 },
+    { "rolA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, CL }, NO_PREFIX },
   },
   /* REG_D3 */
   {
-    { "rolQ",	{ Ev, CL }, 0 },
-    { "rorQ",	{ Ev, CL }, 0 },
-    { "rclQ",	{ Ev, CL }, 0 },
-    { "rcrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "shrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "sarQ",	{ Ev, CL }, 0 },
+    { "rolQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
   },
   /* REG_F6 */
   {
@@ -3640,8 +3652,8 @@ static const struct dis386 prefix_table[][4] = {
   /* PREFIX_0F38F6 */
   {
     { "wrssK",	{ M, Gdq }, 0 },
-    { "adoxS",	{ Gdq, Edq}, 0 },
-    { "adcxS",	{ Gdq, Edq}, 0 },
+    { "adoxS",	{ VexGdq, Gdq, Edq}, 0 },
+    { "adcxS",	{ VexGdq, Gdq, Edq}, 0 },
     { Bad_Opcode },
   },
 
@@ -9127,6 +9139,12 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	  ins->rex2 &= ~REX_R;
 	}
 
+      /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
+	 all bits of EVEX.vvvv and EVEX.V' must be 1.  */
+      if (ins->evex_type == evex_from_legacy && !ins->vex.nd
+	  && (ins->vex.register_specifier || !ins->vex.v))
+	return &bad_opcode;
+
       ins->need_vex = 4;
 
       /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
@@ -9144,8 +9162,10 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_modrm (ins))
 	return &err_opcode;
 
-      /* Set vector length.  */
-      if (ins->modrm.mod == 3 && ins->vex.b)
+      /* Set vector length. For EVEX-promoted instructions, evex.ll == 0b00,
+	 which has the same encoding as vex.length == 128 and they can share
+	 the same processing with vex.length in OP_VEX.  */
+      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type != evex_from_legacy)
 	ins->vex.length = 512;
       else
 	{
@@ -9612,8 +9632,8 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	    }
 
 	  /* Check whether rounding control was enabled for an insn not
-	     supporting it.  */
-	  if (ins.modrm.mod == 3 && ins.vex.b
+	     supporting it, when evex.b is not treated as evex.nd.  */
+	  if (ins.modrm.mod == 3 && ins.vex.b && ins.evex_type == evex_default
 	      && !(ins.evex_used & EVEX_b_used))
 	    {
 	      for (i = 0; i < MAX_OPERANDS; ++i)
@@ -10506,16 +10526,23 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 	  ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
 	  break;
 	case 'F':
-	  if (ins->intel_syntax)
-	    break;
-	  if ((ins->prefixes & PREFIX_ADDR) || (sizeflag & SUFFIX_ALWAYS))
+	  if (l == 0)
 	    {
-	      if (sizeflag & AFLAG)
-		*ins->obufp++ = ins->address_mode == mode_64bit ? 'q' : 'l';
-	      else
-		*ins->obufp++ = ins->address_mode == mode_64bit ? 'l' : 'w';
-	      ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
+	      if (ins->intel_syntax)
+		break;
+	      if ((ins->prefixes & PREFIX_ADDR) || (sizeflag & SUFFIX_ALWAYS))
+		{
+		  if (sizeflag & AFLAG)
+		    *ins->obufp++ = ins->address_mode == mode_64bit ? 'q' : 'l';
+		  else
+		    *ins->obufp++ = ins->address_mode == mode_64bit ? 'l' : 'w';
+		  ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
+		}
 	    }
+	  else if (l == 1 && last[0] == 'C')
+	    break;
+	  else
+	    abort ();
 	  break;
 	case 'G':
 	  if (ins->intel_syntax || (ins->obufp[-1] != 's'
@@ -11079,7 +11106,8 @@ print_displacement (instr_info *ins, bfd_signed_vma val)
 static void
 intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.b)
+  /* Check if there is a broadcast, when evex.b is not treated as evex.nd.  */
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       if (!ins->vex.no_broadcast)
 	switch (bytemode)
@@ -11576,6 +11604,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
+  /* Handles EVEX other than APX EVEX-promoted instructions.  */
   if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
@@ -12011,7 +12040,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	  print_operand_value (ins, disp & 0xffff, dis_style_text);
 	}
     }
-  if (ins->vex.b)
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       ins->evex_used |= EVEX_b_used;
 
@@ -13377,6 +13406,13 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (!ins->need_vex)
     return true;
 
+  if (ins->evex_type == evex_from_legacy)
+    {
+      ins->evex_used |= EVEX_b_used;
+      if (!ins->vex.nd)
+	return true;
+    }
+
   reg = ins->vex.register_specifier;
   ins->vex.register_specifier = 0;
   if (ins->address_mode != mode_64bit)
@@ -13468,12 +13504,19 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  names = att_names_xmm;
 	  ins->evex_used |= EVEX_len_used;
 	  break;
+	case v_mode:
 	case dq_mode:
 	  if (ins->rex & REX_W)
 	    names = att_names64;
+	  else if (bytemode == v_mode
+		   && !(sizeflag & DFLAG))
+	    names = att_names16;
 	  else
 	    names = att_names32;
 	  break;
+	case b_mode:
+	  names = att_names8rex;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 68dbeca343d..5e4f4f97f52 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -638,8 +638,10 @@ enum
   Vex,
   /* How to encode VEX.vvvv:
      0: VEX.vvvv must be 1111b.
-     1: VEX.vvvv encodes one of the register operands.
+     1: VEX.vvvv encodes one of the src register operands.
+     2: VEX.vvvv encodes the dest register operand.
    */
+#define VexVVVV_DST   2
   VexVVVV,
   /* How the VEX.W bit is used:
      0: Set by the REX.W bit.
@@ -786,7 +788,7 @@ typedef struct i386_opcode_modifier
   unsigned int immext:1;
   unsigned int norex64:1;
   unsigned int vex:2;
-  unsigned int vexvvvv:1;
+  unsigned int vexvvvv:2;
   unsigned int vexw:2;
   unsigned int opcodeprefix:2;
   unsigned int sib:3;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 139ec8ce33a..8f4ce62c789 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -143,12 +143,16 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+#define DstVVVV VexVVVV=VexVVVV_DST
+
 // The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
 #define APX_F(cpuid) cpuid&(cpuid|APX_F)
 
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
+// And re-use the bit to mark some NDD insns that swapping the source operands
+// may allow to switch from EVEX encoding to REX2 encoding.
 #define C StaticRounding
 
 #define FP 387|287|8087
@@ -295,26 +299,38 @@ std, 0xfd, 0, NoSuf, {}
 sti, 0xfb, 0, NoSuf, {}
 
 // Arithmetic.
+add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
+inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
 sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
+dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sbb, 0x80/3, APX_F, W|Modrm|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
@@ -325,30 +341,45 @@ test, 0x84, 0, D|W|C|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, R
 test, 0xa8, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 test, 0xf6/0, 0, W|Modrm|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 // clr with 1 operand is really xor with 2 operands.
 clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
 
+adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+
+not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 aaa, 0x37, No64, NoSuf, {}
@@ -382,6 +413,7 @@ cqto, 0x99, x64, Size64|NoSuf, {}
 // These multiplies can only be selected with single operand forms.
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -396,52 +428,90 @@ div, 0xf6/6, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspe
 idiv, 0xf6/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 idiv, 0xf6/7, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 
+rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+ror, 0xc0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+ror, 0xd2/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd2/3, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sal, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sal, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shl, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shl, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shr, 0xc0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shr, 0xd2/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sar, 0xc0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+sar, 0xd2/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+shld, 0x24, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
+shrd, 0x2c, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
 
 // Control transfer instructions.
@@ -943,6 +1013,7 @@ ud2b, 0xfb9, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U
 // 3rd official undefined instr (older CPUs don't take a ModR/M byte)
 ud0, 0xfff, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
+cmov<cc>, 0x4<cc:opc>, CMOV&APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 fcmovb, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
@@ -2034,8 +2105,12 @@ xcryptofb, 0xf30fa7e8, PadLock, NoSuf|RepPrefixOk, {}
 xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 
 // Multy-precision Add Carry, rdseed instructions.
+adcx, 0x6666, ADX&APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|DstVVVV|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adcx, 0x6666, ADX&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX&APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|DstVVVV|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 6/9] Support APX Push2/Pop2
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (4 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 5/9] Support APX NDD Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 7/9] Support APX PUSHP/POPP Cui, Lili
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich, Mo, Zewei

From: "Mo, Zewei" <zewei.mo@intel.com>

PPX functionality for PUSH/POP is not implemented in this patch
and will be implemented separately.

gas/ChangeLog:

2023-12-19  Zewei Mo <zewei.mo@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Lili Cui <lili.cui@intel.com>

	* config/tc-i386.c: (enum i386_error):
	New unsupported_rsp_register and invalid_src_register_set.
	(md_assemble): Add handler for unsupported_rsp_register and
	invalid_src_register_set.
	(check_APX_operands): Add invalid check for push2/pop2.
	(match_template): Handle check_APX_operands.
	* testsuite/gas/i386/i386.exp: Add apx-push2pop2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2.d: New test.
	* testsuite/gas/i386/x86-64-apx-push2pop2.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Added bad
	testcases for POP2.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-reg.h: Add REG_EVEX_MAP4_8F.
	* i386-dis-evex-w.h: Add EVEX_W_MAP4_8F_R_0 and EVEX_W_MAP4_FF_R_6
	* i386-dis-evex.h: Add REG_EVEX_MAP4_8F.
	* i386-dis.c (PUSH2_POP2_Fixup): Add special handling for PUSH2/POP2.
	(get_valid_dis386): Add handler for vector length and address_mode for
	APX-Push2/Pop2 insn.
	(nd): define nd as b for EVEX-promoted instrutions.
	(OP_VEX): Add handler of 64-bit vvvv register for APX-Push2/Pop2 insn.
	* i386-gen.c: Add Push2Pop2 bitfield.
	* i386-opc.h: Regenerated.
	* i386-opc.tbl: Regenerated.
---
 gas/config/tc-i386.c                          | 44 +++++++++++++++++++
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |  5 +++
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |  9 ++++
 gas/testsuite/gas/i386/i386.exp               |  1 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  5 +++
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  7 +++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     | 42 ++++++++++++++++++
 .../gas/i386/x86-64-apx-push2pop2-inval.l     | 13 ++++++
 .../gas/i386/x86-64-apx-push2pop2-inval.s     | 17 +++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d | 42 ++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s | 39 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |  3 ++
 opcodes/i386-dis-evex-reg.h                   |  9 ++++
 opcodes/i386-dis-evex-w.h                     | 10 +++++
 opcodes/i386-dis-evex.h                       |  2 +-
 opcodes/i386-dis.c                            | 31 +++++++++++++
 opcodes/i386-opc.tbl                          |  9 ++++
 17 files changed, 287 insertions(+), 1 deletion(-)
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 4a3dd5e96ca..4771373f4f8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -251,6 +251,7 @@ enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_dest_register_set,
     invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
@@ -260,6 +261,7 @@ enum i386_error
     no_default_mask,
     unsupported_rc_sae,
     unsupported_vector_size,
+    unsupported_rsp_register,
     internal_error,
   };
 
@@ -5423,6 +5425,9 @@ md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_dest_register_set:
+	  err_msg = _("two dest registers must be distinct");
+	  break;
 	case invalid_pseudo_prefix:
 	  err_msg = _("rex2 pseudo prefix cannot be used here");
 	  break;
@@ -5451,6 +5456,9 @@ md_assemble (char *line)
 	  as_bad (_("vector size above %u required for `%s'"), 128u << vector_size,
 		  pass1_mnem ? pass1_mnem : insn_name (current_templates.start));
 	  return;
+	case unsupported_rsp_register:
+	  err_msg = _("'rsp' register cannot be used");
+	  break;
 	case internal_error:
 	  err_msg = _("internal error");
 	  break;
@@ -7170,6 +7178,35 @@ check_EgprOperands (const insn_template *t)
   return 0;
 }
 
+/* Check if APX operands are valid for the instruction.  */
+static bool
+check_APX_operands (const insn_template *t)
+{
+  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same registers.
+   */
+  switch (t->mnem_off)
+    {
+    case MN_pop2:
+    case MN_pop2p:
+      if (register_number (i.op[0].regs) == register_number (i.op[1].regs))
+	{
+	  i.error = invalid_dest_register_set;
+	  return 1;
+	}
+    /* fall through */
+    case MN_push2:
+    case MN_push2p:
+      if (register_number (i.op[0].regs) == 4
+	  || register_number (i.op[1].regs) == 4)
+	{
+	  i.error = unsupported_rsp_register;
+	  return 1;
+	}
+      break;
+    }
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7671,6 +7708,13 @@ match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if APX operands are valid.  */
+      if (check_APX_operands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX encoding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.l b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
new file mode 100644
index 00000000000..a55a71520c8
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
@@ -0,0 +1,5 @@
+.* Assembler messages:
+.*:6: Error: `push2' is only supported in 64-bit mode
+.*:7: Error: `push2p' is only supported in 64-bit mode
+.*:8: Error: `pop2' is only supported in 64-bit mode
+.*:9: Error: `pop2p' is only supported in 64-bit mode
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.s b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
new file mode 100644
index 00000000000..77166327ed1
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
@@ -0,0 +1,9 @@
+# Check 32bit APX-PUSH2/POP2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rax, %rbx
+	push2p %rax, %rbx
+	pop2 %rax, %rbx
+	pop2p %rax, %rbx
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 3917be6be70..f9ee85b4bb3 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -511,6 +511,7 @@ if [gas_32_check] then {
     run_dump_test "sm4-intel"
     run_list_test "pbndkb-inval"
     run_list_test "user_msr-inval"
+    run_list_test "apx-push2pop2-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index e0b14e30178..1e5ad254b5e 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -34,3 +34,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+62 f4 e4[ 	]+\(bad\)
 [ 	]*[a-f0-9]+:[ 	]+08 ff[ 	]+.*
 [ 	]*[a-f0-9]+:[ 	]+04 08[ 	]+.*
+[ 	]*[a-f0-9]+:[ 	]+62 f4 3c[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+08 8f c0 ff ff ff[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]+62 74 7c 18 8f c0[ 	]+pop2   %rax,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 d4 3c 18 8f[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c0[ 	]+.*
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index 9a08a45eb76..5f85648d39c 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -34,3 +34,10 @@ _start:
 	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx
 	#{evex} inc %rax %rbx EVEX.vvvv != 1111 && EVEX.ND = 0.
 	.insn EVEX.L0.NP.M4.W1 0xff/0, (%rax,%rcx), %rbx
+	# pop2 %rax, %r8 set EVEX.ND=0.
+	.insn EVEX.L0.M4.W0 0x8f/0,  %rax, %r8
+	.byte 0xff, 0xff, 0xff
+	# pop2 %rax, %r8 set EVEX.vvvv = 1111.
+	.insn EVEX.L0.M4.W0 0x8f,  %rax, {rn-sae},%r8
+	# pop2 %r8, %r8.
+	.insn EVEX.L0.M4.W0 0x8f/0,  %r8,{rn-sae}, %r8
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
new file mode 100644
index 00000000000..46b21219582
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw -Mintel
+#name: i386 APX-push2pop2 insns (Intel disassembly)
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
new file mode 100644
index 00000000000..2cd142885a1
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
@@ -0,0 +1,13 @@
+.* Assembler messages:
+.*:6: Error: operand size mismatch for `push2'
+.*:7: Error: operand size mismatch for `push2'
+.*:8: Error: 'rsp' register cannot be used for `push2'
+.*:9: Error: 'rsp' register cannot be used for `push2'
+.*:10: Error: operand size mismatch for `push2p'
+.*:11: Error: 'rsp' register cannot be used for `push2p'
+.*:12: Error: operand size mismatch for `pop2'
+.*:13: Error: 'rsp' register cannot be used for `pop2'
+.*:14: Error: 'rsp' register cannot be used for `pop2'
+.*:15: Error: two dest registers must be distinct for `pop2'
+.*:16: Error: 'rsp' register cannot be used for `pop2p'
+.*:17: Error: two dest registers must be distinct for `pop2p'
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
new file mode 100644
index 00000000000..83cef97d57e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
@@ -0,0 +1,17 @@
+# Check illegal APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2  %ax, %bx
+	push2  %eax, %ebx
+	push2  %rsp, %r17
+	push2  %r17, %rsp
+	push2p %eax, %ebx
+	push2p %rsp, %r17
+	pop2   %ax, %bx
+	pop2   %rax, %rsp
+	pop2   %rsp, %rax
+	pop2   %r12, %r12
+	pop2p  %rax, %rsp
+	pop2p  %r12, %r12
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
new file mode 100644
index 00000000000..54f22a7f94e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw
+#name: x86_64 APX-push2pop2 insns
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
new file mode 100644
index 00000000000..5c28c13ba2e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
@@ -0,0 +1,39 @@
+# Check 64bit APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rbx, %rax
+	push2 %r17, %r8
+	push2 %r9, %r31
+	push2 %r31, %r24
+	push2p %rbx, %rax
+	push2p %r17, %r8
+	push2p %r9, %r31
+	push2p %r31, %r24
+	pop2 %rax, %rbx
+	pop2 %r8, %r17
+	pop2 %r31, %r9
+	pop2 %r24, %r31
+	pop2p %rax, %rbx
+	pop2p %r8, %r17
+	pop2p %r31, %r9
+	pop2p %r24, %r31
+
+	.intel_syntax noprefix
+	push2 rax, rbx
+	push2 r8, r17
+	push2 r31, r9
+	push2 r24, r31
+	push2p rax, rbx
+	push2p r8, r17
+	push2p r31, r9
+	push2p r24, r31
+	pop2 rbx, rax
+	pop2 r17, r8
+	pop2 r9, r31
+	pop2 r31, r24
+	pop2p rbx, rax
+	pop2p r17, r8
+	pop2p r9, r31
+	pop2p r31, r24
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 5e2e302b22a..2296ad4af7d 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -345,6 +345,9 @@ run_dump_test "x86-64-avx512dq-rcigrd-intel"
 run_dump_test "x86-64-avx512dq-rcigrd"
 run_dump_test "x86-64-avx512dq-rcigrne-intel"
 run_dump_test "x86-64-avx512dq-rcigrne"
+run_dump_test "x86-64-apx-push2pop2"
+run_dump_test "x86-64-apx-push2pop2-intel"
+run_list_test "x86-64-apx-push2pop2-inval"
 run_dump_test "x86-64-avx512dq-rcigru-intel"
 run_dump_test "x86-64-avx512dq-rcigru"
 run_dump_test "x86-64-avx512dq-rcigrz-intel"
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 4dc736253b1..eeae0603721 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -86,6 +86,10 @@
     { "subQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
     { "xorQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
   },
+  /* REG_EVEX_MAP4_8F */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP4_8F_R_0) },
+  },
   /* REG_EVEX_MAP4_F6 */
   {
     { Bad_Opcode },
@@ -109,4 +113,9 @@
   {
     { "incQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
     { "decQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP4_FF_R_6) },
   },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index b828277d413..12ab29544bb 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -442,6 +442,16 @@
     { Bad_Opcode },
     { "vpshrdw",   { XM, Vex, EXx, Ib }, 0 },
   },
+  /* EVEX_W_MAP4_8F_R_0 */
+  {
+    { "pop2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, NO_PREFIX },
+    { "pop2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, NO_PREFIX },
+  },
+  /* EVEX_W_MAP4_FF_R_6 */
+  {
+    { "push2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+    { "push2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+  },
   /* EVEX_W_MAP5_5B_P_0 */
   {
     { "vcvtdq2ph%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index a8a891d7f0e..4f2ec966457 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -1035,7 +1035,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_8F) },
     /* 90 */
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 50274e39ba6..cf02e34bcc8 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -105,6 +105,7 @@ static bool FXSAVE_Fixup (instr_info *, int, int);
 static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
+static bool PUSH2_POP2_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -903,6 +904,7 @@ enum
   REG_EVEX_MAP4_80,
   REG_EVEX_MAP4_81,
   REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_8F,
   REG_EVEX_MAP4_F6,
   REG_EVEX_MAP4_F7,
   REG_EVEX_MAP4_FE,
@@ -1746,6 +1748,9 @@ enum
   EVEX_W_0F3A70,
   EVEX_W_0F3A72,
 
+  EVEX_W_MAP4_8F_R_0,
+  EVEX_W_MAP4_FF_R_6,
+
   EVEX_W_MAP5_5B_P_0,
   EVEX_W_MAP5_7A_P_3,
 };
@@ -13517,6 +13522,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	case b_mode:
 	  names = att_names8rex;
 	  break;
+	case q_mode:
+	  names = att_names64;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
@@ -13901,3 +13909,26 @@ PREFETCHI_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_M (ins, bytemode, sizeflag);
 }
+
+static bool
+PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  if (ins->modrm.mod != 3)
+    return true;
+
+  unsigned int vvvv_reg = ins->vex.register_specifier
+    | (!ins->vex.v << 4);
+  unsigned int rm_reg = ins->modrm.rm + (ins->rex & REX_B ? 8 : 0)
+    + (ins->rex2 & REX_B ? 16 : 0);
+
+  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
+  if (!ins->vex.nd || vvvv_reg == 0x4 || rm_reg == 0x4
+      || (!ins->modrm.reg
+	  && vvvv_reg == rm_reg))
+    {
+      oappend (ins, "(bad)");
+      return true;
+    }
+
+  return OP_VEX (ins, bytemode, sizeflag);
+}
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 8f4ce62c789..4bb268c4bfb 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3482,3 +3482,12 @@ uwrmsr, 0xf30f38f8, USER_MSR, Modrm|NoSuf|NoRex64, { Reg64, Reg64 }
 uwrmsr, 0xf3f8/0, USER_MSR, Modrm|Vex128|VexMap7|VexW0|NoSuf, { Imm32, Reg64 }
 
 // USER_MSR instructions end.
+
+// APX Push2/Pop2 instructions.
+
+push2, 0xff/6, APX_F, Modrm|VexW0|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+push2p, 0xff/6, APX_F, Modrm|VexW1|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2, 0x8f/0, APX_F, Modrm|VexW0|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+
+// APX Push2/Pop2 instructions end.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 7/9] Support APX PUSHP/POPP
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (5 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 6/9] Support APX Push2/Pop2 Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 `8/9] Support APX NDD optimized encoding Cui, Lili
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich

gas/ChangeLog:

	* config/tc-i386.c (process_operands): Handle "PUSHP/POPP requires
	rex2.w == 1."
	* testsuite/gas/i386/x86-64.exp: Add new test for PUSHP/POPP.
	* testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d: New test.
	* testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-pushp-popp.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-pushp-popp.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (putop): print pushp and popp.
	* i386-opc.tbl: Added new insns.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          |  3 +-
 .../gas/i386/x86-64-apx-pushp-popp-intel.d    | 14 +++++
 .../gas/i386/x86-64-apx-pushp-popp-inval.l    |  5 ++
 .../gas/i386/x86-64-apx-pushp-popp-inval.s    |  7 +++
 .../gas/i386/x86-64-apx-pushp-popp.d          | 14 +++++
 .../gas/i386/x86-64-apx-pushp-popp.s          |  8 +++
 gas/testsuite/gas/i386/x86-64.exp             |  3 +
 opcodes/i386-dis.c                            | 55 ++++++++++++-------
 opcodes/i386-opc.h                            |  2 +
 opcodes/i386-opc.tbl                          |  3 +
 10 files changed, 94 insertions(+), 20 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-pushp-popp.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 4771373f4f8..d62f7d83b05 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -3901,7 +3901,8 @@ is_apx_evex_encoding (void)
 static INLINE bool
 is_apx_rex2_encoding (void)
 {
-  return i.rex2 || i.rex2_encoding;
+  return i.rex2 || i.rex2_encoding
+	|| i.tm.opcode_modifier.operandconstraint == REX2_REQUIRED;
 }
 
 static unsigned int
diff --git a/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d
new file mode 100644
index 00000000000..44e3e96a5df
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d
@@ -0,0 +1,14 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F pushp popp insns (Intel disassembly)
+#source: x86-64-apx-pushp-popp.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 08 50[	 ]+pushp  rax
+\s*[a-f0-9]+:\s*d5 19 57[	 ]+pushp  r31
+\s*[a-f0-9]+:\s*d5 08 58[	 ]+popp   rax
+\s*[a-f0-9]+:\s*d5 19 5f[	 ]+popp   r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l
new file mode 100644
index 00000000000..c4d774b9673
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.l
@@ -0,0 +1,5 @@
+.* Assembler messages:
+.*:4: Error: operand size mismatch for `pushp'
+.*:5: Error: operand size mismatch for `popp'
+.*:6: Error: operand size mismatch for `pushp'
+.*:7: Error: operand size mismatch for `popp'
diff --git a/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s
new file mode 100644
index 00000000000..28ed5d8145a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp-inval.s
@@ -0,0 +1,7 @@
+# Check bytecode of APX_F pushp popp instructions with illegal instructions.
+
+	.text
+	pushp %eax
+	popp  %eax
+	pushp (%rax)
+	popp  (%rax)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.d b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.d
new file mode 100644
index 00000000000..b20e5ba9a35
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.d
@@ -0,0 +1,14 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F pushp popp insns
+#source: x86-64-apx-pushp-popp.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 08 50[ 	]+pushp  %rax
+\s*[a-f0-9]+:\s*d5 19 57[ 	]+pushp  %r31
+\s*[a-f0-9]+:\s*d5 08 58[ 	]+popp   %rax
+\s*[a-f0-9]+:\s*d5 19 5f[ 	]+popp   %r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.s b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.s
new file mode 100644
index 00000000000..0ea66d0e70c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-pushp-popp.s
@@ -0,0 +1,8 @@
+# Check 64bit APX_F pushp popp instructions
+
+       .text
+ _start:
+	pushp %rax
+	pushp %r31
+	popp  %rax
+	popp  %r31
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 2296ad4af7d..112ab35fdb8 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -348,6 +348,9 @@ run_dump_test "x86-64-avx512dq-rcigrne"
 run_dump_test "x86-64-apx-push2pop2"
 run_dump_test "x86-64-apx-push2pop2-intel"
 run_list_test "x86-64-apx-push2pop2-inval"
+run_dump_test "x86-64-apx-pushp-popp"
+run_dump_test "x86-64-apx-pushp-popp-intel"
+run_list_test "x86-64-apx-pushp-popp-inval"
 run_dump_test "x86-64-avx512dq-rcigru-intel"
 run_dump_test "x86-64-avx512dq-rcigru"
 run_dump_test "x86-64-avx512dq-rcigrz-intel"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index cf02e34bcc8..b9fd010062a 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -303,6 +303,9 @@ struct dis_private {
 /* M0 in rex2 prefix represents map0 or map1.  */
 #define REX2_M 0x8
 
+/* {rex2} is not printed when the REX2_SPECIAL is set.  */
+#define REX2_SPECIAL 16
+
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
 #define PREFIX_REPNZ 2
@@ -1931,23 +1934,23 @@ static const struct dis386 dis386[] = {
   { "dec{S|}",		{ RMeSI }, 0 },
   { "dec{S|}",		{ RMeDI }, 0 },
   /* 50 */
-  { "push{!P|}",		{ RMrAX }, 0 },
-  { "push{!P|}",		{ RMrCX }, 0 },
-  { "push{!P|}",		{ RMrDX }, 0 },
-  { "push{!P|}",		{ RMrBX }, 0 },
-  { "push{!P|}",		{ RMrSP }, 0 },
-  { "push{!P|}",		{ RMrBP }, 0 },
-  { "push{!P|}",		{ RMrSI }, 0 },
-  { "push{!P|}",		{ RMrDI }, 0 },
+  { "push!P",		{ RMrAX }, 0 },
+  { "push!P",		{ RMrCX }, 0 },
+  { "push!P",		{ RMrDX }, 0 },
+  { "push!P",		{ RMrBX }, 0 },
+  { "push!P",		{ RMrSP }, 0 },
+  { "push!P",		{ RMrBP }, 0 },
+  { "push!P",		{ RMrSI }, 0 },
+  { "push!P",		{ RMrDI }, 0 },
   /* 58 */
-  { "pop{!P|}",		{ RMrAX }, 0 },
-  { "pop{!P|}",		{ RMrCX }, 0 },
-  { "pop{!P|}",		{ RMrDX }, 0 },
-  { "pop{!P|}",		{ RMrBX }, 0 },
-  { "pop{!P|}",		{ RMrSP }, 0 },
-  { "pop{!P|}",		{ RMrBP }, 0 },
-  { "pop{!P|}",		{ RMrSI }, 0 },
-  { "pop{!P|}",		{ RMrDI }, 0 },
+  { "pop!P",		{ RMrAX }, 0 },
+  { "pop!P",		{ RMrCX }, 0 },
+  { "pop!P",		{ RMrDX }, 0 },
+  { "pop!P",		{ RMrBX }, 0 },
+  { "pop!P",		{ RMrSP }, 0 },
+  { "pop!P",		{ RMrBP }, 0 },
+  { "pop!P",		{ RMrSI }, 0 },
+  { "pop!P",		{ RMrDI }, 0 },
   /* 60 */
   { X86_64_TABLE (X86_64_60) },
   { X86_64_TABLE (X86_64_61) },
@@ -9790,9 +9793,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 
   /* Check if the REX2 prefix is used.  */
   if (ins.last_rex2_prefix >= 0
-      && ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
-      && (ins.rex ^ ins.rex_used) == 0
-      && (ins.rex2 & 0x7))
+      && ((ins.rex2 & REX2_SPECIAL)
+	  || (((ins.rex2 & 7) ^ (ins.rex2_used & 7)) == 0
+	      && (ins.rex ^ ins.rex_used) == 0
+	      && (ins.rex2 & 7))))
     ins.all_prefixes[ins.last_rex2_prefix] = 0;
 
   /* Check if the SEG prefix is used.  */
@@ -10639,6 +10643,19 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 	case 'P':
 	  if (l == 0)
 	    {
+	      if (!cond && ins->last_rex2_prefix >= 0 && (ins->rex & REX_W))
+		{
+		  /* For pushp and popp, p is printed and do not print {rex2}
+		     for them.  */
+		  *ins->obufp++ = 'p';
+		  ins->rex2 |= REX2_SPECIAL;
+		  break;
+		}
+
+	      /* For "!P" print nothing else in Intel syntax.  */
+	      if (!cond && ins->intel_syntax)
+		break;
+
 	      if ((ins->modrm.mod == 3 || !cond)
 		  && !(sizeflag & SUFFIX_ALWAYS))
 		break;
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 5e4f4f97f52..cd261718f59 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -579,6 +579,8 @@ enum
   /* Instrucion requires that destination must be distinct from source
      registers.  */
 #define DISTINCT_DEST 9
+  /* Instrucion requires REX2 prefix.  */
+#define REX2_REQUIRED 10
   OperandConstraint,
   /* instruction ignores operand size prefix and in Intel mode ignores
      mnemonic size suffix check.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 4bb268c4bfb..eb054c9cfc7 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -85,6 +85,7 @@
 #define RegKludge         OperandConstraint=REG_KLUDGE
 #define SwapSources       OperandConstraint=SWAP_SOURCES
 #define Ugh               OperandConstraint=UGH
+#define Rex2              OperandConstraint=REX2_REQUIRED
 
 #define ATTSyntax         Dialect=ATT_SYNTAX
 #define ATTMnemonic       Dialect=ATT_MNEMONIC
@@ -232,6 +233,7 @@ push, 0x68, i186&No64, DefaultSize|No_bSuf|No_sSuf|No_qSuf, { Imm16|Imm32 }
 push, 0x6, No64, DefaultSize|No_bSuf|No_sSuf|No_qSuf, { SReg }
 // In 64bit mode, the operand size is implicitly 64bit.
 push, 0x50, x64, No_bSuf|No_lSuf|No_sSuf|NoRex64, { Reg16|Reg64 }
+pushp, 0x50, APX_F, No_bSuf|No_wSuf|No_lSuf|No_sSuf|Rex2, { Reg64 }
 push, 0xff/6, x64, Modrm|DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64, { Reg16|Reg64|Unspecified|BaseIndex }
 push, 0x6a, x64, DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64, { Imm8S }
 push, 0x68, x64, DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64, { Imm16|Imm32S }
@@ -245,6 +247,7 @@ pop, 0x8f/0, No64, Modrm|DefaultSize|No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32|Unsp
 pop, 0x7, No64, DefaultSize|No_bSuf|No_sSuf|No_qSuf, { SReg }
 // In 64bit mode, the operand size is implicitly 64bit.
 pop, 0x58, x64, No_bSuf|No_lSuf|No_sSuf|NoRex64, { Reg16|Reg64 }
+popp, 0x58, APX_F, No_bSuf|No_wSuf|No_lSuf|No_sSuf|Rex2, { Reg64 }
 pop, 0x8f/0, x64, Modrm|DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64, { Reg16|Reg64|Unspecified|BaseIndex }
 pop, 0xfa1, x64, DefaultSize|No_bSuf|No_lSuf|No_sSuf|NoRex64, { SReg }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 `8/9] Support APX NDD optimized encoding.
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (6 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 7/9] Support APX PUSHP/POPP Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:12 ` [PATCH v4 9/9] Support APX JMPABS for disassembler Cui, Lili
  2023-12-19 12:35 ` [PATCH v4 0/9] Support Intel APX EGPR Jan Beulich
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

This patch aims to optimize:

add %r16, %r15, %r15 -> add %r16, %r15

gas/ChangeLog:

	* config/tc-i386.c (check_Rex_required): New function.
	(can_convert_NDD_to_legacy): Ditto.
	(match_template): If we can optimzie APX NDD insns, so rematch
	template.
	* testsuite/gas/i386/x86-64.exp: Add test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto.
---
 gas/config/tc-i386.c                          | 107 ++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 132 ++++++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 125 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 4 files changed, 365 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d62f7d83b05..88934b4e3e5 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7208,6 +7208,56 @@ check_APX_operands (const insn_template *t)
   return 0;
 }
 
+/* Check if the instruction use the REX registers or REX prefix.  */
+static bool
+check_Rex_required (void)
+{
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg)
+	continue;
+
+      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
+	return true;
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
+      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
+    return true;
+
+  /* Check pseudo prefix {rex} are valid.  */
+  return i.rex_encoding;
+}
+
+/* Optimize APX NDD insns to legacy insns.  */
+static unsigned int
+can_convert_NDD_to_legacy (const insn_template *t)
+{
+  unsigned int match_dest_op = ~0;
+
+  if (!i.tm.opcode_modifier.nf
+      && i.reg_operands >= 2)
+    {
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = i.operands - 2;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+	  && i.op[src1].regs == i.op[dest].regs)
+	match_dest_op = src1;
+      /* If the first operand is the same as the third operand,
+	 these instructions need to support the ability to commutative
+	 the first two operands and still not change the semantics in order
+	 to be optimized.  */
+      else if (optimize > 1
+	       && t->opcode_modifier.commutative
+	       && i.types[src2].bitfield.class == Reg
+	       && i.op[src2].regs == i.op[dest].regs)
+	match_dest_op = src2;
+    }
+  return match_dest_op;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7751,6 +7801,63 @@ match_template (char mnem_suffix)
 	  i.memshift = memshift;
 	}
 
+      /* If we can optimize a NDD insn to legacy insn, like
+	 add %r16, %r8, %r8 -> add %r16, %r8,
+	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
+	 Note that the semantics have not been changed.  */
+      if (optimize
+	  && !i.no_optimize
+	  && i.vec_encoding != vex_encoding_evex
+	  && t + 1 < current_templates.end
+	  && !t[1].opcode_modifier.evex
+	  && t[1].opcode_space <= SPACE_0F38
+	  && t->opcode_modifier.vexvvvv == VexVVVV_DST
+	  && (i.types[i.operands - 1].bitfield.dword
+	      || i.types[i.operands - 1].bitfield.qword))
+	{
+	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
+
+	  if (match_dest_op != (unsigned int) ~0)
+	    {
+	      size_match = true;
+	      /* We ensure that the next template has the same input
+		 operands as the original matching template by the first
+		 opernd (ATT). To avoid someone support new NDD insns and
+		 put it in the wrong position.  */
+	      overlap0 = operand_type_and (i.types[0],
+					   t[1].operand_types[0]);
+	      if (t->opcode_modifier.d)
+		overlap1 = operand_type_and (i.types[0],
+					     t[1].operand_types[1]);
+	      if (!operand_type_match (overlap0, i.types[0])
+		  && (!t->opcode_modifier.d
+		      || !operand_type_match (overlap1, i.types[0])))
+		size_match = false;
+
+	      if (size_match
+		  && (t[1].opcode_space <= SPACE_0F
+		      /* Some non-legacy-map0/1 insns can be shorter when
+			 legacy-encoded and when no REX prefix is required.  */
+		      || (!check_EgprOperands (t + 1)
+			  && !check_Rex_required ()
+			  && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))
+		{
+		  unsigned int src1 = i.operands - 2;
+		  unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+		  if (i.operands > 2 && match_dest_op == i.operands - 3)
+		    swap_2_operands (match_dest_op, i.operands - 2);
+
+		  --i.operands;
+		  --i.reg_operands;
+
+		  specific_error = progress (internal_error);
+		  continue;
+		}
+
+	    }
+	}
+
       /* We've found a match; break out of loop.  */
       break;
     }
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
new file mode 100644
index 00000000000..48f0f1ceee3
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
@@ -0,0 +1,132 @@
+#as: -Os
+#objdump: -drw
+#name: x86-64 APX NDD optimized encoding
+#source: x86-64-apx-ndd-optimize.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*62 44 3c 18 00 f8    	add    %r31b,%r8b,%r8b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 1d 03 c7          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 4d 03 38          	add    \(%r8\),%r31
+\s*[a-f0-9]+:\s*d5 1d 03 07          	add    \(%r31\),%r8
+\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 	add    \$0x12344433,%r15
+\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%r8
+\s*[a-f0-9]+:\s*d5 19 ff c7          	inc    %r31
+\s*[a-f0-9]+:\s*62 dc 04 10 fe c7    	inc    %r31b,%r31b
+\s*[a-f0-9]+:\s*d5 1c 29 f9          	sub    %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 28 f9    	sub    %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 29 38    	sub    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 2b 04 07       	sub    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 	sub    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 18 ff c9          	dec    %r17
+\s*[a-f0-9]+:\s*62 fc 74 10 fe c9    	dec    %r17b,%r17b
+\s*[a-f0-9]+:\s*d5 1c 19 f9          	sbb    %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 18 f9    	sbb    %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 19 38    	sbb    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 1b 04 07       	sbb    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 	sbb    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 21 f9          	and    %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 20 f9    	and    %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*4d 23 38             	and    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 23 04 07       	and    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 	and    \$0x1234,%r30d
+\s*[a-f0-9]+:\s*d5 1c 09 f9          	or     %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 08 f9    	or     %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*4d 0b 38             	or     \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 0b 04 07       	or     \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 	or     \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 31 f9          	xor    %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 30 f9    	xor    %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*4d 33 38             	xor    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 33 04 07       	xor    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 	xor    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 11 f9          	adc    %r15,%r17
+\s*[a-f0-9]+:\s*62 7c 74 10 10 f9    	adc    %r15b,%r17b,%r17b
+\s*[a-f0-9]+:\s*4d 13 38             	adc    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 13 04 07       	adc    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 	adc    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 18 f7 d9          	neg    %r17
+\s*[a-f0-9]+:\s*62 fc 74 10 f6 d9    	neg    %r17b,%r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d1          	not    %r17
+\s*[a-f0-9]+:\s*62 fc 74 10 f6 d1    	not    %r17b,%r17b
+\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx
+\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx
+\s*[a-f0-9]+:\s*48 0f af d0          	imul   %rax,%rdx
+\s*[a-f0-9]+:\s*d5 19 d1 c7          	rol    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 c7    	rol    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 c4 02          	rol    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 c4 02 	rol    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 cf          	ror    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 cf    	ror    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 cc 02          	ror    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 cc 02 	ror    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 d7          	rcl    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 d7    	rcl    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 d4 02          	rcl    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 d4 02 	rcl    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 df          	rcr    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 df    	rcr    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 dc 02          	rcr    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 dc 02 	rcr    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 e7    	shl    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 e4 02 	shl    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 e7    	shl    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 e4 02 	shl    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ef          	shr    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 ef    	shr    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 ec 02          	shr    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 ec 02 	shr    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ff          	sar    \$1,%r31
+\s*[a-f0-9]+:\s*62 dc 04 10 d0 ff    	sar    \$1,%r31b,%r31b
+\s*[a-f0-9]+:\s*49 c1 fc 02          	sar    \$0x2,%r12
+\s*[a-f0-9]+:\s*62 d4 1c 18 c0 fc 02 	sar    \$0x2,%r12b,%r12b
+\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f a4 c4 02       	shld   \$0x2,%r8,%r12
+\s*[a-f0-9]+:\s*62 54 bc 18 24 c4 02 	shld   \$0x2,%r8,%r12,%r8
+\s*[a-f0-9]+:\s*62 74 b4 18 a5 08    	shld   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c a5 e0          	shld   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 a5 e0    	shld   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f ac ec 01       	shrd   \$0x1,%r13,%r12
+\s*[a-f0-9]+:\s*62 54 94 18 2c ec 01 	shrd   \$0x1,%r13,%r12,%r13
+\s*[a-f0-9]+:\s*62 74 b4 18 ad 08    	shrd   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c ad e0          	shrd   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 ad e0    	shrd   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*66 0f 38 f6 c3       	adcx   %ebx,%eax
+\s*[a-f0-9]+:\s*66 0f 38 f6 c3       	adcx   %ebx,%eax
+\s*[a-f0-9]+:\s*62 f4 fd 18 66 c3    	adcx   %rbx,%rax,%rax
+\s*[a-f0-9]+:\s*62 74 3d 18 66 c0    	adcx   %eax,%r8d,%r8d
+\s*[a-f0-9]+:\s*62 d4 7d 18 66 c7    	adcx   %r15d,%eax,%eax
+\s*[a-f0-9]+:\s*67 66 0f 38 f6 04 0a 	adcx   \(%edx,%ecx,1\),%eax
+\s*[a-f0-9]+:\s*f3 0f 38 f6 c3       	adox   %ebx,%eax
+\s*[a-f0-9]+:\s*f3 0f 38 f6 c3       	adox   %ebx,%eax
+\s*[a-f0-9]+:\s*62 f4 fe 18 66 c3    	adox   %rbx,%rax,%rax
+\s*[a-f0-9]+:\s*62 74 3e 18 66 c0    	adox   %eax,%r8d,%r8d
+\s*[a-f0-9]+:\s*62 d4 7e 18 66 c7    	adox   %r15d,%eax,%eax
+\s*[a-f0-9]+:\s*67 f3 0f 38 f6 04 0a 	adox   \(%edx,%ecx,1\),%eax
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
new file mode 100644
index 00000000000..6ffdf5a6390
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
@@ -0,0 +1,125 @@
+# Check 64bit APX NDD instructions with optimized encoding
+
+	.text
+_start:
+add    %r31,%r8,%r8
+addb   %r31b,%r8b,%r8b
+{store} add    %r31,%r8,%r8
+{load}  add    %r31,%r8,%r8
+add    %r31,(%r8),%r31
+add    (%r31),%r8,%r8
+add    $0x12344433,%r15,%r15
+add    $0xfffffffff4332211,%r8,%r8
+inc    %r31,%r31
+incb   %r31b,%r31b
+sub    %r15,%r17,%r17
+subb   %r15b,%r17b,%r17b
+sub    %r15,(%r8),%r15
+sub    (%r15,%rax,1),%r16,%r16
+sub    $0x1234,%r30,%r30
+dec    %r17,%r17
+decb   %r17b,%r17b
+sbb    %r15,%r17,%r17
+sbbb   %r15b,%r17b,%r17b
+sbb    %r15,(%r8),%r15
+sbb    (%r15,%rax,1),%r16,%r16
+sbb    $0x1234,%r30,%r30
+and    %r15,%r17,%r17
+andb   %r15b,%r17b,%r17b
+and    %r15,(%r8),%r15
+and    (%r15,%rax,1),%r16,%r16
+and    $0x1234,%r30,%r30
+or     %r15,%r17,%r17
+orb    %r15b,%r17b,%r17b
+or     %r15,(%r8),%r15
+or     (%r15,%rax,1),%r16,%r16
+or     $0x1234,%r30,%r30
+xor    %r15,%r17,%r17
+xorb   %r15b,%r17b,%r17b
+xor    %r15,(%r8),%r15
+xor    (%r15,%rax,1),%r16,%r16
+xor    $0x1234,%r30,%r30
+adc    %r15,%r17,%r17
+adcb   %r15b,%r17b,%r17b
+adc    %r15,(%r8),%r15
+adc    (%r15,%rax,1),%r16,%r16
+adc    $0x1234,%r30,%r30
+neg    %r17,%r17
+negb   %r17b,%r17b
+not    %r17,%r17
+notb   %r17b,%r17b
+imul   0x90909(%eax),%edx,%edx
+imul   0x909(%rax,%r31,8),%rdx,%rdx
+imul   %rdx,%rax,%rdx
+rol    $0x1,%r31,%r31
+rolb   $0x1,%r31b,%r31b
+rol    $0x2,%r12,%r12
+rolb   $0x2,%r12b,%r12b
+ror    $0x1,%r31,%r31
+rorb   $0x1,%r31b,%r31b
+ror    $0x2,%r12,%r12
+rorb   $0x2,%r12b,%r12b
+rcl    $0x1,%r31,%r31
+rclb   $0x1,%r31b,%r31b
+rcl    $0x2,%r12,%r12
+rclb   $0x2,%r12b,%r12b
+rcr    $0x1,%r31,%r31
+rcrb   $0x1,%r31b,%r31b
+rcr    $0x2,%r12,%r12
+rcrb   $0x2,%r12b,%r12b
+sal    $0x1,%r31,%r31
+salb   $0x1,%r31b,%r31b
+sal    $0x2,%r12,%r12
+salb   $0x2,%r12b,%r12b
+shl    $0x1,%r31,%r31
+shlb   $0x1,%r31b,%r31b
+shl    $0x2,%r12,%r12
+shlb   $0x2,%r12b,%r12b
+shr    $0x1,%r31,%r31
+shrb   $0x1,%r31b,%r31b
+shr    $0x2,%r12,%r12
+shrb   $0x2,%r12b,%r12b
+sar    $0x1,%r31,%r31
+sarb   $0x1,%r31b,%r31b
+sar    $0x2,%r12,%r12
+sarb   $0x2,%r12b,%r12b
+shld   $0x1,%r12,(%rax),%r12
+shld   $0x2,%r8,%r12,%r12
+shld   $0x2,%r8,%r12,%r8
+shld   %cl,%r9,(%rax),%r9
+shld   %cl,%r12,%r16,%r16
+shld   %cl,%r12,%r16,%r12
+shrd   $0x1,%r12,(%rax),%r12
+shrd   $0x1,%r13,%r12,%r12
+shrd   $0x1,%r13,%r12,%r13
+shrd   %cl,%r9,(%rax),%r9
+shrd   %cl,%r12,%r16,%r16
+shrd   %cl,%r12,%r16,%r12
+cmovo  0x90909090(%eax),%edx,%edx
+cmovno 0x90909090(%eax),%edx,%edx
+cmovb  0x90909090(%eax),%edx,%edx
+cmovae 0x90909090(%eax),%edx,%edx
+cmove  0x90909090(%eax),%edx,%edx
+cmovne 0x90909090(%eax),%edx,%edx
+cmovbe 0x90909090(%eax),%edx,%edx
+cmova  0x90909090(%eax),%edx,%edx
+cmovs  0x90909090(%eax),%edx,%edx
+cmovns 0x90909090(%eax),%edx,%edx
+cmovp  0x90909090(%eax),%edx,%edx
+cmovnp 0x90909090(%eax),%edx,%edx
+cmovl  0x90909090(%eax),%edx,%edx
+cmovge 0x90909090(%eax),%edx,%edx
+cmovle 0x90909090(%eax),%edx,%edx
+cmovg  0x90909090(%eax),%edx,%edx
+adcx   %ebx,%eax,%eax
+adcx   %eax,%ebx,%eax
+adcx   %rbx,%rax,%rax
+adcx   %eax,%r8d,%r8d
+adcx   %r15d,%eax,%eax
+adcx   (%edx,%ecx,1),%eax,%eax
+adox   %ebx,%eax,%eax
+adox   %eax,%ebx,%eax
+adox   %rbx,%rax,%rax
+adox   %eax,%r8d,%r8d
+adox   %r15d,%eax,%eax
+adox   (%edx,%ecx,1),%eax,%eax
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 112ab35fdb8..5d81f89bdef 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -561,6 +561,7 @@ run_dump_test "x86-64-optimize-6"
 run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al"
 run_dump_test "x86-64-optimize-7b"
 run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al"
+run_dump_test "x86-64-apx-ndd-optimize"
 run_dump_test "x86-64-align-branch-1a"
 run_dump_test "x86-64-align-branch-1b"
 run_dump_test "x86-64-align-branch-1c"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 9/9] Support APX JMPABS for disassembler
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (7 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 `8/9] Support APX NDD optimized encoding Cui, Lili
@ 2023-12-19 12:12 ` Cui, Lili
  2023-12-19 12:35 ` [PATCH v4 0/9] Support Intel APX EGPR Jan Beulich
  9 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-19 12:12 UTC (permalink / raw)
  To: binutils; +Cc: hongjiu.lu, jbeulich, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

gas/ChangeLog:

	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (JMPABS_Fixup): New Fixup function to disassemble jmpabs.
	(print_insn): Add #UD exception for jmpabs.
	(dis386): Modify a1 unit for support jmpabs.
	* i386-mnem.h: Regenerated.
	* i386-opc.tbl: New insns.
	* i386-tbl.h: Regenerated.
---
 .../gas/i386/x86-64-apx-jmpabs-intel.d        | 12 ++++++
 .../gas/i386/x86-64-apx-jmpabs-inval.d        | 40 +++++++++++++++++++
 .../gas/i386/x86-64-apx-jmpabs-inval.s        | 15 +++++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    | 12 ++++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |  5 +++
 gas/testsuite/gas/i386/x86-64.exp             |  3 ++
 opcodes/i386-dis.c                            | 37 ++++++++++++++++-
 7 files changed, 122 insertions(+), 2 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s

diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
new file mode 100644
index 00000000000..2b87f95532f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
@@ -0,0 +1,12 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F JMPABS insns (Intel disassembly)
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs 0x2
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
new file mode 100644
index 00000000000..86f313f0873
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
@@ -0,0 +1,40 @@
+#as: --64
+#objdump: -dw
+#name: illegal decoding of APX_F jmpabs insns
+#source: x86-64-apx-jmpabs-inval.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <.text>:
+\s*[a-f0-9]+:	66 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	67 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f2 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f3 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f0 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	d5 08 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
new file mode 100644
index 00000000000..de4440a5466
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
@@ -0,0 +1,15 @@
+# Check bytecode of APX_F jmpabs instructions with illegal encode.
+
+	.text
+# With 66 prefix
+	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With 67 prefix
+	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F2 prefix
+	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F3 prefix
+	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With LOCK prefix
+	.byte 0xf0,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# REX2.M0 = 0 REX2.W = 1
+	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
new file mode 100644
index 00000000000..e95b54f5dab
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
@@ -0,0 +1,12 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F JMPABS insns
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs \$0x2
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
new file mode 100644
index 00000000000..69ffb763260
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
@@ -0,0 +1,5 @@
+# Check 64bit APX_F JMPABS instructions
+
+	.text
+ _start:
+	.byte 0xd5,0x00,0xa1,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 5d81f89bdef..dbef4efb83c 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -377,6 +377,9 @@ run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-apx-ndd"
+run_dump_test "x86-64-apx-jmpabs"
+run_dump_test "x86-64-apx-jmpabs-intel"
+run_dump_test "x86-64-apx-jmpabs-inval"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index b9fd010062a..abbe0724616 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
 static bool PUSH2_POP2_Fixup (instr_info *, int, int);
+static bool JMPABS_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -2025,7 +2026,7 @@ static const struct dis386 dis386[] = {
   { "lahf",		{ XX }, 0 },
   /* a0 */
   { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
-  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup, v_mode } }, PREFIX_REX2_ILLEGAL },
   { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
   { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
   { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
@@ -9706,7 +9707,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     }
 
   if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
-      && ins.last_rex2_prefix >= 0)
+      && ins.last_rex2_prefix >= 0 && (ins.rex2 & REX2_SPECIAL) == 0)
     {
       i386_dis_printf (info, dis_style_text, "(bad)");
       ret = ins.end_codep - priv.the_buffer;
@@ -13949,3 +13950,35 @@ PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_VEX (ins, bytemode, sizeflag);
 }
+
+static bool
+JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  if (ins->last_rex2_prefix >= 0)
+    {
+      uint64_t op;
+
+      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR | PREFIX_LOCK)) != 0x0
+	  || (ins->rex & REX_W) != 0x0)
+	{
+	  oappend (ins, "(bad)");
+	  return true;
+	}
+
+      if (bytemode == eAX_reg)
+	return true;
+
+      if (!get64 (ins, &op))
+	return false;
+
+      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
+      ins->rex2 |= REX2_SPECIAL;
+      oappend_immediate (ins, op);
+
+      return true;
+    }
+
+  if (bytemode == eAX_reg)
+    return OP_IMREG (ins, bytemode, sizeflag);
+  return OP_OFF64 (ins, bytemode, sizeflag);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
                   ` (8 preceding siblings ...)
  2023-12-19 12:12 ` [PATCH v4 9/9] Support APX JMPABS for disassembler Cui, Lili
@ 2023-12-19 12:35 ` Jan Beulich
  2023-12-20  8:50   ` Cui, Lili
  9 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-19 12:35 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 19.12.2023 13:12, Cui, Lili wrote:
> *** BLURB HERE ***
> Future optimizations to be made.
> 1. The current implementation of vexvvvvv needs to be optimized.
> 2. The handling of double VEX/EVEX templates in check_register() needs to be optimized.

I hope this is just stale here, and the dependency on templates was now
removed again from check_register().

Jan

> 3. Convert vround* with egpr to VRNDSCALE* instead of reporting an error.
> 4. Find a suitable variable to replace OperandConstraint=REX2_REQUIRED.
> 
> Cui, Lili (5):
>   Support APX GPR32 with rex2 prefix
>   Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
>   Support APX GPR32 with extend evex prefix
>   Add tests for APX GPR32 with extend evex prefix
>   Support APX PUSHP/POPP
> 
> Hu, Lin1 (2):
>   Support APX NDD optimized encoding.
>   Support APX JMPABS for disassembler
> 
> Mo, Zewei (1):
>   Support APX Push2/Pop2
> 
> konglin1 (1):
>   Support APX NDD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-19 12:35 ` [PATCH v4 0/9] Support Intel APX EGPR Jan Beulich
@ 2023-12-20  8:50   ` Cui, Lili
  2023-12-20  8:57     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-20  8:50 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 19.12.2023 13:12, Cui, Lili wrote:
> > *** BLURB HERE ***
> > Future optimizations to be made.
> > 1. The current implementation of vexvvvvv needs to be optimized.
> > 2. The handling of double VEX/EVEX templates in check_register() needs to
> be optimized.
> 
> I hope this is just stale here, and the dependency on templates was now
> removed again from check_register().
> 
> Jan
> 

In fact, I didn't remove it in V4, I didn't find a better place to deal with it. I don't know if you agree with this implementation below.

/* For dual VEX/EVEX templates, evex encoding is required when the input has
   egpr.*/
static INLINE void
vex_with_Egpr_requires_evex_encoding (const insn_template *t)
{
  for (unsigned int op = 0; op < i.operands; op++)
    {
      if (i.types[op].bitfield.class != Reg)
        continue;

      if (i.op[op].regs->reg_flags & RegRex2)
        i.vec_encoding = vex_encoding_evex;
    }

  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
    i.vec_encoding = vex_encoding_evex;
}

static INLINE void
install_template (const insn_template *t)
{
  unsigned int l;

  i.tm = *t;

  /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      vex_with_Egpr_requires_evex_encoding (t);


Regards,
Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20  8:50   ` Cui, Lili
@ 2023-12-20  8:57     ` Jan Beulich
  2023-12-20 10:42       ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-20  8:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 20.12.2023 09:50, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>> *** BLURB HERE ***
>>> Future optimizations to be made.
>>> 1. The current implementation of vexvvvvv needs to be optimized.
>>> 2. The handling of double VEX/EVEX templates in check_register() needs to
>> be optimized.
>>
>> I hope this is just stale here, and the dependency on templates was now
>> removed again from check_register().
> 
> In fact, I didn't remove it in V4, I didn't find a better place to deal with it. I don't know if you agree with this implementation below.

I'm afraid I don't, both because it still isn't clear to me what's wrong
with my alternative proposal, and also for the formal reason of ...

> /* For dual VEX/EVEX templates, evex encoding is required when the input has
>    egpr.*/
> static INLINE void
> vex_with_Egpr_requires_evex_encoding (const insn_template *t)
> {
>   for (unsigned int op = 0; op < i.operands; op++)
>     {
>       if (i.types[op].bitfield.class != Reg)
>         continue;
> 
>       if (i.op[op].regs->reg_flags & RegRex2)
>         i.vec_encoding = vex_encoding_evex;

... it not being okay to override i.vec_encoding like this, when it
may already have been set to another value.

Jan

>     }
> 
>   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>     i.vec_encoding = vex_encoding_evex;
> }
> 
> static INLINE void
> install_template (const insn_template *t)
> {
>   unsigned int l;
> 
>   i.tm = *t;
> 
>   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       vex_with_Egpr_requires_evex_encoding (t);
> 
> 
> Regards,
> Lili.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20  8:57     ` Jan Beulich
@ 2023-12-20 10:42       ` Cui, Lili
  2023-12-20 11:00         ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-20 10:42 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, December 20, 2023 4:57 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v4 0/9] Support Intel APX EGPR
> 
> On 20.12.2023 09:50, Cui, Lili wrote:
> >> On 19.12.2023 13:12, Cui, Lili wrote:
> >>> *** BLURB HERE ***
> >>> Future optimizations to be made.
> >>> 1. The current implementation of vexvvvvv needs to be optimized.
> >>> 2. The handling of double VEX/EVEX templates in check_register()
> >>> needs to
> >> be optimized.
> >>
> >> I hope this is just stale here, and the dependency on templates was
> >> now removed again from check_register().
> >
> > In fact, I didn't remove it in V4, I didn't find a better place to deal with it. I
> don't know if you agree with this implementation below.
> 
> I'm afraid I don't, both because it still isn't clear to me what's wrong with my
> alternative proposal, and also for the formal reason of ...
> 

For the alternative proposal, do you mean adding a new variable to avoid introducing new loops over all operands? How about this ? or do you want to add other variable and handle it in check_register?

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -464,6 +464,9 @@ struct _i386_insn
     /* Have NOTRACK prefix.  */
     const char *notrack_prefix;

+    /* Has Egpr.  */
+    bool has_egpr;
+
     /* Error message.  */
     enum i386_error error;
   };
@@ -3683,7 +3686,7 @@ install_template (const insn_template *t)
           || maybe_cpu (t, CpuFMA))
          && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
        {
-         if (need_evex_encoding ())
+         if (need_evex_encoding () || i.has_egpr)
            {
              i.tm.opcode_modifier.vex = 0;
              i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
@@ -3704,7 +3707,7 @@ install_template (const insn_template *t)
       if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
          || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
          || APX_F(CpuBMI2))
-       if (need_evex_encoding ())
+       if (need_evex_encoding () || i.has_egpr)
          i.tm.opcode_modifier.vex = 0;
        else
          i.tm.opcode_modifier.evex = 0;
@@ -14523,11 +14526,12 @@ static bool check_register (const reg_entry *r)
          || flag_code != CODE_64BIT)
        return false;

+      i.has_egpr = true;


Lili.

> > /* For dual VEX/EVEX templates, evex encoding is required when the input
> has
> >    egpr.*/
> > static INLINE void
> > vex_with_Egpr_requires_evex_encoding (const insn_template *t) {
> >   for (unsigned int op = 0; op < i.operands; op++)
> >     {
> >       if (i.types[op].bitfield.class != Reg)
> >         continue;
> >
> >       if (i.op[op].regs->reg_flags & RegRex2)
> >         i.vec_encoding = vex_encoding_evex;
> 
> ... it not being okay to override i.vec_encoding like this, when it may already
> have been set to another value.
> 
> Jan
> 
> >     }
> >
> >   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >     i.vec_encoding = vex_encoding_evex; }
> >
> > static INLINE void
> > install_template (const insn_template *t) {
> >   unsigned int l;
> >
> >   i.tm = *t;
> >
> >   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >     {
> >       vex_with_Egpr_requires_evex_encoding (t);
> >
> >
> > Regards,
> > Lili.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20 10:42       ` Cui, Lili
@ 2023-12-20 11:00         ` Jan Beulich
  2023-12-20 11:50           ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-20 11:00 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 20.12.2023 11:42, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, December 20, 2023 4:57 PM
>>
>> On 20.12.2023 09:50, Cui, Lili wrote:
>>>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>>> *** BLURB HERE ***
>>>>> Future optimizations to be made.
>>>>> 1. The current implementation of vexvvvvv needs to be optimized.
>>>>> 2. The handling of double VEX/EVEX templates in check_register()
>>>>> needs to
>>>> be optimized.
>>>>
>>>> I hope this is just stale here, and the dependency on templates was
>>>> now removed again from check_register().
>>>
>>> In fact, I didn't remove it in V4, I didn't find a better place to deal with it. I
>> don't know if you agree with this implementation below.
>>
>> I'm afraid I don't, both because it still isn't clear to me what's wrong with my
>> alternative proposal, and also for the formal reason of ...
>>
> 
> For the alternative proposal, do you mean adding a new variable to avoid introducing new loops over all operands? How about this ? or do you want to add other variable and handle it in check_register?

No, the alternative proposal continues to be to introduce a new
enumerator to record in i.vec_encoding (vex_encoding_egpr is what
iirc I had suggested before, despite the naming anomaly). What you
outline below would, however, still be better than adding another
loop (as you had it earlier), imo.

> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -464,6 +464,9 @@ struct _i386_insn
>      /* Have NOTRACK prefix.  */
>      const char *notrack_prefix;
> 
> +    /* Has Egpr.  */
> +    bool has_egpr;
> +
>      /* Error message.  */
>      enum i386_error error;
>    };

As a general remark, when you add new fields to a struct, please
try to find a slot that ideally is using existing padding _and_ is
next to related fields, or at least one of the two.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20 11:00         ` Jan Beulich
@ 2023-12-20 11:50           ` Cui, Lili
  2023-12-20 12:01             ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-20 11:50 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >>>> On 19.12.2023 13:12, Cui, Lili wrote:
> >>>>> *** BLURB HERE ***
> >>>>> Future optimizations to be made.
> >>>>> 1. The current implementation of vexvvvvv needs to be optimized.
> >>>>> 2. The handling of double VEX/EVEX templates in check_register()
> >>>>> needs to
> >>>> be optimized.
> >>>>
> >>>> I hope this is just stale here, and the dependency on templates was
> >>>> now removed again from check_register().
> >>>
> >>> In fact, I didn't remove it in V4, I didn't find a better place to
> >>> deal with it. I
> >> don't know if you agree with this implementation below.
> >>
> >> I'm afraid I don't, both because it still isn't clear to me what's
> >> wrong with my alternative proposal, and also for the formal reason of ...
> >>
> >
> > For the alternative proposal, do you mean adding a new variable to avoid
> introducing new loops over all operands? How about this ? or do you want to
> add other variable and handle it in check_register?
> 
> No, the alternative proposal continues to be to introduce a new enumerator
> to record in i.vec_encoding (vex_encoding_egpr is what iirc I had suggested
> before, despite the naming anomaly). What you outline below would,
> however, still be better than adding another loop (as you had it earlier), imo.
> 

I guessed you want to add a new type like vex_encoding_egpr, but I don't know how to do it differently with before, when the instruction support legacy, vex and evex encodings, if we put the vex and eves templates in front of the legacy templates (in i386-opc.tbl), we'll assign the vex_encoding_egpr for the legacy input, and it will have the same problem as before. And we also need to handle it in check_register(). Maybe you hinted at some other way of handling it, but I didn't get it.


     if (current_templates.start->opcode_modifier.vex
        && current_templates.start->opcode_modifier.evex)
      i.vec_encoding = vex_encoding_egpr;


> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -464,6 +464,9 @@ struct _i386_insn
> >      /* Have NOTRACK prefix.  */
> >      const char *notrack_prefix;
> >
> > +    /* Has Egpr.  */
> > +    bool has_egpr;
> > +
> >      /* Error message.  */
> >      enum i386_error error;
> >    };
> 
> As a general remark, when you add new fields to a struct, please try to find a
> slot that ideally is using existing padding _and_ is next to related fields, or at
> least one of the two.
> 

Moved to

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -438,6 +438,9 @@ struct _i386_insn
     /* Prefer the REX2 prefix in encoding.  */
     bool rex2_encoding;

+    /* Has Egpr.  */
+    bool has_egpr;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;

Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20 11:50           ` Cui, Lili
@ 2023-12-20 12:01             ` Jan Beulich
  2023-12-20 12:16               ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-20 12:01 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 20.12.2023 12:50, Cui, Lili wrote:
>>>>>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>>>>> *** BLURB HERE ***
>>>>>>> Future optimizations to be made.
>>>>>>> 1. The current implementation of vexvvvvv needs to be optimized.
>>>>>>> 2. The handling of double VEX/EVEX templates in check_register()
>>>>>>> needs to
>>>>>> be optimized.
>>>>>>
>>>>>> I hope this is just stale here, and the dependency on templates was
>>>>>> now removed again from check_register().
>>>>>
>>>>> In fact, I didn't remove it in V4, I didn't find a better place to
>>>>> deal with it. I
>>>> don't know if you agree with this implementation below.
>>>>
>>>> I'm afraid I don't, both because it still isn't clear to me what's
>>>> wrong with my alternative proposal, and also for the formal reason of ...
>>>>
>>>
>>> For the alternative proposal, do you mean adding a new variable to avoid
>> introducing new loops over all operands? How about this ? or do you want to
>> add other variable and handle it in check_register?
>>
>> No, the alternative proposal continues to be to introduce a new enumerator
>> to record in i.vec_encoding (vex_encoding_egpr is what iirc I had suggested
>> before, despite the naming anomaly). What you outline below would,
>> however, still be better than adding another loop (as you had it earlier), imo.
>>
> 
> I guessed you want to add a new type like vex_encoding_egpr, but I don't know how to do it differently with before, when the instruction support legacy, vex and evex encodings, if we put the vex and eves templates in front of the legacy templates (in i386-opc.tbl), we'll assign the vex_encoding_egpr for the legacy input, and it will have the same problem as before. And we also need to handle it in check_register(). Maybe you hinted at some other way of handling it, but I didn't get it.
> 
> 
>      if (current_templates.start->opcode_modifier.vex
>         && current_templates.start->opcode_modifier.evex)
>       i.vec_encoding = vex_encoding_egpr;

Since setting of the new encoding type has to happen in check_register(),
using current_templates (as said several times before) is not an option.

Anyway, in the interest of forward progress, feel free to go with ...

>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -464,6 +464,9 @@ struct _i386_insn
>>>      /* Have NOTRACK prefix.  */
>>>      const char *notrack_prefix;
>>>
>>> +    /* Has Egpr.  */
>>> +    bool has_egpr;
>>> +
>>>      /* Error message.  */
>>>      enum i386_error error;
>>>    };
>>
>> As a general remark, when you add new fields to a struct, please try to find a
>> slot that ideally is using existing padding _and_ is next to related fields, or at
>> least one of the two.
>>
> 
> Moved to
> 
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -438,6 +438,9 @@ struct _i386_insn
>      /* Prefer the REX2 prefix in encoding.  */
>      bool rex2_encoding;
> 
> +    /* Has Egpr.  */
> +    bool has_egpr;

... this approach then, and subsequently I'll see if I can re-arrange things
(and if I'm bothered enough to do so). The comment is pretty unhelpful as is,
how about "Need to use an eGPR capable encoding (REX2 or EVEX)" or some such?

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 0/9] Support Intel APX EGPR
  2023-12-20 12:01             ` Jan Beulich
@ 2023-12-20 12:16               ` Cui, Lili
  0 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2023-12-20 12:16 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 20.12.2023 12:50, Cui, Lili wrote:
> >>>>>> On 19.12.2023 13:12, Cui, Lili wrote:
> >>>>>>> *** BLURB HERE ***
> >>>>>>> Future optimizations to be made.
> >>>>>>> 1. The current implementation of vexvvvvv needs to be optimized.
> >>>>>>> 2. The handling of double VEX/EVEX templates in check_register()
> >>>>>>> needs to
> >>>>>> be optimized.
> >>>>>>
> >>>>>> I hope this is just stale here, and the dependency on templates
> >>>>>> was now removed again from check_register().
> >>>>>
> >>>>> In fact, I didn't remove it in V4, I didn't find a better place to
> >>>>> deal with it. I
> >>>> don't know if you agree with this implementation below.
> >>>>
> >>>> I'm afraid I don't, both because it still isn't clear to me what's
> >>>> wrong with my alternative proposal, and also for the formal reason of ...
> >>>>
> >>>
> >>> For the alternative proposal, do you mean adding a new variable to
> >>> avoid
> >> introducing new loops over all operands? How about this ? or do you
> >> want to add other variable and handle it in check_register?
> >>
> >> No, the alternative proposal continues to be to introduce a new
> >> enumerator to record in i.vec_encoding (vex_encoding_egpr is what
> >> iirc I had suggested before, despite the naming anomaly). What you
> >> outline below would, however, still be better than adding another loop (as
> you had it earlier), imo.
> >>
> >
> > I guessed you want to add a new type like vex_encoding_egpr, but I don't
> know how to do it differently with before, when the instruction support
> legacy, vex and evex encodings, if we put the vex and eves templates in front
> of the legacy templates (in i386-opc.tbl), we'll assign the vex_encoding_egpr
> for the legacy input, and it will have the same problem as before. And we also
> need to handle it in check_register(). Maybe you hinted at some other way of
> handling it, but I didn't get it.
> >
> >
> >      if (current_templates.start->opcode_modifier.vex
> >         && current_templates.start->opcode_modifier.evex)
> >       i.vec_encoding = vex_encoding_egpr;
> 
> Since setting of the new encoding type has to happen in check_register(),
> using current_templates (as said several times before) is not an option.
> 
> Anyway, in the interest of forward progress, feel free to go with ...
> 
> >>> --- a/gas/config/tc-i386.c
> >>> +++ b/gas/config/tc-i386.c
> >>> @@ -464,6 +464,9 @@ struct _i386_insn
> >>>      /* Have NOTRACK prefix.  */
> >>>      const char *notrack_prefix;
> >>>
> >>> +    /* Has Egpr.  */
> >>> +    bool has_egpr;
> >>> +
> >>>      /* Error message.  */
> >>>      enum i386_error error;
> >>>    };
> >>
> >> As a general remark, when you add new fields to a struct, please try
> >> to find a slot that ideally is using existing padding _and_ is next
> >> to related fields, or at least one of the two.
> >>
> >
> > Moved to
> >
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -438,6 +438,9 @@ struct _i386_insn
> >      /* Prefer the REX2 prefix in encoding.  */
> >      bool rex2_encoding;
> >
> > +    /* Has Egpr.  */
> > +    bool has_egpr;
> 
> ... this approach then, and subsequently I'll see if I can re-arrange things (and
> if I'm bothered enough to do so). The comment is pretty unhelpful as is, how
> about "Need to use an eGPR capable encoding (REX2 or EVEX)" or some such?
> 

Great, thanks!

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -438,6 +438,9 @@ struct _i386_insn
     /* Prefer the REX2 prefix in encoding.  */
     bool rex2_encoding;

+    /* Need to use an Egpr capable encoding (REX2 or EVEX).  */
+    bool has_egpr;
+

Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/9] Support APX GPR32 with rex2 prefix
  2023-12-19 12:12 ` [PATCH v4 1/9] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-12-22 13:08   ` Jan Beulich
  2023-12-25  6:14     ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-22 13:08 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 19.12.2023 13:12, Cui, Lili wrote:
> @@ -5283,6 +5321,9 @@ md_assemble (char *line)
>  	case unsupported_syntax:
>  	  err_msg = _("unsupported syntax");
>  	  break;
> +	case unsupported_EGPR_for_addressing:
> +	  err_msg = _("extended GPR cannot be used as base/index");
> +	  break;

While this one's now suitable for the as_bad() below the switch, ...

> @@ -5336,6 +5377,9 @@ md_assemble (char *line)
>  	case invalid_dest_and_src_register_set:
>  	  err_msg = _("destination and source registers must be distinct");
>  	  break;
> +	case invalid_pseudo_prefix:
> +	  err_msg = _("rex2 pseudo prefix cannot be used here");
> +	  break;

... this one still doesn't really fit the "... for `<insn>'" there. At
least the "here" needs dropping.

> @@ -5630,11 +5681,12 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
>  
> -      i.rex |= REX_OPCODE;
> +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> +	i.rex |= REX_OPCODE;
>        for (x = 0; x < 2; x++)
>  	{
>  	  /* Look for 8 bit operand that uses old registers.  */
> @@ -5645,7 +5697,7 @@ md_assemble (char *line)
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
>  		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> +			  "instruction requiring REX/REX2 prefix."),
>  			register_prefix, i.op[x].regs->reg_name);
>  
>  	      /* Otherwise it is equivalent to the extended register.
> @@ -5657,11 +5709,11 @@ md_assemble (char *line)
>  	}
>      }
>  
> -  if (i.rex == 0 && i.rex_encoding)
> +  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
>      {
>        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
>  	 that uses legacy register.  If it is "hi" register, don't add
> -	 the REX_OPCODE byte.  */
> +	 rex and rex2 prefix.  */
>        int x;
>        for (x = 0; x < 2; x++)
>  	if (i.types[x].bitfield.class == Reg
> @@ -5671,6 +5723,7 @@ md_assemble (char *line)
>  	  {
>  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	    i.rex_encoding = false;
> +	    i.rex2_encoding = false;
>  	    break;
>  	  }
>  
> @@ -5678,7 +5731,13 @@ md_assemble (char *line)
>  	i.rex = REX_OPCODE;
>      }
>  
> -  if (i.rex != 0)
> +  if (is_apx_rex2_encoding ())
> +    {
> +      build_rex2_prefix ();
> +      /* The individual REX.RXBW bits got consumed.  */
> +      i.rex &= REX_OPCODE;
> +    }
> +  else if (i.rex != 0)
>      add_prefix (REX_OPCODE | i.rex);
>  
>    insert_lfence_before (last_insn);

All of this will need re-basing over "x86: properly respect rex/{rex}",
with the result (I hope) that .insn will then also be covered REX2-wise.

> @@ -5752,6 +5811,20 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
>  	    goto too_long;
>  	  *mnem_p = '\0';
>  
> +	  /* Point l at the closing brace if there's no other separator.  */
> +	  if (*l != END_OF_INSN && !is_space_char (*l)
> +	      && *l != PREFIX_SEPARATOR)
> +	    --l;
> +	}
> +      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
> +      else if (*mnemonic == '{'&& is_space_char (*l))

Nit: Missing blank.

> +	{
> +	  while ( *l != '}')

Nit: Stray blank.

> +	    ++l;

What if there's no '}' on the line?

> +	  *mnem_p++ = *l++;
> +	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
> +	    goto too_long;
> +	  *mnem_p = '\0';

You skip everything, not just 0xNN. You also skip stuff after other pseudo
prefixes, if I'm not mistaken. That's both too lax. However, skipping
isn't an option here anyway. Either we actually respect what the user has
written, or we error out. Skipping therefore is an option only if the
provided expression (not just plain number!) evaluates to 0. I don't
understand anyway why this code was added: When I asked about the specific
plans, H.J. clearly said the form with a constant would be a disassembler-
only thing for now.

> @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)

Hmm, I thought I had asked before to make functions with boolean return
values have a return type of bool, and then use "true" for success. An
alternative would be to return the error indicator, rather than putting
it in i.error here.

Then again I realize this is in line with VEX_check_encoding() and
check_VecOperands() (which I think would better be changed, but anyway).

> @@ -7149,6 +7263,14 @@ match_template (char mnem_suffix)
>  	      continue;
>  	    }
>  
> +	  /* Check if pseudo prefix {rex2} is valid.  */
> +	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
> +	    {
> +	      i.error = invalid_pseudo_prefix;

What is this needed for? I.e. why can't you pass the value ...

> +	      specific_error = progress (i.error);

... in here directly (as is done elsewhere as well)?

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +         test	$0x7, %r24b
> +         test	$0x7, %r24d
> +         test	$0x7, %r24
> +         test	$0x7, %r24w
> +## REX2.M bit
> +         imull	%eax, %r15d
> +         imull	%eax, %r16d
> +         punpckldq (%r18), %mm2
> +## REX2.R4 bit
> +         leal	(%rax), %r16d
> +         leal	(%rax), %r17d
> +         leal	(%rax), %r18d
> +         leal	(%rax), %r19d
> +         leal	(%rax), %r20d
> +         leal	(%rax), %r21d
> +         leal	(%rax), %r22d
> +         leal	(%rax), %r23d
> +         leal	(%rax), %r24d
> +         leal	(%rax), %r25d
> +         leal	(%rax), %r26d
> +         leal	(%rax), %r27d
> +         leal	(%rax), %r28d
> +         leal	(%rax), %r29d
> +         leal	(%rax), %r30d
> +         leal	(%rax), %r31d
> +## REX2.X4 bit
> +         leal	(,%r16), %eax
> +         leal	(,%r17), %eax
> +         leal	(,%r18), %eax
> +         leal	(,%r19), %eax
> +         leal	(,%r20), %eax
> +         leal	(,%r21), %eax
> +         leal	(,%r22), %eax
> +         leal	(,%r23), %eax
> +         leal	(,%r24), %eax
> +         leal	(,%r25), %eax
> +         leal	(,%r26), %eax
> +         leal	(,%r27), %eax
> +         leal	(,%r28), %eax
> +         leal	(,%r29), %eax
> +         leal	(,%r30), %eax
> +         leal	(,%r31), %eax
> +## REX2.B4 bit
> +         leal	(%r16), %eax
> +         leal	(%r17), %eax
> +         leal	(%r18), %eax
> +         leal	(%r19), %eax
> +         leal	(%r20), %eax
> +         leal	(%r21), %eax
> +         leal	(%r22), %eax
> +         leal	(%r23), %eax
> +         leal	(%r24), %eax
> +         leal	(%r25), %eax
> +         leal	(%r26), %eax
> +         leal	(%r27), %eax
> +         leal	(%r28), %eax
> +         leal	(%r29), %eax
> +         leal	(%r30), %eax
> +         leal	(%r31), %eax
> +## REX2.W bit
> +         leaq	(%rax), %r15
> +         leaq	(%rax), %r16
> +         leaq	(%r15), %rax
> +         leaq	(%r16), %rax
> +         leaq	(,%r15), %rax
> +         leaq	(,%r16), %rax
> +## REX2.R3 bit
> +         add    (%r16), %r8
> +         add    (%r16), %r15
> +## REX2.X3 bit
> +         mov    (,%r9), %r16
> +         mov    (,%r14), %r16
> +## REX2.B3 bit
> +	 sub   (%r10), %r31
> +	 sub   (%r13), %r31

Nit: There's still an indentation anomaly here.

> --- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> +++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> @@ -1,4 +1,8 @@
>  	.text
>  	{disp16} movb (%ebp),%al
>  	{disp16} movb (%rbp),%al
> +
> +	/* Instruction not support APX.  */
> +	{rex2} xsave (%r15, %rbx)
> +	{rex2} xsave64 (%r15, %rbx)
>  	.p2align 4,0

Aren't these dealt with (in a more complete fashion) in x86-64-pseudos-bad.s?

> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,77 @@ pseudos:
>  	{rex} vmovaps %xmm7,%xmm2
>  	{rex} vmovaps %xmm17,%xmm2
>  	{rex} rorx $7,%eax,%ebx
> +	{rex2} vmovaps %xmm7,%xmm2
> +	{rex2} xsave (%rax)
> +	{rex2} xsaves (%ecx)
> +	{rex2} xsaves64 (%ecx)
> +	{rex2} xsavec (%ecx)
> +	{rex2} xrstors (%ecx)
> +	{rex2} xrstors64 (%ecx)

Here.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -144,6 +144,12 @@ struct instr_info
>    /* Bits of REX we've already used.  */
>    uint8_t rex_used;
>  
> +  /* Record W R4 X4 B4 bits for rex2.  */
> +  unsigned char rex2;
> +  /* Bits of REX2 we've already used.  */
> +  unsigned char rex2_used;

When you say REX2, one ought to be permitted to imply you mean the
prefix, not the struct field. That's ambiguous here, though - bit
positions used match those in rex2, not those in the REX2 payload.
IOW either you use lower case to make more obvious that you mean the
other struct field, or you say e.g. "REX2 prefix bits we've already
used." Albeit that would still be imprecise, as other REX2 prefix
bits' use is recorded in rex_used.

> @@ -265,8 +272,13 @@ struct dis_private {
>    {							\
>      if (value)						\
>        {							\
> -	if ((ins->rex & value))				\
> +	if (ins->rex & value)				\
>  	  ins->rex_used |= (value) | REX_OPCODE;	\

Like is done here, ...

> +	if (ins->rex2 & value)				\
> +	  {						\
> +	    ins->rex2_used |= value;			\

other uses of "value" also want parenthesizing, unless not used with
any kind of operator (e.g. in the if() above).

> @@ -276,6 +288,9 @@ struct dis_private {
>  #define EVEX_b_used 1
>  #define EVEX_len_used 2
>  
> +/* M0 in rex2 prefix represents map0 or map1.  */
> +#define REX2_M 0x8

Extending an earlier comment: This really should go next to REX_W and
friends. In principle the assembler could want to use this constant as
well, hence why it would better go the opcode/i386.h anyway.

> @@ -4196,19 +4221,19 @@ static const struct dis386 x86_64_table[][2] = {
>  
>    /* X86_64_E8 */
>    {
> -    { "callP",		{ Jv, BND }, 0 },
> -    { "call@",		{ Jv, BND }, 0 }
> +    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },

This, ...

> +    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
>    },
>  
>    /* X86_64_E9 */
>    {
> -    { "jmpP",		{ Jv, BND }, 0 },
> -    { "jmp@",		{ Jv, BND }, 0 }
> +    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },

... this, and ...

> +    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
>    },
>  
>    /* X86_64_EA */
>    {
> -    { "{l|}jmp{P|}", { Ap }, 0 },
> +    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
>    },

... this change isn't really needed, is it? The marker is only needed
on 64-bit insns (i.e. respective slots 2 of x86_64_table[] entries).

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-19 12:12 ` [PATCH v4 3/9] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-12-22 13:49   ` Jan Beulich
  2023-12-25 12:23     ` Cui, Lili
  2023-12-22 14:19   ` Jan Beulich
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-22 13:49 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 19.12.2023 13:12, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -89,6 +89,7 @@
>  /* This matches the C -> StaticRounding alias in the opcode table.  */
>  #define commutative staticrounding
>  
> +#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))

Why is this still here? I said more than once that it's not helpful to
have. As can be seen ...

> @@ -3673,7 +3674,7 @@ install_template (const insn_template *t)
>  
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> -  {
> +    {
>        if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>  	   || maybe_cpu (t, CpuFMA))
>  	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> @@ -3695,7 +3696,15 @@ install_template (const insn_template *t)
>  		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
>  	    }
>  	}
> -  }
> +
> +      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
> +	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
> +	  || APX_F(CpuBMI2))

... right here: There's no point in checking CpuAPX_F a whopping 7 times.

> +	if (need_evex_encoding ())
> +	  i.tm.opcode_modifier.vex = 0;
> +	else
> +	  i.tm.opcode_modifier.evex = 0;
> +    }

I'm also pretty sure that I asked before that such nested if/else please
have proper braces for the body of the outer if().

To say it very clearly again: When you submit a new version, _all_ prior
review comments should be addressed. Whether that's verbally (by
explaining why a change cannot be made) or by adjusting the code is
another matter. I said before that reviewing this work has proven
extremely time consuming. I shouldn't be required to needlessly put in
yet more time, just to re-spot and re-comment things already pointed
out.

> @@ -3876,6 +3885,15 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +/* We can use this function only when the current encoding is evex.  */
> +static INLINE bool
> +is_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);

Nit: Parentheses please around the & expression.

> @@ -8097,7 +8142,11 @@ process_suffix (void)
>  	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
>  	    prefix = ADDR_PREFIX_OPCODE;
>  
> -	  if (!add_prefix (prefix))
> +	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> +	     needs to be adjusted.  */
> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
> +	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;

Feels like I did ask before: What if i.tm.opcode_modifier.opcodeprefix
is already set? Aiui that would be a bug, but one that's likely easy
to introduce and hard to find. IOW - better assert the field is clear
before filling it?

> @@ -14293,6 +14342,12 @@ static bool check_register (const reg_entry *r)
>        if (!cpu_arch_flags.bitfield.cpuapx_f
>  	  || flag_code != CODE_64BIT)
>  	return false;
> +
> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
> +	 For the later install_template function.  */
> +      if (current_templates.start->opcode_modifier.vex
> +	  && current_templates.start->opcode_modifier.evex)
> +	i.vec_encoding = vex_encoding_evex;
>      }

Just to state it again - no use of current_templates in this funciton,
please.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -106,16 +106,6 @@
>  #define HLEPrefixRelease PrefixOk=PrefixHLERelease
>  #define NoTrackPrefixOk  PrefixOk=PrefixNoTrack
>  
> -#define Space0F    OpcodeSpace=SPACE_0F
> -#define Space0F38  OpcodeSpace=SPACE_0F38
> -#define Space0F3A  OpcodeSpace=SPACE_0F3A
> -#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
> -#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
> -#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
> -
> -#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
> -#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
> -

Why are you moving these, leaving ...

>  #define VexMap7 OpcodeSpace=SPACE_VEXMAP7

... this one disconnected? IOW - I see no need for the moving, but
if there is a need, then this one also needs moving (see how it'll
become relevant for USER_MSR+APX_F now). Specifically ...

> @@ -137,11 +127,25 @@
>  #define EVexLIG EVex=EVEXLIG
>  #define EVexDYN EVex=EVEXDYN
>  
> +#define Space0F    OpcodeSpace=SPACE_0F
> +#define Space0F38  OpcodeSpace=SPACE_0F38
> +#define Space0F3A  OpcodeSpace=SPACE_0F3A
> +#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
> +#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
> +#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
> +
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128

... there's no need for this to live after EVex128 was #define-s.

> +#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
> +#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
> +
>  #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
>  
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +// The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)

I think the comment wants to go into further detail. Please can you
go back to read what I said when I suggested this construct, in
particular regarding the stripping then done? However, with you not
having found a need to fiddle with cpu_flags_match(), I wonder if
this construct is needed in the first place. The earlier suggestion
was entirely based on the assumption that stripping similar to that
for other combined VEX/EVEX templates would be needed here, too.

> @@ -2049,13 +2059,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
>  
>  // SHA instructions.
>  sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }

What about this form? It surely also wants to gain an EVEX counterpart.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-19 12:12 ` [PATCH v4 3/9] Support APX GPR32 with extend evex prefix Cui, Lili
  2023-12-22 13:49   ` Jan Beulich
@ 2023-12-22 14:19   ` Jan Beulich
  2023-12-26  7:00     ` Cui, Lili
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-22 14:19 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 19.12.2023 13:12, Cui, Lili wrote:
> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h
> @@ -285,6 +285,14 @@
>      { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
>      { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
>    },
> +  /* PREFIX_EVEX_0F38F2_L_0 */
> +  {
> +    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
> +  },

So not being able to re-use the VEX entry for this and ...

> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -49,3 +49,10 @@
>      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>    },
> +  /* REG_EVEX_0F38F3_L_0_P_0 */
> +  {
> +    { Bad_Opcode },
> +    { "blsrS",	{ VexGdq, Edq }, 0 },
> +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> +    { "blsiS",	{ VexGdq, Edq }, 0 },
> +  },

... this was due to the VEX entries having PREFIX_OPCODE, which would be
getting in the way? This is the sort of thing that would be useful to
have in the description, to avoid raising the same question again that
(I think) was raised before.

Yet then - why do you strip PREFIX_OPCODE from the VEX entries? If you
do that (as iirc I did suggest), there's no need for having separate EVEX
ones (the suggestion, after all, was to be able to re-use the VEX
entries). That re-work of existing VEX encodings could, btw, also have
been split out quite easily. That way this huge patch would have further
shrunk a little.

> --- /dev/null
> +++ b/opcodes/i386-dis-evex-x86-64.h
> @@ -0,0 +1,50 @@
> +  /* X86_64_EVEX_0F90 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },
> +  },
> +  /* X86_64_EVEX_0F91 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },
> +  },
> +  /* X86_64_EVEX_0F92 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },
> +  },
> +  /* X86_64_EVEX_0F93 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F2 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F3 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F5 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F6 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F7 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },
> +  },
> +  /* X86_64_EVEX_0F3AF0 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
> +  },

Am I misremembering that we had agreed that this new file isn't necessary,
by having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
least I couldn't find a mail from you saying this isn't possible (and why).

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-19 12:12 ` [PATCH v4 4/9] Add tests for " Cui, Lili
@ 2023-12-22 14:41   ` Jan Beulich
  2023-12-25 13:40     ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-12-22 14:41 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 19.12.2023 13:12, Cui, Lili wrote:
> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> @@ -16,3 +16,194 @@
>  	xsaveopt64 (%r16, %r31)
>  	xsavec (%r16, %rbx)
>  	xsavec64 (%r16, %r31)
> +#SSE
> +	blendpd $100,(%r18),%xmm6
> +	blendps $100,(%r18),%xmm6
> +	blendvpd %xmm0,(%r19),%xmm6
> +	blendvpd (%r19),%xmm6
> +	blendvps %xmm0,(%r19),%xmm6
> +	blendvps (%r19),%xmm6
> +	dppd $100,(%r20),%xmm6
> +	dpps $100,(%r20),%xmm6
> +	extractps $100,%xmm4,%r21
> +	extractps $100,%xmm4,(%r21)
> +	insertps $100,(%r21),%xmm6
> +	movntdqa (%r21),%xmm4
> +	mpsadbw $100,(%r21),%xmm6
> +	pabsb (%r17),%xmm0
> +	pabsd (%r17),%xmm0
> +	pabsw (%r17),%xmm0
> +	packusdw (%r21),%xmm6
> +	palignr $100,(%r17),%xmm6
> +	pblendvb %xmm0,(%r22),%xmm6
> +	pblendvb (%r22),%xmm6
> +	pblendw $100,(%r22),%xmm6
> +	pcmpeqq (%r22),%xmm6
> +	pcmpestri $100,(%r25),%xmm6
> +	pcmpestrm $100,(%r25),%xmm6
> +	pcmpgtq (%r25),%xmm4
> +	pcmpistri $100,(%r25),%xmm6
> +	pcmpistrm $100,(%r25),%xmm6
> +	pextrb $100,%xmm4,%r22
> +	pextrb $100,%xmm4,(%r22)
> +	pextrd $100,%xmm4,(%r22)
> +	pextrq $100,%xmm4,(%r22)
> +	pextrw $100,%xmm4,(%r22)
> +	phaddd  (%r17),%xmm0
> +	phaddsw (%r17),%xmm0
> +	phaddw  (%r17),%xmm0
> +	phminposuw (%r23),%xmm4
> +	phsubw (%r17),%xmm0
> +	pinsrb $100,%r23,%xmm4
> +	pinsrb $100,(%r23),%xmm4
> +	pinsrd $100, %r23d, %xmm4
> +	pinsrd $100,(%r23),%xmm4
> +	pinsrq $100, %r24, %xmm4
> +	pinsrq $100,(%r24),%xmm4
> +	pmaddubsw (%r17),%xmm0
> +	pmaxsb (%r24),%xmm6
> +	pmaxsd (%r24),%xmm6
> +	pmaxud (%r24),%xmm6
> +	pmaxuw (%r24),%xmm6
> +	pminsb (%r24),%xmm6
> +	pminsd (%r24),%xmm6
> +	pminud (%r24),%xmm6
> +	pminuw (%r24),%xmm6
> +	pmovsxbd (%r24),%xmm4
> +	pmovsxbq (%r24),%xmm4
> +	pmovsxbw (%r24),%xmm4
> +	pmovsxbw (%r24),%xmm4
> +	pmovsxdq (%r24),%xmm4
> +	pmovsxwd (%r24),%xmm4
> +	pmovsxwq (%r24),%xmm4
> +	pmovzxbd (%r24),%xmm4
> +	pmovzxbq (%r24),%xmm4
> +	pmovzxdq (%r24),%xmm4
> +	pmovzxwd (%r24),%xmm4
> +	pmovzxwq (%r24),%xmm4
> +	pmuldq (%r24),%xmm4
> +	pmulhrsw (%r17),%xmm0
> +	pmulld (%r24),%xmm4
> +	pshufb (%r17),%xmm0
> +	psignb (%r17),%xmm0
> +	psignd (%r17),%xmm0
> +	psignw (%r17),%xmm0
> +	roundpd $100,(%r24),%xmm6
> +	roundps $100,(%r24),%xmm6
> +	roundsd $100,(%r24),%xmm6
> +	roundss $100,(%r24),%xmm6
> +#AES
> +	aesdec (%r26),%xmm6
> +	aesdeclast (%r26),%xmm6
> +	aesenc (%r26),%xmm6
> +	aesenclast (%r26),%xmm6
> +	aesimc (%r26),%xmm6
> +	aeskeygenassist $100,(%r26),%xmm6
> +	pclmulhqhqdq (%r26),%xmm6
> +	pclmulhqlqdq (%r26),%xmm6
> +	pclmullqhqdq (%r26),%xmm6
> +	pclmullqlqdq (%r26),%xmm6
> +	pclmulqdq $100,(%r26),%xmm6
> +#GFNI
> +	gf2p8affineinvqb $100,(%r26),%xmm6
> +	gf2p8affineqb $100,(%r26),%xmm6
> +	gf2p8mulb (%r26),%xmm6
> +#VEX without evex
> +	vaesimc (%r27), %xmm3
> +	vaeskeygenassist $7,(%r27),%xmm3
> +	vblendpd $7,(%r27),%xmm6,%xmm2
> +	vblendpd $7,(%r27),%ymm6,%ymm2
> +	vblendps $7,(%r27),%xmm6,%xmm2
> +	vblendps $7,(%r27),%ymm6,%ymm2
> +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
> +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
> +	vdppd $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%ymm6,%ymm2
> +	vhaddpd (%r27),%xmm6,%xmm5
> +	vhaddpd (%r27),%ymm6,%ymm5
> +	vhsubps (%r27),%xmm6,%xmm5
> +	vhsubps (%r27),%ymm6,%ymm5
> +	vlddqu (%r27),%xmm4
> +	vlddqu (%r27),%ymm4
> +	vldmxcsr (%r27)
> +	vmaskmovpd %xmm4,%xmm6,(%r27)
> +	vmaskmovpd %ymm4,%ymm6,(%r27)
> +	vmaskmovpd (%r27),%xmm4,%xmm6
> +	vmaskmovpd (%r27),%ymm4,%ymm6
> +	vmaskmovps %xmm4,%xmm6,(%r27)
> +	vmaskmovps %ymm4,%ymm6,(%r27)
> +	vmaskmovps (%r27),%xmm4,%xmm6
> +	vmaskmovps (%r27),%ymm4,%ymm6
> +	vmovmskpd %xmm4,%r27d
> +	vmovmskpd %xmm8,%r27d
> +	vmovmskps %xmm4,%r27d
> +	vmovmskps %ymm8,%r27d
> +	vpblendd $7,(%r27),%xmm6,%xmm2
> +	vpblendd $7,(%r27),%ymm6,%ymm2
> +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
> +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
> +	vpblendw $7,(%r27),%xmm6,%xmm2
> +	vpblendw $7,(%r27),%ymm6,%ymm2
> +	vpcmpeqb (%r26),%ymm6,%ymm2
> +	vpcmpeqd (%r26),%ymm6,%ymm2
> +	vpcmpeqq (%r16),%ymm6,%ymm2
> +	vpcmpeqw (%r16),%ymm6,%ymm2
> +	vpcmpestri $7,(%r27),%xmm6
> +	vpcmpestrm $7,(%r27),%xmm6
> +	vpcmpgtb (%r26),%ymm6,%ymm2
> +	vpcmpgtd (%r26),%ymm6,%ymm2
> +	vpcmpgtq (%r16),%ymm6,%ymm2
> +	vpcmpgtw (%r16),%ymm6,%ymm2
> +	vpcmpistri $100,(%r25),%xmm6
> +	vpcmpistrm $100,(%r25),%xmm6
> +	vperm2f128 $7,(%r27),%ymm6,%ymm2
> +	vperm2i128 $7,(%r27),%ymm6,%ymm2
> +	vphaddd (%r27),%xmm6,%xmm7
> +	vphaddd (%r27),%ymm6,%ymm7
> +	vphaddsw (%r27),%xmm6,%xmm7
> +	vphaddsw (%r27),%ymm6,%ymm7
> +	vphaddw (%r27),%xmm6,%xmm7
> +	vphaddw (%r27),%ymm6,%ymm7
> +	vphminposuw (%r27),%xmm6
> +	vphsubd (%r27),%xmm6,%xmm7
> +	vphsubd (%r27),%ymm6,%ymm7
> +	vphsubsw (%r27),%xmm6,%xmm7
> +	vphsubsw (%r27),%ymm6,%ymm7
> +	vphsubw (%r27),%xmm6,%xmm7
> +	vphsubw (%r27),%ymm6,%ymm7
> +	vpmaskmovd %xmm4,%xmm6,(%r27)
> +	vpmaskmovd %ymm4,%ymm6,(%r27)
> +	vpmaskmovd (%r27),%xmm4,%xmm6
> +	vpmaskmovd (%r27),%ymm4,%ymm6
> +	vpmaskmovq %xmm4,%xmm6,(%r27)
> +	vpmaskmovq %ymm4,%ymm6,(%r27)
> +	vpmaskmovq (%r27),%xmm4,%xmm6
> +	vpmaskmovq (%r27),%ymm4,%ymm6
> +	vpmovmskb %xmm4,%r27
> +	vpmovmskb %ymm4,%r27d
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vptest (%r27),%ymm6
> +	vptest (%r27),%xmm6
> +	vrcpps (%r27),%xmm6
> +	vrcpps (%r27),%ymm6
> +	vrcpss (%r27),%xmm6,%xmm6
> +	vroundpd $1,(%r24),%xmm6
> +	vroundps $2,(%r24),%xmm6
> +	vroundsd $3,(%r24),%xmm6,%xmm3
> +	vroundss $4,(%r24),%xmm6,%xmm3

These are still here, when they can be expressed.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -0,0 +1,34 @@
> +# Check Illegal prefix for 64bit EVEX-promoted instructions
> +
> +        .allow_index_reg
> +        .text
> +_start:
> +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
> +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
> +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
> +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
> +	#VSIB vpgatherqq (%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
> +	#(illegal value).
> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd
> +	.byte 0xff

This one's still using .byte and still referencing P[10] in the comment.
If that's really unavoidable, then - as elsewhere - the description of
the patch could (and should) provide the reason. And there continue to
be no separating blank lines, making it as hard as before to find ones
way through all of this.

> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == 0b01 (illegal value).
> +	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.a1a0 (P[17:16]) == 0b01

See my earlier comment about a1a0. Once again - please can you use SDM
terms as much as possible, and prefer descriptive names over non-
descriptive ones?

> +        #(illegal value).

??? At the very least indentation is bogus here, but as to this
particular comment (recurring elsewhere) - isn't the entire test about
illegal values?

> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.L'L == 0b01 (illegal value).
> +	.insn EVEX.L1.66.M12.W0 0x60, %di, %ax
> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1 (illegal value).
> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax {%k7}{z}
> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa) == 0b01
> +	#(illegal value).
> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L) == 0b01
> +	#(illegal value).
> +	.insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) == 0b1
> +	#(illegal value).
> +        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx {%k7}{z}

Isn't {%k7} alone rendering the encoding invalid, which was checked
for above already? Also - bogus indentation again.

> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
> +	#(illegal value).
> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx

64-bit register operands with .W0 are kind of okay, but still odd to
see. Note however the one misplaced comma here.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 1/9] Support APX GPR32 with rex2 prefix
  2023-12-22 13:08   ` Jan Beulich
@ 2023-12-25  6:14     ` Cui, Lili
  2024-01-04  8:57       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-25  6:14 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 19.12.2023 13:12, Cui, Lili wrote:
> > @@ -5283,6 +5321,9 @@ md_assemble (char *line)
> >  	case unsupported_syntax:
> >  	  err_msg = _("unsupported syntax");
> >  	  break;
> > +	case unsupported_EGPR_for_addressing:
> > +	  err_msg = _("extended GPR cannot be used as base/index");
> > +	  break;
> 
> While this one's now suitable for the as_bad() below the switch, ...
> 
> > @@ -5336,6 +5377,9 @@ md_assemble (char *line)
> >  	case invalid_dest_and_src_register_set:
> >  	  err_msg = _("destination and source registers must be distinct");
> >  	  break;
> > +	case invalid_pseudo_prefix:
> > +	  err_msg = _("rex2 pseudo prefix cannot be used here");
> > +	  break;
> 
> ... this one still doesn't really fit the "... for `<insn>'" there. At least the "here"
> needs dropping.
> 
Changed to :

"rex2 pseudo prefix cannot be used"

> > @@ -5630,11 +5681,12 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> >
> > -      i.rex |= REX_OPCODE;
> > +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> > +	i.rex |= REX_OPCODE;
> >        for (x = 0; x < 2; x++)
> >  	{
> >  	  /* Look for 8 bit operand that uses old registers.  */ @@ -5645,7
> > +5697,7 @@ md_assemble (char *line)
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> >  		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > +			  "instruction requiring REX/REX2 prefix."),
> >  			register_prefix, i.op[x].regs->reg_name);
> >
> >  	      /* Otherwise it is equivalent to the extended register.
> > @@ -5657,11 +5709,11 @@ md_assemble (char *line)
> >  	}
> >      }
> >
> > -  if (i.rex == 0 && i.rex_encoding)
> > +  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding ||
> > + i.rex2_encoding))
> >      {
> >        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
> >  	 that uses legacy register.  If it is "hi" register, don't add
> > -	 the REX_OPCODE byte.  */
> > +	 rex and rex2 prefix.  */
> >        int x;
> >        for (x = 0; x < 2; x++)
> >  	if (i.types[x].bitfield.class == Reg @@ -5671,6 +5723,7 @@
> > md_assemble (char *line)
> >  	  {
> >  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	    i.rex_encoding = false;
> > +	    i.rex2_encoding = false;
> >  	    break;
> >  	  }
> >
> > @@ -5678,7 +5731,13 @@ md_assemble (char *line)
> >  	i.rex = REX_OPCODE;
> >      }
> >
> > -  if (i.rex != 0)
> > +  if (is_apx_rex2_encoding ())
> > +    {
> > +      build_rex2_prefix ();
> > +      /* The individual REX.RXBW bits got consumed.  */
> > +      i.rex &= REX_OPCODE;
> > +    }
> > +  else if (i.rex != 0)
> >      add_prefix (REX_OPCODE | i.rex);
> >
> >    insert_lfence_before (last_insn);
> 
> All of this will need re-basing over "x86: properly respect rex/{rex}", with the
> result (I hope) that .insn will then also be covered REX2-wise.
> 
Done.

> > @@ -5752,6 +5811,20 @@ parse_insn (const char *line, char *mnemonic,
> bool prefix_only)
> >  	    goto too_long;
> >  	  *mnem_p = '\0';
> >
> > +	  /* Point l at the closing brace if there's no other separator.  */
> > +	  if (*l != END_OF_INSN && !is_space_char (*l)
> > +	      && *l != PREFIX_SEPARATOR)
> > +	    --l;
> > +	}
> > +      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
> > +      else if (*mnemonic == '{'&& is_space_char (*l))
> 
> Nit: Missing blank.
> 
> > +	{
> > +	  while ( *l != '}')
> 
> Nit: Stray blank.
> 
> > +	    ++l;
> 
> What if there's no '}' on the line?
> 
> > +	  *mnem_p++ = *l++;
> > +	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
> > +	    goto too_long;
> > +	  *mnem_p = '\0';
> 
> You skip everything, not just 0xNN. You also skip stuff after other pseudo
> prefixes, if I'm not mistaken. That's both too lax. However, skipping isn't an
> option here anyway. Either we actually respect what the user has written, or
> we error out. Skipping therefore is an option only if the provided expression
> (not just plain number!) evaluates to 0. I don't understand anyway why this
> code was added: When I asked about the specific plans, H.J. clearly said the
> form with a constant would be a disassembler- only thing for now.
> 

It should be only supported in disassembler, I will drop it in assembler.

> > @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t)
> 
> Hmm, I thought I had asked before to make functions with boolean return
> values have a return type of bool, and then use "true" for success. An
> alternative would be to return the error indicator, rather than putting it in
> i.error here.
> 
> Then again I realize this is in line with VEX_check_encoding() and
> check_VecOperands() (which I think would better be changed, but anyway).
> 

Changed it to bool. For the rest, it's a bit strange to only change check_EgprOperands. Can this place be left unchanged? Or should I submit a new patch and change the old one first?

> > @@ -7149,6 +7263,14 @@ match_template (char mnem_suffix)
> >  	      continue;
> >  	    }
> >
> > +	  /* Check if pseudo prefix {rex2} is valid.  */
> > +	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
> > +	    {
> > +	      i.error = invalid_pseudo_prefix;
> 
> What is this needed for? I.e. why can't you pass the value ...
> 
> > +	      specific_error = progress (i.error);
> 
> ... in here directly (as is done elsewhere as well)?
> 

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> > @@ -0,0 +1,86 @@
> > +## REX2.B3 bit
> > +	 sub   (%r10), %r31
> > +	 sub   (%r13), %r31
> 
> Nit: There's still an indentation anomaly here.
> 
Done.

> > --- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> > +++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> > @@ -1,4 +1,8 @@
> >  	.text
> >  	{disp16} movb (%ebp),%al
> >  	{disp16} movb (%rbp),%al
> > +
> > +	/* Instruction not support APX.  */
> > +	{rex2} xsave (%r15, %rbx)
> > +	{rex2} xsave64 (%r15, %rbx)
> >  	.p2align 4,0
> 
> Aren't these dealt with (in a more complete fashion) in x86-64-pseudos-
> bad.s?
> 
Removed.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > @@ -5,3 +5,77 @@ pseudos:
> >  	{rex} vmovaps %xmm7,%xmm2
> >  	{rex} vmovaps %xmm17,%xmm2
> >  	{rex} rorx $7,%eax,%ebx
> > +	{rex2} vmovaps %xmm7,%xmm2
> > +	{rex2} xsave (%rax)
> > +	{rex2} xsaves (%ecx)
> > +	{rex2} xsaves64 (%ecx)
> > +	{rex2} xsavec (%ecx)
> > +	{rex2} xrstors (%ecx)
> > +	{rex2} xrstors64 (%ecx)
> 
> Here.
> 
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -144,6 +144,12 @@ struct instr_info
> >    /* Bits of REX we've already used.  */
> >    uint8_t rex_used;
> >
> > +  /* Record W R4 X4 B4 bits for rex2.  */  unsigned char rex2;
> > +  /* Bits of REX2 we've already used.  */  unsigned char rex2_used;
> 
> When you say REX2, one ought to be permitted to imply you mean the prefix,
> not the struct field. That's ambiguous here, though - bit positions used match
> those in rex2, not those in the REX2 payload.
> IOW either you use lower case to make more obvious that you mean the other
> struct field, or you say e.g. "REX2 prefix bits we've already used." Albeit that
> would still be imprecise, as other REX2 prefix bits' use is recorded in rex_used.
> 

Changed to 

  /* Bits of rex2 we've already used.  */
  unsigned char rex2_used;

> > @@ -265,8 +272,13 @@ struct dis_private {
> >    {							\
> >      if (value)						\
> >        {							\
> > -	if ((ins->rex & value))				\
> > +	if (ins->rex & value)				\
> >  	  ins->rex_used |= (value) | REX_OPCODE;	\
> 
> Like is done here, ...
> 
> > +	if (ins->rex2 & value)				\
> > +	  {						\
> > +	    ins->rex2_used |= value;			\
> 
> other uses of "value" also want parenthesizing, unless not used with any kind
> of operator (e.g. in the if() above).
> 
Done.

> > @@ -276,6 +288,9 @@ struct dis_private {  #define EVEX_b_used 1
> > #define EVEX_len_used 2
> >
> > +/* M0 in rex2 prefix represents map0 or map1.  */ #define REX2_M 0x8
> 
> Extending an earlier comment: This really should go next to REX_W and
> friends. In principle the assembler could want to use this constant as well,
> hence why it would better go the opcode/i386.h anyway.
> 
Done.

> > @@ -4196,19 +4221,19 @@ static const struct dis386 x86_64_table[][2] =
> > {
> >
> >    /* X86_64_E8 */
> >    {
> > -    { "callP",		{ Jv, BND }, 0 },
> > -    { "call@",		{ Jv, BND }, 0 }
> > +    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
> 
> This, ...
> 
> > +    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
> >    },
> >
> >    /* X86_64_E9 */
> >    {
> > -    { "jmpP",		{ Jv, BND }, 0 },
> > -    { "jmp@",		{ Jv, BND }, 0 }
> > +    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
> 
> ... this, and ...
> 
> > +    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
> >    },
> >
> >    /* X86_64_EA */
> >    {
> > -    { "{l|}jmp{P|}", { Ap }, 0 },
> > +    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
> >    },
> 
> ... this change isn't really needed, is it? The marker is only needed on 64-bit
> insns (i.e. respective slots 2 of x86_64_table[] entries).
> 
Done.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-22 13:49   ` Jan Beulich
@ 2023-12-25 12:23     ` Cui, Lili
  2024-01-04  9:08       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-25 12:23 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 19.12.2023 13:12, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -89,6 +89,7 @@
> >  /* This matches the C -> StaticRounding alias in the opcode table.
> > */  #define commutative staticrounding
> >
> > +#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t,
> > +cpuid))
> 
> Why is this still here? I said more than once that it's not helpful to have. As can
> be seen ...
> 
> > @@ -3673,7 +3674,7 @@ install_template (const insn_template *t)
> >
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > -  {
> > +    {
> >        if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >  	   || maybe_cpu (t, CpuFMA))
> >  	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> @@
> > -3695,7 +3696,15 @@ install_template (const insn_template *t)
> >  		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
> >  	    }
> >  	}
> > -  }
> > +
> > +      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
> APX_F(CpuAVX512F)
> > +	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
> APX_F(CpuBMI)
> > +	  || APX_F(CpuBMI2))
> 
> ... right here: There's no point in checking CpuAPX_F a whopping 7 times.
> 
> > +	if (need_evex_encoding ())
> > +	  i.tm.opcode_modifier.vex = 0;
> > +	else
> > +	  i.tm.opcode_modifier.evex = 0;
> > +    }
> 
> I'm also pretty sure that I asked before that such nested if/else please have
> proper braces for the body of the outer if().
> 
Sorry, I missed that email.

Changed to:

      if ((maybe_cpu (t, CpuCMPCCXADD) || maybe_cpu (t, CpuAMX_TILE)
           || maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512DQ)
           || maybe_cpu (t, CpuAVX512BW) || maybe_cpu (t, CpuBMI)
           || maybe_cpu (t, CpuBMI2))
          && maybe_cpu (t, CpuAPX_F))
        {
          if (need_evex_encoding () || i.has_egpr)
            i.tm.opcode_modifier.vex = 0;
          else
            i.tm.opcode_modifier.evex = 0;
        }

> To say it very clearly again: When you submit a new version, _all_ prior review
> comments should be addressed. Whether that's verbally (by explaining why a
> change cannot be made) or by adjusting the code is another matter. I said
> before that reviewing this work has proven extremely time consuming. I
> shouldn't be required to needlessly put in yet more time, just to re-spot and
> re-comment things already pointed out.
> 
> > @@ -3876,6 +3885,15 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +/* We can use this function only when the current encoding is evex.
> > +*/ static INLINE bool is_apx_evex_encoding (void) {
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2);
> 
> Nit: Parentheses please around the & expression.
> 
Done.

> > @@ -8097,7 +8142,11 @@ process_suffix (void)
> >  	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
> >  	    prefix = ADDR_PREFIX_OPCODE;
> >
> > -	  if (!add_prefix (prefix))
> > +	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> > +	     needs to be adjusted.  */
> > +	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
> > +	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> 
> Feels like I did ask before: What if i.tm.opcode_modifier.opcodeprefix is
> already set? Aiui that would be a bug, but one that's likely easy to introduce
> and hard to find. IOW - better assert the field is clear before filling it?
>
Something like that, so we moved it here from somewhere else.  Added.

          /* The DATA PREFIX of EVEX promoted from legacy APX instructions
             needs to be adjusted.  */
          if (i.tm.opcode_space == SPACE_EVEXMAP4)
            {
              gas_assert (!i.tm.opcode_modifier.opcodeprefix);
              i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
            }

> > @@ -14293,6 +14342,12 @@ static bool check_register (const reg_entry
> *r)
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >  	  || flag_code != CODE_64BIT)
> >  	return false;
> > +
> > +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > +	 For the later install_template function.  */
> > +      if (current_templates.start->opcode_modifier.vex
> > +	  && current_templates.start->opcode_modifier.evex)
> > +	i.vec_encoding = vex_encoding_evex;
> >      }
> 
> Just to state it again - no use of current_templates in this funciton, please.
> 
Removed, mentioned in my previous comment.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -106,16 +106,6 @@
> >  #define HLEPrefixRelease PrefixOk=PrefixHLERelease  #define
> > NoTrackPrefixOk  PrefixOk=PrefixNoTrack
> >
> > -#define Space0F    OpcodeSpace=SPACE_0F
> > -#define Space0F38  OpcodeSpace=SPACE_0F38 -#define Space0F3A
> > OpcodeSpace=SPACE_0F3A -#define SpaceXOP08
> OpcodeSpace=SPACE_XOP08
> > -#define SpaceXOP09 OpcodeSpace=SPACE_XOP09 -#define SpaceXOP0A
> > OpcodeSpace=SPACE_XOP0A
> > -
> > -#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5 -#define EVexMap6
> > OpcodeSpace=SPACE_EVEXMAP6
> > -
> 
> Why are you moving these, leaving ...
> 
> >  #define VexMap7 OpcodeSpace=SPACE_VEXMAP7
>
Done.

> ... this one disconnected? IOW - I see no need for the moving, but if there is a
> need, then this one also needs moving (see how it'll become relevant for
> USER_MSR+APX_F now). Specifically ...
> 
> > @@ -137,11 +127,25 @@
> >  #define EVexLIG EVex=EVEXLIG
> >  #define EVexDYN EVex=EVEXDYN
> >
> > +#define Space0F    OpcodeSpace=SPACE_0F
> > +#define Space0F38  OpcodeSpace=SPACE_0F38 #define Space0F3A
> > +OpcodeSpace=SPACE_0F3A #define SpaceXOP08
> OpcodeSpace=SPACE_XOP08
> > +#define SpaceXOP09 OpcodeSpace=SPACE_XOP09 #define SpaceXOP0A
> > +OpcodeSpace=SPACE_XOP0A
> > +
> > +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128
> 
> ... there's no need for this to live after EVex128 was #define-s.
> 
Yes, done.

> > +#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5 #define EVexMap6
> > +OpcodeSpace=SPACE_EVEXMAP6
> > +
> >  #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
> >
> >  #define Vsz256 Vsz=VSZ256
> >  #define Vsz512 Vsz=VSZ512
> >
> > +// The template supports VEX format for cpuid and EVEX format for cpuid &
> apx_f.
> > +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
> 
> I think the comment wants to go into further detail. Please can you go back to
> read what I said when I suggested this construct, in particular regarding the
> stripping then done? However, with you not having found a need to fiddle
> with cpu_flags_match(), I wonder if this construct is needed in the first place.
> The earlier suggestion was entirely based on the assumption that stripping
> similar to that for other combined VEX/EVEX templates would be needed here,
> t
> 

Seeing this, I realized the problem and checked opcodes/i386-tbl.h, for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)", in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the relevant logic in gen.c and don't know how to debug it. when you have time, could you help take a look ?

bzhi, 0xf5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }


> > @@ -2049,13 +2059,20 @@ bndldx, 0x0f1a, MPX,
> > Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
> >
> >  // SHA instructions.
> >  sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S,
> > RegXMM|Unspecified|BaseIndex, RegXMM }
> > +sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S,
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1nexte, 0xf38c8, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1msg1, 0xf38c9, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1msg2, 0xf38ca, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword,
> > RegXMM|Unspecified|BaseIndex, RegXMM }
> 
> What about this form? It surely also wants to gain an EVEX counterpart.
> 

Added.

Thanks,
Lili.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 4/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-22 14:41   ` Jan Beulich
@ 2023-12-25 13:40     ` Cui, Lili
  2024-01-04  9:16       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-25 13:40 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 19.12.2023 13:12, Cui, Lili wrote:
> > --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> > @@ -16,3 +16,194 @@
> > +	vroundpd $1,(%r24),%xmm6
> > +	vroundps $2,(%r24),%xmm6
> > +	vroundsd $3,(%r24),%xmm6,%xmm3
> > +	vroundss $4,(%r24),%xmm6,%xmm3
> 
> These are still here, when they can be expressed.

I think we should keep it until we complete a reasonably equivalent replacement. 

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -0,0 +1,34 @@
> > +# Check Illegal prefix for 64bit EVEX-promoted instructions
> > +
> > +        .allow_index_reg
> > +        .text
> > +_start:
> > +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
> > +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
> > +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
> > +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
> > +	#VSIB vpgatherqq (%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
> == 0
> > +	#(illegal value).
> > +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd
> > +	.byte 0xff
> 
> This one's still using .byte and still referencing P[10] in the comment.
> If that's really unavoidable, then - as elsewhere - the description of the patch
> could (and should) provide the reason. And there continue to be no
> separating blank lines, making it as hard as before to find ones way through all
> of this.
> 

Copied from the previous comments,
------------------------------------------------------------------------
>>> +	#(illegal value).
>>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
>>> +	.byte 0xff
>>
>> For the purpose of this test (whatever P[10] again is) you don't need 
>> a 32-bit displacement, do you? Shorter is (almost always) better in such tests.
>>
> 
> P[10] is a fixed value, in normal EVEX format we don't use this bit.  Dropped 0x7b.
------------------------------------------------------------------------

> > +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == 0b01 (illegal value).
> > +	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
> > +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.a1a0 (P[17:16]) == 0b01
> 
> See my earlier comment about a1a0. Once again - please can you use SDM
> terms as much as possible, and prefer descriptive names over non- descriptive
> ones?
> 
Done.

> > +        #(illegal value).
> 
> ??? At the very least indentation is bogus here, but as to this particular
> comment (recurring elsewhere) - isn't the entire test about illegal values?
> 
Removed them.

> > +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
> > +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.L'L == 0b01 (illegal value).
> > +	.insn EVEX.L1.66.M12.W0 0x60, %di, %ax
> > +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1 (illegal value).
> > +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax {%k7}{z}
> > +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa)
> == 0b01
> > +	#(illegal value).
> > +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
> > +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L)
> == 0b01
> > +	#(illegal value).
> > +	.insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
> > +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) ==
> 0b1
> > +	#(illegal value).
> > +        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
> > +{%k7}{z}
> 
> Isn't {%k7} alone rendering the encoding invalid, which was checked for above
> already? Also - bogus indentation again.
>

That is for legacy, this is for vex. Adjusted indentation and added separating blank lines for all insns.

> > +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) ==
> 0b1
> > +	#(illegal value).
> > +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx
> 
> 64-bit register operands with .W0 are kind of okay, but still odd to see. Note
> however the one misplaced comma here.
> 
Changed to:

        #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
        .insn EVEX.L0.NP.0f38.W1 0xf5, %rax, (%rax,%rbx){1to8}, %rcx

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-22 14:19   ` Jan Beulich
@ 2023-12-26  7:00     ` Cui, Lili
  2024-01-04  9:01       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2023-12-26  7:00 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> > @@ -285,6 +285,14 @@
> >      { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
> >      { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
> >    },
> > +  /* PREFIX_EVEX_0F38F2_L_0 */
> > +  {
> > +    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
> > +  },
> 
> So not being able to re-use the VEX entry for this and ...
> 
> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -49,3 +49,10 @@
> >      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >    },
> > +  /* REG_EVEX_0F38F3_L_0_P_0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "blsrS",	{ VexGdq, Edq }, 0 },
> > +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> > +    { "blsiS",	{ VexGdq, Edq }, 0 },
> > +  },              
> 
> ... this was due to the VEX entries having PREFIX_OPCODE, which would be
> getting in the way? This is the sort of thing that would be useful to have in the
> description, to avoid raising the same question again that (I think) was raised
> before.
> 
> Yet then - why do you strip PREFIX_OPCODE from the VEX entries? If you do
> that (as iirc I did suggest), there's no need for having separate EVEX ones (the
> suggestion, after all, was to be able to re-use the VEX entries). That re-work of
> existing VEX encodings could, btw, also have been split out quite easily. That
> way this huge patch would have further shrunk a little.
> 
I re-used the VEX entry, but forgot to remove the redundant, removed.

> > --- /dev/null
> > +++ b/opcodes/i386-dis-evex-x86-64.h
> > @@ -0,0 +1,50 @@
> > +  /* X86_64_EVEX_0F90 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
> > +  /* X86_64_EVEX_0F91 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
> > +  /* X86_64_EVEX_0F92 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
> > +  /* X86_64_EVEX_0F93 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
> > +  /* X86_64_EVEX_0F38F2 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
> > +  /* X86_64_EVEX_0F38F3 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
> > +  /* X86_64_EVEX_0F38F5 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
> > +  /* X86_64_EVEX_0F38F6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
> > +  /* X86_64_EVEX_0F38F7 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
> > +  /* X86_64_EVEX_0F3AF0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
> 
> Am I misremembering that we had agreed that this new file isn't necessary, by
> having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
> least I couldn't find a mail from you saying this isn't possible (and why).
> 
I Prefer not to change the current implement, we need a table that all instructions must go through, it can be x86-64 or len_table,
but I think x86-64 is better. It can reuse more old parts of x86-64 (for example X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to let another 18 instructions through the len_table, this will also add the number of entries.

18 instructions are:
X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and X86_64_VEX_0F384B.

Thanks,
Lili.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/9] Support APX GPR32 with rex2 prefix
  2023-12-25  6:14     ` Cui, Lili
@ 2024-01-04  8:57       ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2024-01-04  8:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 25.12.2023 07:14, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>> @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Check if Egprs operands are valid for the instruction.  */
>>> +
>>> +static int
>>> +check_EgprOperands (const insn_template *t)
>>
>> Hmm, I thought I had asked before to make functions with boolean return
>> values have a return type of bool, and then use "true" for success. An
>> alternative would be to return the error indicator, rather than putting it in
>> i.error here.
>>
>> Then again I realize this is in line with VEX_check_encoding() and
>> check_VecOperands() (which I think would better be changed, but anyway).
>>
> 
> Changed it to bool. For the rest, it's a bit strange to only change check_EgprOperands. Can this place be left unchanged? Or should I submit a new patch and change the old one first?

Leaving alone is okay; I'll see about cleaning that up at some later point.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-26  7:00     ` Cui, Lili
@ 2024-01-04  9:01       ` Jan Beulich
  2024-01-04 12:47         ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2024-01-04  9:01 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 26.12.2023 08:00, Cui, Lili wrote:
>>> --- /dev/null
>>> +++ b/opcodes/i386-dis-evex-x86-64.h
>>> @@ -0,0 +1,50 @@
>>> +  /* X86_64_EVEX_0F90 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
>>> +  /* X86_64_EVEX_0F91 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
>>> +  /* X86_64_EVEX_0F92 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
>>> +  /* X86_64_EVEX_0F93 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F2 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F3 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F5 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F6 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F7 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
>>> +  /* X86_64_EVEX_0F3AF0 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
>>
>> Am I misremembering that we had agreed that this new file isn't necessary, by
>> having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
>> least I couldn't find a mail from you saying this isn't possible (and why).
>>
> I Prefer not to change the current implement, we need a table that all instructions must go through, it can be x86-64 or len_table,
> but I think x86-64 is better. It can reuse more old parts of x86-64 (for example X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to let another 18 instructions through the len_table, this will also add the number of entries.
> 
> 18 instructions are:
> X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and X86_64_VEX_0F384B.

I don't see this as a (necessary) result. Since the patches were (imo
prematurely) committed already, it'll now be (again) on me to see about
cleaning up. Oh well.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2023-12-25 12:23     ` Cui, Lili
@ 2024-01-04  9:08       ` Jan Beulich
  2024-01-04 12:32         ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2024-01-04  9:08 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 25.12.2023 13:23, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>  #define Vsz256 Vsz=VSZ256
>>>  #define Vsz512 Vsz=VSZ512
>>>
>>> +// The template supports VEX format for cpuid and EVEX format for cpuid &
>> apx_f.
>>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
>>
>> I think the comment wants to go into further detail. Please can you go back to
>> read what I said when I suggested this construct, in particular regarding the
>> stripping then done? However, with you not having found a need to fiddle
>> with cpu_flags_match(), I wonder if this construct is needed in the first place.
>> The earlier suggestion was entirely based on the assumption that stripping
>> similar to that for other combined VEX/EVEX templates would be needed here,
>> t
>>
> 
> Seeing this, I realized the problem and checked opcodes/i386-tbl.h, for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",

I suppose you mean "cpuid|(cpuid&APX_F)"?

> in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the relevant logic in gen.c and don't know how to debug it. when you have time, could you help take a look ?

Well, now that the patch was committed I of course can easily take a look.
But with such an outstanding issue, how could the patch have been committed
in the first place? And isn't it a requirement for making changes to
i386-gen.c that you sufficiently understand the logic there (which admittedly
has grown non-trivial)? Anyway - I will need to find time to play with what
has been committed, but that's not likely to happen in time for 2.42. Hence
it is quite possible that 2.42 will turn out to have incomplete and/or broken
APX_F support. But perhaps this was intended to be this way by H.J., when he
approved the entire series without even a single comment.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-25 13:40     ` Cui, Lili
@ 2024-01-04  9:16       ` Jan Beulich
  2024-01-05  6:58         ` Cui, Lili
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2024-01-04  9:16 UTC (permalink / raw)
  To: Cui, Lili, Lu, Hongjiu; +Cc: binutils

On 25.12.2023 14:40, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
>>> @@ -16,3 +16,194 @@
>>> +	vroundpd $1,(%r24),%xmm6
>>> +	vroundps $2,(%r24),%xmm6
>>> +	vroundsd $3,(%r24),%xmm6,%xmm3
>>> +	vroundss $4,(%r24),%xmm6,%xmm3
>>
>> These are still here, when they can be expressed.
> 
> I think we should keep it until we complete a reasonably equivalent replacement. 

This replacement should have been introduced in this series, as asked for
more than once. And, H.J., I think you shouldn't have approved this patch
(and more generally any one) with unaddressed review comments.

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>>> @@ -0,0 +1,34 @@
>>> +# Check Illegal prefix for 64bit EVEX-promoted instructions
>>> +
>>> +        .allow_index_reg
>>> +        .text
>>> +_start:
>>> +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
>>> +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
>>> +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
>>> +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
>>> +	#VSIB vpgatherqq (%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
>> == 0
>>> +	#(illegal value).
>>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd
>>> +	.byte 0xff
>>
>> This one's still using .byte and still referencing P[10] in the comment.
>> If that's really unavoidable, then - as elsewhere - the description of the patch
>> could (and should) provide the reason. And there continue to be no
>> separating blank lines, making it as hard as before to find ones way through all
>> of this.
>>
> 
> Copied from the previous comments,
> ------------------------------------------------------------------------
>>>> +	#(illegal value).
>>>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
>>>> +	.byte 0xff
>>>
>>> For the purpose of this test (whatever P[10] again is) you don't need 
>>> a 32-bit displacement, do you? Shorter is (almost always) better in such tests.
>>>
>>
>> P[10] is a fixed value, in normal EVEX format we don't use this bit.  Dropped 0x7b.
> ------------------------------------------------------------------------

This is not addressing my comment. Iirc all bit in the EVEX prefix have
names now in SDM plus APX spec, and hence they should preferably be
referred to by their names rather than their bit positions. Plus there's
still no justification for using .byte instead of .insn.

>>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
>>> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.L'L == 0b01 (illegal value).
>>> +	.insn EVEX.L1.66.M12.W0 0x60, %di, %ax
>>> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1 (illegal value).
>>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax {%k7}{z}
>>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa)
>> == 0b01
>>> +	#(illegal value).
>>> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
>>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L)
>> == 0b01
>>> +	#(illegal value).
>>> +	.insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
>>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) ==
>> 0b1
>>> +	#(illegal value).
>>> +        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
>>> +{%k7}{z}
>>
>> Isn't {%k7} alone rendering the encoding invalid, which was checked for above
>> already? Also - bogus indentation again.
>>
> 
> That is for legacy, this is for vex.

What is "that" here? And how can anything with operand mask applied be
VEX? The more that .insn's operand says EVEX? Here you want to set _only_
EVEX.z, without EVEX.aaa being non-zero.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2024-01-04  9:08       ` Jan Beulich
@ 2024-01-04 12:32         ` Cui, Lili
  2024-01-04 12:55           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Cui, Lili @ 2024-01-04 12:32 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, January 4, 2024 5:08 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
> 
> On 25.12.2023 13:23, Cui, Lili wrote:
> >> On 19.12.2023 13:12, Cui, Lili wrote:
> >>>  #define Vsz256 Vsz=VSZ256
> >>>  #define Vsz512 Vsz=VSZ512
> >>>
> >>> +// The template supports VEX format for cpuid and EVEX format for
> >>> +cpuid &
> >> apx_f.
> >>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
> >>
> >> I think the comment wants to go into further detail. Please can you
> >> go back to read what I said when I suggested this construct, in
> >> particular regarding the stripping then done? However, with you not
> >> having found a need to fiddle with cpu_flags_match(), I wonder if this
> construct is needed in the first place.
> >> The earlier suggestion was entirely based on the assumption that
> >> stripping similar to that for other combined VEX/EVEX templates would
> >> be needed here, t
> >>
> >
> > Seeing this, I realized the problem and checked opcodes/i386-tbl.h,
> > for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but
> > gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",
> 
> I suppose you mean "cpuid|(cpuid&APX_F)"?
> 

I tried cpuid|(cpuid&APX_F). Currently this format is not supported.

> > in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the
> relevant logic in gen.c and don't know how to debug it. when you have time,
> could you help take a look ?
> 
> Well, now that the patch was committed I of course can easily take a look.
> But with such an outstanding issue, how could the patch have been
> committed in the first place? And isn't it a requirement for making changes to
> i386-gen.c that you sufficiently understand the logic there (which admittedly
> has grown non-trivial)? Anyway - I will need to find time to play with what has
> been committed, but that's not likely to happen in time for 2.42. Hence it is
> quite possible that 2.42 will turn out to have incomplete and/or broken APX_F
> support. But perhaps this was intended to be this way by H.J., when he
> approved the entire series without even a single comment.
> 

Although there is a problem in this place, now apx can handle all cases correctly. On the contrary, when this place is corrected, cpu_flags_match() needs to be adjusted, otherwise there will be testcase fail.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2024-01-04  9:01       ` Jan Beulich
@ 2024-01-04 12:47         ` Cui, Lili
  0 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2024-01-04 12:47 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 26.12.2023 08:00, Cui, Lili wrote:
> >>> --- /dev/null
> >>> +++ b/opcodes/i386-dis-evex-x86-64.h
> >>> @@ -0,0 +1,50 @@
> >>> +  /* X86_64_EVEX_0F90 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
> >>> +  /* X86_64_EVEX_0F91 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
> >>> +  /* X86_64_EVEX_0F92 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
> >>> +  /* X86_64_EVEX_0F93 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F2 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F3 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F5 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F6 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F7 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
> >>> +  /* X86_64_EVEX_0F3AF0 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
> >>
> >> Am I misremembering that we had agreed that this new file isn't
> >> necessary, by having USE_X86_64_EVEX_FROM_VEX_TABLE handle the
> >> non-64-bit case? At least I couldn't find a mail from you saying this isn't
> possible (and why).
> >>
> > I Prefer not to change the current implement, we need a table that all
> > instructions must go through, it can be x86-64 or len_table, but I think x86-
> 64 is better. It can reuse more old parts of x86-64 (for example
> X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to
> let another 18 instructions through the len_table, this will also add the
> number of entries.
> >
> > 18 instructions are:
> > X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and
> X86_64_VEX_0F384B.
> 
> I don't see this as a (necessary) result. Since the patches were (imo
> prematurely) committed already, it'll now be (again) on me to see about
> cleaning up. Oh well.
> 

I don't quite understand, I think this modification is a trade-off.

Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
  2024-01-04 12:32         ` Cui, Lili
@ 2024-01-04 12:55           ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2024-01-04 12:55 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 04.01.2024 13:32, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, January 4, 2024 5:08 PM
>>
>> On 25.12.2023 13:23, Cui, Lili wrote:
>>>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>>>  #define Vsz256 Vsz=VSZ256
>>>>>  #define Vsz512 Vsz=VSZ512
>>>>>
>>>>> +// The template supports VEX format for cpuid and EVEX format for
>>>>> +cpuid &
>>>> apx_f.
>>>>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
>>>>
>>>> I think the comment wants to go into further detail. Please can you
>>>> go back to read what I said when I suggested this construct, in
>>>> particular regarding the stripping then done? However, with you not
>>>> having found a need to fiddle with cpu_flags_match(), I wonder if this
>> construct is needed in the first place.
>>>> The earlier suggestion was entirely based on the assumption that
>>>> stripping similar to that for other combined VEX/EVEX templates would
>>>> be needed here, t
>>>>
>>>
>>> Seeing this, I realized the problem and checked opcodes/i386-tbl.h,
>>> for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but
>>> gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",
>>
>> I suppose you mean "cpuid|(cpuid&APX_F)"?
>>
> 
> I tried cpuid|(cpuid&APX_F). Currently this format is not supported.

Right, and it cannot be without adding yet another CPU field to the templates
(which I'd like to avoid). Hence why I thought you mean this form, despite
writing the other (recognized) one in your earlier reply.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v4 4/9] Add tests for APX GPR32 with extend evex prefix
  2024-01-04  9:16       ` Jan Beulich
@ 2024-01-05  6:58         ` Cui, Lili
  0 siblings, 0 replies; 34+ messages in thread
From: Cui, Lili @ 2024-01-05  6:58 UTC (permalink / raw)
  To: Beulich, Jan, Lu, Hongjiu; +Cc: binutils

> >>> +	vroundpd $1,(%r24),%xmm6
> >>> +	vroundps $2,(%r24),%xmm6
> >>> +	vroundsd $3,(%r24),%xmm6,%xmm3
> >>> +	vroundss $4,(%r24),%xmm6,%xmm3
> >>
> >> These are still here, when they can be expressed.
> >
> > I think we should keep it until we complete a reasonably equivalent
> replacement.
> 
> This replacement should have been introduced in this series, as asked for more
> than once. And, H.J., I think you shouldn't have approved this patch (and
> more generally any one) with unaddressed review comments.
>

We have different opinion for this part. Currently under discussion.

> >>> --- /dev/null
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> >>> @@ -0,0 +1,34 @@
> >>> +# Check Illegal prefix for 64bit EVEX-promoted instructions
> >>> +
> >>> +        .allow_index_reg
> >>> +        .text
> >>> +_start:
> >>> +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
> >>> +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
> >>> +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
> >>> +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
> >>> +	#VSIB vpgatherqq (%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
> >> == 0
> >>> +	#(illegal value).
> >>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd
> >>> +	.byte 0xff
> >>
> >> This one's still using .byte and still referencing P[10] in the comment.
> >> If that's really unavoidable, then - as elsewhere - the description
> >> of the patch could (and should) provide the reason. And there
> >> continue to be no separating blank lines, making it as hard as before
> >> to find ones way through all of this.
> >>
> >
> > Copied from the previous comments,
> > ----------------------------------------------------------------------
> > --
> >>>> +	#(illegal value).
> >>>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00,
> 0x00
> >>>> +	.byte 0xff
> >>>
> >>> For the purpose of this test (whatever P[10] again is) you don't
> >>> need a 32-bit displacement, do you? Shorter is (almost always) better in
> such tests.
> >>>
> >>
> >> P[10] is a fixed value, in normal EVEX format we don't use this bit.  Dropped
> 0x7b.
> > ----------------------------------------------------------------------
> > --
> 
> This is not addressing my comment. Iirc all bit in the EVEX prefix have names
> now in SDM plus APX spec, and hence they should preferably be referred to by
> their names rather than their bit positions. Plus there's still no justification for
> using .byte instead of .insn.
> 

I will add EVEX.X4 to it, but for using .insn instead of .byte, I think the current .insn does not support the new EVEX format, I don't know how to express the fixed value P[10] with .insn, maybe you have some good suggestions.f
 (insn is really hard to use, I admit it has advantages for some test expressions, but I don't think it works for all).

> >>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
> >>> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.L'L == 0b01 (illegal value).
> >>> +	.insn EVEX.L1.66.M12.W0 0x60, %di, %ax
> >>> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1 (illegal value).
> >>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax {%k7}{z}
> >>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa)
> >> == 0b01
> >>> +	#(illegal value).
> >>> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
> >>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L)
> >> == 0b01
> >>> +	#(illegal value).
> >>> +	.insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
> >>> +	#EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) ==
> >> 0b1
> >>> +	#(illegal value).
> >>> +        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
> >>> +{%k7}{z}
> >>
> >> Isn't {%k7} alone rendering the encoding invalid, which was checked
> >> for above already? Also - bogus indentation again.
> >>
> >
> > That is for legacy, this is for vex.
> 
> What is "that" here? And how can anything with operand mask applied be
> VEX? The more that .insn's operand says EVEX? Here you want to set _only_
> EVEX.z, without EVEX.aaa being non-zero.
> 

These two test cases,  one is for evex form legacy, another is for evex from vex.

        #EVEX_MAP4 movbe %r18w,%ax set EVEX.z == 0b1.
        .insn EVEX.L0.66.M12.W0 0x60, %di, %ax{k7}{z}

        #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) == 0b1
        .insn EVEX.L0.NP.0f38.W1 0xf5, %rax, (%rax,%rbx), %rcx {%k7}{z}

If I just put {z} here, I will get the error: "Only write masking is allowed for clear masking.".  So, I had to add {k7}

Lili.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-01-05  6:58 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 12:12 [PATCH v4 0/9] Support Intel APX EGPR Cui, Lili
2023-12-19 12:12 ` [PATCH v4 1/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-22 13:08   ` Jan Beulich
2023-12-25  6:14     ` Cui, Lili
2024-01-04  8:57       ` Jan Beulich
2023-12-19 12:12 ` [PATCH v4 2/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-12-19 12:12 ` [PATCH v4 3/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-22 13:49   ` Jan Beulich
2023-12-25 12:23     ` Cui, Lili
2024-01-04  9:08       ` Jan Beulich
2024-01-04 12:32         ` Cui, Lili
2024-01-04 12:55           ` Jan Beulich
2023-12-22 14:19   ` Jan Beulich
2023-12-26  7:00     ` Cui, Lili
2024-01-04  9:01       ` Jan Beulich
2024-01-04 12:47         ` Cui, Lili
2023-12-19 12:12 ` [PATCH v4 4/9] Add tests for " Cui, Lili
2023-12-22 14:41   ` Jan Beulich
2023-12-25 13:40     ` Cui, Lili
2024-01-04  9:16       ` Jan Beulich
2024-01-05  6:58         ` Cui, Lili
2023-12-19 12:12 ` [PATCH v4 5/9] Support APX NDD Cui, Lili
2023-12-19 12:12 ` [PATCH v4 6/9] Support APX Push2/Pop2 Cui, Lili
2023-12-19 12:12 ` [PATCH v4 7/9] Support APX PUSHP/POPP Cui, Lili
2023-12-19 12:12 ` [PATCH v4 `8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-19 12:12 ` [PATCH v4 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-12-19 12:35 ` [PATCH v4 0/9] Support Intel APX EGPR Jan Beulich
2023-12-20  8:50   ` Cui, Lili
2023-12-20  8:57     ` Jan Beulich
2023-12-20 10:42       ` Cui, Lili
2023-12-20 11:00         ` Jan Beulich
2023-12-20 11:50           ` Cui, Lili
2023-12-20 12:01             ` Jan Beulich
2023-12-20 12:16               ` Cui, Lili

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).