public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 00/14] x86: new .insn directive
@ 2023-03-10 10:17 Jan Beulich
  2023-03-10 10:19 ` [PATCH v2 01/14] x86: introduce " Jan Beulich
                   ` (14 more replies)
  0 siblings, 15 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:17 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

Especially when instructions which are not known to gas yet also take
register or, yet worse, memory operands, encoding such in code actually
wanting to make use of them is often difficult. Typically people resort
to hard-coding the involved registers, thus being able to express
things via .byte. To overcome this limitation (to a sufficient degree
at least), introduce .insn. This allows users to specify operands in
their "normal" shape (possibly in slightly altered order). Peculiarities
require two small syntax extensions; see the implementation or
documentation for details.

In order to re-use sufficiently much of the functionality md_assemble()
already uses, some adjustments to existing code were necessary. The one
item to call out here is the partial re-write of build_modrm_byte()
(patch 7), which actually turned out to simplify things. Subsequently
possible further tidying is carried out right away (patches 8 and 9),
even if not strictly related to the .insn work.

I'm pretty sure there are still corner cases which aren't taken care of
correctly. It's also quite possible that I've overlooked further places
in pre-existing code which need tweaking for .insn. People taking a
close look and/or playing with the new functionality would be much
appreciated.

The last patch in the series continues to be RFC, as I'm uncertain
whether we actually want this kind of a testcase.

Main changes in v2 are testsuite adjustments for certain non-Linux
targets, resulting from me not properly having re-run wider tests with
the last few patches in the series in place.

01: introduce .insn directive
02: parse VEX and alike specifiers for .insn
03: parse special opcode modifiers for .insn
04: re-work build_modrm_byte()'s register assignment
05: VexVVVV is now merely a boolean
06: drop "shimm" special case template expansions
07: AT&T: restrict recognition of the "absolute branch" prefix character
08: process instruction operands for .insn
09: handle EVEX Disp8 for .insn
10: allow for multiple immediates in output_disp()
11: handle immediate operands for .insn
12: document .insn
13: convert testcases to use .insn
14: .insn example - VEX-encoded instructions of original Xeon Phi

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 01/14] x86: introduce .insn directive
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
@ 2023-03-10 10:19 ` Jan Beulich
  2023-03-10 10:19 ` [PATCH v2 02/14] x86: parse VEX and alike specifiers for .insn Jan Beulich
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:19 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

For starters this deals with only very basic constructs.
---
Subsequently a dot_insn() predicate will be introduced. If deemed better
than parse_insn()'s new parameter, its introduction could be pulled
ahead into here, and the predicate then be used there instead.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -137,6 +137,7 @@ typedef struct
 arch_entry;
 
 static void update_code_flag (int, int);
+static void s_insn (int);
 static void set_code_flag (int);
 static void set_16bit_gcc_code_flag (int);
 static void set_intel_syntax (int);
@@ -159,7 +160,7 @@ static int i386_intel_operand (char *, i
 static int i386_intel_simplify (expressionS *);
 static int i386_intel_parse_name (const char *, expressionS *);
 static const reg_entry *parse_register (char *, char **);
-static const char *parse_insn (const char *, char *);
+static const char *parse_insn (const char *, char *, bool);
 static char *parse_operands (char *, const char *);
 static void swap_operands (void);
 static void swap_2_operands (unsigned int, unsigned int);
@@ -1198,6 +1199,7 @@ const pseudo_typeS md_pseudo_table[] =
   {"bfloat16", float_cons, 'b'},
   {"value", cons, 2},
   {"slong", signed_cons, 4},
+  {"insn", s_insn, 0},
   {"noopt", s_ignore, 0},
   {"optim", s_ignore, 0},
   {"code16gcc", set_16bit_gcc_code_flag, CODE_16BIT},
@@ -4856,6 +4858,20 @@ insert_lfence_before (void)
     }
 }
 
+/* Shared helper for md_assemble() and s_insn().  */
+static void init_globals (void)
+{
+  unsigned int j;
+
+  memset (&i, '\0', sizeof (i));
+  i.rounding.type = rc_none;
+  for (j = 0; j < MAX_OPERANDS; j++)
+    i.reloc[j] = NO_RELOC;
+  memset (disp_expressions, '\0', sizeof (disp_expressions));
+  memset (im_expressions, '\0', sizeof (im_expressions));
+  save_stack_p = save_stack;
+}
+
 /* Helper for md_assemble() to decide whether to prepare for a possible 2nd
    parsing pass. Instead of introducing a rarely use new insn attribute this
    utilizes a common pattern between affected templates. It is deemed
@@ -4888,19 +4904,13 @@ md_assemble (char *line)
   /* Initialize globals.  */
   current_templates = NULL;
  retry:
-  memset (&i, '\0', sizeof (i));
-  i.rounding.type = rc_none;
-  for (j = 0; j < MAX_OPERANDS; j++)
-    i.reloc[j] = NO_RELOC;
-  memset (disp_expressions, '\0', sizeof (disp_expressions));
-  memset (im_expressions, '\0', sizeof (im_expressions));
-  save_stack_p = save_stack;
+  init_globals ();
 
   /* First parse an instruction mnemonic & call i386_operand for the operands.
      We assume that the scrubber has arranged it so that line[0] is the valid
      start of a (possibly prefixed) mnemonic.  */
 
-  end = parse_insn (line, mnemonic);
+  end = parse_insn (line, mnemonic, false);
   if (end == NULL)
     {
       if (pass1_mnem != NULL)
@@ -5439,7 +5449,7 @@ static INLINE bool q_suffix_allowed(cons
 }
 
 static const char *
-parse_insn (const char *line, char *mnemonic)
+parse_insn (const char *line, char *mnemonic, bool prefix_only)
 {
   const char *l = line, *token_start = l;
   char *mnem_p;
@@ -5469,6 +5479,8 @@ parse_insn (const char *line, char *mnem
 	      || (*l != PREFIX_SEPARATOR
 		  && *l != ',')))
 	{
+	  if (prefix_only)
+	    break;
 	  as_bad (_("invalid character %s in mnemonic"),
 		  output_invalid (*l));
 	  return NULL;
@@ -5590,6 +5602,9 @@ parse_insn (const char *line, char *mnem
 	break;
     }
 
+  if (prefix_only)
+    return token_start;
+
   if (!current_templates)
     {
       /* Deprecated functionality (new code should use pseudo-prefixes instead):
@@ -10705,6 +10720,136 @@ signed_cons (int size)
   cons_sign = -1;
 }
 
+static void
+s_insn (int dummy ATTRIBUTE_UNUSED)
+{
+  char mnemonic[MAX_MNEM_SIZE], *line = input_line_pointer;
+  char *saved_ilp = find_end_of_line (line, false), saved_char;
+  const char *end;
+  unsigned int j;
+  valueT val;
+  bool vex = false, xop = false, evex = false;
+  static const templates tt = { &i.tm, &i.tm + 1 };
+
+  init_globals ();
+
+  saved_char = *saved_ilp;
+  *saved_ilp = 0;
+
+  end = parse_insn (line, mnemonic, true);
+  if (end == NULL)
+    {
+  bad:
+      *saved_ilp = saved_char;
+      ignore_rest_of_line ();
+      return;
+    }
+  line += end - line;
+
+  current_templates = &tt;
+  i.tm.mnem_off = MN__insn;
+
+  if (startswith (line, "VEX")
+      && (line[3] == '.' || is_space_char (line[3])))
+    {
+      vex = true;
+      line += 3;
+    }
+  else if (startswith (line, "XOP") && ISDIGIT (line[3]))
+    {
+      char *e;
+      unsigned long n = strtoul (line + 3, &e, 16);
+
+      if (e == line + 5 && n >= 0x08 && n <= 0x1f
+	  && (*e == '.' || is_space_char (*e)))
+	{
+	  xop = true;
+	  line = e;
+	}
+    }
+  else if (startswith (line, "EVEX")
+	   && (line[4] == '.' || is_space_char (line[4])))
+    {
+      evex = true;
+      line += 4;
+    }
+
+  if (vex || xop
+      ? i.vec_encoding == vex_encoding_evex
+      : evex
+	? i.vec_encoding == vex_encoding_vex
+	  || i.vec_encoding == vex_encoding_vex3
+	: i.vec_encoding != vex_encoding_default)
+    {
+      as_bad (_("pseudo-prefix conflicts with encoding specifier"));
+      goto bad;
+    }
+
+  if (line > end && *line == '.')
+    {
+    }
+
+  input_line_pointer = line;
+  val = get_absolute_expression ();
+  line = input_line_pointer;
+
+  for (j = 1; j < sizeof(val); ++j)
+    if (!(val >> (j * 8)))
+      break;
+
+  /* Trim off a prefix if present.  */
+  if (j > 1 && !vex && !xop && !evex)
+    {
+      uint8_t byte = val >> ((j - 1) * 8);
+
+      switch (byte)
+	{
+	case DATA_PREFIX_OPCODE:
+	case REPE_PREFIX_OPCODE:
+	case REPNE_PREFIX_OPCODE:
+	  if (!add_prefix (byte))
+	    goto bad;
+	  val &= ((uint64_t)1 << (--j * 8)) - 1;
+	  break;
+	}
+    }
+
+  /* Trim off encoding space.  */
+  if (j > 1 && !i.tm.opcode_space && (val >> ((j - 1) * 8)) == 0x0f)
+    {
+      uint8_t byte = val >> ((--j - 1) * 8);
+
+      i.tm.opcode_space = SPACE_0F;
+      switch (byte & -(j > 1))
+	{
+	case 0x38:
+	  i.tm.opcode_space = SPACE_0F38;
+	  --j;
+	  break;
+	case 0x3a:
+	  i.tm.opcode_space = SPACE_0F3A;
+	  --j;
+	  break;
+	}
+      val &= ((uint64_t)1 << (j * 8)) - 1;
+    }
+
+  if (j > 2)
+    {
+      as_bad (_("opcode residual (%#"PRIx64") too wide"), val);
+      goto bad;
+    }
+  i.opcode_length = j;
+  i.tm.base_opcode = val;
+
+  output_insn ();
+
+  *saved_ilp = saved_char;
+  input_line_pointer = line;
+
+  demand_empty_rest_of_line ();
+}
+
 #ifdef TE_PE
 static void
 pe_directive_secrel (int dummy ATTRIBUTE_UNUSED)
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -68,6 +68,7 @@ if [gas_32_check] then {
     run_dump_test "intelok"
     run_dump_test "prefix"
     run_list_test "prefix32" "-al"
+    run_dump_test "insn-32"
     run_dump_test "lea"
     run_dump_test "lea16"
     run_dump_test "amd"
@@ -873,6 +874,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-sysenter-mixed"
     run_dump_test "x86-64-sysenter-amd"
     run_list_test "x86-64-sysenter-amd" "-mamd64"
+    run_dump_test "insn-64"
     run_dump_test "noreg64"
     run_list_test "noreg64"
     run_dump_test "noreg64-data16"
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-32.d
@@ -0,0 +1,14 @@
+#objdump: -dw
+#name: .insn (32-bit code)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <insn>:
+[ 	]*[a-f0-9]+:	90[ 	]+nop
+[ 	]*[a-f0-9]+:	f3 90[ 	]+pause
+[ 	]*[a-f0-9]+:	f3 90[ 	]+pause
+[ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
+[ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-32.s
@@ -0,0 +1,14 @@
+	.text
+insn:
+	# nop
+	.insn 0x90
+
+	# pause
+	.insn 0xf390
+	.insn repe 0x90
+
+	# fldz
+	.insn 0xd9ee
+
+	# setssbsy
+	.insn 0xf30f01e8
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-64.d
@@ -0,0 +1,14 @@
+#objdump: -dw
+#name: .insn (64-bit code)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <insn>:
+[ 	]*[a-f0-9]+:	90[ 	]+nop
+[ 	]*[a-f0-9]+:	f3 90[ 	]+pause
+[ 	]*[a-f0-9]+:	f3 90[ 	]+pause
+[ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
+[ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-64.s
@@ -0,0 +1,14 @@
+	.text
+insn:
+	# nop
+	.insn 0x90
+
+	# pause
+	.insn 0xf390
+	.insn repe 0x90
+
+	# fldz
+	.insn 0xd9ee
+
+	# setssbsy
+	.insn 0xf30f01e8
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -1814,6 +1814,9 @@ process_i386_opcodes (FILE *table)
       l = l1;
     }
 
+  fprintf (table, "  \"\\0\"\".insn\"\n");
+  fprintf (fp, "#define MN__insn %#x\n", offs + 1);
+
   fprintf (table, ";\n");
 
   fclose (fp);
--- a/opcodes/i386-mnem.h
+++ b/opcodes/i386-mnem.h
@@ -2339,3 +2339,4 @@ extern const char i386_mnemonics[];
 #define MN__rex_ 0x469c
 #define MN__evex_ 0x46a2
 #define MN__vex_ 0x46a9
+#define MN__insn 0x46af
--- a/opcodes/i386-tbl.h
+++ b/opcodes/i386-tbl.h
@@ -60199,6 +60199,7 @@ const char i386_mnemonics[] =
   "\0""{rex}"
   "\0""{evex}"
   "\0""{vex}"
+  "\0"".insn"
 ;
 
 /* i386 register table.  */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 02/14] x86: parse VEX and alike specifiers for .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
  2023-03-10 10:19 ` [PATCH v2 01/14] x86: introduce " Jan Beulich
@ 2023-03-10 10:19 ` Jan Beulich
  2023-03-10 10:20 ` [PATCH v2 03/14] x86: parse special opcode modifiers " Jan Beulich
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:19 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

All encoding spaces can be used this way; there's a certain risk that
the bits presently reserved could be used for other purposes down the
road, but people using .insn are expected to know what they're doing
anyway. Plus this way there's at least _some_ way to have those bits
set.

For now this will only allow operand-less insns to be encoded this way.
---
For now only numeric parts of the specifiers are handled in a case-
insensitive manner. All other parts have to use upper case letters.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -307,6 +307,9 @@ struct _i386_insn
     unsigned int prefixes;
     unsigned char prefix[MAX_PREFIXES];
 
+    /* .insn allows for reserved opcode spaces.  */
+    unsigned char insn_opcode_space;
+
     /* Register is in low 3 bits of opcode.  */
     bool short_form;
 
@@ -568,6 +571,9 @@ static expressionS im_expressions[MAX_IM
 /* Current operand we are working on.  */
 static int this_operand = -1;
 
+/* Are we processing a .insn directive?  */
+#define dot_insn() (i.tm.mnem_off == MN__insn)
+
 /* We support four different modes.  FLAG_CODE variable is used to distinguish
    these.  */
 
@@ -3648,6 +3654,8 @@ build_vex_prefix (const insn_template *t
     vector_length = avxscalar;
   else if (i.tm.opcode_modifier.vex == VEX256)
     vector_length = 1;
+  else if (dot_insn () && i.tm.opcode_modifier.vex == VEX128)
+    vector_length = 0;
   else
     {
       unsigned int op;
@@ -3715,7 +3723,9 @@ build_vex_prefix (const insn_template *t
 
       /* The high 3 bits of the second VEX byte are 1's compliment
 	 of RXB bits from REX.  */
-      i.vex.bytes[1] = (~i.rex & 0x7) << 5 | i.tm.opcode_space;
+      i.vex.bytes[1] = ((~i.rex & 7) << 5)
+		       | (!dot_insn () ? i.tm.opcode_space
+				       : i.insn_opcode_space);
 
       i.vex.bytes[2] = (w << 7
 			| register_specifier << 3
@@ -3851,7 +3861,9 @@ build_evex_prefix (void)
      bits from REX.  */
   gas_assert (i.tm.opcode_space >= SPACE_0F);
   gas_assert (i.tm.opcode_space <= SPACE_EVEXMAP6);
-  i.vex.bytes[1] = (~i.rex & 0x7) << 5 | i.tm.opcode_space;
+  i.vex.bytes[1] = ((~i.rex & 7) << 5)
+		   | (!dot_insn () ? i.tm.opcode_space
+				   : i.insn_opcode_space);
 
   /* The fifth bit of the second EVEX byte is 1's compliment of the
      REX_R bit in VREX.  */
@@ -3968,6 +3980,13 @@ build_evex_prefix (void)
 	case EVEX512:
 	  vec_length = 2 << 5;
 	  break;
+	case EVEX_L3:
+	  if (dot_insn ())
+	    {
+	      vec_length = 3 << 5;
+	      break;
+	    }
+	  /* Fall through.  */
 	default:
 	  abort ();
 	  break;
@@ -10742,6 +10761,7 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
   bad:
       *saved_ilp = saved_char;
       ignore_rest_of_line ();
+      i.tm.mnem_off = 0;
       return;
     }
   line += end - line;
@@ -10764,6 +10784,9 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 	  && (*e == '.' || is_space_char (*e)))
 	{
 	  xop = true;
+	  /* Arrange for build_vex_prefix() to emit 0x8f.  */
+	  i.tm.opcode_space = SPACE_XOP08;
+	  i.insn_opcode_space = n;
 	  line = e;
 	}
     }
@@ -10787,6 +10810,188 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 
   if (line > end && *line == '.')
     {
+      /* Length specifier (VEX.L, XOP.L, EVEX.L'L).  */
+      switch (line[1])
+	{
+	case 'L':
+	  switch (line[2])
+	    {
+	    case '0':
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX128;
+	      else
+		i.tm.opcode_modifier.vex = VEX128;
+	      break;
+
+	    case '1':
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX256;
+	      else
+		i.tm.opcode_modifier.vex = VEX256;
+	      break;
+
+	    case '2':
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX512;
+	      break;
+
+	    case '3':
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX_L3;
+	      break;
+
+	    case 'I':
+	      if (line[3] == 'G')
+		{
+		  if (evex)
+		    i.tm.opcode_modifier.evex = EVEXLIG;
+		  else
+		    i.tm.opcode_modifier.vex = VEXScalar; /* LIG */
+		  ++line;
+		}
+	      break;
+	    }
+
+	  if (i.tm.opcode_modifier.vex || i.tm.opcode_modifier.evex)
+	    line += 3;
+	  break;
+
+	case '1':
+	  if (line[2] == '2' && line[3] == '8')
+	    {
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX128;
+	      else
+		i.tm.opcode_modifier.vex = VEX128;
+	      line += 4;
+	    }
+	  break;
+
+	case '2':
+	  if (line[2] == '5' && line[3] == '6')
+	    {
+	      if (evex)
+		i.tm.opcode_modifier.evex = EVEX256;
+	      else
+		i.tm.opcode_modifier.vex = VEX256;
+	      line += 4;
+	    }
+	  break;
+
+	case '5':
+	  if (evex && line[2] == '1' && line[3] == '2')
+	    {
+	      i.tm.opcode_modifier.evex = EVEX512;
+	      line += 4;
+	    }
+	  break;
+	}
+    }
+
+  if (line > end && *line == '.')
+    {
+      /* embedded prefix (VEX.pp, XOP.pp, EVEX.pp).  */
+      switch (line[1])
+	{
+	case 'N':
+	  if (line[2] == 'P')
+	    line += 3;
+	  break;
+
+	case '6':
+	  if (line[2] == '6')
+	    {
+	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	      line += 3;
+	    }
+	  break;
+
+	case 'F': case 'f':
+	  if (line[2] == '3')
+	    {
+	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0XF3;
+	      line += 3;
+	    }
+	  else if (line[2] == '2')
+	    {
+	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0XF2;
+	      line += 3;
+	    }
+	  break;
+	}
+    }
+
+  if (line > end && !xop && *line == '.')
+    {
+      /* Encoding space (VEX.mmmmm, EVEX.mmmm).  */
+      switch (line[1])
+	{
+	case '0':
+	  if (TOUPPER (line[2]) != 'F')
+	    break;
+	  if (line[3] == '.' || is_space_char (line[3]))
+	    {
+	      i.insn_opcode_space = SPACE_0F;
+	      line += 3;
+	    }
+	  else if (line[3] == '3'
+		   && (line[4] == '8' || TOUPPER (line[4]) == 'A')
+		   && (line[5] == '.' || is_space_char (line[5])))
+	    {
+	      i.insn_opcode_space = line[4] == '8' ? SPACE_0F38 : SPACE_0F3A;
+	      line += 5;
+	    }
+	  break;
+
+	case 'M':
+	  if (ISDIGIT (line[2]) && line[2] != '0')
+	    {
+	      char *e;
+	      unsigned long n = strtoul (line + 2, &e, 10);
+
+	      if (n <= (evex ? 15 : 31)
+		  && (*e == '.' || is_space_char (*e)))
+		{
+		  i.insn_opcode_space = n;
+		  line = e;
+		}
+	    }
+	  break;
+	}
+    }
+
+  if (line > end && *line == '.' && line[1] == 'W')
+    {
+      /* VEX.W, XOP.W, EVEX.W  */
+      switch (line[2])
+	{
+	case '0':
+	  i.tm.opcode_modifier.vexw = VEXW0;
+	  break;
+
+	case '1':
+	  i.tm.opcode_modifier.vexw = VEXW1;
+	  break;
+
+	case 'I':
+	  if (line[3] == 'G')
+	    {
+	      i.tm.opcode_modifier.vexw = VEXWIG;
+	      ++line;
+	    }
+	  break;
+	}
+
+      if (i.tm.opcode_modifier.vexw)
+	line += 3;
+    }
+
+  if (line > end && *line && !is_space_char (*line))
+    {
+      /* Improve diagnostic a little.  */
+      if (*line == '.' && line[1] && !is_space_char (line[1]))
+	++line;
+      goto done;
     }
 
   input_line_pointer = line;
@@ -10815,24 +11020,30 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
     }
 
   /* Trim off encoding space.  */
-  if (j > 1 && !i.tm.opcode_space && (val >> ((j - 1) * 8)) == 0x0f)
+  if (j > 1 && !i.insn_opcode_space && (val >> ((j - 1) * 8)) == 0x0f)
     {
       uint8_t byte = val >> ((--j - 1) * 8);
 
-      i.tm.opcode_space = SPACE_0F;
+      i.insn_opcode_space = SPACE_0F;
       switch (byte & -(j > 1))
 	{
 	case 0x38:
-	  i.tm.opcode_space = SPACE_0F38;
+	  i.insn_opcode_space = SPACE_0F38;
 	  --j;
 	  break;
 	case 0x3a:
-	  i.tm.opcode_space = SPACE_0F3A;
+	  i.insn_opcode_space = SPACE_0F3A;
 	  --j;
 	  break;
 	}
+      i.tm.opcode_space = i.insn_opcode_space;
       val &= ((uint64_t)1 << (j * 8)) - 1;
     }
+  if (!i.tm.opcode_space && (vex || evex))
+    /* Arrange for build_vex_prefix() to properly emit 0xC4/0xC5.
+       Also avoid hitting abort() there or in build_evex_prefix().  */
+    i.tm.opcode_space = i.insn_opcode_space == SPACE_0F ? SPACE_0F
+						   : SPACE_0F38;
 
   if (j > 2)
     {
@@ -10842,12 +11053,33 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
   i.opcode_length = j;
   i.tm.base_opcode = val;
 
+  if (vex || xop)
+    {
+      if (!i.tm.opcode_modifier.vex)
+	i.tm.opcode_modifier.vex = VEXScalar; /* LIG */
+
+      build_vex_prefix (NULL);
+      i.rex &= REX_OPCODE;
+    }
+  else if (evex)
+    {
+      if (!i.tm.opcode_modifier.evex)
+	i.tm.opcode_modifier.evex = EVEXLIG;
+
+      build_evex_prefix ();
+      i.rex &= REX_OPCODE;
+    }
+
   output_insn ();
 
+ done:
   *saved_ilp = saved_char;
   input_line_pointer = line;
 
   demand_empty_rest_of_line ();
+
+  /* Make sure dot_insn() won't yield "true" anymore.  */
+  i.tm.mnem_off = 0;
 }
 
 #ifdef TE_PE
--- a/gas/testsuite/gas/i386/insn-32.d
+++ b/gas/testsuite/gas/i386/insn-32.d
@@ -11,4 +11,6 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
 [ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+[ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
+[ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
 #pass
--- a/gas/testsuite/gas/i386/insn-32.s
+++ b/gas/testsuite/gas/i386/insn-32.s
@@ -12,3 +12,7 @@ insn:
 
 	# setssbsy
 	.insn 0xf30f01e8
+
+	# vzeroall
+	.insn VEX.256.0F.WIG 0x77
+	.insn {vex3} VEX.L1 0x0f77
--- a/gas/testsuite/gas/i386/insn-64.d
+++ b/gas/testsuite/gas/i386/insn-64.d
@@ -11,4 +11,6 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
 [ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+[ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
+[ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
 #pass
--- a/gas/testsuite/gas/i386/insn-64.s
+++ b/gas/testsuite/gas/i386/insn-64.s
@@ -12,3 +12,7 @@ insn:
 
 	# setssbsy
 	.insn 0xf30f01e8
+
+	# vzeroall
+	.insn VEX.256.0F.WIG 0x77
+	.insn {vex3} VEX.L1 0x0f77
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -643,12 +643,14 @@ enum
 	3: 256bit EVEX prefix.
 	4: Length-ignored (LIG) EVEX prefix.
 	5: Length determined from actual operands.
+	6: L'L = 3 (reserved, .insn only)
    */
 #define EVEX512                1
 #define EVEX128                2
 #define EVEX256                3
 #define EVEXLIG                4
 #define EVEXDYN                5
+#define EVEX_L3                6
   EVex,
 
   /* AVX512 masking support:


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 03/14] x86: parse special opcode modifiers for .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
  2023-03-10 10:19 ` [PATCH v2 01/14] x86: introduce " Jan Beulich
  2023-03-10 10:19 ` [PATCH v2 02/14] x86: parse VEX and alike specifiers for .insn Jan Beulich
@ 2023-03-10 10:20 ` Jan Beulich
  2023-03-10 10:21 ` [PATCH v2 04/14] x86: re-work build_modrm_byte()'s register assignment Jan Beulich
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:20 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

So called "short form" encoding is specified by a trailing "+r", whereas
a possible extension opcode is specified by the usual "/<digit>". Take
these off the expression before handing it to get_absolute_expression().

Note that on targets where / starts a comment, --divide needs passing to
gas in order to make use of the extension opcode functionality.
---
I don't think it makes sense to further complicate things and also
consider the use of quotation inside the major opcode expression.
Ambiguities in particular with "/<digit>" can easily be resolved by
simply parenthesizing the actual expression part of the construct.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -10742,7 +10742,7 @@ signed_cons (int size)
 static void
 s_insn (int dummy ATTRIBUTE_UNUSED)
 {
-  char mnemonic[MAX_MNEM_SIZE], *line = input_line_pointer;
+  char mnemonic[MAX_MNEM_SIZE], *line = input_line_pointer, *ptr;
   char *saved_ilp = find_end_of_line (line, false), saved_char;
   const char *end;
   unsigned int j;
@@ -10768,6 +10768,7 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 
   current_templates = &tt;
   i.tm.mnem_off = MN__insn;
+  i.tm.extension_opcode = None;
 
   if (startswith (line, "VEX")
       && (line[3] == '.' || is_space_char (line[3])))
@@ -10994,10 +10995,46 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
       goto done;
     }
 
+  /* Before processing the opcode expression, find trailing "+r" or
+     "/<digit>" specifiers.  */
+  for (ptr = line; ; ++ptr)
+    {
+      unsigned long n;
+      char *e;
+
+      ptr = strpbrk (ptr, "+/,");
+      if (ptr == NULL || *ptr == ',')
+	break;
+
+      if (*ptr == '+' && ptr[1] == 'r'
+	  && (ptr[2] == ',' || (is_space_char (ptr[2]) && ptr[3] == ',')))
+	{
+	  *ptr = ' ';
+	  ptr[1] = ' ';
+	  i.short_form = true;
+	  break;
+	}
+
+      if (*ptr == '/' && ISDIGIT (ptr[1])
+	  && (n = strtoul (ptr + 1, &e, 8)) < 8
+	  && e == ptr + 2
+	  && (ptr[2] == ',' || (is_space_char (ptr[2]) && ptr[3] == ',')))
+	{
+	  *ptr = ' ';
+	  ptr[1] = ' ';
+	  i.tm.extension_opcode = n;
+	  i.tm.opcode_modifier.modrm = 1;
+	  break;
+	}
+    }
+
   input_line_pointer = line;
   val = get_absolute_expression ();
   line = input_line_pointer;
 
+  if (i.short_form && (val & 7))
+    as_warn ("`+r' assumes low three opcode bits to be clear");
+
   for (j = 1; j < sizeof(val); ++j)
     if (!(val >> (j * 8)))
       break;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 04/14] x86: re-work build_modrm_byte()'s register assignment
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (2 preceding siblings ...)
  2023-03-10 10:20 ` [PATCH v2 03/14] x86: parse special opcode modifiers " Jan Beulich
@ 2023-03-10 10:21 ` Jan Beulich
  2023-03-10 10:21 ` [PATCH v2 05/14] x86: VexVVVV is now merely a boolean Jan Beulich
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:21 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

The function has accumulated a number of special cases for no real
reason. Some were necessary because insn attributes (SwapSources in
particular) weren't suitably utilized instead. Note that the addition of
SwapSources actually increases consistency among the templates: Like
others which already have the attribute, these are all insns where the
VEX.VVVV-encoded register comes first (or last when looking at the SDM).

Note that the vexvvvv attribute now has merely boolean meaning anymore,
in line with the SDM long having dropped the NDS/NDD/DDS concept of
identifying encoding variants. The fallout will be taken care of
subsequently, though, to not further clutter the change here.

As to the TILEZERO special case: If more instructions like this
appeared, a new attribute would likely be the way to go. But as long as
it's only a single insn, going from the mnemonic is cheaper.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8056,6 +8056,22 @@ process_operands (void)
   else if (i.tm.opcode_modifier.immext)
     process_immext ();
 
+  /* TILEZERO is unusual in that it has a single operand encoded in ModR/M.reg,
+     not ModR/M.rm.  To avoid special casing this in build_modrm_byte(), fake a
+     new destination operand here, while converting the source one to register
+     number 0.  */
+  if (i.tm.mnem_off == MN_tilezero)
+    {
+      i.op[1].regs = i.op[0].regs;
+      i.op[0].regs -= i.op[0].regs->reg_num;
+      i.types[1] = i.types[0];
+      i.tm.operand_types[1] = i.tm.operand_types[0];
+      i.flags[1] = i.flags[0];
+      i.operands++;
+      i.reg_operands++;
+      i.tm.operands++;
+    }
+
   if (i.tm.opcode_modifier.sse2avx && i.tm.opcode_modifier.vexvvvv)
     {
       static const i386_operand_type regxmm = {
@@ -8287,17 +8303,27 @@ static const reg_entry *
 build_modrm_byte (void)
 {
   const reg_entry *default_seg = NULL;
-  unsigned int source, dest;
-  bool vex_3_sources = (i.reg_operands + i.mem_operands == 4);
+  unsigned int source = i.imm_operands - i.tm.opcode_modifier.immext
+			/* Compensate for kludge in md_assemble().  */
+			+ i.tm.operand_types[0].bitfield.imm1;
+  unsigned int dest = i.operands - 1 - i.tm.opcode_modifier.immext;
+  unsigned int v, op, reg_slot = ~0;
+
+  /* Accumulator (in particular %st), shift count (%cl), and alike need
+     to be skipped just like immediate operands do.  */
+  if (i.tm.operand_types[source].bitfield.instance)
+    ++source;
+  while (i.tm.operand_types[dest].bitfield.instance)
+    --dest;
+
+  for (op = source; op < i.operands; ++op)
+    if (i.tm.operand_types[op].bitfield.baseindex)
+      break;
 
-  if (vex_3_sources)
+  if (i.reg_operands + i.mem_operands + (i.tm.extension_opcode != None) == 4)
     {
-      unsigned int nds, reg_slot;
       expressionS *exp;
 
-      dest = i.operands - 1;
-      nds = dest - 1;
-
       /* There are 2 kinds of instructions:
 	 1. 5 operands: 4 register operands or 3 register operands
 	 plus 1 memory operand plus one Imm4 operand, VexXDS, and
@@ -8309,18 +8335,12 @@ build_modrm_byte (void)
 		  && i.tm.opcode_modifier.vexw
 		  && i.tm.operand_types[dest].bitfield.class == RegSIMD);
 
-      /* If VexW1 is set, the first non-immediate operand is the source and
-	 the second non-immediate one is encoded in the immediate operand.  */
-      if (i.tm.opcode_modifier.vexw == VEXW1)
-	{
-	  source = i.imm_operands;
-	  reg_slot = i.imm_operands + 1;
-	}
+      /* Of the first two non-immediate operands the one with the template
+	 not allowing for a memory one is encoded in the immediate operand.  */
+      if (source == op)
+	reg_slot = source + 1;
       else
-	{
-	  source = i.imm_operands + 1;
-	  reg_slot = i.imm_operands;
-	}
+	reg_slot = source++;
 
       if (i.imm_operands == 0)
 	{
@@ -8350,159 +8370,42 @@ build_modrm_byte (void)
 	      |= register_number (i.op[reg_slot].regs) << 4;
 	  gas_assert ((i.op[reg_slot].regs->reg_flags & RegVRex) == 0);
 	}
-
-      gas_assert (i.tm.operand_types[nds].bitfield.class == RegSIMD);
-      i.vex.register_specifier = i.op[nds].regs;
     }
-  else
-    source = dest = 0;
-
-  /* i.reg_operands MUST be the number of real register operands;
-     implicit registers do not count.  If there are 3 register
-     operands, it must be a instruction with VexNDS.  For a
-     instruction with VexNDD, the destination register is encoded
-     in VEX prefix.  If there are 4 register operands, it must be
-     a instruction with VEX prefix and 3 sources.  */
-  if (i.mem_operands == 0
-      && ((i.reg_operands == 2
-	   && i.tm.opcode_modifier.vexvvvv <= VEXXDS)
-	  || (i.reg_operands == 3
-	      && i.tm.opcode_modifier.vexvvvv == VEXXDS)
-	  || (i.reg_operands == 4 && vex_3_sources)))
-    {
-      switch (i.operands)
-	{
-	case 2:
-	  source = 0;
-	  break;
-	case 3:
-	  /* When there are 3 operands, one of them may be immediate,
-	     which may be the first or the last operand.  Otherwise,
-	     the first operand must be shift count register (cl) or it
-	     is an instruction with VexNDS. */
-	  gas_assert (i.imm_operands == 1
-		      || (i.imm_operands == 0
-			  && (i.tm.opcode_modifier.vexvvvv == VEXXDS
-			      || (i.types[0].bitfield.instance == RegC
-				  && i.types[0].bitfield.byte))));
-	  if (operand_type_check (i.types[0], imm)
-	      || (i.types[0].bitfield.instance == RegC
-		  && i.types[0].bitfield.byte))
-	    source = 1;
-	  else
-	    source = 0;
-	  break;
-	case 4:
-	  /* When there are 4 operands, the first two must be 8bit
-	     immediate operands. The source operand will be the 3rd
-	     one.
-
-	     For instructions with VexNDS, if the first operand
-	     an imm8, the source operand is the 2nd one.  If the last
-	     operand is imm8, the source operand is the first one.  */
-	  gas_assert ((i.imm_operands == 2
-		       && i.types[0].bitfield.imm8
-		       && i.types[1].bitfield.imm8)
-		      || (i.tm.opcode_modifier.vexvvvv == VEXXDS
-			  && i.imm_operands == 1
-			  && (i.types[0].bitfield.imm8
-			      || i.types[0].bitfield.imm8s
-			      || i.types[i.operands - 1].bitfield.imm8)));
-	  if (i.imm_operands == 2)
-	    source = 2;
-	  else
-	    {
-	      if (i.types[0].bitfield.imm8)
-		source = 1;
-	      else
-		source = 0;
-	    }
-	  break;
-	case 5:
-	  gas_assert (!is_evex_encoding (&i.tm));
-	  gas_assert (i.imm_operands == 1 && vex_3_sources);
-	  break;
-	default:
-	  abort ();
-	}
-
-      if (!vex_3_sources)
-	{
-	  dest = source + 1;
-
-	  if (i.tm.opcode_modifier.vexvvvv == VEXXDS)
-	    {
-	      /* For instructions with VexNDS, the register-only source
-		 operand must be a 32/64bit integer, XMM, YMM, ZMM, or mask
-		 register.  It is encoded in VEX prefix.  */
 
-	      i386_operand_type op;
-	      unsigned int vvvv;
-
-	      /* Swap two source operands if needed.  */
-	      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES)
-		{
-		  vvvv = source;
-		  source = dest;
-		}
-	      else
-		vvvv = dest;
+  for (v = source + 1; v < dest; ++v)
+    if (v != reg_slot)
+      break;
+  if (v >= dest)
+    v = ~0;
+  if (i.tm.extension_opcode != None)
+    {
+      if (dest != source)
+	v = dest;
+      dest = ~0;
+    }
+  gas_assert (source < dest);
+  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
+      && source != op)
+    {
+      unsigned int tmp = source;
 
-	      op = i.tm.operand_types[vvvv];
-	      if ((dest + 1) >= i.operands
-		  || ((op.bitfield.class != Reg
-		       || (!op.bitfield.dword && !op.bitfield.qword))
-		      && op.bitfield.class != RegSIMD
-		      && op.bitfield.class != RegMask))
-		abort ();
-	      i.vex.register_specifier = i.op[vvvv].regs;
-	      dest++;
-	    }
-	}
+      source = v;
+      v = tmp;
+    }
 
-      i.rm.mode = 3;
-      /* One of the register operands will be encoded in the i.rm.reg
-	 field, the other in the combined i.rm.mode and i.rm.regmem
-	 fields.  If no form of this instruction supports a memory
-	 destination operand, then we assume the source operand may
-	 sometimes be a memory operand and so we need to store the
-	 destination in the i.rm.reg field.  */
-      if (!i.tm.opcode_modifier.regmem
-	  && operand_type_check (i.tm.operand_types[dest], anymem) == 0)
-	{
-	  i.rm.reg = i.op[dest].regs->reg_num;
-	  i.rm.regmem = i.op[source].regs->reg_num;
-	  set_rex_vrex (i.op[dest].regs, REX_R, i.tm.opcode_modifier.sse2avx);
-	  set_rex_vrex (i.op[source].regs, REX_B, false);
-	}
-      else
-	{
-	  i.rm.reg = i.op[source].regs->reg_num;
-	  i.rm.regmem = i.op[dest].regs->reg_num;
-	  set_rex_vrex (i.op[dest].regs, REX_B, i.tm.opcode_modifier.sse2avx);
-	  set_rex_vrex (i.op[source].regs, REX_R, false);
-	}
-      if (flag_code != CODE_64BIT && (i.rex & REX_R))
-	{
-	  if (i.types[!i.tm.opcode_modifier.regmem].bitfield.class != RegCR)
-	    abort ();
-	  i.rex &= ~REX_R;
-	  add_prefix (LOCK_PREFIX_OPCODE);
-	}
+  if (v < MAX_OPERANDS)
+    {
+      gas_assert (i.tm.opcode_modifier.vexvvvv);
+      i.vex.register_specifier = i.op[v].regs;
     }
-  else
-    {			/* If it's not 2 reg operands...  */
-      unsigned int mem;
 
+  if (op < i.operands)
+    {
       if (i.mem_operands)
 	{
 	  unsigned int fake_zero_displacement = 0;
-	  unsigned int op;
 
-	  for (op = 0; op < i.operands; op++)
-	    if (i.flags[op] & Operand_Mem)
-	      break;
-	  gas_assert (op < i.operands);
+	  gas_assert (i.flags[op] & Operand_Mem);
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
@@ -8732,140 +8635,62 @@ build_modrm_byte (void)
 	      exp->X_add_symbol = (symbolS *) 0;
 	      exp->X_op_symbol = (symbolS *) 0;
 	    }
+	}
+    else
+	{
+      i.rm.mode = 3;
+      i.rm.regmem = i.op[op].regs->reg_num;
+      set_rex_vrex (i.op[op].regs, REX_B, false);
+	}
 
-	  mem = op;
+      if (op == dest)
+	dest = ~0;
+      if (op == source)
+	source = ~0;
+    }
+  else
+    {
+      i.rm.mode = 3;
+      if (!i.tm.opcode_modifier.regmem)
+	{
+	  gas_assert (source < MAX_OPERANDS);
+	  i.rm.regmem = i.op[source].regs->reg_num;
+	  set_rex_vrex (i.op[source].regs, REX_B,
+			dest >= MAX_OPERANDS && i.tm.opcode_modifier.sse2avx);
+	  source = ~0;
 	}
       else
-	mem = ~0;
-
-      if (i.tm.opcode_modifier.vexvvvv == VEXLWP)
 	{
-	  i.vex.register_specifier = i.op[2].regs;
-	  if (!i.mem_operands)
-	    {
-	      i.rm.mode = 3;
-	      i.rm.regmem = i.op[1].regs->reg_num;
-	      if ((i.op[1].regs->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
-	    }
+	  gas_assert (dest < MAX_OPERANDS);
+	  i.rm.regmem = i.op[dest].regs->reg_num;
+	  set_rex_vrex (i.op[dest].regs, REX_B, i.tm.opcode_modifier.sse2avx);
+	  dest = ~0;
 	}
-      /* Fill in i.rm.reg or i.rm.regmem field with register operand
-	 (if any) based on i.tm.extension_opcode.  Again, we must be
-	 careful to make sure that segment/control/debug/test/MMX
-	 registers are coded into the i.rm.reg field.  */
-      else if (i.reg_operands)
-	{
-	  unsigned int op;
-	  unsigned int vex_reg = ~0;
-
-	  for (op = 0; op < i.operands; op++)
-	    if (i.types[op].bitfield.class == Reg
-		|| i.types[op].bitfield.class == RegBND
-		|| i.types[op].bitfield.class == RegMask
-		|| i.types[op].bitfield.class == SReg
-		|| i.types[op].bitfield.class == RegCR
-		|| i.types[op].bitfield.class == RegDR
-		|| i.types[op].bitfield.class == RegTR
-		|| i.types[op].bitfield.class == RegSIMD
-		|| i.types[op].bitfield.class == RegMMX)
-	      break;
-
-	  if (vex_3_sources)
-	    op = dest;
-	  else if (i.tm.opcode_modifier.vexvvvv == VEXXDS)
-	    {
-	      /* For instructions with VexNDS, the register-only
-		 source operand is encoded in VEX prefix. */
-	      gas_assert (mem != (unsigned int) ~0);
-
-	      if (op > mem || i.tm.cpu_flags.bitfield.cpucmpccxadd)
-		{
-		  vex_reg = op++;
-		  gas_assert (op < i.operands);
-		}
-	      else
-		{
-		  /* Check register-only source operand when two source
-		     operands are swapped.  */
-		  if (!i.tm.operand_types[op].bitfield.baseindex
-		      && i.tm.operand_types[op + 1].bitfield.baseindex)
-		    {
-		      vex_reg = op;
-		      op += 2;
-		      gas_assert (mem == (vex_reg + 1)
-				  && op < i.operands);
-		    }
-		  else
-		    {
-		      vex_reg = op + 1;
-		      gas_assert (vex_reg < i.operands);
-		    }
-		}
-	    }
-	  else if (i.tm.opcode_modifier.vexvvvv == VEXNDD)
-	    {
-	      /* For instructions with VexNDD, the register destination
-		 is encoded in VEX prefix.  */
-	      if (i.mem_operands == 0)
-		{
-		  /* There is no memory operand.  */
-		  gas_assert ((op + 2) == i.operands);
-		  vex_reg = op + 1;
-		}
-	      else
-		{
-		  /* There are only 2 non-immediate operands.  */
-		  gas_assert (op < i.imm_operands + 2
-			      && i.operands == i.imm_operands + 2);
-		  vex_reg = i.imm_operands + 1;
-		}
-	    }
-	  else
-	    gas_assert (op < i.operands);
-
-	  if (vex_reg != (unsigned int) ~0)
-	    {
-	      i386_operand_type *type = &i.tm.operand_types[vex_reg];
-
-	      if ((type->bitfield.class != Reg
-		   || (!type->bitfield.dword && !type->bitfield.qword))
-		  && type->bitfield.class != RegSIMD
-		  && type->bitfield.class != RegMask)
-		abort ();
-
-	      i.vex.register_specifier = i.op[vex_reg].regs;
-	    }
-
-	  /* Don't set OP operand twice.  */
-	  if (vex_reg != op)
-	    {
-	      /* If there is an extension opcode to put here, the
-		 register number must be put into the regmem field.  */
-	      if (i.tm.extension_opcode != None)
-		{
-		  i.rm.regmem = i.op[op].regs->reg_num;
-		  set_rex_vrex (i.op[op].regs, REX_B,
-				i.tm.opcode_modifier.sse2avx);
-		}
-	      else
-		{
-		  i.rm.reg = i.op[op].regs->reg_num;
-		  set_rex_vrex (i.op[op].regs, REX_R,
-				i.tm.opcode_modifier.sse2avx);
-		}
-	    }
+    }
 
-	  /* Now, if no memory operand has set i.rm.mode = 0, 1, 2 we
-	     must set it to 3 to indicate this is a register operand
-	     in the regmem field.  */
-	  if (!i.mem_operands)
-	    i.rm.mode = 3;
-	}
+  /* Fill in i.rm.reg field with extension opcode (if any) or the
+     appropriate register.  */
+  if (i.tm.extension_opcode != None)
+    i.rm.reg = i.tm.extension_opcode;
+  else if (!i.tm.opcode_modifier.regmem && dest < MAX_OPERANDS)
+    {
+      i.rm.reg = i.op[dest].regs->reg_num;
+      set_rex_vrex (i.op[dest].regs, REX_R, i.tm.opcode_modifier.sse2avx);
+    }
+  else
+    {
+      gas_assert (source < MAX_OPERANDS);
+      i.rm.reg = i.op[source].regs->reg_num;
+      set_rex_vrex (i.op[source].regs, REX_R, false);
+    }
 
-      /* Fill in i.rm.reg field with extension opcode (if any).  */
-      if (i.tm.extension_opcode != None)
-	i.rm.reg = i.tm.extension_opcode;
+  if (flag_code != CODE_64BIT && (i.rex & REX_R))
+    {
+      gas_assert (i.types[!i.tm.opcode_modifier.regmem].bitfield.class == RegCR);
+      i.rex &= ~REX_R;
+      add_prefix (LOCK_PREFIX_OPCODE);
     }
+
   return default_seg;
 }
 
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -1749,18 +1749,18 @@ vpsravd, 0x6646, AVX2, Modrm|Vex|Space0F
 vpsrlv<dq>, 0x6645, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX gather instructions
-vgatherdpd, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vgatherdps, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vgatherdps, 0x6692, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
-vgatherqp<sd>, 0x6693, AVX2, Modrm|Vex|Space0F38|VexVVVV|<sd:vexw>|NoSuf|VecSIB128, { RegXMM, <sd:elem>|Unspecified|BaseIndex, RegXMM }
-vgatherqpd, 0x6693, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW1|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
-vgatherqps, 0x6693, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherdd, 0x6690, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherdd, 0x6690, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
-vpgatherdq, 0x6690, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vpgatherq<dq>, 0x6691, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|NoSuf|VecSIB128, { RegXMM, <dq:elem>|Unspecified|BaseIndex, RegXMM }
-vpgatherqd, 0x6691, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW0|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
-vpgatherqq, 0x6691, AVX2, Modrm|Vex=2|Space0F38|VexVVVV|VexW1|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
+vgatherdpd, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vgatherdps, 0x6692, AVX2, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vgatherdps, 0x6692, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
+vgatherqp<sd>, 0x6693, AVX2, Modrm|Vex|Space0F38|VexVVVV|<sd:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <sd:elem>|Unspecified|BaseIndex, RegXMM }
+vgatherqpd, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
+vgatherqps, 0x6693, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherdd, 0x6690, AVX2, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB128, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherdd, 0x6690, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegYMM, Dword|Unspecified|BaseIndex, RegYMM }
+vpgatherdq, 0x6690, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vpgatherq<dq>, 0x6691, AVX2, Modrm|Vex128|Space0F38|VexVVVV|<dq:vexw>|SwapSources|NoSuf|VecSIB128, { RegXMM, <dq:elem>|Unspecified|BaseIndex, RegXMM }
+vpgatherqd, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf|VecSIB256, { RegXMM, Dword|Unspecified|BaseIndex, RegXMM }
+vpgatherqq, 0x6691, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW1|SwapSources|NoSuf|VecSIB256, { RegYMM, Qword|Unspecified|BaseIndex, RegYMM }
 
 // AES + AVX
 
@@ -3321,7 +3321,7 @@ prefetchit1, 0xf18/6, PREFETCHI|x64, Mod
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64, Modrm|Vex|Space0F38|VexVVVV|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD|x64, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 05/14] x86: VexVVVV is now merely a boolean
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (3 preceding siblings ...)
  2023-03-10 10:21 ` [PATCH v2 04/14] x86: re-work build_modrm_byte()'s register assignment Jan Beulich
@ 2023-03-10 10:21 ` Jan Beulich
  2023-03-10 10:22 ` [PATCH v2 06/14] x86: drop "shimm" special case template expansions Jan Beulich
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:21 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

With the SDM long having dropped the NDS/NDD/DDS concept of identifying
encoding variants, we can finally do away with this concept as well. Of
the few consumers of the attribute, only an assertion was still checking
for a particular value, which we don't really need to retain.

When touching lines anyway, modernize other aspects as well. This often
improves similarity to adjacent lines.
---
Actually, apparently except for SSE2AVX, the attribute should not be
necessary anymore at all. Like done for .insn, whether an operand needs
encoding in VEX.VVVV can be derived from the number of operands (when
counting extension_opcode != None as one operand).

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8331,7 +8331,7 @@ build_modrm_byte (void)
 	 ZMM register.
 	 2. 4 operands: 4 register operands or 3 register operands
 	 plus 1 memory operand, with VexXDS.  */
-      gas_assert (i.tm.opcode_modifier.vexvvvv == VEXXDS
+      gas_assert (i.tm.opcode_modifier.vexvvvv
 		  && i.tm.opcode_modifier.vexw
 		  && i.tm.operand_types[dest].bitfield.class == RegSIMD);
 
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -583,23 +583,8 @@ enum
   Vex,
   /* How to encode VEX.vvvv:
      0: VEX.vvvv must be 1111b.
-     1: VEX.NDS.  Register-only source is encoded in VEX.vvvv where
-	the content of source registers will be preserved.
-	VEX.DDS.  The second register operand is encoded in VEX.vvvv
-	where the content of first source register will be overwritten
-	by the result.
-	VEX.NDD2.  The second destination register operand is encoded in
-	VEX.vvvv for instructions with 2 destination register operands.
-	For assembler, there are no difference between VEX.NDS, VEX.DDS
-	and VEX.NDD2.
-     2. VEX.NDD.  Register destination is encoded in VEX.vvvv for
-     instructions with 1 destination register operand.
-     3. VEX.LWP.  Register destination is encoded in VEX.vvvv and one
-	of the operands can access a memory location.
-   */
-#define VEXXDS	1
-#define VEXNDD	2
-#define VEXLWP	3
+     1: VEX.vvvv encodes one of the register operands.
+   */
   VexVVVV,
   /* How the VEX.W bit is used:
      0: Set by the REX.W bit.
@@ -735,7 +720,7 @@ typedef struct i386_opcode_modifier
   unsigned int immext:1;
   unsigned int norex64:1;
   unsigned int vex:2;
-  unsigned int vexvvvv:2;
+  unsigned int vexvvvv:1;
   unsigned int vexw:2;
   unsigned int opcodeprefix:2;
   unsigned int sib:3;
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -975,12 +975,12 @@ pause, 0xf390, i186, NoSuf, {}
 // MMX/SSE2 instructions.
 
 <mmx:cpu:pfx:attr:shimm:reg:mem, +
-    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV=2|VexW0|SSE2AVX:RegXMM:Xmmword, +
+    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
     $sse:SSE2:66:::RegXMM:Xmmword, +
     $mmx:MMX::::RegMMX:Qword>
 
 <sse2:cpu:attr:scal:vvvv:shimm, +
-    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV=2|VexW0|SSE2AVX, +
+    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV|VexW0|SSE2AVX, +
     $sse:SSE2::::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
@@ -1079,8 +1079,8 @@ comiss<sse>, 0x0f2f, <sse:cpu>, Modrm|<s
 cvtpi2ps, 0xf2a, SSE, Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
 cvtps2pi, 0xf2d, SSE, Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
 cvtsi2ss<sse>, 0xf30f2a, <sse:cpu>|No64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
-cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2ss, 0xf32a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, SSE|x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, SSE|x64, Modrm|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtss2si, 0xf32d, AVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
@@ -1110,7 +1110,7 @@ movntps<sse>, 0x0f2b, <sse:cpu>, Modrm|<
 movntq, 0xfe7, SSE|3dnowA, Modrm|NoSuf, { RegMMX, Qword|Unspecified|BaseIndex }
 movntdq<sse2>, 0x660fe7, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
 movss, 0xf310, AVX, D|Modrm|VexLIG|Space0F|VexW0|NoSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
-movss, 0xf310, AVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|NoSuf|SSE2AVX, { RegXMM, RegXMM }
+movss, 0xf310, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
 movss, 0xf30f10, SSE, D|Modrm|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movups<sse>, 0x0f10, <sse:cpu>, D|Modrm|<sse:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulps<sse>, 0x0f59, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1174,8 +1174,8 @@ cvtpi2pd, 0x660f2a, SSE2, Modrm|NoSuf, {
 cvtpi2pd, 0xf3e6, AVX, Modrm|Vex|Space0F|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtpi2pd, 0x660f2a, SSE2, Modrm|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd<sse2>, 0xf20f2a, <sse2:cpu>|No64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
-cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
+cvtsi2sd, 0xf22a, AVX|x64, Modrm|Vex=3|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf20f2a, SSE2|x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf20f2a, SSE2|x64, Modrm|No_bSuf|No_wSuf|No_sSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 divpd<sse2>, 0x660f5e, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1194,7 +1194,7 @@ movlpd, 0x660f12, SSE2, D|Modrm|NoSuf, {
 movmskpd<sse2>, 0x660f50, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
 movsd, 0xf210, AVX, D|Modrm|VexLIG|Space0F|VexW0|NoSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movsd, 0xf210, AVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|NoSuf|SSE2AVX, { RegXMM, RegXMM }
+movsd, 0xf210, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexW0|NoSuf|SSE2AVX, { RegXMM, RegXMM }
 movsd, 0xf20f10, SSE2, D|Modrm|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, <sse2:cpu>, D|Modrm|<sse2:attr>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulpd<sse2>, 0x660f59, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1373,7 +1373,7 @@ phminposuw<sse41>, 0x660f3841, <sse41:cp
 pinsrb<sse41>, 0x660f3a20, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
 pinsrb<sse41>, 0x660f3a20, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM }
 pinsrd<sse41>, 0x660f3a22, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf|IgnoreSize, { Imm8, Reg32|Unspecified|BaseIndex, RegXMM }
-pinsrq, 0x6622, AVX|x64, Modrm|Vex|Space0F3A|VexVVVV=1|VexW1|NoSuf|SSE2AVX, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
+pinsrq, 0x6622, AVX|x64, Modrm|Vex|Space0F3A|VexVVVV|VexW1|NoSuf|SSE2AVX, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
 pinsrq, 0x660f3a22, SSE4_1|x64, Modrm|Size64|NoSuf, { Imm8, Reg64|Unspecified|BaseIndex, RegXMM }
 pmaxsb<sse41>, 0x660f383c, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pmaxsd<sse41>, 0x660f383d, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1443,10 +1443,10 @@ aeskeygenassist<aes>, 0x660f3adf, <aes:c
 
 // VAES
 
-vaesdec, 0x66de, VAES, Modrm|Vex=2|Space0F38|VexVVVV=1|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
-vaesdeclast, 0x66df, VAES, Modrm|Vex=2|Space0F38|VexVVVV=1|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
-vaesenc, 0x66dc, VAES, Modrm|Vex=2|Space0F38|VexVVVV=1|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
-vaesenclast, 0x66dd, VAES, Modrm|Vex=2|Space0F38|VexVVVV=1|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
+vaesdec, 0x66de, VAES, Modrm|Vex256|Space0F38|VexVVVV|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
+vaesdeclast, 0x66df, VAES, Modrm|Vex256|Space0F38|VexVVVV|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
+vaesenc, 0x66dc, VAES, Modrm|Vex256|Space0F38|VexVVVV|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
+vaesenclast, 0x66dd, VAES, Modrm|Vex256|Space0F38|VexVVVV|VexWIG|NoSuf, { RegYMM|Unspecified|BaseIndex, RegYMM, RegYMM }
 
 // PCLMUL
 
@@ -1487,8 +1487,8 @@ gf2p8mulb<gfni>, 0x660f38cf, <gfni:cpu>G
 
 vaddp<sd>, 0x<sd:ppfx>58, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vadds<sd>, 0x<sd:spfx>58, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaddsubpd, 0x66d0, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vaddsubps, 0xf2d0, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vaddsubpd, 0x66d0, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vaddsubps, 0xf2d0, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vandnp<sd>, 0x<sd:ppfx>55, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vandp<sd>, 0x<sd:ppfx>54, AVX, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vblendp<sd>, 0x660c | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1519,16 +1519,16 @@ vcvttps2dq, 0xf35b, AVX, Modrm|Vex|Space
 vcvtts<sd>2si, 0x<sd:spfx>2c, AVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vdivp<sd>, 0x<sd:ppfx>5e, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vdivs<sd>, 0x<sd:spfx>5e, AVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vdppd, 0x6641, AVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vdpps, 0x6640, AVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vdppd, 0x6641, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vdpps, 0x6640, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vextractf128, 0x6619, AVX, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
 vextractps, 0x6617, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, AVX|x64, RegMem|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg64 }
-vhaddpd, 0x667c, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhaddps, 0xf27c, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhsubpd, 0x667d, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vhsubps, 0xf27d, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vinsertf128, 0x6618, AVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
+vhaddpd, 0x667c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhaddps, 0xf27c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhsubpd, 0x667d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vhsubps, 0xf27d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vinsertf128, 0x6618, AVX, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
 vinsertps, 0x6621, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vlddqu, 0xf2f0, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vldmxcsr, 0xae/2, AVX, Modrm|Vex128|Space0F|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex }
@@ -1551,10 +1551,10 @@ vmovddup, 0xf212, AVX, Modrm|Vex|Space0F
 vmovddup, 0xf212, AVX, Modrm|Vex=2|Space0F|VexWIG|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
 vmovdqa, 0x666f, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, AVX, D|Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vmovhlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovhlps, 0x12, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
 vmovhp<sd>, 0x<sd:ppfx>16, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovhp<sd>, 0x<sd:ppfx>17, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovlhps, 0x16, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM }
 vmovlp<sd>, 0x<sd:ppfx>12, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovlp<sd>, 0x<sd:ppfx>13, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovmskp<sd>, 0x<sd:ppfx>50, AVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { RegXMM|RegYMM, Reg32|Reg64 }
@@ -1602,7 +1602,7 @@ vpcmpgtd, 0x6666, AVX|AVX2, Modrm|Vex|Sp
 vpcmpgtq, 0x6637, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpcmpistri, 0x6663, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 vpcmpistrm, 0x6662, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
-vperm2f128, 0x6606, AVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vperm2f128, 0x6606, AVX, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpermilp<sd>, 0x660c | <sd:opc>, AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpermilp<sd>, 0x6604 | <sd:opc>, AVX, Modrm|Vex|Space0F3A|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpextr<dq>, 0x6616, AVX|<dq:cpu64>, Modrm|Vex|Space0F3A|<dq:vexw64>|NoSuf, { Imm8, RegXMM, <dq:gpr>|Unspecified|BaseIndex }
@@ -1616,10 +1616,10 @@ vphminposuw, 0x6641, AVX, Modrm|Vex|Spac
 vphsubd, 0x6606, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vphsubsw, 0x6607, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vphsubw, 0x6605, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
 vpinsrb, 0x6620, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
 vpinsr<dq>, 0x6622, AVX|<dq:cpu64>, Modrm|Vex|Space0F3A|VexVVVV|<dq:vexw64>|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
 vpinsrw, 0x66c4, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 vpmaddubsw, 0x6604, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpmaddwd, 0x66f5, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1663,19 +1663,19 @@ vpshufhw, 0xf370, AVX|AVX2, Modrm|Vex|Sp
 vpshuflw, 0xf270, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vpsign<bw>, 0x6608 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsignd, 0x660a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsll<dq>, 0x6672 | <dq:opc>/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsll<dq>, 0x6672 | <dq:opc>/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsll<dq>, 0x66f2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpslldq, 0x6673/7, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsllw, 0x6671/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpslldq, 0x6673/7, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsllw, 0x6671/6, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsllw, 0x66f1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrad, 0x6672/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrad, 0x6672/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsrad, 0x66e2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsraw, 0x6671/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsraw, 0x6671/4, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsraw, 0x66e1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsrl<dq>, 0x66d2 | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrldq, 0x6673/3, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsrlw, 0x6671/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV=2|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrldq, 0x6673/3, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsrlw, 0x6671/2, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsrlw, 0x66d1, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsub<bw>, 0x66f8 | <bw:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsub<dq>, 0x66fa | <dq:opc>, AVX|AVX2, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf|Optimize, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1736,16 +1736,16 @@ vpblendd, 0x6602, AVX2, Modrm|Vex|Space0
 vpbroadcast<bw>, 0x6678 | <bw:opc>, AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { <bw:elem>|Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM }
 vpbroadcast<dq>, 0x6658 | <dq:opc>, AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { <dq:elem>|Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM }
 vperm2i128, 0x6646, AVX2, Modrm|Vex=2|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpermd, 0x6636, AVX2, Modrm|Vex=2|Space0F38|VexVVVV=1|VexW=1|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpermd, 0x6636, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpermpd, 0x6601, AVX2, Modrm|Vex=2|Space0F3A|VexW1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM }
-vpermps, 0x6616, AVX2, Modrm|Vex=2|Space0F38|VexVVVV=1|VexW=1|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpermps, 0x6616, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpermq, 0x6600, AVX2, Modrm|Vex=2|Space0F3A|VexW1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM }
 vextracti128, 0x6639, AVX2, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
-vinserti128, 0x6638, AVX2, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
+vinserti128, 0x6638, AVX2, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
 vpmaskmov<dq>, 0x668e, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vpmaskmov<dq>, 0x668c, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsllv<dq>, 0x6647, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vpsravd, 0x6646, AVX2, Modrm|Vex|Space0F38|VexVVVV=1|VexW=1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vpsravd, 0x6646, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vpsrlv<dq>, 0x6645, AVX2, Modrm|Vex|Space0F38|VexVVVV|<dq:vexw>|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX gather instructions
@@ -1764,26 +1764,26 @@ vpgatherqq, 0x6691, AVX2, Modrm|Vex256|S
 
 // AES + AVX
 
-vaesdec, 0x66de, AVX|AES, Modrm|Vex|Space0F38|VexVVVV=1|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesdeclast, 0x66df, AVX|AES, Modrm|Vex|Space0F38|VexVVVV=1|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesenc, 0x66dc, AVX|AES, Modrm|Vex|Space0F38|VexVVVV=1|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vaesenclast, 0x66dd, AVX|AES, Modrm|Vex|Space0F38|VexVVVV=1|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesdec, 0x66de, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesdeclast, 0x66df, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesenc, 0x66dc, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vaesenclast, 0x66dd, AVX|AES, Modrm|Vex|Space0F38|VexVVVV|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vaesimc, 0x66db, AVX|AES, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegXMM }
 vaeskeygenassist, 0x66df, AVX|AES, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 
 // PCLMUL + AVX
 
 vpclmulqdq, 0x6644, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmullqlqdq, 0x6644/0x00, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmulhqlqdq, 0x6644/0x01, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmullqhqdq, 0x6644/0x10, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpclmulhqhqdq, 0x6644/0x11, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmullqlqdq, 0x6644/0x00, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmulhqlqdq, 0x6644/0x01, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmullqhqdq, 0x6644/0x10, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpclmulhqhqdq, 0x6644/0x11, AVX|PCLMUL, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf|ImmExt, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 
 // GFNI + AVX
 
-vgf2p8affineinvqb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=2|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vgf2p8affineqb, 0x66ce, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=2|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
-vgf2p8mulb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F38|VexVVVV=1|VexW=1|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vgf2p8affineinvqb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vgf2p8affineqb, 0x66ce, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
+vgf2p8mulb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // FSGSBASE, RDRND and F16C
 
@@ -1824,14 +1824,15 @@ xend, 0xf01d5, RTM, NoSuf, {}
 xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+
+bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1896,29 +1897,30 @@ vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Mod
 
 llwpcb, 0x12/0, LWP, Modrm|SpaceXOP09|NoSuf|Vex, { Reg32|Reg64 }
 slwpcb, 0x12/1, LWP, Modrm|SpaceXOP09|NoSuf|Vex, { Reg32|Reg64 }
-lwpval, 0x12/1, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
-lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpval, 0x12/1, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=1|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
-bextr, 0x10, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP0A|VexVVVV=0|No_bSuf|No_wSuf|No_sSuf, { Imm32|Imm32S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blcfill, 0x01/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blci, 0x02/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blcic, 0x01/5, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blcmsk, 0x02/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blcs, 0x01/3, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blsfill, 0x01/2, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-blsic, 0x01/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-t1mskc, 0x01/7, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-tzmsk, 0x01/4, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV=2|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+
+bextr, 0x10, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP0A|No_bSuf|No_wSuf|No_sSuf, { Imm32|Imm32S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcfill, 0x01/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blci, 0x02/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcic, 0x01/5, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcmsk, 0x02/1, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blcs, 0x01/3, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsfill, 0x01/2, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsic, 0x01/6, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+t1mskc, 0x01/7, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+tzmsk, 0x01/4, TBM, Modrm|CheckOperandSize|Vex128|SpaceXOP09|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // AMD 3DNow! instructions.
 
@@ -2044,10 +2046,10 @@ sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, {
 // VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ, Modrm|Vex=2|Space0F3A|VexWIG|VexVVVV=1|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ, Modrm|Vex=2|Space0F3A|VexWIG|VexVVVV=1|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ, Modrm|Vex=2|Space0F3A|VexWIG|VexVVVV=1|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ, Modrm|Vex=2|Space0F3A|VexWIG|VexVVVV=1|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf|ImmExt, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 
 // VPCLMULQDQ instructions end
 
@@ -2085,7 +2087,7 @@ kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, M
 kshiftl<bw>, 0x6632, <bw:kcpu>, Modrm|Vex128|Space0F3A|<bw:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 kshiftr<bw>, 0x6630, <bw:kcpu>, Modrm|Vex128|Space0F3A|<bw:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 
-kunpckbw, 0x664B, AVX512F, Modrm|Vex=2|Space0F|VexVVVV=1|VexW=1|NoSuf, { RegMask, RegMask, RegMask }
+kunpckbw, 0x664B, AVX512F, Modrm|Vex=2|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
 vaddp<sdh>, 0x<sdh:ppfx>58, <sdh:cpu>, Modrm|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vdivp<sdh>, 0x<sdh:ppfx>5e, <sdh:cpu>, Modrm|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
@@ -2099,25 +2101,25 @@ vmuls<sdh>, 0x<sdh:spfx>59, <sdh:cpu>, M
 vsqrts<sdh>, 0x<sdh:spfx>51, <sdh:cpu>, Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 vsubs<sdh>, 0x<sdh:spfx>5C, <sdh:cpu>, Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-valign<dq>, 0x6603, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+valign<dq>, 0x6603, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vblendmp<sd>, 0x6665, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpblendm<dq>, 0x6664, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermi2<dq>, 0x6676, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpblendm<dq>, 0x6664, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermi2<dq>, 0x6676, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpermi2p<sd>, 0x6677, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpermt2<dq>, 0x667E, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpermt2<dq>, 0x667E, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpermt2p<sd>, 0x667F, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxs<dq>, 0x663D, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxu<dq>, 0x663F, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmins<dq>, 0x6639, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminu<dq>, 0x663B, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmuldq, 0x6628, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulld, 0x6640, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vprolv<dq>, 0x6615, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vprorv<dq>, 0x6614, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsllv<dq>, 0x6647, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrav<dq>, 0x6646, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlv<dq>, 0x6645, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpternlog<dq>, 0x6625, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxs<dq>, 0x663D, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxu<dq>, 0x663F, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmins<dq>, 0x6639, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminu<dq>, 0x663B, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmuldq, 0x6628, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulld, 0x6640, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vprolv<dq>, 0x6615, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vprorv<dq>, 0x6614, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsllv<dq>, 0x6647, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrav<dq>, 0x6646, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrlv<dq>, 0x6645, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpternlog<dq>, 0x6625, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vbroadcastf32x4, 0x661A, AVX512F, Modrm|Masking=3|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { XMMword|Unspecified|BaseIndex, RegYMM|RegZMM }
 vbroadcasti32x4, 0x665A, AVX512F, Modrm|Masking=3|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { XMMword|Unspecified|BaseIndex, RegYMM|RegZMM }
@@ -2258,11 +2260,11 @@ vmovntdqa, 0x662A, AVX512F, Modrm|Space0
 vgetexpp<sdh>, 0x6642, <sdh:cpu>, Modrm|Masking=3|<sdh:spc2>|<sdh:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vgetexps<sdh>, 0x6643, <sdh:cpu>, Modrm|EVexLIG|Masking=3|<sdh:spc2>|VexVVVV|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vinsertf32x4, 0x6618, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vinserti32x4, 0x6638, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinsertf32x4, 0x6618, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinserti32x4, 0x6638, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
-vinsertf64x4, 0x661A, AVX512F, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
-vinserti64x4, 0x663A, AVX512F, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinsertf64x4, 0x661A, AVX512F, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinserti64x4, 0x663A, AVX512F, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV|VexW1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
 
 vinsertps, 0x6621, AVX512F, Modrm|EVex128|Space0F3A|VexVVVV|VexW0|Disp8MemShift=2|NoSuf, { Imm8, RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
@@ -2286,8 +2288,8 @@ vmovntdq, 0x66E7, AVX512F, Modrm|Space0F
 vmovdqu32, 0xF36F, AVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqu64, 0xF36F, AVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vmovhlps, 0x12, AVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|NoSuf, { RegXMM, RegXMM, RegXMM }
-vmovlhps, 0x16, AVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovhlps, 0x12, AVX512F, Modrm|EVex=4|Space0F|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
+vmovlhps, 0x16, AVX512F, Modrm|EVex=4|Space0F|VexVVVV|VexW0|NoSuf, { RegXMM, RegXMM, RegXMM }
 
 vmovhp<sd>, 0x<sd:ppfx>16, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vmovhp<sd>, 0x<sd:ppfx>17, AVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex }
@@ -2305,24 +2307,24 @@ vmovshdup, 0xF316, AVX512F, Modrm|Maskin
 vmovsldup, 0xF312, AVX512F, Modrm|Masking=3|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpabs<dq>, 0x661e | <dq:opc>, AVX512F, Modrm|Masking=3|Space0F38|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpaddd, 0x66FE, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpaddd, 0x66FE, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpaddq, 0x66d4, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpand<dq>, 0x66db, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpandn<dq>, 0x66df, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpmuludq, 0x66f4, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpor<dq>, 0x66eb, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpsub<dq>, 0x66fa | <dq:opc>, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhdq, 0x666A, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhdq, 0x666A, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpunpckhqdq, 0x666d, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckldq, 0x6662, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckldq, 0x6662, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpunpcklqdq, 0x666c, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpxor<dq>, 0x66ef, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 <irel:imm, eq:0, lt:1, le:2, neq:4, nlt:5, nle:6>
 
-vpcmpeqd, 0x6676, AVX512F, Modrm|Masking=2|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpeqd, 0x6676, AVX512F, Modrm|Masking=2|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmpeqq, 0x6629, AVX512F, Modrm|Masking=2|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
-vpcmpgtd, 0x6666, AVX512F, Modrm|Masking=2|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpcmpgtd, 0x6666, AVX512F, Modrm|Masking=2|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmpgtq, 0x6637, AVX512F, Modrm|Masking=2|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmp<dq>, 0x661f, AVX512F, Modrm|Masking=2|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmpu<dq>, 0x661e, AVX512F, Modrm|Masking=2|Space0F3A|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
@@ -2332,16 +2334,16 @@ vpcmp<irel>u<dq>, 0x661e/<irel:imm>, AVX
 vptestm<dq>, 0x6627, AVX512F, Modrm|Masking=2|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vptestnm<dq>, 0xf327, AVX512F, Modrm|Masking=2|Space0F38|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vpermd, 0x6636, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vpermps, 0x6616, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermd, 0x6636, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermps, 0x6616, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vpermilp<sd>, 0x6604 | <sd:opc>, AVX512F, Modrm|Masking=3|Space0F3A|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpermilp<sd>, 0x660C | <sd:opc>, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpermpd, 0x6601, AVX512F, Modrm|Masking=3|Space0F3A|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM }
-vpermpd, 0x6616, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermpd, 0x6616, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 vpermq, 0x6600, AVX512F, Modrm|Masking=3|Space0F3A|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM }
-vpermq, 0x6636, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vpermq, 0x6636, AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vpmovdb, 0xF331, AVX512F, Modrm|EVex=1|MaskingMorZ|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegZMM, RegXMM|Unspecified|BaseIndex }
 vpmovsdb, 0xF321, AVX512F, Modrm|EVex=1|MaskingMorZ|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegZMM, RegXMM|Unspecified|BaseIndex }
@@ -2378,17 +2380,17 @@ vpmovzxwd, 0x6633, AVX512F, Modrm|EVex=1
 vpmovsxwq, 0x6624, AVX512F, Modrm|EVex=1|Masking=3|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegZMM }
 vpmovzxwq, 0x6634, AVX512F, Modrm|EVex=1|Masking=3|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegZMM }
 
-vprol<dq>, 0x6672/1, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=2|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpror<dq>, 0x6672/0, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=2|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vprol<dq>, 0x6672/1, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpror<dq>, 0x6672/0, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpshufd, 0x6670, AVX512F, Modrm|Masking=3|Space0F|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpsll<dq>, 0x66f2 | <dq:opc>, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsll<dq>, 0x6672 | <dq:opc>/6, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=2|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsll<dq>, 0x6672 | <dq:opc>/6, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpsra<dq>, 0x66e2, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsra<dq>, 0x6672/4, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=2|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsra<dq>, 0x6672/4, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpsrl<dq>, 0x66d2 | <dq:opc>, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX512F, Modrm|Masking=3|Space0F|VexVVVV=2|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrl<dq>, 0x6672 | <dq:opc>/2, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<dq:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|<dq:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vrcp14p<sd>, 0x664C, AVX512F, Modrm|Masking=3|Space0F38|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vrcp14s<sd>, 0x664D, AVX512F, Modrm|EVexLIG|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
@@ -2396,11 +2398,11 @@ vrcp14s<sd>, 0x664D, AVX512F, Modrm|EVex
 vrsqrt14p<sd>, 0x664E, AVX512F, Modrm|Masking=3|Space0F38|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vrsqrt14s<sd>, 0x664F, AVX512F, Modrm|EVexLIG|Masking=3|Space0F38|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vshuff32x4, 0x6623, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vshufi32x4, 0x6643, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshuff32x4, 0x6623, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshufi32x4, 0x6643, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
-vshuff64x2, 0x6623, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vshufi64x2, 0x6643, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshuff64x2, 0x6623, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vshufi64x2, 0x6643, AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vshufp<sd>, 0x<sd:ppfx>C6, AVX512F, Modrm|Masking=3|Space0F|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
@@ -2606,69 +2608,69 @@ kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Mo
 ktest<dq>, 0x<dq:kpfx>99, AVX512BW, Modrm|Vex128|Space0F|VexW1|NoSuf, { RegMask, RegMask }
 kxnor<dq>, 0x<dq:kpfx>46, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|NoSuf, { RegMask, RegMask, RegMask }
 kxor<dq>, 0x<dq:kpfx>47, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kunpckdq, 0x4B, AVX512BW, Modrm|Vex=2|Space0F|VexVVVV=1|VexW=2|NoSuf, { RegMask, RegMask, RegMask }
-kunpckwd, 0x4B, AVX512BW, Modrm|Vex=2|Space0F|VexVVVV=1|VexW=1|NoSuf, { RegMask, RegMask, RegMask }
+kunpckdq, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|NoSuf, { RegMask, RegMask, RegMask }
+kunpckwd, 0x4B, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 kshiftl<dq>, 0x6633, AVX512BW, Modrm|Vex128|Space0F3A|<dq:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 kshiftr<dq>, 0x6631, AVX512BW, Modrm|Vex128|Space0F3A|<dq:vexw>|NoSuf, { Imm8, RegMask, RegMask }
 
-vdbpsadbw, 0x6642, AVX512BW, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vdbpsadbw, 0x6642, AVX512BW, Modrm|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vmovdqu8, 0xF26F, AVX512BW, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqu16, 0xF26F, AVX512BW, D|Modrm|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpabs<bw>, 0x661c | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpmaxsb, 0x663C, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminsb, 0x6638, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpshufb, 0x6600, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpmaddubsw, 0x6604, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxuw, 0x663E, AVX512BW, Modrm|Masking=3|VexWIG|Space0F38|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminuw, 0x663A, AVX512BW, Modrm|Masking=3|VexWIG|Space0F38|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhrsw, 0x660B, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-
-vpackssdw, 0x666B, AVX512BW, Modrm|Masking=3|Space0F|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpacksswb, 0x6663, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpackuswb, 0x6667, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpackusdw, 0x662B, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxsb, 0x663C, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminsb, 0x6638, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpshufb, 0x6600, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpmaddubsw, 0x6604, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxuw, 0x663E, AVX512BW, Modrm|Masking=3|VexWIG|Space0F38|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminuw, 0x663A, AVX512BW, Modrm|Masking=3|VexWIG|Space0F38|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhrsw, 0x660B, AVX512BW, Modrm|Masking=3|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+
+vpackssdw, 0x666B, AVX512BW, Modrm|Masking=3|Space0F|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpacksswb, 0x6663, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpackuswb, 0x6667, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpackusdw, 0x662B, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpadd<bw>, 0x66fc | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpadds<bw>, 0x66ec | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpaddus<bw>, 0x66dc | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpavg<bw>, 0x66e0 | (3 * <bw:opc>), AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmaxub, 0x66DE, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminub, 0x66DA, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaxub, 0x66DE, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminub, 0x66DA, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpsub<bw>, 0x66f8 | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpsubs<bw>, 0x66e8 | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpsubus<bw>, 0x66d8 | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhbw, 0x6668, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpcklbw, 0x6660, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhbw, 0x6668, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpcklbw, 0x6660, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpmaxsw, 0x66EE, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpminsw, 0x66EA, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhuw, 0x66E4, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmulhw, 0x66E5, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmullw, 0x66D5, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsllw, 0x6671/6, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpmaxsw, 0x66EE, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpminsw, 0x66EA, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhuw, 0x66E4, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmulhw, 0x66E5, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmullw, 0x66D5, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsllw, 0x6671/6, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpsllw, 0x66F1, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsraw, 0x6671/4, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsraw, 0x6671/4, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpsraw, 0x66E1, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlw, 0x6671/2, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrlw, 0x6671/2, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpsrlw, 0x66D1, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8MemShift=4|CheckOperandSize|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpckhwd, 0x6669, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpunpcklwd, 0x6661, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpckhwd, 0x6669, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpunpcklwd, 0x6661, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpalignr, 0x660F, AVX512BW, Modrm|Masking=3|Space0F3A|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpalignr, 0x660F, AVX512BW, Modrm|Masking=3|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpblendm<bw>, 0x6666, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV=1|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpblendm<bw>, 0x6666, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpbroadcast<bw>, 0x6678 | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F38|VexW0|Disp8MemShift|NoSuf, { RegXMM|<bw:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpbroadcast<bw>, 0x667a | <bw:opc>, AVX512BW, Modrm|Masking=3|Space0F38|VexW0|NoSuf, { Reg32, RegXMM|RegYMM|RegZMM }
 
 vpermi2<bw>, 0x6675, <bw:cpubmi>, Modrm|Masking=3|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vpermt2<bw>, 0x667d, <bw:cpubmi>, Modrm|Masking=3|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vperm<bw>, 0x668d, <bw:cpubmi>, Modrm|Masking=3|Space0F38|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsllvw, 0x6612, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsravw, 0x6611, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpsrlvw, 0x6610, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsllvw, 0x6612, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsravw, 0x6611, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsrlvw, 0x6610, AVX512BW, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpcmpeq<bw>, 0x6674 | <bw:opc>, AVX512BW, Modrm|Masking=2|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmpgt<bw>, 0x6664 | <bw:opc>, AVX512BW, Modrm|Masking=2|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
@@ -2677,19 +2679,19 @@ vpcmpu<bw>, 0x663e, AVX512BW, Modrm|Mask
 vpcmp<irel><bw>, 0x663f/<irel:imm>, AVX512BW, Modrm|Masking=2|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 vpcmp<irel>u<bw>, 0x663e/<irel:imm>, AVX512BW, Modrm|Masking=2|Space0F3A|VexVVVV|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
-vpslldq, 0x6673/7, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
-vpsrldq, 0x6673/3, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpslldq, 0x6673/7, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vpsrldq, 0x6673/3, AVX512BW, Modrm|Space0F|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
 vpextrw, 0x66C5, AVX512BW, Load|Modrm|EVex128|Space0F|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX512BW, RegMem|EVex128|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 }
 vpextr<bw>, 0x6614 | <bw:opc>, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift|NoSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
 
-vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVV=1|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVV|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
 vpinsrw, 0x66C4, AVX512BW, Modrm|EVex128|Space0F|VexWIG|VexVVVV|Disp8MemShift=1|NoSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM, RegXMM }
-vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVV=1|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
+vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8, Reg32|Reg64, RegXMM, RegXMM }
 vpinsrb, 0x6620, AVX512BW, Modrm|EVex128|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8, Byte|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vpmaddwd, 0x66F5, AVX512BW, Modrm|Masking=3|Space0F|VexVVVV=1|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmaddwd, 0x66F5, AVX512BW, Modrm|Masking=3|Space0F|VexVVVV|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpmov<bw>2m, 0xf329, AVX512BW, Modrm|EVexDYN|Space0F38|<bw:vexw>|NoSuf, { RegXMM|RegYMM|RegZMM, RegMask }
 vpmovm2<bw>, 0xf328, AVX512BW, Modrm|EVexDYN|Space0F38|<bw:vexw>|NoSuf, { RegMask, RegXMM|RegYMM|RegZMM }
@@ -2713,7 +2715,7 @@ vpmovzxbw, 0x6630, AVX512BW, Modrm|EVex=
 vpmovzxbw, 0x6630, AVX512BW|AVX512VL, Modrm|EVex=2|Masking=3|VexWIG|Space0F38|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vpmovzxbw, 0x6630, AVX512BW|AVX512VL, Modrm|EVex=3|Masking=3|VexWIG|Space0F38|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
 
-vpsadbw, 0x66F6, AVX512BW, Modrm|Space0F|VexVVVV=1|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpsadbw, 0x66F6, AVX512BW, Modrm|Space0F|VexVVVV|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vpshufhw, 0xF370, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vpshuflw, 0xF270, AVX512BW, Modrm|Masking=3|Space0F|VexWIG|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
@@ -2777,16 +2779,16 @@ vcvtuqq2ps<Exy>, 0xf27a, AVX512DQ|<Exy:v
 
 vextractf32x8, 0x661B, AVX512DQ, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti32x8, 0x663B, AVX512DQ, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
-vinsertf32x8, 0x661A, AVX512DQ, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
-vinserti32x8, 0x663A, AVX512DQ, Modrm|EVex=1|Masking=3|Space0F3A|VexVVVV=1|VexW=1|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinsertf32x8, 0x661A, AVX512DQ, Modrm|EVex512|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
+vinserti32x8, 0x663A, AVX512DQ, Modrm|EVex512|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=5|NoSuf, { Imm8, RegYMM|Unspecified|BaseIndex, RegZMM, RegZMM }
 
 vpextr<dq>, 0x6616, AVX512DQ|<dq:cpu64>, Modrm|EVex128|Space0F3A|<dq:vexw64>|Disp8MemShift|NoSuf, { Imm8, RegXMM, <dq:gpr>|Unspecified|BaseIndex }
 vpinsr<dq>, 0x6622, AVX512DQ|<dq:cpu64>, Modrm|EVex128|Space0F3A|VexVVVV|<dq:vexw64>|Disp8MemShift|NoSuf, { Imm8, <dq:gpr>|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 vextractf64x2, 0x6619, AVX512DQ, Modrm|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=4|NoSuf, { Imm8, RegYMM|RegZMM, RegXMM|Unspecified|BaseIndex }
 vextracti64x2, 0x6639, AVX512DQ, Modrm|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=4|NoSuf, { Imm8, RegYMM|RegZMM, RegXMM|Unspecified|BaseIndex }
-vinsertf64x2, 0x6618, AVX512DQ, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
-vinserti64x2, 0x6638, AVX512DQ, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinsertf64x2, 0x6618, AVX512DQ, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
+vinserti64x2, 0x6638, AVX512DQ, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Disp8MemShift=4|CheckOperandSize|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM }
 
 vfpclassp<sd>, 0x6666, AVX512DQ, Modrm|Masking=2|Space0F3A|<sd:vexw>|Broadcast|Disp8ShiftVL|NoSuf|IntelSyntax, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegMask }
 vfpclassp<sd>, 0x6666, AVX512DQ, Modrm|Masking=2|Space0F3A|<sd:vexw>|Broadcast|Disp8ShiftVL|NoSuf|ATTSyntax, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|<sd:elem>|BaseIndex, RegMask }
@@ -2798,7 +2800,7 @@ vfpclasss<sdh>, 0x<sdh:pfx>67, <sdh:cpud
 vpmov<dq>2m, 0xf339, AVX512DQ, Modrm|EVexDYN|Space0F38|<dq:vexw>|NoSuf, { RegXMM|RegYMM|RegZMM, RegMask }
 vpmovm2<dq>, 0xf338, AVX512DQ, Modrm|EVexDYN|Space0F38|<dq:vexw>|NoSuf, { RegMask, RegXMM|RegYMM|RegZMM }
 
-vpmullq, 0x6640, AVX512DQ, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmullq, 0x6640, AVX512DQ, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 vrangep<sd>, 0x6650, AVX512DQ, Modrm|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 vranges<sd>, 0x6651, AVX512DQ, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Disp8MemShift|NoSuf|SAE, { Imm8, RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
@@ -2816,8 +2818,8 @@ clwb, 0x660fae/6, CLWB, Modrm|Anysize|Ig
 
 // AVX512IFMA instructions
 
-vpmadd52huq, 0x66B5, AVX512IFMA, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpmadd52luq, 0x66B4, AVX512IFMA, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmadd52huq, 0x66B5, AVX512IFMA, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmadd52luq, 0x66B4, AVX512IFMA, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512IFMA instructions end
 
@@ -2830,23 +2832,23 @@ vpmadd52luq, 0x66B4, AVX_IFMA, Modrm|Vex
 
 // AVX512VBMI instructions
 
-vpmultishiftqb, 0x6683, AVX512VBMI, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpmultishiftqb, 0x6683, AVX512VBMI, Modrm|Masking=3|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512VBMI instructions end
 
 // AVX512_4FMAPS instructions
 
-v4fmaddps, 0xf29a, AVX512_4FMAPS, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-v4fnmaddps, 0xf2aa, AVX512_4FMAPS, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-v4fmaddss, 0xf29b, AVX512_4FMAPS, Modrm|EVex=4|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
-v4fnmaddss, 0xf2ab, AVX512_4FMAPS, Modrm|EVex=4|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
+v4fmaddps, 0xf29a, AVX512_4FMAPS, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+v4fnmaddps, 0xf2aa, AVX512_4FMAPS, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+v4fmaddss, 0xf29b, AVX512_4FMAPS, Modrm|EVex=4|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
+v4fnmaddss, 0xf2ab, AVX512_4FMAPS, Modrm|EVex=4|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // AVX512_4FMAPS instructions end
 
 // AVX512_4VNNIW instructions
 
-vp4dpwssd, 0xf252, AVX512_4VNNIW, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
-vp4dpwssds, 0xf253, AVX512_4VNNIW, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+vp4dpwssd, 0xf252, AVX512_4VNNIW, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
+vp4dpwssds, 0xf253, AVX512_4VNNIW, Modrm|EVex=1|Masking=3|Space0F38|VexVVVV|VexW0|Disp8MemShift=4|NoSuf|ImplicitQuadGroup, { XMMword|Unspecified|BaseIndex, RegZMM, RegZMM }
 
 // AVX512_4VNNIW instructions end
 
@@ -2880,11 +2882,11 @@ vpshrdw, 0x6672, AVX512_VBMI2, Modrm|Mas
 
 // AVX512_VNNI instructions
 
-vpdpbusd, 0x6650, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpdpwssd, 0x6652, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpbusd, 0x6650, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpwssd, 0x6652, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vpdpbusds, 0x6651, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpdpwssds, 0x6653, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpbusds, 0x6651, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpdpwssds, 0x6653, AVX512_VNNI, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512_VNNI instructions end
 
@@ -2913,34 +2915,34 @@ vpdpbsuds, 0xf351, AVX_VNNI_INT8, Modrm|
 
 vpopcnt<bw>, 0x6654, AVX512_BITALG, Modrm|Masking=3|Space0F38|<bw:vexw>|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vpshufbitqmb, 0x668f, AVX512_BITALG, Modrm|Masking=2|Space0F38|VexVVVV=1|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vpshufbitqmb, 0x668f, AVX512_BITALG, Modrm|Masking=2|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
 
 // AVX512_BITALG instructions end
 
 // AVX512 + GFNI instructions
 
-vgf2p8affineinvqb, 0x66cf, GFNI|AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vgf2p8affineqb, 0x66ce, GFNI|AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV=1|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vgf2p8mulb, 0x66cf, GFNI|AVX512F, Modrm|Masking=3|Space0F38|VexVVVV=1|VexW=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8affineinvqb, 0x66cf, GFNI|AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8affineqb, 0x66ce, GFNI|AVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vgf2p8mulb, 0x66cf, GFNI|AVX512F, Modrm|Masking=3|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512 + GFNI instructions end
 
 // AVX512 + VAES instructions
 
-vaesdec, 0x66de, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesdeclast, 0x66df, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesenc, 0x66dc, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vaesenclast, 0x66dd, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesdec, 0x66de, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesdeclast, 0x66df, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesenc, 0x66dc, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaesenclast, 0x66dd, VAES|AVX512F, Modrm|Space0F38|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512 + VAES instructions end
 
 // AVX512 + VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
-vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmullqlqdq, 0x6644/0x00, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmulhqlqdq, 0x6644/0x01, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmullqhqdq, 0x6644/0x10, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ|AVX512F, Modrm|Space0F3A|VexWIG|VexVVVV|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
 // AVX512 + VPCLMULQDQ instructions end
 
@@ -3144,12 +3146,12 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf,
 ldtilecfg, 0x49/0, AMX_TILE|x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 sttilecfg, 0x6649/0, AMX_TILE|x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
-tdpbf16ps, 0xf35c, AMX_BF16|x64, Modrm|Vex128|Space0F38|VexVVVV=1|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbf16ps, 0xf35c, AMX_BF16|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpfp16ps, 0xf25c, AMX_FP16|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbssd, 0xf25e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV=1|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbuud, 0x5e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV=1|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbusd, 0x665e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV=1|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
-tdpbsud, 0xf35e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV=1|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 0xf25e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 0x5e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 0x665e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 0xf35e, AMX_INT8|x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
 tileloadd, 0xf24b, AMX_TILE|x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 tileloaddt1, 0x664b, AMX_TILE|x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 06/14] x86: drop "shimm" special case template expansions
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (4 preceding siblings ...)
  2023-03-10 10:21 ` [PATCH v2 05/14] x86: VexVVVV is now merely a boolean Jan Beulich
@ 2023-03-10 10:22 ` Jan Beulich
  2023-03-10 10:22 ` [PATCH v2 07/14] x86/AT&T: restrict recognition of the "absolute branch" prefix character Jan Beulich
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:22 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

With VexVVVV only being boolean, the SSE shift-by-immediate instructions
don't need special casing anymore for SSE2AVX handling. Simplify the two
respective templates. (No change to generated tables.)

--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -974,14 +974,14 @@ pause, 0xf390, i186, NoSuf, {}
 
 // MMX/SSE2 instructions.
 
-<mmx:cpu:pfx:attr:shimm:reg:mem, +
-    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
-    $sse:SSE2:66:::RegXMM:Xmmword, +
-    $mmx:MMX::::RegMMX:Qword>
-
-<sse2:cpu:attr:scal:vvvv:shimm, +
-    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV|VexW0|SSE2AVX, +
-    $sse:SSE2::::>
+<mmx:cpu:pfx:attr:reg:mem, +
+    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
+    $sse:SSE2:66::RegXMM:Xmmword, +
+    $mmx:MMX:::RegMMX:Qword>
+
+<sse2:cpu:attr:scal:vvvv, +
+    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
+    $sse:SSE2:::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
     b:0:VexW0:Byte:AVX512DQ:66:AVX512VBMI, +
@@ -1032,17 +1032,17 @@ pmulhw<mmx>, 0x<mmx:pfx>0fe5, <mmx:cpu>,
 pmullw<mmx>, 0x<mmx:pfx>0fd5, <mmx:cpu>, Modrm|<mmx:attr>|C|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 por<mmx>, 0x<mmx:pfx>0feb, <mmx:cpu>, Modrm|<mmx:attr>|C|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psllw<mmx>, 0x<mmx:pfx>0ff1, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psll<dq><mmx>, 0x<mmx:pfx>0ff2 | <dq:opc>, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psll<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc>/6, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psll<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc>/6, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psraw<mmx>, 0x<mmx:pfx>0fe1, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psraw<mmx>, 0x<mmx:pfx>0f71/4, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psraw<mmx>, 0x<mmx:pfx>0f71/4, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psrad<mmx>, 0x<mmx:pfx>0fe2, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psrad<mmx>, 0x<mmx:pfx>0f72/4, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psrad<mmx>, 0x<mmx:pfx>0f72/4, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psrlw<mmx>, 0x<mmx:pfx>0fd1, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psrlw<mmx>, 0x<mmx:pfx>0f71/2, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psrlw<mmx>, 0x<mmx:pfx>0f71/2, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psrl<dq><mmx>, 0x<mmx:pfx>0fd2 | <dq:opc>, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
-psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc>/2, <mmx:cpu>, Modrm|<mmx:shimm>|NoSuf, { Imm8, <mmx:reg> }
+psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc>/2, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
 psub<bw><mmx>, 0x<mmx:pfx>0ff8 | <bw:opc>, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubd<mmx>, 0x<mmx:pfx>0ffa, <mmx:cpu>, Modrm|<mmx:attr>|NoSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubq<sse2>, 0x660ffb, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1236,8 +1236,8 @@ pmuludq, 0xff4, SSE2, Modrm|NoSuf, { Qwo
 pshufd<sse2>, 0x660f70, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshufhw<sse2>, 0xf30f70, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshuflw<sse2>, 0xf20f70, <sse2:cpu>, Modrm|<sse2:attr>|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
-pslldq<sse2>, 0x660f73/7, <sse2:cpu>, Modrm|<sse2:shimm>|NoSuf, { Imm8, RegXMM }
-psrldq<sse2>, 0x660f73/3, <sse2:cpu>, Modrm|<sse2:shimm>|NoSuf, { Imm8, RegXMM }
+pslldq<sse2>, 0x660f73/7, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { Imm8, RegXMM }
+psrldq<sse2>, 0x660f73/3, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { Imm8, RegXMM }
 punpckhqdq<sse2>, 0x660f6d, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 punpcklqdq<sse2>, 0x660f6c, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 07/14] x86/AT&T: restrict recognition of the "absolute branch" prefix character
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (5 preceding siblings ...)
  2023-03-10 10:22 ` [PATCH v2 06/14] x86: drop "shimm" special case template expansions Jan Beulich
@ 2023-03-10 10:22 ` Jan Beulich
  2023-03-10 10:23 ` [PATCH v2 08/14] x86: process instruction operands for .insn Jan Beulich
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:22 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

While in principle merely rejecting this for .insn would be sufficient
for the purposes there, be more generic and reject it for anything that
isn't going to be a branch: All elements of same-mnemonic template
groups either are branches, or are not, and the few cases possibly
requiring a 2nd parsing pass aren't affected either. This then also
improves diagnostics for misuses like

	inc	*%eax
	incl	%fs:*(%eax)
	add	*$1, %eax

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -11837,7 +11837,8 @@ i386_att_operand (char *operand_string)
 
   /* We check for an absolute prefix (differentiating,
      for example, 'jmp pc_relative_label' from 'jmp *absolute_label'.  */
-  if (*op_string == ABSOLUTE_PREFIX)
+  if (*op_string == ABSOLUTE_PREFIX
+      && current_templates->start->opcode_modifier.jump)
     {
       ++op_string;
       if (is_space_char (*op_string))
@@ -11868,7 +11869,8 @@ i386_att_operand (char *operand_string)
 	    ++op_string;
 
 	  /* Handle case of %es:*foo.  */
-	  if (!i.jumpabsolute && *op_string == ABSOLUTE_PREFIX)
+	  if (!i.jumpabsolute && *op_string == ABSOLUTE_PREFIX
+	      && current_templates->start->opcode_modifier.jump)
 	    {
 	      ++op_string;
 	      if (is_space_char (*op_string))


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 08/14] x86: process instruction operands for .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (6 preceding siblings ...)
  2023-03-10 10:22 ` [PATCH v2 07/14] x86/AT&T: restrict recognition of the "absolute branch" prefix character Jan Beulich
@ 2023-03-10 10:23 ` Jan Beulich
  2023-03-10 10:24 ` [PATCH v2 09/14] x86: handle EVEX Disp8 " Jan Beulich
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:23 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

Deal with register and memory operands; immediate operands will follow
later, as will the handling of EVEX embedded broadcast and EVEX Disp8
scaling.

Note that because we can't really know how to encode their use, %cr8 and
up cannot be used with .insn outside of 64-bit mode. Users would need to
specify an explicit LOCK prefix in combination with %cr0 etc.
---
I'm not convinced the assertions early in build_modrm_byte() are useful
to retain.
---
v2: Re-base over patch which was pulled ahead.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2356,7 +2356,8 @@ fits_in_disp8 (offsetT num)
 static INLINE int
 fits_in_imm4 (offsetT num)
 {
-  return (num & 0xf) == num;
+  /* Despite the name, check for imm3 if we're dealing with EVEX.  */
+  return (num & (i.vec_encoding != vex_encoding_evex ? 0xf : 7)) == num;
 }
 
 static i386_operand_type
@@ -8228,7 +8229,7 @@ process_operands (void)
 	    }
 	}
     }
-  else if (i.types[0].bitfield.class == SReg)
+  else if (i.types[0].bitfield.class == SReg && !dot_insn ())
     {
       if (flag_code != CODE_64BIT
 	  ? i.tm.base_opcode == POP_SEG_SHORT
@@ -8261,15 +8262,32 @@ process_operands (void)
     }
   else if (i.short_form)
     {
-      /* The register operand is in operand 0 or 1.  */
-      const reg_entry *r = i.op[0].regs;
+      /* The register operand is in the 1st or 2nd non-immediate operand.  */
+      const reg_entry *r = i.op[i.imm_operands].regs;
 
-      if (i.imm_operands
-	  || (r->reg_type.bitfield.instance == Accum && i.op[1].regs))
-	r = i.op[1].regs;
+      if (!dot_insn ()
+	  && r->reg_type.bitfield.instance == Accum
+	  && i.op[i.imm_operands + 1].regs)
+	r = i.op[i.imm_operands + 1].regs;
       /* Register goes in low 3 bits of opcode.  */
       i.tm.base_opcode |= r->reg_num;
       set_rex_vrex (r, REX_B, false);
+
+      if (dot_insn () && i.reg_operands == 2)
+	{
+	  gas_assert (is_any_vex_encoding (&i.tm)
+		      || i.vec_encoding != vex_encoding_default);
+	  i.vex.register_specifier = i.op[i.operands - 1].regs;
+	}
+    }
+  else if (i.reg_operands == 1
+	   && !i.flags[i.operands - 1]
+	   && i.tm.operand_types[i.operands - 1].bitfield.instance
+	      == InstanceNone)
+    {
+      gas_assert (is_any_vex_encoding (&i.tm)
+		  || i.vec_encoding != vex_encoding_default);
+      i.vex.register_specifier = i.op[i.operands - 1].regs;
     }
 
   if ((i.seg[0] || i.prefix[SEG_PREFIX])
@@ -8330,10 +8348,12 @@ build_modrm_byte (void)
 	 VexW0 or VexW1.  The destination must be either XMM, YMM or
 	 ZMM register.
 	 2. 4 operands: 4 register operands or 3 register operands
-	 plus 1 memory operand, with VexXDS.  */
+	 plus 1 memory operand, with VexXDS.
+	 3. Other equivalent combinations when coming from s_insn().  */
       gas_assert (i.tm.opcode_modifier.vexvvvv
-		  && i.tm.opcode_modifier.vexw
-		  && i.tm.operand_types[dest].bitfield.class == RegSIMD);
+		  && i.tm.opcode_modifier.vexw);
+      gas_assert (dot_insn ()
+		  || i.tm.operand_types[dest].bitfield.class == RegSIMD);
 
       /* Of the first two non-immediate operands the one with the template
 	 not allowing for a memory one is encoded in the immediate operand.  */
@@ -8342,6 +8362,14 @@ build_modrm_byte (void)
       else
 	reg_slot = source++;
 
+      if (!dot_insn ())
+	{
+	  gas_assert (i.tm.operand_types[reg_slot].bitfield.class == RegSIMD);
+	  gas_assert (!(i.op[reg_slot].regs->reg_flags & RegVRex));
+	}
+      else
+	gas_assert (i.tm.operand_types[reg_slot].bitfield.class != ClassNone);
+
       if (i.imm_operands == 0)
 	{
 	  /* When there is no immediate operand, generate an 8bit
@@ -8351,10 +8379,7 @@ build_modrm_byte (void)
 	  i.types[i.operands].bitfield.imm8 = 1;
 	  i.operands++;
 
-	  gas_assert (i.tm.operand_types[reg_slot].bitfield.class == RegSIMD);
 	  exp->X_op = O_constant;
-	  exp->X_add_number = register_number (i.op[reg_slot].regs) << 4;
-	  gas_assert ((i.op[reg_slot].regs->reg_flags & RegVRex) == 0);
 	}
       else
 	{
@@ -8365,11 +8390,11 @@ build_modrm_byte (void)
 	  /* Turn on Imm8 again so that output_imm will generate it.  */
 	  i.types[0].bitfield.imm8 = 1;
 
-	  gas_assert (i.tm.operand_types[reg_slot].bitfield.class == RegSIMD);
-	  i.op[0].imms->X_add_number
-	      |= register_number (i.op[reg_slot].regs) << 4;
-	  gas_assert ((i.op[reg_slot].regs->reg_flags & RegVRex) == 0);
+	  exp = i.op[0].imms;
 	}
+      exp->X_add_number |= register_number (i.op[reg_slot].regs)
+			   << (3 + !(is_evex_encoding (&i.tm)
+				     || i.vec_encoding == vex_encoding_evex));
     }
 
   for (v = source + 1; v < dest; ++v)
@@ -10634,6 +10659,9 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
       goto bad;
     }
 
+  if (line > end && i.vec_encoding == vex_encoding_default)
+    i.vec_encoding = evex ? vex_encoding_evex : vex_encoding_vex;
+
   if (line > end && *line == '.')
     {
       /* Length specifier (VEX.L, XOP.L, EVEX.L'L).  */
@@ -10913,7 +10941,244 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
       goto bad;
     }
   i.opcode_length = j;
-  i.tm.base_opcode = val;
+
+  /* Handle operands, if any.  */
+  if (*line == ',')
+    {
+      i386_operand_type combined;
+      bool changed;
+
+      ptr = parse_operands (line + 1, &i386_mnemonics[MN__insn]);
+      this_operand = -1;
+      if (!ptr)
+	goto bad;
+      line = ptr;
+
+      if (!i.operands)
+	{
+	  as_bad (_("expecting operand after ','; got nothing"));
+	  goto done;
+	}
+
+      if (i.mem_operands > 1)
+	{
+	  as_bad (_("too many memory references for `%s'"),
+		  &i386_mnemonics[MN__insn]);
+	  goto done;
+	}
+
+      /* Are we to emit ModR/M encoding?  */
+      if (!i.short_form
+	  && (i.mem_operands
+	      || i.reg_operands > (i.vec_encoding != vex_encoding_default)
+	      || i.tm.extension_opcode != None))
+	i.tm.opcode_modifier.modrm = 1;
+
+      if (!i.tm.opcode_modifier.modrm
+	  && (i.reg_operands
+	      > i.short_form + 0U + (i.vec_encoding != vex_encoding_default)
+	      || i.mem_operands))
+	{
+	  as_bad (_("too many register/memory operands"));
+	  goto done;
+	}
+
+      /* Enforce certain constraints on operands.  */
+      switch (i.reg_operands + i.mem_operands
+	      + (i.tm.extension_opcode != None))
+	{
+	case 0:
+	  if (i.short_form)
+	    {
+	      as_bad (_("too few register/memory operands"));
+	      goto done;
+	    }
+	  /* Fall through.  */
+	case 1:
+	  if (i.tm.opcode_modifier.modrm)
+	    {
+	      as_bad (_("too few register/memory operands"));
+	      goto done;
+	    }
+	  break;
+
+	case 2:
+	  break;
+
+	case 4:
+	  if (i.imm_operands
+	      && (i.op[0].imms->X_op != O_constant
+		  || !fits_in_imm4 (i.op[0].imms->X_add_number)))
+	    {
+	      as_bad (_("constant doesn't fit in %d bits"), evex ? 3 : 4);
+	      goto done;
+	    }
+	  /* Fall through.  */
+	case 3:
+	  if (i.vec_encoding != vex_encoding_default)
+	    {
+	      i.tm.opcode_modifier.vexvvvv = 1;
+	      break;
+	    }
+	  /* Fall through.  */
+	default:
+	  as_bad (_("too many register/memory operands"));
+	  goto done;
+	}
+
+      /* Bring operands into canonical order (imm, mem, reg).  */
+      do
+	{
+	  changed = false;
+
+	  for (j = 1; j < i.operands; ++j)
+	    {
+	      if ((!operand_type_check (i.types[j - 1], imm)
+		   && operand_type_check (i.types[j], imm))
+		  || (i.types[j - 1].bitfield.class != ClassNone
+		      && i.types[j].bitfield.class == ClassNone))
+		{
+		  swap_2_operands (j - 1, j);
+		  changed = true;
+		}
+	    }
+	}
+      while (changed);
+
+      /* For Intel syntax swap the order of register operands.  */
+      if (intel_syntax)
+	switch (i.reg_operands)
+	  {
+	  case 0:
+	  case 1:
+	    break;
+
+	  case 4:
+	    swap_2_operands (i.imm_operands + i.mem_operands + 1, i.operands - 2);
+	    /* Fall through.  */
+	  case 3:
+	  case 2:
+	    swap_2_operands (i.imm_operands + i.mem_operands, i.operands - 1);
+	    break;
+
+	  default:
+	    abort ();
+	  }
+
+      /* Enforce constraints when using VSIB.  */
+      if (i.index_reg
+	  && (i.index_reg->reg_type.bitfield.xmmword
+	      || i.index_reg->reg_type.bitfield.ymmword
+	      || i.index_reg->reg_type.bitfield.zmmword))
+	{
+	  if (i.vec_encoding == vex_encoding_default)
+	    {
+	      as_bad (_("VSIB unavailable with legacy encoding"));
+	      goto done;
+	    }
+
+	  if (i.vec_encoding == vex_encoding_evex
+	      && i.reg_operands > 1)
+	    {
+	      /* We could allow two register operands, encoding the 2nd one in
+		 an 8-bit immediate like for 4-register-operand insns, but that
+		 would require ugly fiddling with process_operands() and/or
+		 build_modrm_byte().  */
+	      as_bad (_("too many register operands with VSIB"));
+	      goto done;
+	    }
+
+	  i.tm.opcode_modifier.sib = 1;
+	}
+
+      /* Establish operand size encoding.  */
+      operand_type_set (&combined, 0);
+      for (j = i.imm_operands; j < i.operands; ++j)
+	{
+	  i.types[j].bitfield.instance = InstanceNone;
+
+	  if (operand_type_check (i.types[j], disp))
+	    i.types[j].bitfield.baseindex = 1;
+
+	  if ((i.broadcast.type || i.broadcast.bytes)
+	      && j == i.broadcast.operand)
+	    continue;
+
+	  combined = operand_type_or (combined, i.types[j]);
+	  combined.bitfield.class = ClassNone;
+	}
+
+      if (i.vec_encoding == vex_encoding_default)
+	{
+	  if (flag_code == CODE_64BIT && combined.bitfield.qword)
+	    i.rex |= REX_W;
+	  else if ((flag_code == CODE_16BIT ? combined.bitfield.dword
+					    : combined.bitfield.word)
+	           && !add_prefix (DATA_PREFIX_OPCODE))
+	    goto done;
+	}
+      else if (!i.tm.opcode_modifier.vexw)
+	{
+	  if (flag_code == CODE_64BIT)
+	    {
+	      if (combined.bitfield.qword)
+	        i.tm.opcode_modifier.vexw = VEXW1;
+	      else if (combined.bitfield.dword)
+	        i.tm.opcode_modifier.vexw = VEXW0;
+	    }
+
+	  if (!i.tm.opcode_modifier.vexw)
+	    i.tm.opcode_modifier.vexw = VEXWIG;
+	}
+
+      if (vex || xop)
+	{
+	  if (!i.tm.opcode_modifier.vex)
+	    {
+	      if (combined.bitfield.ymmword)
+	        i.tm.opcode_modifier.vex = VEX256;
+	      else if (combined.bitfield.xmmword)
+	        i.tm.opcode_modifier.vex = VEX128;
+	    }
+	}
+      else if (evex)
+	{
+	  if (!i.tm.opcode_modifier.evex)
+	    {
+	      /* Do _not_ consider AVX512VL here.  */
+	      if (i.rounding.type != rc_none || combined.bitfield.zmmword)
+	        i.tm.opcode_modifier.evex = EVEX512;
+	      else if (combined.bitfield.ymmword)
+	        i.tm.opcode_modifier.evex = EVEX256;
+	      else if (combined.bitfield.xmmword)
+	        i.tm.opcode_modifier.evex = EVEX128;
+	    }
+	}
+
+      if (i.disp_operands && !optimize_disp (&i.tm))
+	goto done;
+
+      for (j = 0; j < i.operands; ++j)
+	i.tm.operand_types[j] = i.types[j];
+
+      process_operands ();
+    }
+
+  /* Don't set opcode until after processing operands, to avoid any
+     potential special casing there.  */
+  i.tm.base_opcode |= val;
+
+  if (i.vec_encoding == vex_encoding_error
+      || (i.vec_encoding != vex_encoding_evex
+	  ? i.broadcast.type || i.broadcast.bytes
+	    || i.rounding.type != rc_none
+	    || i.mask.reg
+	  : (i.broadcast.type || i.broadcast.bytes)
+	    && i.rounding.type != rc_none))
+    {
+      as_bad (_("conflicting .insn operands"));
+      goto done;
+    }
 
   if (vex || xop)
     {
@@ -10931,6 +11196,8 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
       build_evex_prefix ();
       i.rex &= REX_OPCODE;
     }
+  else if (i.rex != 0)
+    add_prefix (REX_OPCODE | i.rex);
 
   output_insn ();
 
@@ -11899,6 +12166,15 @@ i386_att_operand (char *operand_string)
 	  as_bad (_("junk `%s' after register"), op_string);
 	  return 0;
 	}
+
+       /* Reject pseudo registers for .insn.  */
+      if (dot_insn () && r->reg_type.bitfield.class == ClassNone)
+	{
+	  as_bad (_("`%s%s' cannot be used here"),
+		  register_prefix, r->reg_name);
+	  return 0;
+	}
+
       temp = r->reg_type;
       temp.bitfield.baseindex = 0;
       i.types[this_operand] = operand_type_or (i.types[this_operand],
@@ -13274,7 +13550,9 @@ static bool check_register (const reg_en
     }
 
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
-      && (!cpu_arch_flags.bitfield.cpulm || r->reg_type.bitfield.class != RegCR)
+      && (!cpu_arch_flags.bitfield.cpulm
+	  || r->reg_type.bitfield.class != RegCR
+	  || dot_insn ())
       && flag_code != CODE_64BIT)
     return false;
 
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -320,8 +320,10 @@ i386_intel_simplify_register (expression
 	  as_bad (_("invalid use of register"));
 	  return 0;
 	}
-      if (i386_regtab[reg_num].reg_type.bitfield.class == SReg
-	  && i386_regtab[reg_num].reg_num == RegFlat)
+      if ((i386_regtab[reg_num].reg_type.bitfield.class == SReg
+	   && i386_regtab[reg_num].reg_num == RegFlat)
+	  || (dot_insn ()
+	      && i386_regtab[reg_num].reg_type.bitfield.class == ClassNone))
 	{
 	  as_bad (_("invalid use of pseudo-register"));
 	  return 0;
@@ -342,6 +344,7 @@ i386_intel_simplify_register (expression
 
       if (intel_state.in_scale
 	  || i386_regtab[reg_num].reg_type.bitfield.baseindex
+	  || dot_insn ()
 	  || t->mnem_off == MN_bndmk
 	  || t->mnem_off == MN_bndldx
 	  || t->mnem_off == MN_bndstx)
--- a/gas/testsuite/gas/i386/insn-32.d
+++ b/gas/testsuite/gas/i386/insn-32.d
@@ -11,6 +11,24 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
 [ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+[ 	]*[a-f0-9]+:	8b c1[ 	]+mov    %ecx,%eax
+[ 	]*[a-f0-9]+:	66 8b c8[ 	]+mov    %ax,%cx
+[ 	]*[a-f0-9]+:	89 48 04[ 	]+mov    %ecx,0x4\(%eax\)
+[ 	]*[a-f0-9]+:	8b 0c 05 44 44 00 00[ 	]+mov    0x4444\(,%eax,1\),%ecx
+[ 	]*[a-f0-9]+:	66 0f b6 cc[ 	]+movzbw %ah,%cx
+[ 	]*[a-f0-9]+:	0f b7 c8[ 	]+movzwl %ax,%ecx
+[ 	]*[a-f0-9]+:	0f ca[ 	]+bswap  %edx
 [ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
+[ 	]*[a-f0-9]+:	c5 f1 58 d0[ 	]+vaddpd %xmm0,%xmm1,%xmm2
+[ 	]*[a-f0-9]+:	c5 f5 58 d0[ 	]+vaddpd %ymm0,%ymm1,%ymm2
+[ 	]*[a-f0-9]+:	c5 f2 58 d0[ 	]+vaddss %xmm0,%xmm1,%xmm2
+[ 	]*[a-f0-9]+:	c4 e3 69 68 19 00[ 	]+vfmaddps %xmm0,\(%ecx\),%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 e9 68 18 10[ 	]+vfmaddps \(%eax\),%xmm1,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c5 f8 92 c8[ 	]+kmovw  %eax,%k1
+[ 	]*[a-f0-9]+:	c5 f8 93 c1[ 	]+kmovw  %k1,%eax
+[ 	]*[a-f0-9]+:	62 f1 74 18 58 d0[ 	]+vaddps \{rn-sae\},%zmm0,%zmm1,%zmm2
+[ 	]*[a-f0-9]+:	c4 e2 79 92 1c 48[ 	]+vgatherdps %xmm0,\(%eax,%xmm1,2\),%xmm3
+[ 	]*[a-f0-9]+:	62 f2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%eax,%xmm1,2\),%xmm3\{%k4\}
 #pass
--- a/gas/testsuite/gas/i386/insn-32.s
+++ b/gas/testsuite/gas/i386/insn-32.s
@@ -13,6 +13,42 @@ insn:
 	# setssbsy
 	.insn 0xf30f01e8
 
+	# mov
+	.insn 0x8b, %ecx, %eax
+	.insn 0x8b, %ax, %cx
+	.insn 0x89, %ecx, 4(%eax)
+	.insn 0x8b, 0x4444(,%eax), %ecx
+
+	# movzx
+	.insn 0x0fb6, %ah, %cx
+	.insn 0x0fb7, %eax, %ecx
+
+	# bswap
+	.insn 0x0fc8+r, %edx
+
 	# vzeroall
 	.insn VEX.256.0F.WIG 0x77
 	.insn {vex3} VEX.L1 0x0f77
+
+	# vaddpd
+	.insn VEX.66.0F 0x58, %xmm0, %xmm1, %xmm2
+	.insn VEX.66 0x0f58, %ymm0, %ymm1, %ymm2
+
+	# vaddss
+	.insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm2
+
+	# vfmaddps
+	.insn VEX.66.0F3A.W0 0x68, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x68, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x68, (%eax), %xmm1, %xmm2, %xmm3
+
+	# kmovw
+	.insn VEX.L0.0F.W0 0x92, %eax, %k1
+	.insn VEX.L0.0F.W0 0x93, %k1, %eax
+
+	# vaddps
+	.insn EVEX.NP.0F.W0 0x58, {rn-sae}, %zmm0, %zmm1, %zmm2
+
+	# vgather...
+	.insn VEX.66.0f38.W0 0x92, %xmm0, (%eax, %xmm1, 2), %xmm3
+	.insn EVEX.66.0f38.W1 0x93, (%eax, %xmm1, 2), %xmm3{%k4}
--- a/gas/testsuite/gas/i386/insn-64.d
+++ b/gas/testsuite/gas/i386/insn-64.d
@@ -11,6 +11,35 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
 [ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
+[ 	]*[a-f0-9]+:	44 8b c1[ 	]+mov    %ecx,%r8d
+[ 	]*[a-f0-9]+:	48 8b c8[ 	]+mov    %rax,%rcx
+[ 	]*[a-f0-9]+:	41 89 48 08[ 	]+mov    %ecx,0x8\(%r8\)
+[ 	]*[a-f0-9]+:	42 8b 0c 05 80 80 00 00[ 	]+mov    0x8080\(,%r8,1\),%ecx
+[ 	]*[a-f0-9]+:	66 0f be cc[ 	]+movsbw %ah,%cx
+[ 	]*[a-f0-9]+:	0f bf c8[ 	]+movswl %ax,%ecx
+[ 	]*[a-f0-9]+:	48 63 c8[ 	]+movslq %eax,%rcx
+[ 	]*[a-f0-9]+:	48 0f ca[ 	]+bswap  %rdx
+[ 	]*[a-f0-9]+:	41 0f c8[ 	]+bswap  %r8d
 [ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
+[ 	]*[a-f0-9]+:	c4 c1 71 58 d0[ 	]+vaddpd %xmm8,%xmm1,%xmm2
+[ 	]*[a-f0-9]+:	c5 b5 58 d0[ 	]+vaddpd %ymm0,%ymm9,%ymm2
+[ 	]*[a-f0-9]+:	c5 72 58 d0[ 	]+vaddss %xmm0,%xmm1,%xmm10
+[ 	]*[a-f0-9]+:	c4 e3 69 68 19 80[ 	]+vfmaddps %xmm8,\(%rcx\),%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	67 c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 c3 e9 68 18 10[ 	]+vfmaddps \(%r8\),%xmm1,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 c1 78 92 c8[ 	]+kmovw  %r8d,%k1
+[ 	]*[a-f0-9]+:	c5 78 93 c1[ 	]+kmovw  %k1,%r8d
+[ 	]*[a-f0-9]+:	62 b1 74 38 58 d0[ 	]+vaddps \{rd-sae\},%zmm16,%zmm1,%zmm2
+[ 	]*[a-f0-9]+:	62 f1 74 10 58 d0[ 	]+vaddps \{rn-sae\},%zmm0,%zmm17,%zmm2
+[ 	]*[a-f0-9]+:	62 e1 74 58 58 d0[ 	]+vaddps \{ru-sae\},%zmm0,%zmm1,%zmm18
+[ 	]*[a-f0-9]+:	c4 e2 39 92 1c 48[ 	]+vgatherdps %xmm8,\(%rax,%xmm1,2\),%xmm3
+[ 	]*[a-f0-9]+:	c4 c2 79 92 1c 48[ 	]+vgatherdps %xmm0,\(%r8,%xmm1,2\),%xmm3
+[ 	]*[a-f0-9]+:	c4 a2 79 92 1c 48[ 	]+vgatherdps %xmm0,\(%rax,%xmm9,2\),%xmm3
+[ 	]*[a-f0-9]+:	c4 62 79 92 1c 48[ 	]+vgatherdps %xmm0,\(%rax,%xmm1,2\),%xmm11
+[ 	]*[a-f0-9]+:	62 d2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%r8,%xmm1,2\),%xmm3\{%k4\}
+[ 	]*[a-f0-9]+:	62 b2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm9,2\),%xmm3\{%k4\}
+[ 	]*[a-f0-9]+:	62 f2 fd 04 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm17,2\),%xmm3\{%k4\}
+[ 	]*[a-f0-9]+:	62 72 fd 0c 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm1,2\),%xmm11\{%k4\}
+[ 	]*[a-f0-9]+:	62 e2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm1,2\),%xmm19\{%k4\}
 #pass
--- a/gas/testsuite/gas/i386/insn-64.s
+++ b/gas/testsuite/gas/i386/insn-64.s
@@ -13,6 +13,53 @@ insn:
 	# setssbsy
 	.insn 0xf30f01e8
 
+	# mov
+	.insn 0x8b, %ecx, %r8d
+	.insn 0x8b, %rax, %rcx
+	.insn 0x89, %ecx, 8(%r8)
+	.insn 0x8b, 0x8080(,%r8), %ecx
+
+	# movsx
+	.insn 0x0fbe, %ah, %cx
+	.insn 0x0fbf, %eax, %ecx
+	.insn 0x63, %rax, %rcx
+
+	# bswap
+	.insn 0x0fc8+r, %rdx
+	.insn 0x0fc8+r, %r8d
+
 	# vzeroall
 	.insn VEX.256.0F.WIG 0x77
 	.insn {vex3} VEX.L1 0x0f77
+
+	# vaddpd
+	.insn VEX.66.0F 0x58, %xmm8, %xmm1, %xmm2
+	.insn VEX.66 0x0f58, %ymm0, %ymm9, %ymm2
+
+	# vaddss
+	.insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm10
+
+	# vfmaddps
+	.insn VEX.66.0F3A.W0 0x68, %xmm8, (%rcx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x68, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x68, (%r8), %xmm1, %xmm2, %xmm3
+
+	# kmovw
+	.insn VEX.L0.0F.W0 0x92, %r8d, %k1
+	.insn VEX.L0.0F.W0 0x93, %k1, %r8d
+
+	# vaddps
+	.insn EVEX.NP.0F.W0 0x58, {rd-sae}, %zmm16, %zmm1, %zmm2
+	.insn EVEX.NP.0F.W0 0x58, {rn-sae}, %zmm0, %zmm17, %zmm2
+	.insn EVEX.NP.0F.W0 0x58, {ru-sae}, %zmm0, %zmm1, %zmm18
+
+	# vgather...
+	.insn VEX.66.0f38.W0 0x92, %xmm8, (%rax, %xmm1, 2), %xmm3
+	.insn VEX.66.0f38.W0 0x92, %xmm0, (%r8, %xmm1, 2), %xmm3
+	.insn VEX.66.0f38.W0 0x92, %xmm0, (%rax, %xmm9, 2), %xmm3
+	.insn VEX.66.0f38.W0 0x92, %xmm0, (%rax, %xmm1, 2), %xmm11
+	.insn EVEX.66.0f38.W1 0x93, (%r8, %xmm1, 2), %xmm3{%k4}
+	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm9, 2), %xmm3{%k4}
+	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm17, 2), %xmm3{%k4}
+	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm1, 2), %xmm11{%k4}
+	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm1, 2), %xmm19{%k4}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 09/14] x86: handle EVEX Disp8 for .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (7 preceding siblings ...)
  2023-03-10 10:23 ` [PATCH v2 08/14] x86: process instruction operands for .insn Jan Beulich
@ 2023-03-10 10:24 ` Jan Beulich
  2023-03-10 10:24 ` [PATCH v2 10/14] x86: allow for multiple immediates in output_disp() Jan Beulich
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:24 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

In particular the scaling factor cannot always be determined from pre-
existing operand attributes. Introduce a new {:d<N>} vector operand
syntax extension, restricted to .insn only, to allow specifying this in
(at least) otherwise ambiguous cases.
---
I was considering to suppress the {:d...} extension for Intel syntax, as
it should not be required there (and doing so would then also prevent
both possibly conflicting). Thoughts?

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -10946,8 +10946,11 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
   if (*line == ',')
     {
       i386_operand_type combined;
+      expressionS *disp_exp = NULL;
       bool changed;
 
+      i.memshift = -1;
+
       ptr = parse_operands (line + 1, &i386_mnemonics[MN__insn]);
       this_operand = -1;
       if (!ptr)
@@ -11093,12 +11096,40 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 
       /* Establish operand size encoding.  */
       operand_type_set (&combined, 0);
+
       for (j = i.imm_operands; j < i.operands; ++j)
 	{
 	  i.types[j].bitfield.instance = InstanceNone;
 
 	  if (operand_type_check (i.types[j], disp))
-	    i.types[j].bitfield.baseindex = 1;
+	    {
+	      i.types[j].bitfield.baseindex = 1;
+	      disp_exp = i.op[j].disps;
+	    }
+
+	  if (evex && i.types[j].bitfield.baseindex)
+	    {
+	      unsigned int n = i.memshift;
+
+	      if (i.types[j].bitfield.byte)
+		n = 0;
+	      else if (i.types[j].bitfield.word)
+		n = 1;
+	      else if (i.types[j].bitfield.dword)
+		n = 2;
+	      else if (i.types[j].bitfield.qword)
+		n = 3;
+	      else if (i.types[j].bitfield.xmmword)
+		n = 4;
+	      else if (i.types[j].bitfield.ymmword)
+		n = 5;
+	      else if (i.types[j].bitfield.zmmword)
+		n = 6;
+
+	      if (i.memshift < 32 && n != i.memshift)
+		as_warn ("conflicting memory operand size specifiers");
+	      i.memshift = n;
+	    }
 
 	  if ((i.broadcast.type || i.broadcast.bytes)
 	      && j == i.broadcast.operand)
@@ -11108,6 +11139,16 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 	  combined.bitfield.class = ClassNone;
 	}
 
+      switch ((i.broadcast.type ? i.broadcast.type : 1)
+	      << (i.memshift < 32 ? i.memshift : 0))
+	{
+	case 64: combined.bitfield.zmmword = 1; break;
+	case 32: combined.bitfield.ymmword = 1; break;
+	case 16: combined.bitfield.xmmword = 1; break;
+	case  8: combined.bitfield.qword = 1; break;
+	case  4: combined.bitfield.dword = 1; break;
+	}
+
       if (i.vec_encoding == vex_encoding_default)
 	{
 	  if (flag_code == CODE_64BIT && combined.bitfield.qword)
@@ -11153,8 +11194,40 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
 	      else if (combined.bitfield.xmmword)
 	        i.tm.opcode_modifier.evex = EVEX128;
 	    }
+
+	  if (i.memshift >= 32)
+	    {
+	      unsigned int n = 0;
+
+	      switch (i.tm.opcode_modifier.evex)
+		{
+		case EVEX512: n = 64; break;
+		case EVEX256: n = 32; break;
+		case EVEX128: n = 16; break;
+		}
+
+	      if (i.broadcast.type)
+		n /= i.broadcast.type;
+
+	      if (n > 0)
+		for (i.memshift = 0; !(n & 1); n >>= 1)
+		  ++i.memshift;
+	      else if (disp_exp != NULL && disp_exp->X_op == O_constant
+		       && disp_exp->X_add_number != 0
+		       && i.disp_encoding != disp_encoding_32bit)
+		{
+		  if (!quiet_warnings)
+		    as_warn ("cannot determine memory operand size");
+		  i.disp_encoding = disp_encoding_32bit;
+		}
+	    }
 	}
 
+      if (i.memshift >= 32)
+	i.memshift = 0;
+      else if (!evex)
+	i.vec_encoding = vex_encoding_error;
+
       if (i.disp_operands && !optimize_disp (&i.tm))
 	goto done;
 
@@ -11329,6 +11402,29 @@ check_VecOperations (char *op_string)
 
 	      i.broadcast.type = bcst_type;
 	      i.broadcast.operand = this_operand;
+
+	      /* For .insn a data size specifier may be appended.  */
+	      if (dot_insn () && *op_string == ':')
+		goto dot_insn_modifier;
+	    }
+	  /* Check .insn special cases.  */
+	  else if (dot_insn () && *op_string == ':')
+	    {
+	    dot_insn_modifier:
+	      if (op_string[1] == 'd')
+		{
+		  unsigned long n;
+
+		  if (i.memshift < 32)
+		    goto duplicated_vec_op;
+
+		  n = strtoul (op_string + 2, &end_op, 0);
+		  if (n)
+		    for (i.memshift = 0; !(n & 1); n >>= 1)
+		      ++i.memshift;
+		  if (i.memshift < 32 && n == 1)
+		    op_string = end_op;
+		}
 	    }
 	  /* Check masking operation.  */
 	  else if ((mask = parse_register (op_string, &end_op)) != NULL)
--- a/gas/testsuite/gas/i386/insn-32.d
+++ b/gas/testsuite/gas/i386/insn-32.d
@@ -23,6 +23,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c5 f1 58 d0[ 	]+vaddpd %xmm0,%xmm1,%xmm2
 [ 	]*[a-f0-9]+:	c5 f5 58 d0[ 	]+vaddpd %ymm0,%ymm1,%ymm2
 [ 	]*[a-f0-9]+:	c5 f2 58 d0[ 	]+vaddss %xmm0,%xmm1,%xmm2
+[ 	]*[a-f0-9]+:	62 f1 76 08 58 50 01[ 	]+\{evex\} vaddss (0x)?4\(%eax\),%xmm1,%xmm2
 [ 	]*[a-f0-9]+:	c4 e3 69 68 19 00[ 	]+vfmaddps %xmm0,\(%ecx\),%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 e3 e9 68 18 10[ 	]+vfmaddps \(%eax\),%xmm1,%xmm2,%xmm3
@@ -31,4 +32,13 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	62 f1 74 18 58 d0[ 	]+vaddps \{rn-sae\},%zmm0,%zmm1,%zmm2
 [ 	]*[a-f0-9]+:	c4 e2 79 92 1c 48[ 	]+vgatherdps %xmm0,\(%eax,%xmm1,2\),%xmm3
 [ 	]*[a-f0-9]+:	62 f2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%eax,%xmm1,2\),%xmm3\{%k4\}
+[ 	]*[a-f0-9]+:	62 f2 7d 28 88 48 01[ 	]+vexpandps (0x)?4\(%eax\),%ymm1
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%eax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%eax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%eax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 7c 48 5a 40 01[ 	]+vcvtph2pd 0x10\(%eax\),%zmm0
+[ 	]*[a-f0-9]+:	62 f5 7c 58 5a 40 01[ 	]+vcvtph2pd (0x)?2\(%eax\)\{1to8\},%zmm0
 #pass
--- a/gas/testsuite/gas/i386/insn-32.s
+++ b/gas/testsuite/gas/i386/insn-32.s
@@ -36,6 +36,7 @@ insn:
 
 	# vaddss
 	.insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm2
+	.insn EVEX.LIG.F3.0F.W0 0x58, 4(%eax){:d4}, %xmm1, %xmm2
 
 	# vfmaddps
 	.insn VEX.66.0F3A.W0 0x68, %xmm0, (%ecx), %xmm2, %xmm3
@@ -52,3 +53,18 @@ insn:
 	# vgather...
 	.insn VEX.66.0f38.W0 0x92, %xmm0, (%eax, %xmm1, 2), %xmm3
 	.insn EVEX.66.0f38.W1 0x93, (%eax, %xmm1, 2), %xmm3{%k4}
+
+	# vexpandps
+	.insn EVEX.66.0F38.W0 0x88, 4(%eax){:d4}, %ymm1
+
+	# vcvtpd2phz
+	.insn EVEX.512.66.M5.W1 0x5a, 64(%eax), %xmm0
+	.insn EVEX.66.M5.W1 0x5a, 64(%eax), %zmm0
+	.insn EVEX.66.M5.W1 0x5a, 64(%eax){:d64}, %xmm0
+	.insn EVEX.512.66.M5.W1 0x5a, 8(%eax){1to8}, %xmm0
+	.insn EVEX.66.M5.W1 0x5a, 8(%eax){1to8}, %zmm0
+	.insn EVEX.66.M5.W1 0x5a, 8(%eax){1to8:d8}, %xmm0
+
+	# vcvtph2pd
+	.insn EVEX.M5.W0 0x5a, 16(%eax){:d16}, %zmm0
+	.insn EVEX.M5.W0 0x5a, 2(%eax){1to8:d2}, %zmm0
--- a/gas/testsuite/gas/i386/insn-64.d
+++ b/gas/testsuite/gas/i386/insn-64.d
@@ -25,6 +25,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c4 c1 71 58 d0[ 	]+vaddpd %xmm8,%xmm1,%xmm2
 [ 	]*[a-f0-9]+:	c5 b5 58 d0[ 	]+vaddpd %ymm0,%ymm9,%ymm2
 [ 	]*[a-f0-9]+:	c5 72 58 d0[ 	]+vaddss %xmm0,%xmm1,%xmm10
+[ 	]*[a-f0-9]+:	62 f1 76 08 58 50 01[ 	]+\{evex\} vaddss (0x)?4\(%rax\),%xmm1,%xmm2
 [ 	]*[a-f0-9]+:	c4 e3 69 68 19 80[ 	]+vfmaddps %xmm8,\(%rcx\),%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	67 c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 c3 e9 68 18 10[ 	]+vfmaddps \(%r8\),%xmm1,%xmm2,%xmm3
@@ -42,4 +43,13 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	62 f2 fd 04 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm17,2\),%xmm3\{%k4\}
 [ 	]*[a-f0-9]+:	62 72 fd 0c 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm1,2\),%xmm11\{%k4\}
 [ 	]*[a-f0-9]+:	62 e2 fd 0c 93 1c 48[ 	]+vgatherqpd \(%rax,%xmm1,2\),%xmm19\{%k4\}
+[ 	]*[a-f0-9]+:	62 f2 7d 28 88 48 01[ 	]+vexpandps (0x)?4\(%rax\),%ymm1
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%rax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%rax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 48 5a 40 01[ 	]+vcvtpd2phz 0x40\(%rax\),%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%rax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%rax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%rax\)\{1to8\},%xmm0
+[ 	]*[a-f0-9]+:	62 f5 7c 48 5a 40 01[ 	]+vcvtph2pd 0x10\(%rax\),%zmm0
+[ 	]*[a-f0-9]+:	62 f5 7c 58 5a 40 01[ 	]+vcvtph2pd (0x)?2\(%rax\)\{1to8\},%zmm0
 #pass
--- a/gas/testsuite/gas/i386/insn-64.s
+++ b/gas/testsuite/gas/i386/insn-64.s
@@ -38,6 +38,7 @@ insn:
 
 	# vaddss
 	.insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm10
+	.insn EVEX.LIG.F3.0F.W0 0x58, 4(%rax){:d4}, %xmm1, %xmm2
 
 	# vfmaddps
 	.insn VEX.66.0F3A.W0 0x68, %xmm8, (%rcx), %xmm2, %xmm3
@@ -63,3 +64,18 @@ insn:
 	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm17, 2), %xmm3{%k4}
 	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm1, 2), %xmm11{%k4}
 	.insn EVEX.66.0f38.W1 0x93, (%rax, %xmm1, 2), %xmm19{%k4}
+
+	# vexpandps
+	.insn EVEX.66.0F38.W0 0x88, 4(%rax){:d4}, %ymm1
+
+	# vcvtpd2phz
+	.insn EVEX.512.66.M5.W1 0x5a, 64(%rax), %xmm0
+	.insn EVEX.66.M5.W1 0x5a, 64(%rax), %zmm0
+	.insn EVEX.66.M5.W1 0x5a, 64(%rax){:d64}, %xmm0
+	.insn EVEX.512.66.M5.W1 0x5a, 8(%rax){1to8}, %xmm0
+	.insn EVEX.66.M5.W1 0x5a, 8(%rax){1to8}, %zmm0
+	.insn EVEX.66.M5.W1 0x5a, 8(%rax){1to8:d8}, %xmm0
+
+	# vcvtph2pd
+	.insn EVEX.M5.W0 0x5a, 16(%rax){:d16}, %zmm0
+	.insn EVEX.M5.W0 0x5a, 2(%rax){1to8:d2}, %zmm0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 10/14] x86: allow for multiple immediates in output_disp()
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (8 preceding siblings ...)
  2023-03-10 10:24 ` [PATCH v2 09/14] x86: handle EVEX Disp8 " Jan Beulich
@ 2023-03-10 10:24 ` Jan Beulich
  2023-03-10 10:25 ` [PATCH v2 11/14] x86: handle immediate operands for .insn Jan Beulich
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:24 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

.insn isn't going to have a constraint of only a single immediate when,
in particular, RIP-relative addressing is used.
---
Of course this could be folded into the relevant subsequent patch, but
I'm wondering in particular whether limiting the new behavior to .insn
is actually necessary (if not, this wouldn't be a good fit to merge into
that later patch).

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -10065,13 +10065,13 @@ output_disp (fragS *insn_start_frag, off
 		    if (operand_type_check (i.types[n1], imm))
 		      {
 			/* Only one immediate is allowed for PC
-			   relative address.  */
-			gas_assert (sz == 0);
-			sz = imm_size (n1);
-			i.op[n].disps->X_add_number -= sz;
+			   relative address, except with .insn.  */
+			gas_assert (sz == 0 || dot_insn ());
+			sz += imm_size (n1);
 		      }
-		  /* We should find the immediate.  */
+		  /* We should find at least one immediate.  */
 		  gas_assert (sz != 0);
+		  i.op[n].disps->X_add_number -= sz;
 		}
 
 	      p = frag_more (size);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 11/14] x86: handle immediate operands for .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (9 preceding siblings ...)
  2023-03-10 10:24 ` [PATCH v2 10/14] x86: allow for multiple immediates in output_disp() Jan Beulich
@ 2023-03-10 10:25 ` Jan Beulich
  2023-03-10 10:26 ` [PATCH v2 12/14] x86: document .insn Jan Beulich
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:25 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

Since we have no insn suffix and it's also not realistic to infer
immediate size from the size of other (typically register) operands
(like optimize_imm() does), and since we also don't have a template
telling us permitted size(s), a new syntax construct is introduced to
allow size (and signedness) specification. In the absence of such, the
size is inferred from significant bits (which obviously may yield
inconsistent results at least for effectively negative values, depending
on whether BFD64 is enabled), and only if supplied expressions can be
evaluated at parsing time. Being explicit is generally recommended to
users.

Size specification is permitted at bit granularity, but of course the
eventually emitted immediate values will be padded up to 8-, 16-, 32-,
or 64-bit fields.
---
RFC: As to relocations emitted, there simply aren't enough to express
     the signed/unsigned distinction the new syntax allows. Similarly
     only significant bit counts 8, 16, 32, and 64 can be honored.
     Question is whether non-representable cases should be warned about
     or even be rejected.
---
v2: Adjust 64-bit test expectations to also cover COFF. Xfail test for
    Darwin.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -288,6 +288,7 @@ struct _i386_insn
     unsigned int flags[MAX_OPERANDS];
 #define Operand_PCrel 1
 #define Operand_Mem   2
+#define Operand_Signed 4 /* .insn only */
 
     /* Relocation type for operand */
     enum bfd_reloc_code_real reloc[MAX_OPERANDS];
@@ -310,6 +311,9 @@ struct _i386_insn
     /* .insn allows for reserved opcode spaces.  */
     unsigned char insn_opcode_space;
 
+    /* .insn also allows (requires) specifying immediate size.  */
+    unsigned char imm_bits[MAX_OPERANDS];
+
     /* Register is in low 3 bits of opcode.  */
     bool short_form;
 
@@ -5938,6 +5942,10 @@ swap_2_operands (unsigned int xchg1, uns
   i.reloc[xchg2] = i.reloc[xchg1];
   i.reloc[xchg1] = temp_reloc;
 
+  temp_flags = i.imm_bits[xchg2];
+  i.imm_bits[xchg2] = i.imm_bits[xchg1];
+  i.imm_bits[xchg1] = temp_flags;
+
   if (i.mask.reg)
     {
       if (i.mask.operand == xchg1)
@@ -10203,7 +10211,8 @@ output_imm (fragS *insn_start_frag, offs
 
 	      if (i.types[n].bitfield.imm32s
 		  && (i.suffix == QWORD_MNEM_SUFFIX
-		      || (!i.suffix && i.tm.opcode_modifier.no_lsuf)))
+		      || (!i.suffix && i.tm.opcode_modifier.no_lsuf)
+		      || dot_insn ()))
 		sign = 1;
 	      else
 		sign = 0;
@@ -11231,6 +11240,57 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
       if (i.disp_operands && !optimize_disp (&i.tm))
 	goto done;
 
+      /* Establish size for immediate operands.  */
+      for (j = 0; j < i.imm_operands; ++j)
+	{
+	  expressionS *expP = i.op[j].imms;
+
+	  gas_assert (operand_type_check (i.types[j], imm));
+	  operand_type_set (&i.types[j], 0);
+
+	  if (i.imm_bits[j] > 32)
+	    i.types[j].bitfield.imm64 = 1;
+	  else if (i.imm_bits[j] > 16)
+	    {
+	      if (flag_code == CODE_64BIT && (i.flags[j] & Operand_Signed))
+		i.types[j].bitfield.imm32s = 1;
+	      else
+		i.types[j].bitfield.imm32 = 1;
+	    }
+	  else if (i.imm_bits[j] > 8)
+	    i.types[j].bitfield.imm16 = 1;
+	  else if (i.imm_bits[j] > 0)
+	    {
+	      if (i.flags[j] & Operand_Signed)
+		i.types[j].bitfield.imm8s = 1;
+	      else
+		i.types[j].bitfield.imm8 = 1;
+	    }
+	  else if (expP->X_op == O_constant)
+	    {
+	      i.types[j] = smallest_imm_type (expP->X_add_number);
+	      i.types[j].bitfield.imm1 = 0;
+	      /* Oddly enough imm_size() checks imm64 first, so the bit needs
+		 zapping since smallest_imm_type() sets it unconditionally.  */
+	      if (flag_code != CODE_64BIT)
+		{
+		  i.types[j].bitfield.imm64 = 0;
+		  i.types[j].bitfield.imm32s = 0;
+		  i.types[j].bitfield.imm32 = 1;
+		}
+	      else if (i.types[j].bitfield.imm32 || i.types[j].bitfield.imm32s)
+		i.types[j].bitfield.imm64 = 0;
+	    }
+	  else
+	    /* Non-constant expressions are sized heuristically.  */
+	    switch (flag_code)
+	      {
+	      case CODE_64BIT: i.types[j].bitfield.imm32s = 1; break;
+	      case CODE_32BIT: i.types[j].bitfield.imm32 = 1; break;
+	      case CODE_16BIT: i.types[j].bitfield.imm16 = 1; break;
+	      }
+	}
+
       for (j = 0; j < i.operands; ++j)
 	i.tm.operand_types[j] = i.types[j];
 
@@ -11411,10 +11471,11 @@ check_VecOperations (char *op_string)
 	  else if (dot_insn () && *op_string == ':')
 	    {
 	    dot_insn_modifier:
-	      if (op_string[1] == 'd')
+	      switch (op_string[1])
 		{
 		  unsigned long n;
 
+		case 'd':
 		  if (i.memshift < 32)
 		    goto duplicated_vec_op;
 
@@ -11424,6 +11485,27 @@ check_VecOperations (char *op_string)
 		      ++i.memshift;
 		  if (i.memshift < 32 && n == 1)
 		    op_string = end_op;
+		  break;
+
+		case 's': case 'u':
+		  /* This isn't really a "vector" operation, but a sign/size
+		     specifier for immediate operands of .insn.  Note that AT&T
+		     syntax handles the same in i386_immediate().  */
+		  if (!intel_syntax)
+		    break;
+
+		  if (i.imm_bits[this_operand])
+		    goto duplicated_vec_op;
+
+		  n = strtoul (op_string + 2, &end_op, 0);
+		  if (n && n <= (flag_code == CODE_64BIT ? 64 : 32))
+		    {
+		      i.imm_bits[this_operand] = n;
+		      if (op_string[1] == 's')
+			i.flags[this_operand] |= Operand_Signed;
+		      op_string = end_op;
+		    }
+		  break;
 		}
 	    }
 	  /* Check masking operation.  */
@@ -11562,6 +11644,22 @@ i386_immediate (char *imm_start)
 
   exp_seg = expression (exp);
 
+  /* For .insn immediates there may be a size specifier.  */
+  if (dot_insn () && *input_line_pointer == '{' && input_line_pointer[1] == ':'
+      && (input_line_pointer[2] == 's' || input_line_pointer[2] == 'u'))
+    {
+      char *e;
+      unsigned long n = strtoul (input_line_pointer + 3, &e, 0);
+
+      if (*e == '}' && n && n <= (flag_code == CODE_64BIT ? 64 : 32))
+	{
+	  i.imm_bits[this_operand] = n;
+	  if (input_line_pointer[2] == 's')
+	    i.flags[this_operand] |= Operand_Signed;
+	  input_line_pointer = e + 1;
+	}
+    }
+
   SKIP_WHITESPACE ();
   if (*input_line_pointer)
     as_bad (_("junk `%s' after expression"), input_line_pointer);
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -965,7 +965,8 @@ i386_intel_operand (char *operand_string
       i386_operand_type temp;
 
       /* Register operand.  */
-      if (intel_state.base || intel_state.index || intel_state.seg)
+      if (intel_state.base || intel_state.index || intel_state.seg
+          || i.imm_bits[this_operand])
 	{
 	  as_bad (_("invalid operand"));
 	  return 0;
@@ -998,6 +999,12 @@ i386_intel_operand (char *operand_string
 	   || intel_state.is_mem)
     {
       /* Memory operand.  */
+      if (i.imm_bits[this_operand])
+	{
+	  as_bad (_("invalid operand"));
+	  return 0;
+	}
+
       if (i.mem_operands)
 	{
 	  /* Handle
--- a/gas/testsuite/gas/i386/insn-32.d
+++ b/gas/testsuite/gas/i386/insn-32.d
@@ -1,5 +1,7 @@
+#as: --divide
 #objdump: -dw
 #name: .insn (32-bit code)
+#xfail: *-*-darwin*
 
 .*: +file format .*
 
@@ -10,6 +12,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	f3 90[ 	]+pause
 [ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
+[ 	]*[a-f0-9]+:	d9 ee[ 	]+fldz
 [ 	]*[a-f0-9]+:	f3 0f 01 e8[ 	]+setssbsy
 [ 	]*[a-f0-9]+:	8b c1[ 	]+mov    %ecx,%eax
 [ 	]*[a-f0-9]+:	66 8b c8[ 	]+mov    %ax,%cx
@@ -17,7 +20,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	8b 0c 05 44 44 00 00[ 	]+mov    0x4444\(,%eax,1\),%ecx
 [ 	]*[a-f0-9]+:	66 0f b6 cc[ 	]+movzbw %ah,%cx
 [ 	]*[a-f0-9]+:	0f b7 c8[ 	]+movzwl %ax,%ecx
+[ 	]*[a-f0-9]+:	64 f0 80 30 01[ 	]+lock xorb \$(0x)?1,%fs:\(%eax\)
 [ 	]*[a-f0-9]+:	0f ca[ 	]+bswap  %edx
+[ 	]*[a-f0-9]+:	c7 f8 02 00 00 00[ 	]+xbegin [0-9a-f]+ <insn\+.*>
+[ 	]*[a-f0-9]+:	e2 f8[ 	]+loop   [0-9a-f]+ <insn\+.*>
 [ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c5 f1 58 d0[ 	]+vaddpd %xmm0,%xmm1,%xmm2
@@ -27,6 +33,9 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c4 e3 69 68 19 00[ 	]+vfmaddps %xmm0,\(%ecx\),%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 e3 e9 68 18 10[ 	]+vfmaddps \(%eax\),%xmm1,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 69 48 19 00[ 	]+vpermil2ps \$(0x)?0,%xmm0,\(%ecx\),%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 e9 48 19 02[ 	]+vpermil2ps \$(0x)?2,\(%ecx\),%xmm0,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 e9 48 18 13[ 	]+vpermil2ps \$(0x)?3,\(%eax\),%xmm1,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c5 f8 92 c8[ 	]+kmovw  %eax,%k1
 [ 	]*[a-f0-9]+:	c5 f8 93 c1[ 	]+kmovw  %k1,%eax
 [ 	]*[a-f0-9]+:	62 f1 74 18 58 d0[ 	]+vaddps \{rn-sae\},%zmm0,%zmm1,%zmm2
@@ -41,4 +50,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	62 f5 fd 58 5a 40 01[ 	]+vcvtpd2ph (0x)?8\(%eax\)\{1to8\},%xmm0
 [ 	]*[a-f0-9]+:	62 f5 7c 48 5a 40 01[ 	]+vcvtph2pd 0x10\(%eax\),%zmm0
 [ 	]*[a-f0-9]+:	62 f5 7c 58 5a 40 01[ 	]+vcvtph2pd (0x)?2\(%eax\)\{1to8\},%zmm0
+[ 	]*[a-f0-9]+:	62 f3 7d 28 66 40 01 ff[ 	]+vfpclasspsy \$0xff,0x20\(%eax\),%k0
+[ 	]*[a-f0-9]+:	62 f3 7d 28 66 40 01 ff[ 	]+vfpclasspsy \$0xff,0x20\(%eax\),%k0
+[ 	]*[a-f0-9]+:	62 f3 7d 38 66 40 01 ff[ 	]+vfpclassps \$0xff,(0x)?4\(%eax\)\{1to8\},%k0
+[ 	]*[a-f0-9]+:	62 f3 7d 38 66 40 01 ff[ 	]+vfpclassps \$0xff,(0x)?4\(%eax\)\{1to8\},%k0
 #pass
--- a/gas/testsuite/gas/i386/insn-32.s
+++ b/gas/testsuite/gas/i386/insn-32.s
@@ -9,6 +9,7 @@ insn:
 
 	# fldz
 	.insn 0xd9ee
+	.insn 0xd9, $0xee
 
 	# setssbsy
 	.insn 0xf30f01e8
@@ -23,9 +24,20 @@ insn:
 	.insn 0x0fb6, %ah, %cx
 	.insn 0x0fb7, %eax, %ecx
 
+	# xorb
+	.insn lock 0x80/6, $1, %fs:(%eax)
+
 	# bswap
 	.insn 0x0fc8+r, %edx
 
+1:
+	# xbegin 3f
+	.insn 0xc7f8, $3f-2f{:s32}
+2:
+	# loop 1b
+	.insn 0xe2, $1b-3f{:s8}
+3:
+
 	# vzeroall
 	.insn VEX.256.0F.WIG 0x77
 	.insn {vex3} VEX.L1 0x0f77
@@ -43,6 +55,11 @@ insn:
 	.insn VEX.66.0F3A.W1 0x68, %xmm0, (%ecx), %xmm2, %xmm3
 	.insn VEX.66.0F3A.W1 0x68, (%eax), %xmm1, %xmm2, %xmm3
 
+	# vpermil2ps
+	.insn VEX.66.0F3A.W0 0x48, $0, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x48, $2, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x48, $3, (%eax), %xmm1, %xmm2, %xmm3
+
 	# kmovw
 	.insn VEX.L0.0F.W0 0x92, %eax, %k1
 	.insn VEX.L0.0F.W0 0x93, %k1, %eax
@@ -68,3 +85,10 @@ insn:
 	# vcvtph2pd
 	.insn EVEX.M5.W0 0x5a, 16(%eax){:d16}, %zmm0
 	.insn EVEX.M5.W0 0x5a, 2(%eax){1to8:d2}, %zmm0
+
+	.intel_syntax noprefix
+	# vfpclassps
+	.insn EVEX.256.66.0f3a.W0 0x66, k0, [eax+32], 0xff
+	.insn EVEX.66.0f3a.W0 0x66, k0, ymmword ptr [eax+32], 0xff
+	.insn EVEX.256.66.0f3a.W0 0x66, k0, [eax+4]{1to8}, 0xff
+	.insn EVEX.66.0f3a.W0 0x66, k0, dword ptr [eax+4]{1to8}, 0xff
--- a/gas/testsuite/gas/i386/insn-64.d
+++ b/gas/testsuite/gas/i386/insn-64.d
@@ -1,5 +1,7 @@
-#objdump: -dw
+#as: --divide
+#objdump: -dwr
 #name: .insn (64-bit code)
+#xfail: *-*-darwin*
 
 .*: +file format .*
 
@@ -18,8 +20,14 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	66 0f be cc[ 	]+movsbw %ah,%cx
 [ 	]*[a-f0-9]+:	0f bf c8[ 	]+movswl %ax,%ecx
 [ 	]*[a-f0-9]+:	48 63 c8[ 	]+movslq %eax,%rcx
+[ 	]*[a-f0-9]+:	f0 80 35 ((00|ff) ){4}01[ 	]+lock xorb \$(0x)?1,[-x01]+\(%rip\) *# .*: (R_X86_64_PC32	lock-(0x)?5|IMAGE_REL_AMD64_REL32	lock)
 [ 	]*[a-f0-9]+:	48 0f ca[ 	]+bswap  %rdx
 [ 	]*[a-f0-9]+:	41 0f c8[ 	]+bswap  %r8d
+[ 	]*[a-f0-9]+:	c7 f8 02 00 00 00[ 	]+xbegin [0-9a-f]+ <insn\+.*>
+[ 	]*[a-f0-9]+:	e2 f8[ 	]+loop   [0-9a-f]+ <insn\+.*>
+[ 	]*[a-f0-9]+:	05 00 00 00 00[ 	]+add    \$(0x)?0,%eax	.*: (R_X86_64_32|IMAGE_REL_AMD64_ADDR32)	var
+[ 	]*[a-f0-9]+:	48 05 00 00 00 00[ 	]+add    \$(0x)?0,%rax	.*: R_X86_64_32S	var
+[ 	]*[a-f0-9]+:	81 3d (00|fc) ((00|ff) ){3}13 12 23 21[ 	]+cmpl   \$0x21231213,[-x04]+\(%rip\) *# .*: (R_X86_64_PC32	var-(0x)?8|IMAGE_REL_AMD64_REL32	var)
 [ 	]*[a-f0-9]+:	c5 fc 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c4 e1 7c 77[ 	]+vzeroall
 [ 	]*[a-f0-9]+:	c4 c1 71 58 d0[ 	]+vaddpd %xmm8,%xmm1,%xmm2
@@ -29,6 +37,9 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c4 e3 69 68 19 80[ 	]+vfmaddps %xmm8,\(%rcx\),%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	67 c4 e3 e9 68 19 00[ 	]+vfmaddps \(%ecx\),%xmm0,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 c3 e9 68 18 10[ 	]+vfmaddps \(%r8\),%xmm1,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 e3 69 48 19 80[ 	]+vpermil2ps \$(0x)0,%xmm8,\(%rcx\),%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	67 c4 e3 e9 48 19 02[ 	]+vpermil2ps \$(0x)2,\(%ecx\),%xmm0,%xmm2,%xmm3
+[ 	]*[a-f0-9]+:	c4 c3 e9 48 18 13[ 	]+vpermil2ps \$(0x)3,\(%r8\),%xmm1,%xmm2,%xmm3
 [ 	]*[a-f0-9]+:	c4 c1 78 92 c8[ 	]+kmovw  %r8d,%k1
 [ 	]*[a-f0-9]+:	c5 78 93 c1[ 	]+kmovw  %k1,%r8d
 [ 	]*[a-f0-9]+:	62 b1 74 38 58 d0[ 	]+vaddps \{rd-sae\},%zmm16,%zmm1,%zmm2
--- a/gas/testsuite/gas/i386/insn-64.s
+++ b/gas/testsuite/gas/i386/insn-64.s
@@ -24,10 +24,30 @@ insn:
 	.insn 0x0fbf, %eax, %ecx
 	.insn 0x63, %rax, %rcx
 
+	# xorb
+	.insn lock 0x80/6, $1, lock(%rip)
+
 	# bswap
 	.insn 0x0fc8+r, %rdx
 	.insn 0x0fc8+r, %r8d
 
+1:
+	# xbegin 3f
+	.insn 0xc7f8, $3f-2f{:s32}
+2:
+	# loop 1b
+	.insn 0xe2, $1b-3f{:s8}
+3:
+
+	# add $var, %eax
+	.insn 0x05, $var{:u32}
+
+	# add $var, %rax
+	.insn rex.w 0x05, $var{:s32}
+
+	# cmpl (32-bit immediate split into two 16-bit halves)
+	.insn 0x81/7, $0x1213, $0x2123, var(%rip)
+
 	# vzeroall
 	.insn VEX.256.0F.WIG 0x77
 	.insn {vex3} VEX.L1 0x0f77
@@ -45,6 +65,11 @@ insn:
 	.insn VEX.66.0F3A.W1 0x68, %xmm0, (%ecx), %xmm2, %xmm3
 	.insn VEX.66.0F3A.W1 0x68, (%r8), %xmm1, %xmm2, %xmm3
 
+	# vpermil2ps
+	.insn VEX.66.0F3A.W0 0x48, $0, %xmm8, (%rcx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x48, $2, %xmm0, (%ecx), %xmm2, %xmm3
+	.insn VEX.66.0F3A.W1 0x48, $3, (%r8), %xmm1, %xmm2, %xmm3
+
 	# kmovw
 	.insn VEX.L0.0F.W0 0x92, %r8d, %k1
 	.insn VEX.L0.0F.W0 0x93, %k1, %r8d


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 12/14] x86: document .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (10 preceding siblings ...)
  2023-03-10 10:25 ` [PATCH v2 11/14] x86: handle immediate operands for .insn Jan Beulich
@ 2023-03-10 10:26 ` Jan Beulich
  2023-03-10 10:26 ` [PATCH v2 13/14] x86: convert testcases to use .insn Jan Beulich
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:26 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

... and mention its introduction in NEWS.

--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* A new .insn directive is recognized by x86 gas.
+
 Changes in 2.40:
 
 * Add support for Intel RAO-INT instructions.
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -613,6 +613,137 @@ This directive behaves in the same way a
 taking a series of comma separated expressions and storing them as
 two-byte wide values into the current section.
 
+@cindex @code{insn} directive
+@item .insn [@var{prefix}[,...]] [@var{encoding}] @var{major-opcode}[@code{+r}|@code{/@var{extension}}] [,@var{operand}[,...]]
+This directive allows composing instructions which @code{@value{AS}}
+may not know about yet, or which it has no way of expressing (which
+can be the case for certain alternative encodings).  It assumes certain
+basic structure in how operands are encoded, and it also only
+recognizes - with a few extensions as per below - operands otherwise
+valid for instructions.  Therefore there is no guarantee that
+everything can be expressed (e.g. the original Intel Xeon Phi's MVEX
+encodings cannot be expressed).
+
+@itemize @bullet
+@item
+@var{prefix} expresses one or more opcode prefixes in the usual way.
+Legacy encoding prefixes altering meaning (0x66, 0xF2, 0xF3) may be
+specified as high byte of <major-opcode> (perhaps already including an
+encoding space prefix).  Note that there can only be one such prefix.
+Segment overrides are better specified in the respective memory
+operand, as long as there is one.
+
+@item
+@var{encoding} is used to specify VEX, XOP, or EVEX encodings. The
+syntax tries to resemble that used in documentation:
+@itemize @bullet
+@item @code{VEX}[@code{.@var{len}}][@code{.@var{prefix}}][@code{.@var{space}}][@code{.@var{w}}]
+@item @code{EVEX}[@code{.@var{len}}][@code{.@var{prefix}}][@code{.@var{space}}][@code{.@var{w}}]
+@item @code{XOP}@var{space}[@code{.@var{len}}][@code{.@var{prefix}}][@code{.@var{w}}]
+@end itemize
+
+Here
+@itemize @bullet
+@item @var{len} can be @code{LIG}, @code{128}, @code{256}, or (EVEX
+only) @code{512} as well as @code{L0} / @code{L1} for VEX / XOP and
+@code{L0}...@code{L3} for EVEX
+@item @var{prefix} can be @code{NP}, @code{66}, @code{F3}, or @code{F2}
+@item @var{space} can be
+@itemize @bullet
+@item @code{0f}, @code{0f38}, @code{0f3a}, or @code{M0}...@code{M31}
+for VEX
+@item @code{08}...@code{1f} for XOP
+@item @code{0f}, @code{0f38}, @code{0f3a}, or @code{M0}...@code{M15}
+for EVEX
+@end itemize
+@item @var{w} can be @code{WIG}, @code{W0}, or @code{W1}
+@end itemize
+
+Defaults:
+@itemize @bullet
+@item Omitted @var{len} means "infer from operand size" if there is at
+least one sized vector operand, or @code{LIG} otherwise. (Obviously
+@var{len} has to be omitted when there's EVEX rounding control
+specified later in the operands.)
+@item Omitted @var{prefix} means @code{NP}.
+@item Omitted @var{space} (VEX/EVEX only) implies encoding space is
+taken from @var{major-opcode}.
+@item Omitted @var{w} means "infer from GPR operand size" in 64-bit
+code if there is at least one GPR(-like) operand, or @code{WIG}
+otherwise.
+@end itemize
+
+@item
+@var{major-opcode} is an absolute expression specifying the instruction
+opcode.  Legacy encoding prefixes altering encoding space (0x0f,
+0x0f38, 0x0f3a) have to be specified as high byte(s) here.
+"Degenerate" ModR/M bytes, as present in e.g. certain FPU opcodes or
+sub-spaces like that of major opcode 0x0f01, generally want encoding as
+immediate operand (such opcodes wouldn't normally have non-immediate
+operands); in some cases it may be possible to also encode these as low
+byte of the major opcode, but there are potential ambiguities.  Also
+note that after stripping encoding prefixes, the residual has to fit in
+two bytes (16 bits).  @code{+r} can be suffixed to the major opcode
+expression to specify register-only encoding forms not using a ModR/M
+byte.  @code{/@var{extension}} can alternatively be suffixed to the
+major opcode expression to specify an extension opcode, encoded in bits
+3-5 of the ModR/M byte.
+
+@item
+@var{operand} is an instruction operand expressed the usual way.
+Register operands are primarily used to express register numbers as
+encoded in ModR/M byte and REX/VEX/XOP/EVEX prefixes.  In certain
+cases the register type (really: size) is also used to derive other
+encoding attributes, if these aren't specified explicitly.  Note that
+there is no consistency checking among operands, so entirely bogus
+mixes of operands are possible.  Note further that only operands
+actually encoded in the instruction should be specified.  Operands like
+@samp{%cl} in shift/rotate instructions have to be omitted, or else
+they'll be encoded as an ordinary (register) operand.  Operand order
+may also not match that of the actual instruction (see below).
+@end itemize
+
+Encoding of operands: While for a memory operand (of which there can be
+only one) it is clear how to encode it in the resulting ModR/M byte,
+register operands are encoded strictly in this order (operand counts do
+not include immediate ones in the enumeration below, and if there was an
+extension opcode specified it counts as a register operand; VEX.vvvv
+is meant to cover XOP and EVEX as well):
+
+@itemize @bullet
+@item VEX.vvvv for 1-register-operand VEX/XOP/EVEX insns,
+@item ModR/M.rm, ModR/M.reg for 2-operand insns,
+@item ModR/M.rm, VEX.vvvv, ModR/M.reg for 3-operand insns, and
+@item Imm@{4,5@}, ModR/M.rm, VEX.vvvv, ModR/M.reg for 4-operand insns,
+@end itemize
+
+obviously with the ModR/M.rm slot skipped when there is a memory
+operand, and obviously with the ModR/M.reg slot skipped when there is
+an extension opcode.  For Intel syntax of course the opposite order
+applies.  With @code{+r} (and hence no ModR/M) there can only be a
+single register operand for legacy encodings.  VEX and alike can have
+two register operands, where the second (first in Intel syntax) would
+go into VEX.vvvv.
+
+Immediate operands (including immediate-like displacements, i.e. when
+not part of ModR/M addressing) are emitted in the order specified,
+regardless of AT&T or Intel syntax.  Since it may not be possible to
+infer the size of such immediates, they can be suffixed by
+@code{@{:s@var{n}@}} or @code{@{:u@var{n}@}}, representing signed /
+unsigned immediates of the given number of bits respectively.  When
+emitting such operands, the number of bits will be rounded up to the
+smallest suitable of 8, 16, 32, or 64.  Immediates wider than 32 bits
+are permitted in 64-bit code only.
+
+For EVEX encoding memory operands with a displacement need to know
+Disp8 scaling size in order to use an 8-bit displacement.  For many
+instructions this can be inferred from the types of other operands
+specified.  In Intel syntax @samp{DWORD PTR} and alike can be used to
+specify the respective size.  In AT&T syntax the memory operands can
+be suffixed by @code{@{:d@var{n}@}} to specify the size (in bytes).
+This can be combined with an embedded broadcast specifier:
+@samp{8(%eax)@{1to8:d8@}}.
+
 @c FIXME: Document other x86 specific directives ?  Eg: .code16gcc,
 
 @end table


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (11 preceding siblings ...)
  2023-03-10 10:26 ` [PATCH v2 12/14] x86: document .insn Jan Beulich
@ 2023-03-10 10:26 ` Jan Beulich
  2023-04-20  8:56   ` Clément Chigot
  2023-03-10 10:27 ` [PATCH RFC v2 14/14] x86: .insn example - VEX-encoded instructions of original Xeon Phi Jan Beulich
  2023-03-24  9:51 ` [PATCH v2 00/14] x86: new .insn directive Jan Beulich
  14 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:26 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

This can't be done for all insns currently encoded with .byte. For one
outside of 64-bit mode unused (typically ignored) register encoding bits
in VEX/XOP/EVEX prefixes can't be set to their non-default values, since
the necessary registers cannot be specified (and some of these bits
can't even be used outside of 64-bit mode). And then there are odd tests
like the first one in bad-bcast.s: Its purpose is to illegaly set EVEX.b
together with EVEX.W (which could be expressed; note though EVEX.W set
is invalid on its own), but then it also clears EVEX.B and EVEX.R' plus
it sets EVEX.vvvv to other than 0xf (rendering the test ambiguous,
because that's another #UD reason).

In {,x86-64-}disassem.s many bogus encodings exist - some with ModR/M
byte but insufficient displacement bytes, some using SIB encoding with
the SIB byte actually being the supposed immediate. Some of these could
be expressed by .insn, but I don't want to introduce bogus examples.
These will all need adjustment anyway once the disassembler is improved
in the way it deals with unrecognized encodings.

Generally generated code is meant to remain the same. {,x86-64-}nops.d
are exceptions because insn prefixes are emitted in a different order.
opcode{,-intel,-suffix}.d are also adjusted (along with an according
correction to opcode.s) to cover an apparent typo in the original tests
(xor when or was meant).

Where necessary --divide is added as gas option, to allow for the use
of the extension opcode functionality.

Comments are being adjusted where obviously wrong/misleading.
---
v2: Add --divide as necessary.

--- a/gas/testsuite/gas/i386/amd.s
+++ b/gas/testsuite/gas/i386/amd.s
@@ -32,4 +32,4 @@ foo:
 
 # This is a 3DNow! instruction, with a prefix, that isn't quite right
 # Everything's good bar the opcode suffix
-.byte 0x2e, 0x0f, 0x0f, 0x54, 0xc3, 0x07, 0xc3
+ .insn 0x0f0f, $0xc3, %cs:7(%ebx,%eax,8), %mm2
--- a/gas/testsuite/gas/i386/avx512f-nondef.s
+++ b/gas/testsuite/gas/i386/avx512f-nondef.s
@@ -1,27 +1,28 @@
 # Check if objdump works correctly when some bits in instruction
 # has non-default value
 
-# vrndscalesd	{sae}, $123, %xmm4, %xmm5, %xmm6{%k7}	 # with null RC
-.byte 0x62, 0xf3, 0xd5, 0x1f, 0x0b, 0xf4, 0x7b
+	 vrndscalesd	$123, {sae}, %xmm4, %xmm5, %xmm6{%k7} # with null RC
 # vrndscalesd	{sae}, $123, %xmm4, %xmm5, %xmm6{%k7}	 # with not-null RC
-.byte 0x62, 0xf3, 0xd5, 0x5f, 0x0b, 0xf4, 0x7b
-# vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with 11 EVEX.{B,R'}
-.byte 0x62, 0xf2, 0x55, 0x4f, 0x3b, 0xf4
+	.insn EVEX.66.0f3a.W1 0x0b, $123, {ru-sae}, %xmm4, %xmm5, %xmm6{%k7}
+
+	 vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with 11 EVEX.{B,R'}
 # vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with not-11 EVEX.{B,R'}
 .byte 0x62, 0xc2, 0x55, 0x4f, 0x3b, 0xf4
 # vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with set EVEX.b bit
-.byte 0x62, 0xf2, 0x55, 0x1f, 0x3b, 0xf4
-# vpmovdb	%zmm6, 2032(%rdx)		# with unset EVEX.b bit
-.byte 0x62, 0xf2, 0x7e, 0x48, 0x31, 0x72, 0x7f
-# vpmovdb	%zmm6, 2032(%rdx)		# with set EVEX.b bit - we should get (bad) operand
-.byte 0x62, 0xf2, 0x7e, 0x58, 0x31, 0x72, 0x7f
+	.insn EVEX.66.0F38.W0 0x3b, {rn-sae}, %zmm4, %zmm5, %zmm6{%k7}
+
+	 vpmovdb	%zmm6, 2032(%edx)	# with unset EVEX.b bit
+# vpmovdb	%zmm6, 2032(%edx)		# with set EVEX.b bit - we should get (bad) operand
+	.insn EVEX.f3.0f38.W0 0x31, %zmm6, 2032(%edx){1to4}
+
 # vaddps xmm0, xmm0, xmm3 # with EVEX.z set
 .byte 0x62, 0xf1, 0x7c, 0x88, 0x58, 0xc3
+
 # vgatherdps (%ecx), %zmm0{%k7}			# without SIB / index register
-.byte 0x62, 0xf2, 0x7d, 0x4f, 0x92, 0x01
+	.insn EVEX.66.0F38.W0 0x92, (%ecx), %zmm0{%k7}
 # vgatherdps (%bx,%xmm?), %zmm0{%k7}		# with 16-bit addressing
-.byte 0x67, 0x62, 0xf2, 0x7d, 0x4f, 0x92, 0x01
+	.insn EVEX.66.0F38.W0 0x92, (%bx,%di), %zmm0{%k7}
 # vgatherdps (%eax,%zmm1), %zmm0{%k7}{z}	# with set EVEX.z
-.byte 0x62, 0xf2, 0x7d, 0xcf, 0x92, 0x04, 0x08
+	.insn EVEX.66.0F38.W0 0x92, (%eax,%zmm1), %zmm0{%k7}{z}
 # vgatherdps (%eax,%zmm1), %zmm0		# without actual mask register
-.byte 0x62, 0xf2, 0x7d, 0x48, 0x92, 0x04, 0x08
+	.insn EVEX.66.0F38.W0 0x92, (%eax,%zmm1), %zmm0
--- a/gas/testsuite/gas/i386/cdr.s
+++ b/gas/testsuite/gas/i386/cdr.s
@@ -1,14 +1,7 @@
 	.text
 start:
-	.byte 0x0f
-	.byte 0x22
-	.byte 0x1f
-	.byte 0x0f
-	.byte 0x20
-	.byte 0x1f
-	.byte 0x0f
-	.byte 0x21
-	.byte 0x1f
-	.byte 0x0f
-	.byte 0x23
-	.byte 0x1f
+	.code32
+	.insn 0x0f22, (%edi), %cr3
+	.insn 0x0f20, %cr3, (%edi)
+	.insn 0x0f21, %db3, (%edi)
+	.insn 0x0f23, (%edi), %db3
--- a/gas/testsuite/gas/i386/disassem.d
+++ b/gas/testsuite/gas/i386/disassem.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -drw
 #name: opcodes with invalid modrm byte
 
--- a/gas/testsuite/gas/i386/disassem.s
+++ b/gas/testsuite/gas/i386/disassem.s
@@ -1,124 +1,124 @@
 .text
-.byte 0xFF, 0xEF
-.byte 0xFF, 0xD8
+	.insn 0xFF/5, %edi
+	.insn 0xFF/3, %eax
 .fill 0x5, 0x1, 0x90
 .byte 0xC5, 0xEC, 0x4A, 0x9B
 .byte 0xC5, 0xEC, 0x4A, 0x6F
-.byte 0xC5, 0xEC, 0x4A, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x4a, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x4A, 0x9B
 .byte 0xC5, 0xED, 0x4A, 0x6F
-.byte 0xC5, 0xED, 0x4A, 0x3F
+	.insn VEX.L1.66.0f.W0 0x4a, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x4A, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x4A, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x4A, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x4a, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x4A, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x4A, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x4A, 0x3F
+	.insn VEX.L1.66.0f.W1 0x4a, (%edi), %k2, %k7
 .byte 0xC5, 0xEC, 0x41, 0x9B
 .byte 0xC5, 0xEC, 0x41, 0x6F
-.byte 0xC5, 0xEC, 0x41, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x41, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x41, 0x9B
 .byte 0xC5, 0xED, 0x41, 0x6F
-.byte 0xC5, 0xED, 0x41, 0x3F
+	.insn VEX.L1.66.0f.W0 0x41, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x41, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x41, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x41, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x41, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x41, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x41, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x41, 0x3F
+	.insn VEX.L1.66.0f.W1 0x41, (%edi), %k2, %k7
 .byte 0xC5, 0xEC, 0x42, 0x9B
 .byte 0xC5, 0xEC, 0x42, 0x6F
-.byte 0xC5, 0xEC, 0x42, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x42, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x42, 0x9B
 .byte 0xC5, 0xED, 0x42, 0x6F
-.byte 0xC5, 0xED, 0x42, 0x3F
+	.insn VEX.L1.66.0f.W0 0x42, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x42, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x42, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x42, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x42, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x42, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x42, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x42, 0x3F
+	.insn VEX.L1.66.0f.W1 0x42, (%edi), %k2, %k7
 .byte 0xC5, 0xEC, 0x4B, 0x9B
 .byte 0xC5, 0xEC, 0x4B, 0x6F
-.byte 0xC5, 0xEC, 0x4B, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x4b, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x4B, 0x9B
 .byte 0xC5, 0xED, 0x4B, 0x6F
-.byte 0xC5, 0xED, 0x4B, 0x3F
+	.insn VEX.L1.66.0f.W0 0x4b, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x4B, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x4B, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x4B, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x4b, (%edi), %k2, %k7
 .byte 0xC5, 0xF8, 0x44, 0x9B
 .byte 0xC5, 0xF8, 0x44, 0x6F
-.byte 0xC5, 0xF8, 0x44, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x44, (%edi), %k7
 .byte 0xC5, 0xF9, 0x44, 0x9B
 .byte 0xC5, 0xF9, 0x44, 0x6F
-.byte 0xC5, 0xF9, 0x44, 0x3F
+	.insn VEX.L0.66.0f.W0 0x44, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x44, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x44, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x44, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x44, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x44, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x44, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x44, 0x3F
+	.insn VEX.L0.66.0f.W1 0x44, (%edi), %k7
 .byte 0xC5, 0xEC, 0x45, 0x9B
 .byte 0xC5, 0xEC, 0x45, 0x6F
-.byte 0xC5, 0xEC, 0x45, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x45, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x45, 0x9B
 .byte 0xC5, 0xED, 0x45, 0x6F
-.byte 0xC5, 0xED, 0x45, 0x3F
+	.insn VEX.L1.66.0f.W0 0x45, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x45, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x45, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x45, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x45, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x45, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x45, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x45, 0x3F
+	.insn VEX.L1.66.0f.W1 0x45, (%edi), %k2, %k7
 .byte 0xC5, 0xF8, 0x98, 0x9B
 .byte 0xC5, 0xF8, 0x98, 0x6F
-.byte 0xC5, 0xF8, 0x98, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x98, (%edi), %k7
 .byte 0xC5, 0xF9, 0x98, 0x9B
 .byte 0xC5, 0xF9, 0x98, 0x6F
-.byte 0xC5, 0xF9, 0x98, 0x3F
+	.insn VEX.L0.66.0f.W0 0x98, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x98, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x98, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x98, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x98, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x98, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x98, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x98, 0x3F
+	.insn VEX.L0.66.0f.W1 0x98, (%edi), %k7
 .byte 0xC5, 0xEC, 0x46, 0x9B
 .byte 0xC5, 0xEC, 0x46, 0x6F
-.byte 0xC5, 0xEC, 0x46, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x46, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x46, 0x9B
 .byte 0xC5, 0xED, 0x46, 0x6F
-.byte 0xC5, 0xED, 0x46, 0x3F
+	.insn VEX.L1.66.0f.W0 0x46, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x46, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x46, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x46, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x46, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x46, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x46, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x46, 0x3F
+	.insn VEX.L1.66.0f.W1 0x46, (%edi), %k2, %k7
 .byte 0xC5, 0xEC, 0x47, 0x9B
 .byte 0xC5, 0xEC, 0x47, 0x6F
-.byte 0xC5, 0xEC, 0x47, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x47, (%edi), %k2, %k7
 .byte 0xC5, 0xED, 0x47, 0x9B
 .byte 0xC5, 0xED, 0x47, 0x6F
-.byte 0xC5, 0xED, 0x47, 0x3F
+	.insn VEX.L1.66.0f.W0 0x47, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x47, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x47, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x47, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x47, (%edi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x47, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x47, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x47, 0x3F
+	.insn VEX.L1.66.0f.W1 0x47, (%edi), %k2, %k7
 .byte 0xC5, 0xF8, 0x99, 0x9B
 .byte 0xC5, 0xF8, 0x99, 0x6F
-.byte 0xC5, 0xF8, 0x99, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x99, (%edi), %k7
 .byte 0xC5, 0xF9, 0x99, 0x9B
 .byte 0xC5, 0xF9, 0x99, 0x6F
-.byte 0xC5, 0xF9, 0x99, 0x3F
+	.insn VEX.L0.66.0f.W0 0x99, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x99, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x99, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x99, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x99, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x99, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x99, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x99, 0x3F
+	.insn VEX.L0.66.0f.W1 0x99, (%edi), %k7
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x8F, 0x01
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x6A, 0x01
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x04, 0x01
@@ -145,33 +145,34 @@
 .byte 0xC4, 0xE3, 0x79, 0x33, 0x04, 0x01
 .byte 0xC5, 0xF8, 0x92, 0x9B
 .byte 0xC5, 0xF8, 0x92, 0x6F
-.byte 0xC5, 0xF8, 0x92, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x92, (%edi), %k7
 .byte 0xC5, 0xF9, 0x92, 0x9B
 .byte 0xC5, 0xF9, 0x92, 0x6F
-.byte 0xC5, 0xF9, 0x92, 0x3F
+	.insn VEX.L0.66.0f.W0 0x92, (%edi), %k7
 .byte 0xC5, 0xFB, 0x92, 0x9B
 .byte 0xC5, 0xFB, 0x92, 0x6F
-.byte 0xC5, 0xFB, 0x92, 0x3F
+	.insn VEX.L0.f2.0f.W0 0x92, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x92, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x92, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x92, 0x3F
+	.insn VEX.L0.66.0f.W1 0x92, (%edi), %k7
 .byte 0xC5, 0xF8, 0x93, 0x9B
 .byte 0xC5, 0xF8, 0x93, 0x6F
-.byte 0xC5, 0xF8, 0x93, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x93, (%edi), %k7
 .byte 0xC5, 0xF9, 0x93, 0x9B
 .byte 0xC5, 0xF9, 0x93, 0x6F
-.byte 0xC5, 0xF9, 0x93, 0x3F
+	.insn VEX.L0.66.0f.W0 0x93, (%edi), %k7
 .byte 0xC5, 0xFB, 0x93, 0x9B
 .byte 0xC5, 0xFB, 0x93, 0x6F
-.byte 0xC5, 0xFB, 0x93, 0x3F
+	.insn VEX.L0.f2.0f.W0 0x93, (%edi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x93, 0x3F
+	.insn VEX.L0.66.0f.W1 0x93, (%edi), %k7
 .byte 0xc4, 0xe2, 0x1, 0x1c, 0x41, 0x37
 .byte 0x62, 0xf2, 0xad, 0x08, 0x1c, 0x01
 .byte 0x1
-.byte 0x62, 0xf3, 0x7d, 0x28, 0x1b, 0xc8, 0x25
+	.insn EVEX.66.0f3a.W0 0x1b, $0x25, %ymm0, %xmm1
 .byte 0x62, 0xf3
-.byte 0x62, 0xf3, 0x75, 0x08, 0x23, 0xc2, 0x25
+//.byte 0x62, 0xf3, 0x75, 0x08, 0x23, 0xc2, 0x25
+	.insn EVEX.66.0f3a.W0 0x23, $0x25, %xmm2, %xmm1, %xmm0
 .byte 0x62
-.byte 0x62, 0xf2, 0x7d, 0x28, 0x5b, 0x41, 0x37
+	.insn EVEX.66.0f38.W0 0x5b, 0x37(%ecx){:d1}, %ymm0
--- a/gas/testsuite/gas/i386/evex.s
+++ b/gas/testsuite/gas/i386/evex.s
@@ -3,14 +3,14 @@
 	.allow_index_reg
 	.text
 _start:
-	.byte 0x62, 0xf1, 0xd6, 0x38, 0x2a, 0xf0
-	.byte 0x62, 0xf1, 0x57, 0x38, 0x2a, 0xf0
-	.byte 0x62, 0xf1, 0xd7, 0x38, 0x2a, 0xf0
-	.byte 0x62, 0xf1, 0xd6, 0x08, 0x7b, 0xf0
-	.byte 0x62, 0xf1, 0x57, 0x08, 0x7b, 0xf0
-	.byte 0x62, 0xf1, 0xd7, 0x08, 0x7b, 0xf0
-	.byte 0x62, 0xf1, 0xd6, 0x38, 0x7b, 0xf0
-	.byte 0x62, 0xf1, 0x57, 0x38, 0x7b, 0xf0
-	.byte 0x62, 0xf1, 0xd7, 0x38, 0x7b, 0xf0
+	.insn EVEX.LIG.F3.0F.W1 0x2a, %eax,{rd-sae},%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W0 0x2a, %eax,{rd-sae},%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W1 0x2a, %eax,{rd-sae},%xmm5,%xmm6
+	.insn EVEX.LIG.F3.0F.W1 0x7b, %eax,%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W0 0x7b, %eax,%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W1 0x7b, %eax,%xmm5,%xmm6
+	.insn EVEX.LIG.F3.0F.W1 0x7b, %eax,{rd-sae},%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W0 0x7b, %eax,{rd-sae},%xmm5,%xmm6
+	.insn EVEX.LIG.F2.0F.W1 0x7b, %eax,{rd-sae},%xmm5,%xmm6
 	.byte 0x62, 0xe1, 0x7e, 0x08, 0x2d, 0xc0
 	.byte 0x62, 0xe1, 0x7c, 0x08, 0xc2, 0xc0, 0x00
--- a/gas/testsuite/gas/i386/fpu-bad.d
+++ b/gas/testsuite/gas/i386/fpu-bad.d
@@ -1,4 +1,4 @@
-#as: --32
+#as: --32 --divide
 #objdump: -dw
 #name: i386 fpu bad opcodes
 
--- a/gas/testsuite/gas/i386/fpu-bad.s
+++ b/gas/testsuite/gas/i386/fpu-bad.s
@@ -1,4 +1,3 @@
 	.text
 start:
-	.byte 0xdd
-	.byte 0xf0
+	.insn 0xdd/6, %st(0)
--- a/gas/testsuite/gas/i386/ilp32/x86-64-nops.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-nops.d
@@ -1,63 +1,5 @@
 #source: ../x86-64-nops.s
+#as: --divide
 #objdump: -drw
 #name: x86-64 (ILP32) nops
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <.text>:
-[ 	]*[a-f0-9]+:	0f 1f 00             	nopl   \(%rax\)
-[ 	]*[a-f0-9]+:	0f 1f 40 00          	nopl   0x0\(%rax\)
-[ 	]*[a-f0-9]+:	0f 1f 44 00 00       	nopl   0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	66 0f 1f 44 00 00    	nopw   0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	0f 1f 80 00 00 00 00 	nopl   0x0\(%rax\)
-[ 	]*[a-f0-9]+:	0f 1f 84 00 00 00 00 00 	nopl   0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	66 0f 1f 84 00 00 00 00 00 	nopw   0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	66 2e 0f 1f 84 00 00 00 00 00 	cs nopw 0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	0f 19 ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1a ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1b ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1c ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1d ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1e ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 1f ff             	nop    %edi
-[ 	]*[a-f0-9]+:	0f 19 5a 22          	nopl   0x22\(%rdx\)
-[ 	]*[a-f0-9]+:	0f 1c 5a 22          	nopl   0x22\(%rdx\)
-[ 	]*[a-f0-9]+:	0f 1d 5a 22          	nopl   0x22\(%rdx\)
-[ 	]*[a-f0-9]+:	0f 1e 5a 22          	nopl   0x22\(%rdx\)
-[ 	]*[a-f0-9]+:	0f 1f 5a 22          	nopl   0x22\(%rdx\)
-[ 	]*[a-f0-9]+:	0f 19 9c 1d 11 22 33 44 	nopl   0x44332211\(%rbp,%rbx,1\)
-[ 	]*[a-f0-9]+:	0f 1c 9c 1d 11 22 33 44 	nopl   0x44332211\(%rbp,%rbx,1\)
-[ 	]*[a-f0-9]+:	0f 1d 9c 1d 11 22 33 44 	nopl   0x44332211\(%rbp,%rbx,1\)
-[ 	]*[a-f0-9]+:	0f 1e 9c 1d 11 22 33 44 	nopl   0x44332211\(%rbp,%rbx,1\)
-[ 	]*[a-f0-9]+:	0f 1f 9c 1d 11 22 33 44 	nopl   0x44332211\(%rbp,%rbx,1\)
-[ 	]*[a-f0-9]+:	0f 19 04 60          	nopl   \(%rax,%riz,2\)
-[ 	]*[a-f0-9]+:	0f 1c 0c 60          	nopl   \(%rax,%riz,2\)
-[ 	]*[a-f0-9]+:	0f 1d 04 60          	nopl   \(%rax,%riz,2\)
-[ 	]*[a-f0-9]+:	0f 1e 04 60          	nopl   \(%rax,%riz,2\)
-[ 	]*[a-f0-9]+:	0f 1f 04 60          	nopl   \(%rax,%riz,2\)
-[ 	]*[a-f0-9]+:	0f 19 04 59          	nopl   \(%rcx,%rbx,2\)
-[ 	]*[a-f0-9]+:	0f 1c 0c 59          	nopl   \(%rcx,%rbx,2\)
-[ 	]*[a-f0-9]+:	0f 1d 04 59          	nopl   \(%rcx,%rbx,2\)
-[ 	]*[a-f0-9]+:	0f 1e 04 59          	nopl   \(%rcx,%rbx,2\)
-[ 	]*[a-f0-9]+:	0f 1f 04 59          	nopl   \(%rcx,%rbx,2\)
-[ 	]*[a-f0-9]+:	48 0f 1f c0          	nop    %rax
-[ 	]*[a-f0-9]+:	0f 1f c0             	nop    %eax
-[ 	]*[a-f0-9]+:	66 0f 1f c0          	nop    %ax
-[ 	]*[a-f0-9]+:	48 0f 1f 00          	nopq   \(%rax\)
-[ 	]*[a-f0-9]+:	0f 1f 00             	nopl   \(%rax\)
-[ 	]*[a-f0-9]+:	66 0f 1f 00          	nopw   \(%rax\)
-[ 	]*[a-f0-9]+:	48 0f 1f c0          	nop    %rax
-[ 	]*[a-f0-9]+:	0f 1f c0             	nop    %eax
-[ 	]*[a-f0-9]+:	66 0f 1f c0          	nop    %ax
-[ 	]*[a-f0-9]+:	49 0f 1f c2          	nop    %r10
-[ 	]*[a-f0-9]+:	41 0f 1f c2          	nop    %r10d
-[ 	]*[a-f0-9]+:	66 41 0f 1f c2       	nop    %r10w
-[ 	]*[a-f0-9]+:	49 0f 1f 02          	nopq   \(%r10\)
-[ 	]*[a-f0-9]+:	41 0f 1f 02          	nopl   \(%r10\)
-[ 	]*[a-f0-9]+:	66 41 0f 1f 02       	nopw   \(%r10\)
-[ 	]*[a-f0-9]+:	49 0f 1f c2          	nop    %r10
-[ 	]*[a-f0-9]+:	41 0f 1f c2          	nop    %r10d
-[ 	]*[a-f0-9]+:	66 41 0f 1f c2       	nop    %r10w
-#pass
+#dump: ../x86-64-nops.d
--- a/gas/testsuite/gas/i386/katmai.d
+++ b/gas/testsuite/gas/i386/katmai.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -dw
 #name: i386 katmai
 
--- a/gas/testsuite/gas/i386/katmai.s
+++ b/gas/testsuite/gas/i386/katmai.s
@@ -158,6 +158,6 @@ foo:
  prefetcht2	(%ecx)
 
 # A bad sfence modrm byte
-.byte 0x65,0x0F,0xAE,0xff
+	.insn gs 0x0FAE/7, %edi
 # Pad out to good alignment
  .p2align 4,0
--- a/gas/testsuite/gas/i386/mpx.s
+++ b/gas/testsuite/gas/i386/mpx.s
@@ -158,14 +158,10 @@ start:
 
 foo:	bnd ret
 
+	.att_syntax prefix
 bad:
 	# bndldx (%eax),(bad)
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0x30
+	.insn 0x0f1a, (%eax), %esi
 
 	# bndmov (bad),%bnd0
-	.byte 0x66
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0xc4
+	.insn 0x660f1a, %k4, %bnd0
--- a/gas/testsuite/gas/i386/nops.d
+++ b/gas/testsuite/gas/i386/nops.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -drw
 #name: i386 nops
 
@@ -13,7 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 1f 80 00 00 00 00 	nopl   0x0\(%eax\)
 [ 	]*[a-f0-9]+:	0f 1f 84 00 00 00 00 00 	nopl   0x0\(%eax,%eax,1\)
 [ 	]*[a-f0-9]+:	66 0f 1f 84 00 00 00 00 00 	nopw   0x0\(%eax,%eax,1\)
-[ 	]*[a-f0-9]+:	66 2e 0f 1f 84 00 00 00 00 00 	nopw   %cs:0x0\(%eax,%eax,1\)
+[ 	]*[a-f0-9]+:	2e 66 0f 1f 84 00 00 00 00 00 	nopw   %cs:0x0\(%eax,%eax,1\)
 [ 	]*[a-f0-9]+:	0f 19 ff             	nop    %edi
 [ 	]*[a-f0-9]+:	0f 1a ff             	nop    %edi
 [ 	]*[a-f0-9]+:	0f 1b ff             	nop    %edi
--- a/gas/testsuite/gas/i386/nops.s
+++ b/gas/testsuite/gas/i386/nops.s
@@ -1,48 +1,49 @@
 	.text
 
-	.byte 0x0f, 0x1f, 0x0	
-	.byte 0x0f, 0x1f, 0x40, 0x0	
-	.byte 0x0f, 0x1f, 0x44, 0x0,  0x0	
-	.byte 0x66, 0x0f, 0x1f, 0x44, 0x0,  0x0	
-	.byte 0x0f, 0x1f, 0x80, 0x0,  0x0,  0x0, 0x0	
-	.byte 0x0f, 0x1f, 0x84, 0x0,  0x0,  0x0, 0x0, 0x0
-	.byte 0x66, 0x0f, 0x1f, 0x84, 0x0,  0x0, 0x0, 0x0, 0x0
-	.byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x0, 0x0, 0x0, 0x0, 0x0
+	.insn 0x0f1f/0, (%eax)
+	.insn {disp8} 0x0f1f/0, 0(%eax)
+	.insn {disp8} 0x0f1f/0, 0(%eax,%eax)
+	.insn {disp8} data16 0x0f1f/0, 0(%eax,%eax)
+	.insn {disp32} 0x0f1f/0, 0(%eax)
+	.insn {disp32} 0x0f1f/0, 0(%eax,%eax)
+	.insn {disp32} data16 0x0f1f/0, 0(%eax,%eax)
+	.insn {disp32} data16 0x0f1f/0, %cs:0(%eax,%eax)
 
 	# reg,reg
-	.byte 0x0f, 0x19, 0xff
-	.byte 0x0f, 0x1a, 0xff  
-	.byte 0x0f, 0x1b, 0xff
-	.byte 0x0f, 0x1c, 0xff  
-	.byte 0x0f, 0x1d, 0xff
-	.byte 0x0f, 0x1e, 0xff  
-	.byte 0x0f, 0x1f, 0xff
+	.insn 0x0f19, %edi, %edi
+	.insn 0x0f1a, %edi, %edi
+	.insn 0x0f1b, %edi, %edi
+	.insn 0x0f1c, %edi, %edi
+	.insn 0x0f1d, %edi, %edi
+	.insn 0x0f1e, %edi, %edi
+	.insn 0x0f1f, %edi, %edi
 
 	# with base and imm8
-	.byte 0x0f, 0x19, 0x5A, 0x22
-	.byte 0x0f, 0x1c, 0x5A, 0x22
-	.byte 0x0f, 0x1d, 0x5A, 0x22
-	.byte 0x0f, 0x1e, 0x5A, 0x22
-	.byte 0x0f, 0x1f, 0x5A, 0x22
+	.insn 0x0f19/3, 0x22(%edx)
+	.insn 0x0f1c/3, 0x22(%edx)
+	.insn 0x0f1d/3, 0x22(%edx)
+	.insn 0x0f1e/3, 0x22(%edx)
+	.insn 0x0f1f/3, 0x22(%edx)
 
 	# with sib and imm32
-	.byte 0x0f, 0x19, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1c, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1d, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1e, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1f, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-
-	.byte 0x0f, 0x19, 0x04, 0x60
-	.byte 0x0f, 0x1c, 0x0c, 0x60
-	.byte 0x0f, 0x1d, 0x04, 0x60
-	.byte 0x0f, 0x1e, 0x04, 0x60
-	.byte 0x0f, 0x1f, 0x04, 0x60
-
-	.byte 0x0f, 0x19, 0x04, 0x59
-	.byte 0x0f, 0x1c, 0x0c, 0x59
-	.byte 0x0f, 0x1d, 0x04, 0x59
-	.byte 0x0f, 0x1e, 0x04, 0x59
-	.byte 0x0f, 0x1f, 0x04, 0x59
+	.insn 0x0f19/3, 0x44332211(%ebp,%ebx)
+	.insn 0x0f1c/3, 0x44332211(%ebp,%ebx)
+	.insn 0x0f1d/3, 0x44332211(%ebp,%ebx)
+	.insn 0x0f1e/3, 0x44332211(%ebp,%ebx)
+	.insn 0x0f1f/3, 0x44332211(%ebp,%ebx)
+
+	.allow_index_reg
+	.insn 0x0f19/0, (%eax,%eiz,2)
+	.insn 0x0f1c/1, (%eax,%eiz,2)
+	.insn 0x0f1d/0, (%eax,%eiz,2)
+	.insn 0x0f1e/0, (%eax,%eiz,2)
+	.insn 0x0f1f/0, (%eax,%eiz,2)
+
+	.insn 0x0f19/0, (%ecx,%ebx,2)
+	.insn 0x0f1c/1, (%ecx,%ebx,2)
+	.insn 0x0f1d/0, (%ecx,%ebx,2)
+	.insn 0x0f1e/0, (%ecx,%ebx,2)
+	.insn 0x0f1f/0, (%ecx,%ebx,2)
 
 	nop %eax
 	nop %ax
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -1,4 +1,4 @@
-#as: -J
+#as: -J --divide
 #objdump: -dw
 #name: i386 opcodes
 
@@ -597,7 +597,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
 [ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	add    \$0x1,%bl
- +[a-f0-9]+:	82 f3 01             	xor    \$0x1,%bl
+ +[a-f0-9]+:	82 cb 01             	or     \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adc    \$0x1,%bl
  +[a-f0-9]+:	82 db 01             	sbb    \$0x1,%bl
  +[a-f0-9]+:	82 e3 01             	and    \$0x1,%bl
--- a/gas/testsuite/gas/i386/opcode.s
+++ b/gas/testsuite/gas/i386/opcode.s
@@ -597,23 +597,23 @@ foo:
  fistpq (%eax)
  fistpll (%eax)
 
-	.byte 0x82, 0xc3, 0x01
-	.byte 0x82, 0xf3, 0x01
-	.byte 0x82, 0xd3, 0x01
-	.byte 0x82, 0xdb, 0x01
-	.byte 0x82, 0xe3, 0x01
-	.byte 0x82, 0xeb, 0x01
-	.byte 0x82, 0xf3, 0x01
-	.byte 0x82, 0xfb, 0x01
+	.insn 0x82/0, $1, %bl
+	.insn 0x82/1, $1, %bl
+	.insn 0x82/2, $1, %bl
+	.insn 0x82/3, $1, %bl
+	.insn 0x82/4, $1, %bl
+	.insn 0x82/5, $1, %bl
+	.insn 0x82/6, $1, %bl
+	.insn 0x82/7, $1, %bl
 
 	{evex} {store} vpextrw $0xab, %xmm5, %eax
 
-	.byte 0xf6, 0xc9, 0x01
-	.byte 0x66, 0xf7, 0xc9, 0x02, 0x00
-	.byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
-	.byte 0xc0, 0xf0, 0x02
-	.byte 0xc1, 0xf0, 0x01
-	.byte 0xd0, 0xf0
-	.byte 0xd1, 0xf0
-	.byte 0xd2, 0xf0
-	.byte 0xd3, 0xf0
+	.insn 0xf6/1, $1, %cl
+	.insn 0xf7/1, $2{:u16}, %cx
+	.insn 0xf7/1, $4{:u32}, %ecx
+	.insn 0xc0/6, $2, %al
+	.insn 0xc1/6, $1, %eax
+	.insn 0xd0/6, %al
+	.insn 0xd1/6, %eax
+	.insn 0xd2/6, %al
+	.insn 0xd3/6, %eax
--- a/gas/testsuite/gas/i386/opcode-intel.d
+++ b/gas/testsuite/gas/i386/opcode-intel.d
@@ -1,5 +1,5 @@
 #source: opcode.s
-#as: -J
+#as: -J --divide
 #objdump: -dwMintel
 #name: i386 opcodes (Intel disassembly)
 
@@ -598,7 +598,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
 [ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
  +[a-f0-9]+:	82 c3 01             	add    bl,0x1
- +[a-f0-9]+:	82 f3 01             	xor    bl,0x1
+ +[a-f0-9]+:	82 cb 01             	or     bl,0x1
  +[a-f0-9]+:	82 d3 01             	adc    bl,0x1
  +[a-f0-9]+:	82 db 01             	sbb    bl,0x1
  +[a-f0-9]+:	82 e3 01             	and    bl,0x1
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -1,5 +1,5 @@
 #source: opcode.s
-#as: -J
+#as: -J --divide
 #objdump: -dwMsuffix
 #name: i386 opcodes (w/ suffix)
 
@@ -598,7 +598,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
 [ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	addb   \$0x1,%bl
- +[a-f0-9]+:	82 f3 01             	xorb   \$0x1,%bl
+ +[a-f0-9]+:	82 cb 01             	orb    \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adcb   \$0x1,%bl
  +[a-f0-9]+:	82 db 01             	sbbb   \$0x1,%bl
  +[a-f0-9]+:	82 e3 01             	andb   \$0x1,%bl
--- a/gas/testsuite/gas/i386/pr29483.s
+++ b/gas/testsuite/gas/i386/pr29483.s
@@ -1,3 +1,3 @@
 	.text
 pr29483:
-	.byte 0x65,0x62,0x62,0x7d,0x97,0xa0,0x94,0xff,0x20,0x20,0x20,0xae
+	.insn EVEX.128.66.0f38.W0 0xa0, %gs:-0x51dfdfe0(%rdi,%xmm23,8){1to4}, %xmm26{%k7}{z}
--- a/gas/testsuite/gas/i386/prefetch.d
+++ b/gas/testsuite/gas/i386/prefetch.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -dw
 #name: i386 prefetch
 
--- a/gas/testsuite/gas/i386/prefetch.s
+++ b/gas/testsuite/gas/i386/prefetch.s
@@ -1,18 +1,20 @@
+	.code32
+
 .macro try opcode:vararg
-	.byte \opcode, 0x00
-	.byte \opcode, 0x08
-	.byte \opcode, 0x10
-	.byte \opcode, 0x18
-	.byte \opcode, 0x20
-	.byte \opcode, 0x28
-	.byte \opcode, 0x30
-	.byte \opcode, 0x38
+	.insn 0x0f\opcode/0, (%eax)
+	.insn 0x0f\opcode/1, (%eax)
+	.insn 0x0f\opcode/2, (%eax)
+	.insn 0x0f\opcode/3, (%eax)
+	.insn 0x0f\opcode/4, (%eax)
+	.insn 0x0f\opcode/5, (%eax)
+	.insn 0x0f\opcode/6, (%eax)
+	.insn 0x0f\opcode/7, (%eax)
 .endm
 
 .text
 
 amd_prefetch:
-	try 0x0f, 0x0d
+	try 0d
 
 intel_prefetch:
-	try 0x0f, 0x18
+	try 18
--- a/gas/testsuite/gas/i386/prefetch-intel.d
+++ b/gas/testsuite/gas/i386/prefetch-intel.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -dw -Mintel
 #name: i386 prefetch (Intel disassembly)
 #source: prefetch.s
--- a/gas/testsuite/gas/i386/prefix.s
+++ b/gas/testsuite/gas/i386/prefix.s
@@ -336,59 +336,32 @@
 	int $3
 
 # "repz" vmovaps %xmm7, %xmm7
-	.byte 0xc5
-	.byte 0xfa
-	.byte 0x28
-	.byte 0xff
+	.insn VEX.128.f3.0f.W0 0x28, %xmm7, %xmm7
 
 	int $3
 
 # "repnz" {vex3} vmovaps %xmm7, %xmm7
-	.byte 0xc4
-	.byte 0xe1
-	.byte 0x7b
-	.byte 0x28
-	.byte 0xff
+	.insn {vex3} VEX.128.f2.0f.W0 0x28, %xmm7, %xmm7
 
 	int $3
 
 # "EVEX.W1" vmovaps %xmm7, %xmm7
-	.byte 0x62
-	.byte 0xf1
-	.byte 0xfc
-	.byte 0x08
-	.byte 0x28
-	.byte 0xff
+	.insn EVEX.128.0f.W1 0x28, %xmm7, %xmm7
 
 	int $3
 
 # "repz" vmovaps %xmm7, %xmm7
-	.byte 0x62
-	.byte 0xf1
-	.byte 0x7e
-	.byte 0x08
-	.byte 0x28
-	.byte 0xff
+	.insn EVEX.128.f3.0f.W0 0x28, %xmm7, %xmm7
 
 	int $3
 
 # "EVEX.W0" vmovapd %xmm7, %xmm7
-	.byte 0x62
-	.byte 0xf1
-	.byte 0x7d
-	.byte 0x08
-	.byte 0x28
-	.byte 0xff
+	.insn EVEX.128.66.0f.W0 0x28, %xmm7, %xmm7
 
 	int $3
 
 # "repnz" vmovapd %xmm7, %xmm7
-	.byte 0x62
-	.byte 0xf1
-	.byte 0xff
-	.byte 0x08
-	.byte 0x28
-	.byte 0xff
+	.insn EVEX.128.f2.0f.W1 0x28, %xmm7, %xmm7
 
 	int $3
 
--- a/gas/testsuite/gas/i386/x86-64-amx-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s
@@ -1,63 +1,32 @@
 .text
 	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0xd2
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.128.F3.0F38.W1 0x5c, %tmm4, %tmm5, %tmm3
 	.fill 0x05, 0x01, 0x90
+
 	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x56
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.256.F3.0F38.W0 0x5c, %tmm4, %tmm5, %tmm3
 	.fill 0x05, 0x01, 0x90
+
 	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).
-	.byte 0xc4
-	.byte 0x62
-	.byte 0x52
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.128.F3.0F38.W0 0x5c, %xmm4, %xmm5, %xmm11
+
 	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
-	.byte 0xc4
-	.byte 0xc2
-	.byte 0x52
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.128.F3.0F38.W0 0x5c, %xmm12, %xmm5, %xmm3
+
 	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x32
-	.byte 0x5c
-	.byte 0xdc
-	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x7b
-	.byte 0x4b
-	.byte 0x09
+	.insn VEX.128.F3.0F38.W0 0x5c, %xmm4, %xmm9, %xmm3
+
+	#tileloadd (%rcx),%tmm1 set R/M= 001 (illegal value) without SIB.
+	.insn VEX.128.F2.0F38.W0 0x4b, (%rcx), %xmm1
+
 	#tdpbuud %tmm1,%tmm1,%tmm1 All 3 TMM registers can't be identical.
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x70
-	.byte 0x5e
-	.byte 0xc9
+	.insn VEX.128.NP.0F38.W0 0x5e, %tmm1, %tmm1, %tmm1
+
 	#tdpbuud %tmm0,%tmm1,%tmm1 All 3 TMM registers can't be identical.
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x78
-	.byte 0x5e
-	.byte 0xc9
+	.insn VEX.128.NP.0F38.W0 0x5e, %tmm1, %tmm0, %tmm1
+
 	#tdpbuud %tmm1,%tmm0,%tmm1 All 3 TMM registers can't be identical.
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x70
-	.byte 0x5e
-	.byte 0xc8
+	.insn VEX.128.NP.0F38.W0 0x5e, %tmm0, %tmm1, %tmm1
+
 	#tdpbuud %tmm1,%tmm1,%tmm0 All 3 TMM registers can't be identical.
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x70
-	.byte 0x5e
-	.byte 0xc1
+	.insn VEX.128.NP.0F38.W0 0x5e, %tmm1, %tmm1, %tmm0
--- a/gas/testsuite/gas/i386/x86-64-amx-fp16-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-amx-fp16-bad.s
@@ -2,34 +2,18 @@
 
 .text
 	#tdpfp16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0xd3
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.128.F2.0F38.W1 0x5c, %tmm4, %tmm5, %tmm3
 	.fill 0x05, 0x01, 0x90
+
 	#tdpfp16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x57
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.256.F2.0F38.W0 0x5c, %tmm4, %tmm5, %tmm3
 	.fill 0x05, 0x01, 0x90
+
 	#tdpfp16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).
-	.byte 0xc4
-	.byte 0x62
-	.byte 0x53
-	.byte 0x5c
-	.byte 0xdc
-	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
-	.byte 0xc4
-	.byte 0xc2
-	.byte 0x53
-	.byte 0x5c
-	.byte 0xdc
-	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
-	.byte 0xc4
-	.byte 0xe2
-	.byte 0x33
-	.byte 0x5c
-	.byte 0xdc
+	.insn VEX.128.F2.0F38.W0 0x5c, %xmm4, %xmm5, %xmm11
+
+	#tdpfp16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
+	.insn VEX.128.F2.0F38.W0 0x5c, %xmm12, %xmm5, %xmm3
+
+	#tdpfp16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
+	.insn VEX.128.F2.0F38.W0 0x5c, %xmm4, %xmm9, %xmm3
--- a/gas/testsuite/gas/i386/x86-64-avx512f-nondef.s
+++ b/gas/testsuite/gas/i386/x86-64-avx512f-nondef.s
@@ -1,17 +1,15 @@
 # Check if objdump works correctly when some bits in instruction
 # has non-default value
 
-# vrndscalesd	{sae}, $123, %xmm4, %xmm5, %xmm6{%k7}	 # with null RC
-.byte 0x62, 0xf3, 0xd5, 0x1f, 0x0b, 0xf4, 0x7b
+	vrndscalesd	$123, {sae}, %xmm4, %xmm5, %xmm6{%k7} # with null RC
 # vrndscalesd	{sae}, $123, %xmm4, %xmm5, %xmm6{%k7}	 # with not-null RC
-.byte 0x62, 0xf3, 0xd5, 0x5f, 0x0b, 0xf4, 0x7b
-# vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with 11 EVEX.{B,R'}
-.byte 0x62, 0xf2, 0x55, 0x4f, 0x3b, 0xf4
-# vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with not-11 EVEX.{B,R'}
-.byte 0x62, 0xc2, 0x55, 0x4f, 0x3b, 0xf4
+	.insn EVEX.66.0f3a.W1 0x0b, $123, {ru-sae}, %xmm4, %xmm5, %xmm6{%k7}
+
+	vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with 11 EVEX.{B,R'}
+	vpminud	%zmm12, %zmm5, %zmm22{%k7}	# with not-11 EVEX.{B,R'}
 # vpminud	%zmm4, %zmm5, %zmm6{%k7}	# with set EVEX.b bit
-.byte 0x62, 0xf2, 0x55, 0x1f, 0x3b, 0xf4
-# vpmovdb	%zmm6, 2032(%rdx)		# with unset EVEX.b bit
-.byte 0x62, 0xf2, 0x7e, 0x48, 0x31, 0x72, 0x7f
+	.insn EVEX.66.0F38.W0 0x3b, {rn-sae}, %zmm4, %zmm5, %zmm6{%k7}
+
+	vpmovdb	%zmm6, 2032(%rdx)		# with unset EVEX.b bit
 # vpmovdb	%zmm6, 2032(%rdx)		# with set EVEX.b bit - we should get (bad) operand
-.byte 0x62, 0xf2, 0x7e, 0x58, 0x31, 0x72, 0x7f
+	.insn EVEX.f3.0f38.W0 0x31, %zmm6, 2032(%rdx){1to4}
--- a/gas/testsuite/gas/i386/x86-64-avx512_fp16-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-avx512_fp16-bad.s
@@ -1,36 +1,15 @@
 .text
 	#vfcmaddcph %zmm30, %zmm29, %zmm30 dest and src registers must be distinct.
-	.byte 0x62
-	.byte 0x06
-	.byte 0x17
-	.byte 0x40
-	.byte 0x56
-	.byte 0xf6
+	.insn EVEX.f2.M6.W0 0x56, %zmm30, %zmm29, %zmm30
+
 	#vfcmaddcph (%rcx), %zmm3, %zmm3 dest and src registers must be distinct.
-	.byte 0x62
-	.byte 0xf6
-	.byte 0x67
-	.byte 0x48
-	.byte 0x56
-	.byte 0x19
+	.insn EVEX.f2.M6.W0 0x56, (%rcx), %zmm3, %zmm3
+
 	#vfcmaddcph %xmm3, %xmm2, %xmm2 dest and src registers must be distinct.
-	.byte 0x62
-	.byte 0xf6
-	.byte 0x6f
-	.byte 0x08
-	.byte 0x56
-	.byte 0xd3
+	.insn EVEX.f2.M6.W0 0x56, %xmm3, %xmm2, %xmm2
+
 	#vfcmaddcsh %xmm3, %xmm2, %xmm3 dest and src registers must be distinct.
-	.byte 0x62
-	.byte 0xf6
-	.byte 0x6f
-	.byte 0x08
-	.byte 0x57
-	.byte 0xdb
+	.insn EVEX.LIG.f2.M6.W0 0x57, %xmm3, %xmm2, %xmm3
+
 	#vfcmaddcsh %xmm3, %xmm2, %xmm2 dest and src registers must be distinct.
-	.byte 0x62
-	.byte 0xf6
-	.byte 0x6f
-	.byte 0x08
-	.byte 0x57
-	.byte 0xd3
+	.insn EVEX.LIG.f2.M6.W0 0x57, %xmm3, %xmm2, %xmm2
--- a/gas/testsuite/gas/i386/x86-64-disassem.d
+++ b/gas/testsuite/gas/i386/x86-64-disassem.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -drw
 #name: x86-64 opcodes with invalid modrm byte
 
--- a/gas/testsuite/gas/i386/x86-64-disassem.s
+++ b/gas/testsuite/gas/i386/x86-64-disassem.s
@@ -1,124 +1,124 @@
 .text
-.byte 0xFF, 0xEF
-.byte 0xFF, 0xD8
+	.insn 0xFF/5, %edi
+	.insn 0xFF/3, %eax
 .fill 0x5, 0x1, 0x90
 .byte 0xC5, 0xEC, 0x4A, 0x9B
 .byte 0xC5, 0xEC, 0x4A, 0x6F
-.byte 0xC5, 0xEC, 0x4A, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x4a, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x4A, 0x9B
 .byte 0xC5, 0xED, 0x4A, 0x6F
-.byte 0xC5, 0xED, 0x4A, 0x3F
+	.insn VEX.L1.66.0f.W0 0x4a, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x4A, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x4A, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x4A, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x4a, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x4A, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x4A, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x4A, 0x3F
+	.insn VEX.L1.66.0f.W1 0x4a, (%rdi), %k2, %k7
 .byte 0xC5, 0xEC, 0x41, 0x9B
 .byte 0xC5, 0xEC, 0x41, 0x6F
-.byte 0xC5, 0xEC, 0x41, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x41, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x41, 0x9B
 .byte 0xC5, 0xED, 0x41, 0x6F
-.byte 0xC5, 0xED, 0x41, 0x3F
+	.insn VEX.L1.66.0f.W0 0x41, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x41, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x41, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x41, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x41, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x41, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x41, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x41, 0x3F
+	.insn VEX.L1.66.0f.W1 0x41, (%rdi), %k2, %k7
 .byte 0xC5, 0xEC, 0x42, 0x9B
 .byte 0xC5, 0xEC, 0x42, 0x6F
-.byte 0xC5, 0xEC, 0x42, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x42, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x42, 0x9B
 .byte 0xC5, 0xED, 0x42, 0x6F
-.byte 0xC5, 0xED, 0x42, 0x3F
+	.insn VEX.L1.66.0f.W0 0x42, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x42, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x42, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x42, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x42, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x42, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x42, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x42, 0x3F
+	.insn VEX.L1.66.0f.W1 0x42, (%rdi), %k2, %k7
 .byte 0xC5, 0xEC, 0x4B, 0x9B
 .byte 0xC5, 0xEC, 0x4B, 0x6F
-.byte 0xC5, 0xEC, 0x4B, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x4b, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x4B, 0x9B
 .byte 0xC5, 0xED, 0x4B, 0x6F
-.byte 0xC5, 0xED, 0x4B, 0x3F
+	.insn VEX.L1.66.0f.W0 0x4b, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x4B, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x4B, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x4B, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x4b, (%rdi), %k2, %k7
 .byte 0xC5, 0xF8, 0x44, 0x9B
 .byte 0xC5, 0xF8, 0x44, 0x6F
-.byte 0xC5, 0xF8, 0x44, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x44, (%rdi), %k7
 .byte 0xC5, 0xF9, 0x44, 0x9B
 .byte 0xC5, 0xF9, 0x44, 0x6F
-.byte 0xC5, 0xF9, 0x44, 0x3F
+	.insn VEX.L0.66.0f.W0 0x44, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x44, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x44, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x44, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x44, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x44, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x44, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x44, 0x3F
+	.insn VEX.L0.66.0f.W1 0x44, (%rdi), %k7
 .byte 0xC5, 0xEC, 0x45, 0x9B
 .byte 0xC5, 0xEC, 0x45, 0x6F
-.byte 0xC5, 0xEC, 0x45, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x45, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x45, 0x9B
 .byte 0xC5, 0xED, 0x45, 0x6F
-.byte 0xC5, 0xED, 0x45, 0x3F
+	.insn VEX.L1.66.0f.W0 0x45, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x45, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x45, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x45, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x45, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x45, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x45, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x45, 0x3F
+	.insn VEX.L1.66.0f.W1 0x45, (%rdi), %k2, %k7
 .byte 0xC5, 0xF8, 0x98, 0x9B
 .byte 0xC5, 0xF8, 0x98, 0x6F
-.byte 0xC5, 0xF8, 0x98, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x98, (%rdi), %k7
 .byte 0xC5, 0xF9, 0x98, 0x9B
 .byte 0xC5, 0xF9, 0x98, 0x6F
-.byte 0xC5, 0xF9, 0x98, 0x3F
+	.insn VEX.L0.66.0f.W0 0x98, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x98, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x98, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x98, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x98, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x98, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x98, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x98, 0x3F
+	.insn VEX.L0.66.0f.W1 0x98, (%rdi), %k7
 .byte 0xC5, 0xEC, 0x46, 0x9B
 .byte 0xC5, 0xEC, 0x46, 0x6F
-.byte 0xC5, 0xEC, 0x46, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x46, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x46, 0x9B
 .byte 0xC5, 0xED, 0x46, 0x6F
-.byte 0xC5, 0xED, 0x46, 0x3F
+	.insn VEX.L1.66.0f.W0 0x46, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x46, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x46, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x46, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x46, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x46, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x46, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x46, 0x3F
+	.insn VEX.L1.66.0f.W1 0x46, (%rdi), %k2, %k7
 .byte 0xC5, 0xEC, 0x47, 0x9B
 .byte 0xC5, 0xEC, 0x47, 0x6F
-.byte 0xC5, 0xEC, 0x47, 0x3F
+	.insn VEX.L1.NP.0f.W0 0x47, (%rdi), %k2, %k7
 .byte 0xC5, 0xED, 0x47, 0x9B
 .byte 0xC5, 0xED, 0x47, 0x6F
-.byte 0xC5, 0xED, 0x47, 0x3F
+	.insn VEX.L1.66.0f.W0 0x47, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xEC, 0x47, 0x9B
 .byte 0xC4, 0xE1, 0xEC, 0x47, 0x6F
-.byte 0xC4, 0xE1, 0xEC, 0x47, 0x3F
+	.insn VEX.L1.NP.0f.W1 0x47, (%rdi), %k2, %k7
 .byte 0xC4, 0xE1, 0xED, 0x47, 0x9B
 .byte 0xC4, 0xE1, 0xED, 0x47, 0x6F
-.byte 0xC4, 0xE1, 0xED, 0x47, 0x3F
+	.insn VEX.L1.66.0f.W1 0x47, (%rdi), %k2, %k7
 .byte 0xC5, 0xF8, 0x99, 0x9B
 .byte 0xC5, 0xF8, 0x99, 0x6F
-.byte 0xC5, 0xF8, 0x99, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x99, (%rdi), %k7
 .byte 0xC5, 0xF9, 0x99, 0x9B
 .byte 0xC5, 0xF9, 0x99, 0x6F
-.byte 0xC5, 0xF9, 0x99, 0x3F
+	.insn VEX.L0.66.0f.W0 0x99, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF8, 0x99, 0x9B
 .byte 0xC4, 0xE1, 0xF8, 0x99, 0x6F
-.byte 0xC4, 0xE1, 0xF8, 0x99, 0x3F
+	.insn VEX.L0.NP.0f.W1 0x99, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x99, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x99, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x99, 0x3F
+	.insn VEX.L0.66.0f.W1 0x99, (%rdi), %k7
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x8F, 0x01
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x6A, 0x01
 .byte 0xC4, 0xE3, 0xF9, 0x30, 0x04, 0x01
@@ -145,33 +145,33 @@
 .byte 0xC4, 0xE3, 0x79, 0x33, 0x04, 0x01
 .byte 0xC5, 0xF8, 0x92, 0x9B
 .byte 0xC5, 0xF8, 0x92, 0x6F
-.byte 0xC5, 0xF8, 0x92, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x92, (%rdi), %k7
 .byte 0xC5, 0xF9, 0x92, 0x9B
 .byte 0xC5, 0xF9, 0x92, 0x6F
-.byte 0xC5, 0xF9, 0x92, 0x3F
+	.insn VEX.L0.66.0f.W0 0x92, (%rdi), %k7
 .byte 0xC5, 0xFB, 0x92, 0x9B
 .byte 0xC5, 0xFB, 0x92, 0x6F
-.byte 0xC5, 0xFB, 0x92, 0x3F
+	.insn VEX.L0.f2.0f.W0 0x92, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x92, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x92, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x92, 0x3F
+	.insn VEX.L0.66.0f.W1 0x92, (%rdi), %k7
 .byte 0xC5, 0xF8, 0x93, 0x9B
 .byte 0xC5, 0xF8, 0x93, 0x6F
-.byte 0xC5, 0xF8, 0x93, 0x3F
+	.insn VEX.L0.NP.0f.W0 0x93, (%rdi), %k7
 .byte 0xC5, 0xF9, 0x93, 0x9B
 .byte 0xC5, 0xF9, 0x93, 0x6F
-.byte 0xC5, 0xF9, 0x93, 0x3F
+	.insn VEX.L0.66.0f.W0 0x93, (%rdi), %k7
 .byte 0xC5, 0xFB, 0x93, 0x9B
 .byte 0xC5, 0xFB, 0x93, 0x6F
-.byte 0xC5, 0xFB, 0x93, 0x3F
+	.insn VEX.L0.f2.0f.W0 0x93, (%rdi), %k7
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x9B
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x6F
-.byte 0xC4, 0xE1, 0xF9, 0x93, 0x3F
-.byte 0xc4, 0x62, 0x1, 0x1c, 0x41, 0x37
-.byte 0x62, 0x72, 0xad, 0x08, 0x1c, 0x01
+	.insn VEX.L0.66.0f.W1 0x93, (%rdi), %k7
+	.insn VEX.66.0f38.W0 0x1c, 0x37(%rcx), %xmm15, %xmm8
+	.insn EVEX.66.0f38.W1 0x1c, (%rcx), %xmm10, %xmm8
 .byte 0x1
-.byte 0x62, 0xf3, 0x7d, 0x28, 0x1b, 0xc8, 0x25
+	.insn EVEX.66.0f3a.W0 0x1b, $0x25, %ymm0, %xmm1
 .byte 0x62, 0xf3
-.byte 0x62, 0xf3, 0x75, 0x08, 0x23, 0xc2, 0x25
+	.insn EVEX.66.0f3a.W0 0x23, $0x25, %xmm2, %xmm1, %xmm0
 .byte 0x62
-.byte 0x62, 0xf2, 0x7d, 0x28, 0x5b, 0x41, 0x37
+	.insn EVEX.66.0f38.W0 0x5b, 0x37(%rcx){:d1}, %ymm0
--- a/gas/testsuite/gas/i386/x86-64-mpx.s
+++ b/gas/testsuite/gas/i386/x86-64-mpx.s
@@ -215,35 +215,19 @@ start:
 
 foo:	bnd ret
 
+	.att_syntax prefix
 bad:
-	# bndldx (%eax),(bad)
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0x30
+	# bndldx (%rax),(bad)
+	.insn 0x0f1a, (%rax), %esi
 
 	# bndmov (bad),%bnd0
-	.byte 0x66
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0xc4
+	.insn 0x660f1a, %esp, %bnd0
 
 	# bndmov with REX.B set
-	.byte 0x66
-	.byte 0x41
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0xc0
+	.insn 0x660f1a, %r8d, %bnd0
 
 	# bndmov with REX.R set
-	.byte 0x66
-	.byte 0x44
-	.byte 0x0f
-	.byte 0x1a
-	.byte 0xc0
+	.insn 0x660f1a, %bnd0, %r8d
 
 	# bndmk (bad),%bnd0
-	.byte 0xf3
-	.byte 0x0f
-	.byte 0x1b
-	.byte 0x05
-	.long 0x90909090
+	.insn 0xf30f1b, -0x6f6f6f70(%rip), %bnd0
--- a/gas/testsuite/gas/i386/x86-64-nops.d
+++ b/gas/testsuite/gas/i386/x86-64-nops.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -drw
 #name: x86-64 nops
 
@@ -13,7 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 1f 80 00 00 00 00 	nopl   0x0\(%rax\)
 [ 	]*[a-f0-9]+:	0f 1f 84 00 00 00 00 00 	nopl   0x0\(%rax,%rax,1\)
 [ 	]*[a-f0-9]+:	66 0f 1f 84 00 00 00 00 00 	nopw   0x0\(%rax,%rax,1\)
-[ 	]*[a-f0-9]+:	66 2e 0f 1f 84 00 00 00 00 00 	cs nopw 0x0\(%rax,%rax,1\)
+[ 	]*[a-f0-9]+:	2e 66 0f 1f 84 00 00 00 00 00 	cs nopw 0x0\(%rax,%rax,1\)
 [ 	]*[a-f0-9]+:	0f 19 ff             	nop    %edi
 [ 	]*[a-f0-9]+:	0f 1a ff             	nop    %edi
 [ 	]*[a-f0-9]+:	0f 1b ff             	nop    %edi
--- a/gas/testsuite/gas/i386/x86-64-nops.s
+++ b/gas/testsuite/gas/i386/x86-64-nops.s
@@ -1,48 +1,49 @@
 	.text
 
-	.byte 0x0f, 0x1f, 0x0	
-	.byte 0x0f, 0x1f, 0x40, 0x0	
-	.byte 0x0f, 0x1f, 0x44, 0x0,  0x0	
-	.byte 0x66, 0x0f, 0x1f, 0x44, 0x0,  0x0	
-	.byte 0x0f, 0x1f, 0x80, 0x0,  0x0,  0x0, 0x0	
-	.byte 0x0f, 0x1f, 0x84, 0x0,  0x0,  0x0, 0x0, 0x0
-	.byte 0x66, 0x0f, 0x1f, 0x84, 0x0,  0x0, 0x0, 0x0, 0x0
-	.byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x0, 0x0, 0x0, 0x0, 0x0
+	.insn 0x0f1f/0, (%rax)
+	.insn {disp8} 0x0f1f/0, 0(%rax)
+	.insn {disp8} 0x0f1f/0, 0(%rax,%rax)
+	.insn {disp8} data16 0x0f1f/0, 0(%rax,%rax)
+	.insn {disp32} 0x0f1f/0, 0(%rax)
+	.insn {disp32} 0x0f1f/0, 0(%rax,%rax)
+	.insn {disp32} data16 0x0f1f/0, 0(%rax,%rax)
+	.insn {disp32} data16 0x0f1f/0, %cs:0(%rax,%rax)
 
 	# reg,reg
-	.byte 0x0f, 0x19, 0xff
-	.byte 0x0f, 0x1a, 0xff  
-	.byte 0x0f, 0x1b, 0xff
-	.byte 0x0f, 0x1c, 0xff  
-	.byte 0x0f, 0x1d, 0xff
-	.byte 0x0f, 0x1e, 0xff  
-	.byte 0x0f, 0x1f, 0xff
+	.insn 0x0f19, %edi, %edi
+	.insn 0x0f1a, %edi, %edi
+	.insn 0x0f1b, %edi, %edi
+	.insn 0x0f1c, %edi, %edi
+	.insn 0x0f1d, %edi, %edi
+	.insn 0x0f1e, %edi, %edi
+	.insn 0x0f1f, %edi, %edi
 
 	# with base and imm8
-	.byte 0x0f, 0x19, 0x5A, 0x22
-	.byte 0x0f, 0x1c, 0x5A, 0x22
-	.byte 0x0f, 0x1d, 0x5A, 0x22
-	.byte 0x0f, 0x1e, 0x5A, 0x22
-	.byte 0x0f, 0x1f, 0x5A, 0x22
+	.insn 0x0f19/3, 0x22(%rdx)
+	.insn 0x0f1c/3, 0x22(%rdx)
+	.insn 0x0f1d/3, 0x22(%rdx)
+	.insn 0x0f1e/3, 0x22(%rdx)
+	.insn 0x0f1f/3, 0x22(%rdx)
 
 	# with sib and imm32
-	.byte 0x0f, 0x19, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1c, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1d, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1e, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-	.byte 0x0f, 0x1f, 0x9C, 0x1D, 0x11, 0x22, 0x33, 0x44
-
-	.byte 0x0f, 0x19, 0x04, 0x60
-	.byte 0x0f, 0x1c, 0x0c, 0x60
-	.byte 0x0f, 0x1d, 0x04, 0x60
-	.byte 0x0f, 0x1e, 0x04, 0x60
-	.byte 0x0f, 0x1f, 0x04, 0x60
-
-	.byte 0x0f, 0x19, 0x04, 0x59
-	.byte 0x0f, 0x1c, 0x0c, 0x59
-	.byte 0x0f, 0x1d, 0x04, 0x59
-	.byte 0x0f, 0x1e, 0x04, 0x59
-	.byte 0x0f, 0x1f, 0x04, 0x59
+	.insn 0x0f19/3, 0x44332211(%rbp,%rbx)
+	.insn 0x0f1c/3, 0x44332211(%rbp,%rbx)
+	.insn 0x0f1d/3, 0x44332211(%rbp,%rbx)
+	.insn 0x0f1e/3, 0x44332211(%rbp,%rbx)
+	.insn 0x0f1f/3, 0x44332211(%rbp,%rbx)
+
+	.allow_index_reg
+	.insn 0x0f19/0, (%rax,%riz,2)
+	.insn 0x0f1c/1, (%rax,%riz,2)
+	.insn 0x0f1d/0, (%rax,%riz,2)
+	.insn 0x0f1e/0, (%rax,%riz,2)
+	.insn 0x0f1f/0, (%rax,%riz,2)
+
+	.insn 0x0f19/0, (%rcx,%rbx,2)
+	.insn 0x0f1c/1, (%rcx,%rbx,2)
+	.insn 0x0f1d/0, (%rcx,%rbx,2)
+	.insn 0x0f1e/0, (%rcx,%rbx,2)
+	.insn 0x0f1f/0, (%rcx,%rbx,2)
 
 	nop %rax
 	nop %eax
--- a/gas/testsuite/gas/i386/x86-64-opcode.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode.d
@@ -1,4 +1,4 @@
-#as: -J
+#as: -J --divide
 #objdump: -drw
 #name: x86-64 opcode
 
--- a/gas/testsuite/gas/i386/x86-64-opcode.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode.s
@@ -458,16 +458,16 @@
 	int3
 	int    $0x90
 
-	.byte 0xf6, 0xc9, 0x01
-	.byte 0x66, 0xf7, 0xc9, 0x02, 0x00
-	.byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
-	.byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
-	.byte 0xc0, 0xf0, 0x02
-	.byte 0xc1, 0xf0, 0x01
-	.byte 0x48, 0xc1, 0xf0, 0x01
-	.byte 0xd0, 0xf0
-	.byte 0xd1, 0xf0
-	.byte 0x48, 0xd1, 0xf0
-	.byte 0xd2, 0xf0
-	.byte 0xd3, 0xf0
-	.byte 0x48, 0xd3, 0xf0
+	.insn 0xf6/1, $1, %cl
+	.insn 0xf7/1, $2{:u16}, %cx
+	.insn 0xf7/1, $4{:u32}, %ecx
+	.insn 0xf7/1, $8{:s32}, %rcx
+	.insn 0xc0/6, $2, %al
+	.insn 0xc1/6, $1, %eax
+	.insn 0xc1/6, $1, %rax
+	.insn 0xd0/6, %al
+	.insn 0xd1/6, %eax
+	.insn 0xd1/6, %rax
+	.insn 0xd2/6, %al
+	.insn 0xd3/6, %eax
+	.insn 0xd3/6, %rax
--- a/gas/testsuite/gas/i386/x86-64-opcode-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode-bad.s
@@ -1,10 +1,4 @@
 	.text
 # All the followings are bad opcodes for x86-64.
-	.byte 0xc5
-	.byte 0xac
-	.byte 0x46
-	.byte 0xf5
-	.byte 0xc5
-	.byte 0x2c
-	.byte 0x46
-	.byte 0xf5
+	.insn VEX.L1.0f 0x46, %k5, %r10d, %k6
+	.insn VEX.L1.0f 0x46, %k5, %r10d, %r14d
--- a/gas/testsuite/gas/i386/x86-64-prefetch.d
+++ b/gas/testsuite/gas/i386/x86-64-prefetch.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -dw
 #name: x86-64 prefetch
 #source: prefetch.s
--- a/gas/testsuite/gas/i386/x86-64-prefetch-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-prefetch-intel.d
@@ -1,3 +1,4 @@
+#as: --divide
 #objdump: -dw -Mintel
 #name: x86-64 prefetch (Intel disassembly)
 #source: prefetch.s
--- a/gas/testsuite/gas/i386/x86-64-prefetchi-inval-register.d
+++ b/gas/testsuite/gas/i386/x86-64-prefetchi-inval-register.d
@@ -1,4 +1,4 @@
-#as:
+#as: --divide
 #objdump: -dw
 #name: x86-64 PREFETCHI INVAL REGISTER insns
 
--- a/gas/testsuite/gas/i386/x86-64-prefetchi-inval-register.s
+++ b/gas/testsuite/gas/i386/x86-64-prefetchi-inval-register.s
@@ -1,9 +1,6 @@
 .text
         #prefetchit0 (%rcx) PREFETCHIT0/1 apply without RIP-relative addressing, should stay NOPs.
-        .byte 0x0f
-        .byte 0x18
-        .byte 0x39
+        .insn 0x0f18/7, (%rcx)
+
         #prefetchit1 (%rcx) PREFETCHIT1/1 apply without RIP-relative addressing, should stay NOPs.
-        .byte 0x0f
-        .byte 0x18
-        .byte 0x31
+        .insn 0x0f18/6, (%rcx)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC v2 14/14] x86: .insn example - VEX-encoded instructions of original Xeon Phi
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (12 preceding siblings ...)
  2023-03-10 10:26 ` [PATCH v2 13/14] x86: convert testcases to use .insn Jan Beulich
@ 2023-03-10 10:27 ` Jan Beulich
  2023-03-24  9:51 ` [PATCH v2 00/14] x86: new .insn directive Jan Beulich
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-10 10:27 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

While obviously the otherwise unknown to gas MVEX encoded insns cannot
be expressed (for now gas simply doesn't know to keep clear the bit
distinguishing it from EVEX, and of course Phi-specific operand forms
also aren't known to it), the VEX ones can be.

Since the disassembler produces utter rubbish, have expectations - at
least for the time being - in raw hex dump form. (I've verified with my
own disassembler that generated code is correct.)
---
RFC: Do we want this as a new testcase?
---
v2: Pass --divide. Xfail test for Darwin.

--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -875,6 +875,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-sysenter-amd"
     run_list_test "x86-64-sysenter-amd" "-mamd64"
     run_dump_test "insn-64"
+    run_dump_test "insn-Phi"
     run_dump_test "noreg64"
     run_list_test "noreg64"
     run_dump_test "noreg64-data16"
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-Phi.d
@@ -0,0 +1,21 @@
+#as: --divide
+#objdump: -sj.text
+#name: .insn (Xeon Phi)
+#xfail: *-*-darwin*
+
+.*: +file format .*
+
+Contents of section .text:
+ 0000 c5fbae78 40c5faae 7840c4c1 7aaef0c4  .*
+ 0010 e1faaef1 c4e06074 e7c5d885 e0ffffff  .*
+ 0020 c5f841d1 c5f842d1 c5f843d1 c56895c1  .*
+ 0030 c5e897f9 c4c3783e d103c5f8 48d1c5f8  .*
+ 0040 49d1c5f8 90d1c578 93d1c4c1 7892d1c5  .*
+ 0050 f844d1c5 f845d1c5 f898d1c5 f846d1c5  .*
+ 0060 f847d1c4 c17abdc8 c461fabd c1c57ab8  .*
+ 0070 c1c4c1fa b8c8c5fb aef1c4c1 fbaef0c4  .*
+ 0080 c17abcc8 c461fabc c1c57bbc c1c4c1fb  .*
+ 0090 bcc8c5f8 184f40c4 c1781850 40c4c178  .*
+ 00a0 185f40c5 f8182d55 ffffffc4 a1781834  .*
+ 00b0 41c4c178 183c88c5 f81824c5 00000000  .*
+ 00c0 c5f81840 40.*
--- /dev/null
+++ b/gas/testsuite/gas/i386/insn-Phi.s
@@ -0,0 +1,44 @@
+	.text
+Phi:
+	.insn VEX.L0.f2.0f 0xae/7, 0x40(%rax)		# clevict0 0x40(%rax)
+	.insn VEX.L0.f3.0f 0xae/7, 0x40(%rax)		# clevict1 0x40(%rax)
+	.insn VEX.L0.f3.0f 0xae/6, %r8d			# delay %r8d
+	.insn VEX.L0.f3.0f 0xae/6, %rcx			# delay %rcx
+	.insn VEX.L0.W0 0x74, $Phi-1f{:s8}, %k3		# jkzd Phi, %k3
+1:
+	.insn VEX.L0.0f.W0 0x85, $Phi-2f{:s32}, %k4	# jknzd Phi, %k4
+2:
+	.insn VEX.L0.0f.W0 0x41, %k1, %k2		# kand %k1, %k2
+	.insn VEX.L0.0f.W0 0x42, %k1, %k2		# kandn %k1, %k2
+	.insn VEX.L0.0f.W0 0x43, %k1, %k2		# kandnr %k1, %k2
+	.insn VEX.L0.0f.W0 0x95, %k1, %k2, %r8		# kconcath %k1, %k2, %r8
+	.insn VEX.L0.0f.W0 0x97, %k1, %k2, %rdi		# kconcatl %k1, %k2, %rdi
+	.insn VEX.L0.0f3a.W0 0x3e, $3, %r9, %k2		# kextract $3, %r9, %k2
+	.insn VEX.L0.0f.W0 0x48, %k1, %k2		# kmergel1h %k1, %k2
+	.insn VEX.L0.0f.W0 0x49, %k1, %k2		# kmergel1l %k1, %k2
+	.insn VEX.L0.0f.W0 0x90, %k1, %k2		# kmov %k1, %k2
+	.insn VEX.L0.0f.W0 0x93, %k1, %r10d		# kmov %k1, %r10d
+	.insn VEX.L0.0f.W0 0x92, %r9d, %k2		# kmov %r9d, %k2
+	.insn VEX.L0.0f.W0 0x44, %k1, %k2		# knot %k1, %k2
+	.insn VEX.L0.0f.W0 0x45, %k1, %k2		# kor %k1, %k2
+	.insn VEX.L0.0f.W0 0x98, %k1, %k2		# kortest %k1, %k2
+	.insn VEX.L0.0f.W0 0x46, %k1, %k2		# kxnor %k1, %k2
+	.insn VEX.L0.0f.W0 0x47, %k1, %k2		# kxor %k1, %k2
+	.insn VEX.L0.f3.0f 0xbd, %r8d, %ecx		# lzcnt %r8d, %ecx
+	.insn VEX.L0.f3.0f 0xbd, %rcx, %r8		# lzcnt %rcx, %r8
+	.insn VEX.L0.f3.0f 0xb8, %ecx, %r8d		# popcnt %ecx, %r8d
+	.insn VEX.L0.f3.0f 0xb8, %r8, %rcx		# popcnt %r8, %rcx
+	.insn VEX.L0.f2.0f 0xae/6, %ecx			# spflt %ecx
+	.insn VEX.L0.f2.0f 0xae/6, %r8			# spflt %r8
+	.insn VEX.L0.f3.0f 0xbc, %r8d, %ecx		# tzcnt %r8d, %ecx
+	.insn VEX.L0.f3.0f 0xbc, %rcx, %r8		# tzcnt %rcx, %r8
+	.insn VEX.L0.f2.0f 0xbc, %ecx, %r8d		# tzcnti %ecx, %r8d
+	.insn VEX.L0.f2.0f 0xbc, %r8, %rcx		# tzcnti %r8, %rcx
+	.insn VEX.L0.0f 0x18/1, 0x40(%rdi)		# vprefetch0 0x40(%rdi)
+	.insn VEX.L0.0f 0x18/2, 0x40(%r8)		# vprefetch1 0x40(%r8)
+	.insn VEX.L0.0f 0x18/3, 0x40(%r15)		# vprefetch2 0x40(%r15)
+	.insn VEX.L0.0f 0x18/5, Phi(%rip)		# vprefetche0 Phi(%rip)
+	.insn VEX.L0.0f 0x18/6, (%rcx,%r8,2)		# vprefetche1 (%rcx,%r8,2)
+	.insn VEX.L0.0f 0x18/7, (%r8,%rcx,4)		# vprefetche2 (%r8,%rcx,4)
+	.insn VEX.L0.0f 0x18/4, (,%rax,8)		# vprefetchenta (,%rax,8)
+	.insn VEX.L0.0f 0x18/0, 0x40(%rax)		# vprefetchnta 0x40(%rax)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] x86: new .insn directive
  2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
                   ` (13 preceding siblings ...)
  2023-03-10 10:27 ` [PATCH RFC v2 14/14] x86: .insn example - VEX-encoded instructions of original Xeon Phi Jan Beulich
@ 2023-03-24  9:51 ` Jan Beulich
  14 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-03-24  9:51 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Jiang, Haochen

On 10.03.2023 11:17, Jan Beulich via Binutils wrote:
> Especially when instructions which are not known to gas yet also take
> register or, yet worse, memory operands, encoding such in code actually
> wanting to make use of them is often difficult. Typically people resort
> to hard-coding the involved registers, thus being able to express
> things via .byte. To overcome this limitation (to a sufficient degree
> at least), introduce .insn. This allows users to specify operands in
> their "normal" shape (possibly in slightly altered order). Peculiarities
> require two small syntax extensions; see the implementation or
> documentation for details.
> 
> In order to re-use sufficiently much of the functionality md_assemble()
> already uses, some adjustments to existing code were necessary. The one
> item to call out here is the partial re-write of build_modrm_byte()
> (patch 7), which actually turned out to simplify things. Subsequently
> possible further tidying is carried out right away (patches 8 and 9),
> even if not strictly related to the .insn work.
> 
> I'm pretty sure there are still corner cases which aren't taken care of
> correctly. It's also quite possible that I've overlooked further places
> in pre-existing code which need tweaking for .insn. People taking a
> close look and/or playing with the new functionality would be much
> appreciated.
> 
> The last patch in the series continues to be RFC, as I'm uncertain
> whether we actually want this kind of a testcase.
> 
> Main changes in v2 are testsuite adjustments for certain non-Linux
> targets, resulting from me not properly having re-run wider tests with
> the last few patches in the series in place.
> 
> 01: introduce .insn directive
> 02: parse VEX and alike specifiers for .insn
> 03: parse special opcode modifiers for .insn
> 04: re-work build_modrm_byte()'s register assignment
> 05: VexVVVV is now merely a boolean
> 06: drop "shimm" special case template expansions
> 07: AT&T: restrict recognition of the "absolute branch" prefix character
> 08: process instruction operands for .insn
> 09: handle EVEX Disp8 for .insn
> 10: allow for multiple immediates in output_disp()
> 11: handle immediate operands for .insn
> 12: document .insn
> 13: convert testcases to use .insn
> 14: .insn example - VEX-encoded instructions of original Xeon Phi

Unless I hear back otherwise with some good reasons not to, I intend to
commit the remaining patches (with one further !BFD64 build fix) some time
next week. Would certainly be nice to have an explicit view voiced by
somebody on whether to include the last patch ...

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-03-10 10:26 ` [PATCH v2 13/14] x86: convert testcases to use .insn Jan Beulich
@ 2023-04-20  8:56   ` Clément Chigot
  2023-04-20  9:01     ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Clément Chigot @ 2023-04-20  8:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, H.J. Lu, Jiang, Haochen

Hi Jan,

> --- a/gas/testsuite/gas/i386/x86-64-opcode.s
> +++ b/gas/testsuite/gas/i386/x86-64-opcode.s
> @@ -458,16 +458,16 @@
>         int3
>         int    $0x90
>
> -       .byte 0xf6, 0xc9, 0x01
> -       .byte 0x66, 0xf7, 0xc9, 0x02, 0x00
> -       .byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
> -       .byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
> -       .byte 0xc0, 0xf0, 0x02
> -       .byte 0xc1, 0xf0, 0x01
> -       .byte 0x48, 0xc1, 0xf0, 0x01
> -       .byte 0xd0, 0xf0
> -       .byte 0xd1, 0xf0
> -       .byte 0x48, 0xd1, 0xf0
> -       .byte 0xd2, 0xf0
> -       .byte 0xd3, 0xf0
> -       .byte 0x48, 0xd3, 0xf0
> +       .insn 0xf6/1, $1, %cl
> +       .insn 0xf7/1, $2{:u16}, %cx
> +       .insn 0xf7/1, $4{:u32}, %ecx
> +       .insn 0xf7/1, $8{:s32}, %rcx
> +       .insn 0xc0/6, $2, %al
> +       .insn 0xc1/6, $1, %eax
> +       .insn 0xc1/6, $1, %rax
> +       .insn 0xd0/6, %al
> +       .insn 0xd1/6, %eax
> +       .insn 0xd1/6, %rax
> +       .insn 0xd2/6, %al
> +       .insn 0xd3/6, %eax
> +       .insn 0xd3/6, %rax

The test is failing on my side when building with --target=x86_64-elf.
I'm not sure what's wrong yet but gas seems to ignore everything after "/":
  | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
  |  ...
  |  4ea: f6 f7                div    %bh
  |  4ec: f7 f7                div    %edi
  |  4ee: c0 c1 c1              rol    $0xc1,%cl
  |  4f1: d0 d1                rcl    %cl
  |  4f3: d1 d2                rcl    %edx
  |  4f5: d3 d3                rcl    %cl,%ebx

Thanks,
Clément

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-04-20  8:56   ` Clément Chigot
@ 2023-04-20  9:01     ` Jan Beulich
  2023-04-20  9:09       ` Clément Chigot
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2023-04-20  9:01 UTC (permalink / raw)
  To: Clément Chigot; +Cc: Binutils, H.J. Lu, Jiang, Haochen

On 20.04.2023 10:56, Clément Chigot wrote:
>> --- a/gas/testsuite/gas/i386/x86-64-opcode.s
>> +++ b/gas/testsuite/gas/i386/x86-64-opcode.s
>> @@ -458,16 +458,16 @@
>>         int3
>>         int    $0x90
>>
>> -       .byte 0xf6, 0xc9, 0x01
>> -       .byte 0x66, 0xf7, 0xc9, 0x02, 0x00
>> -       .byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
>> -       .byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
>> -       .byte 0xc0, 0xf0, 0x02
>> -       .byte 0xc1, 0xf0, 0x01
>> -       .byte 0x48, 0xc1, 0xf0, 0x01
>> -       .byte 0xd0, 0xf0
>> -       .byte 0xd1, 0xf0
>> -       .byte 0x48, 0xd1, 0xf0
>> -       .byte 0xd2, 0xf0
>> -       .byte 0xd3, 0xf0
>> -       .byte 0x48, 0xd3, 0xf0
>> +       .insn 0xf6/1, $1, %cl
>> +       .insn 0xf7/1, $2{:u16}, %cx
>> +       .insn 0xf7/1, $4{:u32}, %ecx
>> +       .insn 0xf7/1, $8{:s32}, %rcx
>> +       .insn 0xc0/6, $2, %al
>> +       .insn 0xc1/6, $1, %eax
>> +       .insn 0xc1/6, $1, %rax
>> +       .insn 0xd0/6, %al
>> +       .insn 0xd1/6, %eax
>> +       .insn 0xd1/6, %rax
>> +       .insn 0xd2/6, %al
>> +       .insn 0xd3/6, %eax
>> +       .insn 0xd3/6, %rax
> 
> The test is failing on my side when building with --target=x86_64-elf.
> I'm not sure what's wrong yet but gas seems to ignore everything after "/":
>   | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
>   |  ...
>   |  4ea: f6 f7                div    %bh
>   |  4ec: f7 f7                div    %edi
>   |  4ee: c0 c1 c1              rol    $0xc1,%cl
>   |  4f1: d0 d1                rcl    %cl
>   |  4f3: d1 d2                rcl    %edx
>   |  4f5: d3 d3                rcl    %cl,%ebx

Right, and I think I did address all of these issues (there were more than
just here) in what was committed (and in fact already in v2), by passing
--divide to as. Can you confirm --divide does not take the intended effect
in that case?

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-04-20  9:01     ` Jan Beulich
@ 2023-04-20  9:09       ` Clément Chigot
  2023-04-20  9:19         ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Clément Chigot @ 2023-04-20  9:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, H.J. Lu, Jiang, Haochen

On Thu, Apr 20, 2023 at 11:01 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.04.2023 10:56, Clément Chigot wrote:
> >> --- a/gas/testsuite/gas/i386/x86-64-opcode.s
> >> +++ b/gas/testsuite/gas/i386/x86-64-opcode.s
> >> @@ -458,16 +458,16 @@
> >>         int3
> >>         int    $0x90
> >>
> >> -       .byte 0xf6, 0xc9, 0x01
> >> -       .byte 0x66, 0xf7, 0xc9, 0x02, 0x00
> >> -       .byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
> >> -       .byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
> >> -       .byte 0xc0, 0xf0, 0x02
> >> -       .byte 0xc1, 0xf0, 0x01
> >> -       .byte 0x48, 0xc1, 0xf0, 0x01
> >> -       .byte 0xd0, 0xf0
> >> -       .byte 0xd1, 0xf0
> >> -       .byte 0x48, 0xd1, 0xf0
> >> -       .byte 0xd2, 0xf0
> >> -       .byte 0xd3, 0xf0
> >> -       .byte 0x48, 0xd3, 0xf0
> >> +       .insn 0xf6/1, $1, %cl
> >> +       .insn 0xf7/1, $2{:u16}, %cx
> >> +       .insn 0xf7/1, $4{:u32}, %ecx
> >> +       .insn 0xf7/1, $8{:s32}, %rcx
> >> +       .insn 0xc0/6, $2, %al
> >> +       .insn 0xc1/6, $1, %eax
> >> +       .insn 0xc1/6, $1, %rax
> >> +       .insn 0xd0/6, %al
> >> +       .insn 0xd1/6, %eax
> >> +       .insn 0xd1/6, %rax
> >> +       .insn 0xd2/6, %al
> >> +       .insn 0xd3/6, %eax
> >> +       .insn 0xd3/6, %rax
> >
> > The test is failing on my side when building with --target=x86_64-elf.
> > I'm not sure what's wrong yet but gas seems to ignore everything after "/":
> >   | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
> >   |  ...
> >   |  4ea: f6 f7                div    %bh
> >   |  4ec: f7 f7                div    %edi
> >   |  4ee: c0 c1 c1              rol    $0xc1,%cl
> >   |  4f1: d0 d1                rcl    %cl
> >   |  4f3: d1 d2                rcl    %edx
> >   |  4f5: d3 d3                rcl    %cl,%ebx
>
> Right, and I think I did address all of these issues (there were more than
> just here) in what was committed (and in fact already in v2), by passing
> --divide to as. Can you confirm --divide does not take the intended effect
> in that case?

--divide is not passed to x86_64-opcode test.
But adding it resolves the issue:
  | $ ../as-new  --x32 --divide -J  -o tmpdir/x86-64-opcode.o
.../x86-64-opcode.s
  | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
  |  4ea: f6 c9 01              test   $0x1,%cl
  |  4ed: 66 f7 c9 02 00        test   $0x2,%cx
  |  4f2: f7 c9 04 00 00 00    test   $0x4,%ecx
  |  4f8: 48 f7 c9 08 00 00 00 test   $0x8,%rcx
  |  4ff: c0 f0 02              shl    $0x2,%al
  |  502: c1 f0 01              shl    $0x1,%eax
  |  505: 48 c1 f0 01          shl    $0x1,%rax
  |  509: d0 f0                shl    %al
  |  50b: d1 f0                shl    %eax
  |  50d: 48 d1 f0              shl    %rax
  |  510: d2 f0                shl    %cl,%al
  |  512: d3 f0                shl    %cl,%eax
  |  514: 48 d3 f0              shl    %cl,%rax

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-04-20  9:09       ` Clément Chigot
@ 2023-04-20  9:19         ` Jan Beulich
  2023-04-20  9:22           ` Clément Chigot
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2023-04-20  9:19 UTC (permalink / raw)
  To: Clément Chigot; +Cc: Binutils, H.J. Lu, Jiang, Haochen

On 20.04.2023 11:09, Clément Chigot wrote:
> On Thu, Apr 20, 2023 at 11:01 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 20.04.2023 10:56, Clément Chigot wrote:
>>>> --- a/gas/testsuite/gas/i386/x86-64-opcode.s
>>>> +++ b/gas/testsuite/gas/i386/x86-64-opcode.s
>>>> @@ -458,16 +458,16 @@
>>>>         int3
>>>>         int    $0x90
>>>>
>>>> -       .byte 0xf6, 0xc9, 0x01
>>>> -       .byte 0x66, 0xf7, 0xc9, 0x02, 0x00
>>>> -       .byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
>>>> -       .byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
>>>> -       .byte 0xc0, 0xf0, 0x02
>>>> -       .byte 0xc1, 0xf0, 0x01
>>>> -       .byte 0x48, 0xc1, 0xf0, 0x01
>>>> -       .byte 0xd0, 0xf0
>>>> -       .byte 0xd1, 0xf0
>>>> -       .byte 0x48, 0xd1, 0xf0
>>>> -       .byte 0xd2, 0xf0
>>>> -       .byte 0xd3, 0xf0
>>>> -       .byte 0x48, 0xd3, 0xf0
>>>> +       .insn 0xf6/1, $1, %cl
>>>> +       .insn 0xf7/1, $2{:u16}, %cx
>>>> +       .insn 0xf7/1, $4{:u32}, %ecx
>>>> +       .insn 0xf7/1, $8{:s32}, %rcx
>>>> +       .insn 0xc0/6, $2, %al
>>>> +       .insn 0xc1/6, $1, %eax
>>>> +       .insn 0xc1/6, $1, %rax
>>>> +       .insn 0xd0/6, %al
>>>> +       .insn 0xd1/6, %eax
>>>> +       .insn 0xd1/6, %rax
>>>> +       .insn 0xd2/6, %al
>>>> +       .insn 0xd3/6, %eax
>>>> +       .insn 0xd3/6, %rax
>>>
>>> The test is failing on my side when building with --target=x86_64-elf.
>>> I'm not sure what's wrong yet but gas seems to ignore everything after "/":
>>>   | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
>>>   |  ...
>>>   |  4ea: f6 f7                div    %bh
>>>   |  4ec: f7 f7                div    %edi
>>>   |  4ee: c0 c1 c1              rol    $0xc1,%cl
>>>   |  4f1: d0 d1                rcl    %cl
>>>   |  4f3: d1 d2                rcl    %edx
>>>   |  4f5: d3 d3                rcl    %cl,%ebx
>>
>> Right, and I think I did address all of these issues (there were more than
>> just here) in what was committed (and in fact already in v2), by passing
>> --divide to as. Can you confirm --divide does not take the intended effect
>> in that case?
> 
> --divide is not passed to x86_64-opcode test.

Well, you continue to supply ambiguous information up to here; it only
becomes clear ...

> But adding it resolves the issue:
>   | $ ../as-new  --x32 --divide -J  -o tmpdir/x86-64-opcode.o

... here that what you mean is the ilp32/x86-64-opcode test (which is a
clone of the x86-64-opcode one). So yes, I did overlook the need to add
--divide there as well.

Jan

> .../x86-64-opcode.s
>   | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
>   |  4ea: f6 c9 01              test   $0x1,%cl
>   |  4ed: 66 f7 c9 02 00        test   $0x2,%cx
>   |  4f2: f7 c9 04 00 00 00    test   $0x4,%ecx
>   |  4f8: 48 f7 c9 08 00 00 00 test   $0x8,%rcx
>   |  4ff: c0 f0 02              shl    $0x2,%al
>   |  502: c1 f0 01              shl    $0x1,%eax
>   |  505: 48 c1 f0 01          shl    $0x1,%rax
>   |  509: d0 f0                shl    %al
>   |  50b: d1 f0                shl    %eax
>   |  50d: 48 d1 f0              shl    %rax
>   |  510: d2 f0                shl    %cl,%al
>   |  512: d3 f0                shl    %cl,%eax
>   |  514: 48 d3 f0              shl    %cl,%rax


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/14] x86: convert testcases to use .insn
  2023-04-20  9:19         ` Jan Beulich
@ 2023-04-20  9:22           ` Clément Chigot
  0 siblings, 0 replies; 21+ messages in thread
From: Clément Chigot @ 2023-04-20  9:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, H.J. Lu, Jiang, Haochen

On Thu, Apr 20, 2023 at 11:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.04.2023 11:09, Clément Chigot wrote:
> > On Thu, Apr 20, 2023 at 11:01 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 20.04.2023 10:56, Clément Chigot wrote:
> >>>> --- a/gas/testsuite/gas/i386/x86-64-opcode.s
> >>>> +++ b/gas/testsuite/gas/i386/x86-64-opcode.s
> >>>> @@ -458,16 +458,16 @@
> >>>>         int3
> >>>>         int    $0x90
> >>>>
> >>>> -       .byte 0xf6, 0xc9, 0x01
> >>>> -       .byte 0x66, 0xf7, 0xc9, 0x02, 0x00
> >>>> -       .byte 0xf7, 0xc9, 0x04, 0x00, 0x00, 0x00
> >>>> -       .byte 0x48, 0xf7, 0xc9, 0x08, 0x00, 0x00, 0x00
> >>>> -       .byte 0xc0, 0xf0, 0x02
> >>>> -       .byte 0xc1, 0xf0, 0x01
> >>>> -       .byte 0x48, 0xc1, 0xf0, 0x01
> >>>> -       .byte 0xd0, 0xf0
> >>>> -       .byte 0xd1, 0xf0
> >>>> -       .byte 0x48, 0xd1, 0xf0
> >>>> -       .byte 0xd2, 0xf0
> >>>> -       .byte 0xd3, 0xf0
> >>>> -       .byte 0x48, 0xd3, 0xf0
> >>>> +       .insn 0xf6/1, $1, %cl
> >>>> +       .insn 0xf7/1, $2{:u16}, %cx
> >>>> +       .insn 0xf7/1, $4{:u32}, %ecx
> >>>> +       .insn 0xf7/1, $8{:s32}, %rcx
> >>>> +       .insn 0xc0/6, $2, %al
> >>>> +       .insn 0xc1/6, $1, %eax
> >>>> +       .insn 0xc1/6, $1, %rax
> >>>> +       .insn 0xd0/6, %al
> >>>> +       .insn 0xd1/6, %eax
> >>>> +       .insn 0xd1/6, %rax
> >>>> +       .insn 0xd2/6, %al
> >>>> +       .insn 0xd3/6, %eax
> >>>> +       .insn 0xd3/6, %rax
> >>>
> >>> The test is failing on my side when building with --target=x86_64-elf.
> >>> I'm not sure what's wrong yet but gas seems to ignore everything after "/":
> >>>   | $ ../../binutils/objdump  -drw tmpdir/x86-64-opcode.o
> >>>   |  ...
> >>>   |  4ea: f6 f7                div    %bh
> >>>   |  4ec: f7 f7                div    %edi
> >>>   |  4ee: c0 c1 c1              rol    $0xc1,%cl
> >>>   |  4f1: d0 d1                rcl    %cl
> >>>   |  4f3: d1 d2                rcl    %edx
> >>>   |  4f5: d3 d3                rcl    %cl,%ebx
> >>
> >> Right, and I think I did address all of these issues (there were more than
> >> just here) in what was committed (and in fact already in v2), by passing
> >> --divide to as. Can you confirm --divide does not take the intended effect
> >> in that case?
> >
> > --divide is not passed to x86_64-opcode test.
>
> Well, you continue to supply ambiguous information up to here; it only
> becomes clear ...
>
> > But adding it resolves the issue:
> >   | $ ../as-new  --x32 --divide -J  -o tmpdir/x86-64-opcode.o
>
> ... here that what you mean is the ilp32/x86-64-opcode test (which is a
> clone of the x86-64-opcode one). So yes, I did overlook the need to add
> --divide there as well.

Oh yeah my bad. This is indeed ilp32 version of the test:
 FAIL: x86-64 (ILP32) opcode

Sorry about the confusion by not stating it earlier :(

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-04-20  9:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 10:17 [PATCH v2 00/14] x86: new .insn directive Jan Beulich
2023-03-10 10:19 ` [PATCH v2 01/14] x86: introduce " Jan Beulich
2023-03-10 10:19 ` [PATCH v2 02/14] x86: parse VEX and alike specifiers for .insn Jan Beulich
2023-03-10 10:20 ` [PATCH v2 03/14] x86: parse special opcode modifiers " Jan Beulich
2023-03-10 10:21 ` [PATCH v2 04/14] x86: re-work build_modrm_byte()'s register assignment Jan Beulich
2023-03-10 10:21 ` [PATCH v2 05/14] x86: VexVVVV is now merely a boolean Jan Beulich
2023-03-10 10:22 ` [PATCH v2 06/14] x86: drop "shimm" special case template expansions Jan Beulich
2023-03-10 10:22 ` [PATCH v2 07/14] x86/AT&T: restrict recognition of the "absolute branch" prefix character Jan Beulich
2023-03-10 10:23 ` [PATCH v2 08/14] x86: process instruction operands for .insn Jan Beulich
2023-03-10 10:24 ` [PATCH v2 09/14] x86: handle EVEX Disp8 " Jan Beulich
2023-03-10 10:24 ` [PATCH v2 10/14] x86: allow for multiple immediates in output_disp() Jan Beulich
2023-03-10 10:25 ` [PATCH v2 11/14] x86: handle immediate operands for .insn Jan Beulich
2023-03-10 10:26 ` [PATCH v2 12/14] x86: document .insn Jan Beulich
2023-03-10 10:26 ` [PATCH v2 13/14] x86: convert testcases to use .insn Jan Beulich
2023-04-20  8:56   ` Clément Chigot
2023-04-20  9:01     ` Jan Beulich
2023-04-20  9:09       ` Clément Chigot
2023-04-20  9:19         ` Jan Beulich
2023-04-20  9:22           ` Clément Chigot
2023-03-10 10:27 ` [PATCH RFC v2 14/14] x86: .insn example - VEX-encoded instructions of original Xeon Phi Jan Beulich
2023-03-24  9:51 ` [PATCH v2 00/14] x86: new .insn directive Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).