public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/7] x86: suffix handling changes
@ 2022-08-16  7:27 Jan Beulich
  2022-08-16  7:30 ` [PATCH 1/7] x86/Intel: restrict suffix derivation Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:27 UTC (permalink / raw)
  To: Binutils

... accompanied by a few other improvements (or so I hope) found
along the way.

1: Intel: restrict suffix derivation
2: insert "no error" enumerator in i386_error enumeration
3: move / quiesce pre-386 non-16-bit warning
4: improve match_template()'s diagnostics
5: re-work insn/suffix recognition
6: further re-work insn/suffix recognition to also cover MOVSL
7: don't recognize/derive Q suffix in the common case

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
@ 2022-08-16  7:30 ` Jan Beulich
  2022-08-17 19:19   ` H.J. Lu
  2022-08-16  7:30 ` [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:30 UTC (permalink / raw)
  To: Binutils

While in some cases deriving an AT&T-style suffix from an Intel syntax
memory operand size specifier is necessary, in many cases this is not
only pointless, but has led to the introduction of various workarounds:
Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
ToQword attributes. Suppress suffix derivation when we can clearly tell
that the memory operand's size isn't going to be needed to infer the
possible need for the low byte/word opcode bit or an operand size prefix
(0x66 or REX.W).

As a result ToDword and ToQword can be dropped entirely, plus a fair
number of IgnoreSize and NoRex64 can also be got rid of. Note that
IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
16-bit code using SIMD insns isn't well tested, clone an existing
testcase just enough to cover a few insns which are potentially
problematic but are being touched here.)

As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
counterparts (which I think should also be corrected, but perhaps better
in a separate change).
---
Long term suffix derivation should be dropped altogether, not the least
such that bogus error messages like "incorrect register `...' used with
`...' suffix" don't misguid people anymore when no suffix was used at
all.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7071,42 +7071,22 @@ process_suffix (void)
 	}
       else if (i.suffix == BYTE_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_bsuf)
-	    i.suffix = 0;
-	  else if (!check_byte_reg ())
+	  if (!check_byte_reg ())
 	    return 0;
 	}
       else if (i.suffix == LONG_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_lsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_long_reg ())
+	  if (!check_long_reg ())
 	    return 0;
 	}
       else if (i.suffix == QWORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_qsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_qword_reg ())
+	  if (!check_qword_reg ())
 	    return 0;
 	}
       else if (i.suffix == WORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_wsuf)
-	    i.suffix = 0;
-	  else if (!check_word_reg ())
+	  if (!check_word_reg ())
 	    return 0;
 	}
       else if (intel_syntax
@@ -7566,20 +7546,9 @@ check_long_reg (void)
 		 || i.tm.operand_types[op].bitfield.instance == Accum)
 	     && i.tm.operand_types[op].bitfield.dword)
       {
-	if (intel_syntax
-	    && i.tm.opcode_modifier.toqword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to QWORD.  We want REX byte. */
-	    i.suffix = QWORD_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7617,20 +7586,9 @@ check_qword_reg (void)
       {
 	/* Prohibit these changes in the 64bit mode, since the
 	   lowering is more complicated.  */
-	if (intel_syntax
-	    && i.tm.opcode_modifier.todword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to DWORD.  We don't want REX byte. */
-	    i.suffix = LONG_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7670,14 +7628,6 @@ check_word_reg (void)
 		i.suffix);
 	return 0;
       }
-    /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */
-    else if (i.types[op].bitfield.qword
-	     && intel_syntax
-	     && i.tm.opcode_modifier.toqword)
-      {
-	  /* Convert to QWORD.  We want EVEX.W byte. */
-	  i.suffix = QWORD_MNEM_SUFFIX;
-      }
   return 1;
 }
 
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -790,9 +790,83 @@ i386_intel_operand (char *operand_string
 	  break;
 	}
 
+      /* Now check whether we actually want to infer an AT&T-like suffix.
+	 We really only need to do this when operand size determination (incl.
+	 REX.W) is going to be derived from it.  For this we check whether the
+	 given suffix is valid for any of the candidate templates.  */
+      if (suffix && suffix != i.suffix
+	  && (current_templates->start->opcode_modifier.opcodespace != SPACE_BASE
+	      || current_templates->start->base_opcode != 0x62 /* bound */))
+	{
+	  const insn_template *t;
+
+	  for (t = current_templates->start; t < current_templates->end; ++t)
+	    {
+	      /* Operands haven't been swapped yet.  */
+	      unsigned int op = t->operands - 1 - this_operand;
+
+	      /* Easy checks to skip templates which won't match anyway.  */
+	      if (this_operand >= t->operands || t->opcode_modifier.attsyntax)
+		continue;
+
+	      switch (suffix)
+		{
+		case BYTE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_bsuf)
+		    continue;
+		  break;
+		case WORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_wsuf)
+		    continue;
+		  break;
+		case LONG_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_lsuf)
+		    continue;
+		  break;
+		case QWORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_qsuf)
+		    continue;
+		  break;
+		case SHORT_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ssuf)
+		    continue;
+		  break;
+		case LONG_DOUBLE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ldsuf)
+		    continue;
+		  break;
+		default:
+		  abort ();
+		}
+
+	      /* In a few cases suffixes are permitted, but we can nevertheless
+		 derive that these aren't going to be needed.  This is only of
+		 interest for insns using ModR/M, plus we can skip templates with
+		 swappable operands here (simplifying subsequent logic).  */
+	      if (!t->opcode_modifier.modrm || t->opcode_modifier.d)
+		break;
+
+	      if (!t->operand_types[op].bitfield.baseindex)
+		continue;
+
+	      switch (t->operand_types[op].bitfield.class)
+		{
+		case RegMMX:
+		case RegSIMD:
+		case RegMask:
+		  continue;
+		}
+
+	      break;
+	    }
+
+	  if (t == current_templates->end)
+	    suffix = 0;
+	}
+
       if (!i.suffix)
 	i.suffix = suffix;
-      else if (i.suffix != suffix)
+      else if (suffix && i.suffix != suffix)
 	{
 	  as_bad (_("conflicting operand size modifiers"));
 	  return 0;
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -169,6 +169,7 @@ if [gas_32_check] then {
     run_dump_test "simd"
     run_dump_test "simd-intel"
     run_dump_test "simd-suffix"
+    run_dump_test "simd16"
     run_dump_test "mem"
     run_dump_test "mem-intel"
     run_dump_test "reg"
--- a/gas/testsuite/gas/i386/simd.s
+++ b/gas/testsuite/gas/i386/simd.s
@@ -1,5 +1,6 @@
 	.text
 _start:
+	.ifndef use16
 	addsubps 0x12345678,%xmm1
 	comisd 0x12345678,%xmm1
 	comiss 0x12345678,%xmm1
@@ -31,6 +32,7 @@ _start:
 	punpcklqdq 0x12345678,%xmm1
 	ucomisd 0x12345678,%xmm1
 	ucomiss 0x12345678,%xmm1
+	.endif
 
 	cmpeqsd (%eax),%xmm0
 	cmpeqss (%eax),%xmm0
@@ -101,6 +103,7 @@ cmpsd	$0x10,(%eax),%xmm7
 
 	.intel_syntax noprefix
 
+	.ifndef use16
 addsubps xmm1,XMMWORD PTR ds:0x12345678
 comisd xmm1,QWORD PTR ds:0x12345678
 comiss xmm1,DWORD PTR ds:0x12345678
@@ -132,6 +135,8 @@ punpcklwd xmm1,XMMWORD PTR ds:0x12345678
 punpcklqdq xmm1,XMMWORD PTR ds:0x12345678
 ucomisd xmm1,QWORD PTR ds:0x12345678
 ucomiss xmm1,DWORD PTR ds:0x12345678
+	.endif
+
 cmpeqsd xmm0,QWORD PTR [eax]
 cmpeqss xmm0,DWORD PTR [eax]
 cvtpi2pd xmm0,QWORD PTR [eax]
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.d
@@ -0,0 +1,137 @@
+#as: --defsym use16=1 -I${srcdir}/$subdir
+#objdump: -dw -Mi8086
+#name: i386 SIMD (16-bit)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 08       	unpckhpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 15 08          	unpckhps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 66 0f 14 08       	unpcklpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 14 08          	unpcklps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 00       	unpckhpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 15 00          	unpckhps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 14 00       	unpcklpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 14 00          	unpcklps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 2c 00          	cvttps2pi \(%eax\),%mm0
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.s
@@ -0,0 +1,2 @@
+	.code16
+	.include "simd.s"
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -706,8 +706,6 @@ static bitfield opcode_modifiers[] =
   BITFIELD (RegKludge),
   BITFIELD (Implicit1stXmm0),
   BITFIELD (PrefixOk),
-  BITFIELD (ToDword),
-  BITFIELD (ToQword),
   BITFIELD (AddrPrefixOpReg),
   BITFIELD (IsPrefix),
   BITFIELD (ImmExt),
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -521,10 +521,6 @@ enum
 #define PrefixHLELock		5 /* Okay with a LOCK prefix.  */
 #define PrefixHLEAny		6 /* Okay with or without a LOCK prefix.  */
   PrefixOk,
-  /* Convert to DWORD */
-  ToDword,
-  /* Convert to QWORD */
-  ToQword,
   /* Address prefix changes register operand */
   AddrPrefixOpReg,
   /* opcode is a prefix */
@@ -740,8 +736,6 @@ typedef struct i386_opcode_modifier
   unsigned int regkludge:1;
   unsigned int implicit1stxmm0:1;
   unsigned int prefixok:3;
-  unsigned int todword:1;
-  unsigned int toqword:1;
   unsigned int addrprefixopreg:1;
   unsigned int isprefix:1;
   unsigned int immext:1;
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -970,18 +970,18 @@ pause, 0xf390, None, Cpu186, No_bSuf|No_
 <mmx:cpu:pfx:attr:shimm:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV=2|VexW0|SSE2AVX:RegXMM:Xmmword, +
     $sse:CpuSSE2:66:::RegXMM:Xmmword, +
-    $mmx:CpuMMX::NoRex64::RegMMX:Qword>
+    $mmx:CpuMMX::::RegMMX:Qword>
 
 <sse2:cpu:attr:scal:vvvv:shimm, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV=2|VexW0|SSE2AVX, +
-    $sse:CpuSSE2::NoRex64::>
+    $sse:CpuSSE2::::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
     b:0:VexW0:Byte:CpuAVX512DQ:66:CpuAVX512VBMI, +
     w:1:VexW1:Word:CpuAVX512F::CpuAVX512BW>
 
 <dq:opc:vexw:vexw64:elem:cpu64:gpr:kpfx, +
-    d:0:VexW0:IgnoreSize:Dword::Reg32:66, +
+    d:0:VexW0::Dword::Reg32:66, +
     q:1:VexW1:VexW1:Qword:Cpu64:Reg64:>
 
 emms, 0xf77, None, CpuMMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -989,7 +989,7 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
 // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
 // spec). AMD's spec, having been in existence for much longer, failed to
 // recognize that and specified movd for 32- and 64-bit operations.
-movd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
+movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
@@ -998,10 +998,10 @@ movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm
 movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
-movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
+movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
+movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
 movq, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
+movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
 movq, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegMMX }
 packssdw<mmx>, 0x<mmx:pfx>0f6b, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 packsswb<mmx>, 0x<mmx:pfx>0f63, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1009,7 +1009,7 @@ packuswb<mmx>, 0x<mmx:pfx>0f67, None, <m
 padd<bw><mmx>, 0x<mmx:pfx>0ffc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddd<mmx>, 0x<mmx:pfx>0ffe, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddq<sse2>, 0x660fd4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 padds<bw><mmx>, 0x<mmx:pfx>0fec | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddus<bw><mmx>, 0x<mmx:pfx>0fdc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 pand<mmx>, 0x<mmx:pfx>0fdb, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1037,25 +1037,25 @@ psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc
 psub<bw><mmx>, 0x<mmx:pfx>0ff8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubd<mmx>, 0x<mmx:pfx>0ffa, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubq<sse2>, 0x660ffb, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psubs<bw><mmx>, 0x<mmx:pfx>0fe8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubus<bw><mmx>, 0x<mmx:pfx>0fd8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhbw<mmx>, 0x<mmx:pfx>0f68, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhwd<mmx>, 0x<mmx:pfx>0f69, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhdq<mmx>, 0x<mmx:pfx>0f6a, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpcklbw<sse2>, 0x660f60, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklbw, 0xf60, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklbw, 0xf60, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpcklwd<sse2>, 0x660f61, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklwd, 0xf61, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklwd, 0xf61, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpckldq<sse2>, 0x660f62, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpckldq, 0xf62, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpckldq, 0xf62, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pxor<mmx>, 0x<mmx:pfx>0fef, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 
 // SSE instructions.
 
 <sse:cpu:attr:scal:vvvv, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
-    $sse:CpuSSE::IgnoreSize:>
+    $sse:CpuSSE:::>
 <frel:imm:comm, eq:0:C, lt:1:, le:2:, unord:3:C, neq:4:C, nlt:5:, nle:6:, ord:7:C>
 
 addps<sse>, 0x0f58, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1067,21 +1067,21 @@ cmp<frel>ss<sse>, 0xf30fc2, <frel:imm>,
 cmpps<sse>, 0x0fc2, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpss<sse>, 0xf30fc2, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comiss<sse>, 0x0f2f, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
-cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
+cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
 cvtsi2ss<sse>, 0xf30f2a, None, <sse:cpu>|CpuNo64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
-cvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf32d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvttss2si, 0xf32c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 divps<sse>, 0x0f5e, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 divss<sse>, 0xf30f5e, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 maskmovq, 0xff7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegMMX }
 maxps<sse>, 0x0f5f, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maxss<sse>, 0xf30f5f, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1089,51 +1089,51 @@ minps<sse>, 0x0f5d, None, <sse:cpu>, Mod
 minss<sse>, 0xf30f5d, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movaps<sse>, 0x0f28, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhlps<sse>, 0x0f12, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhps, 0xf16, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhps, 0xf16, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movlhps<sse>, 0x0f16, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlps, 0xf12, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlps, 0xf12, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskps<sse>, 0x0f50, None, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntps<sse>, 0x0f2b, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
+movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
 movntdq<sse2>, 0x660fe7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
+movss, 0xf310, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
 movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movss, 0xf30f10, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movss, 0xf30f10, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movups<sse>, 0x0f10, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulps<sse>, 0x0f59, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulss<sse>, 0xf30f59, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 orps<sse>, 0x0f56, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pavg<bw><sse2>, 0x660fe0 | (3 * <bw:opc>), None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextrw<sse2>, 0x660fc5, None, <sse2:cpu>, Load|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
 pextrw, 0xfc5, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }
 pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
-pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
+pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
 pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, Reg32|Reg64, RegMMX }
-pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
+pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
 pmaxsw<sse2>, 0x660fee, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmaxub<sse2>, 0x660fde, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminsw<sse2>, 0x660fea, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminub<sse2>, 0x660fda, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmovmskb<sse2>, 0x660fd7, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 pmovmskb, 0xfd7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegMMX, Reg32|Reg64 }
 pmulhuw<sse2>, 0x660fe4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 prefetchnta, 0xf18, 0, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht0, 0xf18, 1, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht1, 0xf18, 2, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht2, 0xf18, 3, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
-psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 rcpps<sse>, 0x0f53, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 rcpss<sse>, 0xf30f53, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 rsqrtps<sse>, 0x0f52, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1142,7 +1142,7 @@ sfence, 0xfaef8, None, CpuSSE|Cpu3dnowA,
 shufps<sse>, 0x0fc6, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtps<sse>, 0x0f51, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtss<sse>, 0xf30f51, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 subps<sse>, 0x0f5c, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 subss<sse>, 0xf30f5c, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 ucomiss<sse>, 0x0f2e, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1161,9 +1161,9 @@ cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>,
 cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegMMX, RegXMM }
-cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
+cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd<sse2>, 0xf20f2a, None, <sse2:cpu>|CpuNo64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
@@ -1176,17 +1176,17 @@ maxsd<sse2>, 0xf20f5f, None, <sse2:cpu>,
 minpd<sse2>, 0x660f5d, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minsd<sse2>, 0xf20f5d, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movapd<sse2>, 0x660f28, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhpd, 0x660f16, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlpd, 0x660f12, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhpd, 0x660f16, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulpd<sse2>, 0x660f59, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulsd<sse2>, 0xf20f59, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1200,21 +1200,21 @@ ucomisd<sse2>, 0x660f2e, None, <sse2:cpu
 unpckhpd<sse2>, 0x660f15, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 unpcklpd<sse2>, 0x660f14, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 xorpd<sse2>, 0x660f57, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtpd2dq<sse2>, 0xf20fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtdq2ps<sse2>, 0x0f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtpd2pi, 0x660f2d, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
 cvtpd2ps<sse2>, 0x660f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtps2dq<sse2>, 0x660f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtsd2si, 0xf22d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 cvttpd2pi, 0x660f2c, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
-cvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf22c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 cvttpd2dq<sse2>, 0x660fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvttps2dq<sse2>, 0xf30f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maskmovdqu<sse2>, 0x660ff7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -1223,7 +1223,7 @@ movdqu<sse2>, 0xf30f6f, None, <sse2:cpu>
 movdq2q, 0xf20fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegMMX }
 movq2dq, 0xf30fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
 pmuludq<sse2>, 0x660ff4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pshufd<sse2>, 0x660f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshufhw<sse2>, 0xf30f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshuflw<sse2>, 0xf20f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1245,7 +1245,7 @@ haddps<sse3>, 0xf20f7c, None, <sse3:cpu>
 hsubpd<sse3>, 0x660f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 hsubps<sse3>, 0xf20f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 lddqu<sse3>, 0xf20ff0, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
-movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movshdup<sse3>, 0xf30f16, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movsldup<sse3>, 0xf30f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
@@ -1276,17 +1276,17 @@ mwait, 0xf01c9, None, CpuSSE3, CheckRegS
 // VMX instructions.
 
 vmcall, 0xf01c1, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmclear, 0x660fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmclear, 0x660fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmlaunch, 0xf01c2, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 vmresume, 0xf01c3, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmptrld, 0xfc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
-vmptrst, 0xfc7, 7, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmptrld, 0xfc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
+vmptrst, 0xfc7, 7, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Reg32|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64, Reg64|Qword|Unspecified|BaseIndex }
 vmwrite, 0xf79, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32 }
 vmwrite, 0xf79, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64|Qword|Unspecified|BaseIndex, Reg64 }
 vmxoff, 0xf01c4, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmxon, 0xf30fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmxon, 0xf30fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 
 // VMFUNC instruction
 
@@ -1313,7 +1313,7 @@ invpcid, 0x660f3882, None, CpuINVPCID|Cp
 <ssse3:cpu:pfx:attr:vvvv:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexW0|SSE2AVX:VexVVVV:RegXMM:Xmmword, +
     $sse:CpuSSSE3:66:::RegXMM:Xmmword, +
-    $mmx:CpuSSSE3::NoRex64::RegMMX:Qword>
+    $mmx:CpuSSSE3::::RegMMX:Qword>
 
 phaddw<ssse3>, 0x<ssse3:pfx>0f3801, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
 phaddd<ssse3>, 0x<ssse3:pfx>0f3802, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
@@ -1333,7 +1333,7 @@ pabsd<ssse3>, 0x<ssse3:pfx>0f381e, None,
 // SSE4.1 instructions.
 
 <sse41:cpu:attr:scal:vvvv, $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, $sse:CpuSSE4_1:::>
-<sd:ppfx:spfx:opc:vexw:elem:scal, s::f3:0:VexW0:Dword:IgnoreSize, d:66:f2:1:VexW1:Qword:NoRex64>
+<sd:ppfx:spfx:opc:vexw:elem, s::f3:0:VexW0:Dword, d:66:f2:1:VexW1:Qword>
 
 blendp<sd><sse41>, 0x660f3a0c | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=1|VexSources=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1341,11 +1341,11 @@ blendvp<sd>, 0x664a | <sd:opc>, None, Cp
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 dpp<sd><sse41>, 0x660f3a40 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64 }
 extractps, 0x660f3a17, None, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x660f3a17, None, CpuSSE4_1|Cpu64, RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, RegXMM, Reg64 }
-insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|IgnoreSize|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movntdqa<sse41>, 0x660f382a, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
 mpsadbw<sse41>, 0x660f3a42, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 packusdw<sse41>, 0x660f382b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1356,7 +1356,7 @@ pblendvb, 0x660f3810, None, CpuSSE4_1, M
 pblendw<sse41>, 0x660f3a0e, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpeqq<sse41>, 0x660f3829, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, RegMem|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
-pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
+pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
 pextrd<sse41>, 0x660f3a16, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, Reg32|Unspecified|BaseIndex }
 pextrq, 0x6616, None, CpuAVX|Cpu64, Modrm|Vex|Space0F3A|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
 pextrq, 0x660f3a16, None, CpuSSE4_1|Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
@@ -1374,23 +1374,23 @@ pminsb<sse41>, 0x660f3838, None, <sse41:
 pminsd<sse41>, 0x660f3839, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminud<sse41>, 0x660f383b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminuw<sse41>, 0x660f383a, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 pmuldq<sse41>, 0x660f3828, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pmulld<sse41>, 0x660f3840, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 ptest<sse41>, 0x660f3817, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 roundp<sd><sse41>, 0x660f3a08 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<sd:scal>, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
+rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 // SSE4.2 instructions.
 
@@ -1484,8 +1484,8 @@ vandp<sd>, 0x<sd:ppfx>54, None, CpuAVX,
 vblendp<sd>, 0x660c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vblendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexW0|VexSources=2|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vbroadcastf128, 0x661a, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
-vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
-vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex256|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
+vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex128|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vcmp<frel>p<sd>, 0x<sd:ppfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 vcmp<frel>s<sd>, 0x<sd:spfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcmpp<sd>, 0x<sd:ppfx>c2, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1499,22 +1499,20 @@ vcvtpd2ps<xy>, 0x665a, None, CpuAVX, Mod
 vcvtps2dq, 0x665b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex256|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
-vcvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvtsd2ss, 0xf25a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtss2sd, 0xf35a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vcvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX, Modrm|<xy:vex>|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:syntax>, { <xy:dst>, RegXMM }
 vcvttps2dq, 0xf35b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vcvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-vcvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vdivp<sd>, 0x<sd:ppfx>5e, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vdivs<sd>, 0x<sd:spfx>5e, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdppd, 0x6641, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdpps, 0x6640, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vextractf128, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
-vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 vhaddpd, 0x667c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vhaddps, 0xf27c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1523,7 +1521,7 @@ vhsubps, 0xf27d, None, CpuAVX, Modrm|Vex
 vinsertf128, 0x6618, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
 vinsertps, 0x6621, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vlddqu, 0xf2f0, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vmaskmovdqu, 0x66f7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 vmaskmovp<sd>, 0x662e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmaskmovp<sd>, 0x662c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1537,26 +1535,26 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX,
 // by Intel AVX spec).  To avoid extra template in gcc x86 backend and
 // support assembler for AMD64, we accept 64bit operand on vmovd so
 // that we can use one template for both SSE and AVX instructions.
-vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 vmovd, 0x667e, None, CpuAVX|Cpu64, D|RegMem|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { RegXMM, Reg64 }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
 vmovdqa, 0x666f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovhlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovlhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovmskp<sd>, 0x<sd:ppfx>50, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { RegXMM|RegYMM, Reg32|Reg64 }
 vmovntdq, 0x66e7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovntdqa, 0x662a, None, CpuAVX|CpuAVX2, Modrm|Vex|Space0F38|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vmovntp<sd>, 0x<sd:ppfx>2b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
-vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovshdup, 0xf316, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovsldup, 0xf312, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1692,7 +1690,7 @@ vrsqrtss, 0xf352, None, CpuAVX, Modrm|Ve
 vshufp<sd>, 0x<sd:ppfx>c6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrtp<sd>, 0x<sd:ppfx>51, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrts<sd>, 0x<sd:spfx>51, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vsubp<sd>, 0x<sd:ppfx>5c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsubs<sd>, 0x<sd:spfx>5c, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vtestp<sd>, 0x660e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1889,8 +1887,8 @@ vpshl<xop>, 0x94 | <xop:opc>, None, CpuX
 
 llwpcb, 0x12, 0, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
 slwpcb, 0x12, 1, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
-lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
-lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // BMI instructions
 
@@ -1918,30 +1916,30 @@ tzmsk, 0x01, 4, CpuTBM, Modrm|CheckRegSi
 prefetch, 0xf0d, 0, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetchw, 0xf0d, 1, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 femms, 0xf0e, None, Cpu3dnow, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 
 // AMD extensions.
 syscall, 0xf05, None, CpuSYSCALL, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -1967,8 +1965,8 @@ vmsave, 0xf01db, None, CpuSVME, AddrPref
 
 
 // SSE4a instructions
-movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-movntss, 0xf30f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
+movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+movntss, 0xf30f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
 extrq, 0x660f78, 0, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Imm8, RegXMM }
 extrq, 0x660f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 insertq, 0xf20f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -2166,8 +2164,8 @@ vcvtps2pd, 0x5A, None, CpuAVX512F, Modrm
 
 vcvtps2ph, 0x661D, None, CpuAVX512F, Modrm|EVex512|MaskingMorZ|Space0F3A|VexW0|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vcvtsd2si, 0xF22D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsd2usi, 0xF279, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sdh>2si, 0x<sdh:spfx>2d, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sdh>2usi, 0x<sdh:spfx>79, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtsd2ss, 0xF25A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
@@ -2187,20 +2185,14 @@ vcvtusi2ss, 0xF37B, None, CpuAVX512F, Mo
 
 vcvtss2sd, 0xF35A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW0|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtss2si, 0xF32D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtss2usi, 0xF379, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 vcvttpd2udq<xy>, 0x78, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 
 vcvttps2dq, 0xF35B, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttps2udq, 0x78, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsd2si, 0xF22C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsd2usi, 0xF278, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-
-vcvttss2si, 0xF32C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttss2usi, 0xF378, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sdh>2si, 0x<sdh:spfx>2c, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sdh>2usi, 0x<sdh:spfx>78, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtudq2ps, 0xF27A, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
@@ -2216,7 +2208,7 @@ vextracti32x4, 0x6639, None, CpuAVX512F,
 vextractf64x4, 0x661B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti64x4, 0x663B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX512F|Cpu64, RegMem|EVex128|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 
 vfixupimmp<sd>, 0x6654, None, CpuAVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
@@ -2274,7 +2266,7 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX5
 vmovntp<sd>, 0x<sd:ppfx>2B, None, CpuAVX512F, Modrm|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovup<sd>, 0x<sd:ppfx>10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 
 vmovddup, 0xF212, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM }
 
@@ -2287,16 +2279,16 @@ vmovdqu64, 0xF36F, None, CpuAVX512F, D|M
 vmovhlps, 0x12, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovlhps, 0x16, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 
-vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex=2|Space0F|VexW=2|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex128|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
 vmovq, 0xF37E, None, CpuAVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66D6, None, CpuAVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 
-vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
+vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
 vmovshdup, 0xF316, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
@@ -2596,7 +2588,7 @@ kadd<dq>, 0x<dq:kpfx>4a, None, CpuAVX512
 kand<dq>, 0x<dq:kpfx>41, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
 kmov<dq>, 0xf292, None, CpuAVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
@@ -2985,13 +2977,13 @@ incsspq, 0xf30fae, 5, CpuSHSTK|Cpu64, Mo
 rdsspd, 0xf30f1e, 1, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32 }
 rdsspq, 0xf30f1e, 1, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 endbr32, 0xf30f1efb, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 
@@ -3230,9 +3222,6 @@ vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16
 vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap5|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegYMM }
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegYMM|Word|Unspecified|BaseIndex, RegZMM }
@@ -3256,9 +3245,6 @@ vcvtph2psx, 0x6613, None, CpuAVX512_FP16
 vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-
 vfpclassph<xyz>, 0x66, None, CpuAVX512_FP16|<xyz:vl>, Modrm|<xyz:attr>|Masking=2|Space0F3A|VexW0|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xyz:att>, { Imm8, <xyz:src>|Word, RegMask }
 
 vmovw, 0x666e, None, CpuAVX512_FP16, D|Modrm|EVex128|VexWIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex, RegXMM }


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
  2022-08-16  7:30 ` [PATCH 1/7] x86/Intel: restrict suffix derivation Jan Beulich
@ 2022-08-16  7:30 ` Jan Beulich
  2022-08-17 19:19   ` H.J. Lu
  2022-08-16  7:31 ` [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:30 UTC (permalink / raw)
  To: Binutils

The value of zero would better not indicate any error, but rather hit
the abort() at the top of the consuming switch().

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -226,6 +226,7 @@ union i386_op
 
 enum i386_error
   {
+    no_error, /* Must be first.  */
     operand_size_mismatch,
     operand_type_mismatch,
     register_type_mismatch,


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
  2022-08-16  7:30 ` [PATCH 1/7] x86/Intel: restrict suffix derivation Jan Beulich
  2022-08-16  7:30 ` [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration Jan Beulich
@ 2022-08-16  7:31 ` Jan Beulich
  2022-08-17 19:21   ` H.J. Lu
  2022-08-16  7:32 ` [PATCH 4/7] x86: improve match_template()'s diagnostics Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:31 UTC (permalink / raw)
  To: Binutils

Emitting this warning for every insn, including ones having actual
errors, is annoying. Introduce a boolean variable to emit the warning
just once on the first insn after .arch may have changed the things, and
move the warning to output_insn(). (I didn't want to go as far as
checking whether the .arch actually turned off the i386 bit, but doing
so would be an option.)
---
Otoh I wonder whether switching to a pre-386 architecture shouldn't
automatically move to CODE_16BIT: Us emitting operand- or address-size
prefixes violates the architecture specification. Alternatively we
could outright reject such .arch directives when not already in 16-bit
mode.

I've left the message text unaltered, albeit I think "addressing mode"
is particularly misleading for instructions without memory operands (nor
any other address-size affected aspect, like in e.g. JCXZ).

Originally I thought the warning may get in the way of work done in
subsequent patches, but I think I've convinced myself that all affected
insns are post-286 and hence wouldn't yield CPU_FLAGS_PERFECT_MATCH.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -765,6 +765,9 @@ int optimize_align_code = 1;
 /* Non-zero to quieten some warnings.  */
 static int quiet_warnings = 0;
 
+/* Guard to avoid repeated warnings about non-16-bit code on 16-bit CPUs.  */
+static bool pre_386_16bit_warned;
+
 /* CPU name.  */
 static const char *cpu_arch_name = NULL;
 static char *cpu_sub_arch_name = NULL;
@@ -2809,6 +2812,7 @@ set_cpu_arch (int dummy ATTRIBUTE_UNUSED
 		      cpu_arch_tune = cpu_arch_isa;
 		      cpu_arch_tune_flags = cpu_arch_isa_flags;
 		    }
+		  pre_386_16bit_warned = false;
 		  break;
 		}
 
@@ -5486,12 +5490,7 @@ parse_insn (char *line, char *mnemonic)
     {
       supported |= cpu_flags_match (t);
       if (supported == CPU_FLAGS_PERFECT_MATCH)
-	{
-	  if (!cpu_arch_flags.bitfield.cpui386 && (flag_code != CODE_16BIT))
-	    as_warn (_("use .code16 to ensure correct addressing mode"));
-
-	  return l;
-	}
+	return l;
     }
 
   if (!(supported & CPU_FLAGS_64BIT_MATCH))
@@ -9491,6 +9490,13 @@ output_insn (void)
       fragP->tc_frag_data.max_bytes = max_branch_padding_size;
     }
 
+  if (!cpu_arch_flags.bitfield.cpui386 && (flag_code != CODE_16BIT)
+      && !pre_386_16bit_warned)
+    {
+      as_warn (_("use .code16 to ensure correct addressing mode"));
+      pre_386_16bit_warned = true;
+    }
+
   /* Output jumps.  */
   if (i.tm.opcode_modifier.jump == JUMP)
     output_branch ();


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 4/7] x86: improve match_template()'s diagnostics
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
                   ` (2 preceding siblings ...)
  2022-08-16  7:31 ` [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning Jan Beulich
@ 2022-08-16  7:32 ` Jan Beulich
  2022-08-17 20:24   ` H.J. Lu
  2022-08-16  7:32 ` [PATCH 5/7] x86: re-work insn/suffix recognition Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:32 UTC (permalink / raw)
  To: Binutils

At the example of

	extractps $0, %xmm0, %xmm0
	insertps $0, %xmm0, %eax

(both having respectively the same mistake of using the wrong kind of
destination register) it is easy to see that current behavior is far
from ideal: The former results in "unsupported instruction" for 32-bit
code simply because the 2nd template we have is a Cpu64 one. Instead we
should aim at emitting the "best" possible error, which will typically
be the one where we passed the largest number of checks. Generalize the
original "specific_error" approach by making it apply to the entire
matching loop, utilizing that line numbers increase as we pass further
checks.
---
As to the inval-tls testcase: Why is KMOV special? Are e.g. VMOV or
other vector insns (legacy or EVEX-encoded) any different? Shouldn't the
use of the respective reloc types be limited to _exactly_ the insns they
are intended to be used with? Furthermore having this check in
match_template() is unhelpful, as the resulting diagnostic isn't aiding
in understanding what's wrong. Template matching should be left alone
here, and the issue be diagnosed later, say directly in md_assemble()
(alongside the various further consistency checks there) or in
process_operands().

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2083,12 +2083,7 @@ operand_size_match (const insn_template
     }
 
   if (!t->opcode_modifier.d)
-    {
-    mismatch:
-      if (!match)
-	i.error = operand_size_mismatch;
-      return match;
-    }
+    return match;
 
   /* Check reverse.  */
   gas_assert ((i.operands >= 2 && i.operands <= 3)
@@ -2105,19 +2100,19 @@ operand_size_match (const insn_template
 
       if (t->operand_types[j].bitfield.class == Reg
 	  && !match_operand_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.class == RegSIMD
 	  && !match_simd_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.instance == Accum
 	  && (!match_operand_size (t, j, given)
 	      || !match_simd_size (t, j, given)))
-	goto mismatch;
+	return match;
 
       if ((i.flags[given] & Operand_Mem) && !match_mem_size (t, j, given))
-	goto mismatch;
+	return match;
     }
 
   return match | MATCH_REVERSE;
@@ -6386,6 +6381,17 @@ VEX_check_encoding (const insn_template
   return 0;
 }
 
+/* Helper function for the progress() macro in match_template().  */
+static INLINE enum i386_error progress (enum i386_error new,
+					enum i386_error last,
+					unsigned int line, unsigned int *line_p)
+{
+  if (line <= *line_p)
+    return last;
+  *line_p = line;
+  return new;
+}
+
 static const insn_template *
 match_template (char mnem_suffix)
 {
@@ -6397,8 +6403,9 @@ match_template (char mnem_suffix)
   i386_opcode_modifier suffix_check;
   i386_operand_type operand_types [MAX_OPERANDS];
   int addr_prefix_disp;
-  unsigned int j, size_match, check_register;
-  enum i386_error specific_error = 0;
+  unsigned int j, size_match, check_register, errline = __LINE__;
+  enum i386_error specific_error = number_of_operands_mismatch;
+#define progress(err) progress(err, specific_error, __LINE__, &errline)
 
 #if MAX_OPERANDS != 5
 # error "MAX_OPERANDS must be 5."
@@ -6436,36 +6443,33 @@ match_template (char mnem_suffix)
 	suffix_check.no_ldsuf = 1;
     }
 
-  /* Must have right number of operands.  */
-  i.error = number_of_operands_mismatch;
-
   for (t = current_templates->start; t < current_templates->end; t++)
     {
       addr_prefix_disp = -1;
       found_reverse_match = 0;
 
+      /* Must have right number of operands.  */
       if (i.operands != t->operands)
 	continue;
 
       /* Check processor support.  */
-      i.error = unsupported;
+      specific_error = progress (unsupported);
       if (cpu_flags_match (t) != CPU_FLAGS_PERFECT_MATCH)
 	continue;
 
       /* Check Pseudo Prefix.  */
-      i.error = unsupported;
       if (t->opcode_modifier.pseudovexprefix
 	  && !(i.vec_encoding == vex_encoding_vex
 	      || i.vec_encoding == vex_encoding_vex3))
 	continue;
 
       /* Check AT&T mnemonic.   */
-      i.error = unsupported_with_intel_mnemonic;
+      specific_error = progress (unsupported_with_intel_mnemonic);
       if (intel_mnemonic && t->opcode_modifier.attmnemonic)
 	continue;
 
       /* Check AT&T/Intel syntax.  */
-      i.error = unsupported_syntax;
+      specific_error = progress (unsupported_syntax);
       if ((intel_syntax && t->opcode_modifier.attsyntax)
 	  || (!intel_syntax && t->opcode_modifier.intelsyntax))
 	continue;
@@ -6491,7 +6495,7 @@ match_template (char mnem_suffix)
 	}
 
       /* Check the suffix.  */
-      i.error = invalid_instruction_suffix;
+      specific_error = progress (invalid_instruction_suffix);
       if ((t->opcode_modifier.no_bsuf && suffix_check.no_bsuf)
 	  || (t->opcode_modifier.no_wsuf && suffix_check.no_wsuf)
 	  || (t->opcode_modifier.no_lsuf && suffix_check.no_lsuf)
@@ -6500,6 +6504,7 @@ match_template (char mnem_suffix)
 	  || (t->opcode_modifier.no_ldsuf && suffix_check.no_ldsuf))
 	continue;
 
+      specific_error = progress (operand_size_mismatch);
       size_match = operand_size_match (t);
       if (!size_match)
 	continue;
@@ -6510,11 +6515,9 @@ match_template (char mnem_suffix)
 
 	 as the case of a missing * on the operand is accepted (perhaps with
 	 a warning, issued further down).  */
+      specific_error = progress (operand_type_mismatch);
       if (i.jumpabsolute && t->opcode_modifier.jump != JUMP_ABSOLUTE)
-	{
-	  i.error = operand_type_mismatch;
-	  continue;
-	}
+	continue;
 
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
@@ -6522,6 +6525,8 @@ match_template (char mnem_suffix)
       /* In general, don't allow
 	 - 64-bit operands outside of 64-bit mode,
 	 - 32-bit operands on pre-386.  */
+      specific_error = progress (mnem_suffix ? invalid_instruction_suffix
+					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
       if (((i.suffix == QWORD_MNEM_SUFFIX
 	    && flag_code != CODE_64BIT
@@ -6550,7 +6555,7 @@ match_template (char mnem_suffix)
 	{
 	  if (VEX_check_encoding (t))
 	    {
-	      specific_error = i.error;
+	      specific_error = progress (i.error);
 	      continue;
 	    }
 
@@ -6711,6 +6716,8 @@ match_template (char mnem_suffix)
 						   i.types[1],
 						   operand_types[1])))
 	    {
+	      specific_error = progress (i.error);
+
 	      /* Check if other direction is valid ...  */
 	      if (!t->opcode_modifier.d)
 		continue;
@@ -6735,6 +6742,7 @@ match_template (char mnem_suffix)
 						       operand_types[0])))
 		{
 		  /* Does not match either direction.  */
+		  specific_error = progress (i.error);
 		  continue;
 		}
 	      /* found_reverse_match holds which of D or FloatR
@@ -6773,7 +6781,10 @@ match_template (char mnem_suffix)
 						       operand_types[3],
 						       i.types[4],
 						       operand_types[4]))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 4:
 		  overlap3 = operand_type_and (i.types[3], operand_types[3]);
@@ -6788,7 +6799,10 @@ match_template (char mnem_suffix)
 							    operand_types[2],
 							    i.types[3],
 							    operand_types[3])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 3:
 		  overlap2 = operand_type_and (i.types[2], operand_types[2]);
@@ -6803,7 +6817,10 @@ match_template (char mnem_suffix)
 							    operand_types[1],
 							    i.types[2],
 							    operand_types[2])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  break;
 		}
 	    }
@@ -6814,14 +6831,14 @@ match_template (char mnem_suffix)
       /* Check if vector operands are valid.  */
       if (check_VecOperands (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
       /* Check if VEX/EVEX encoding requirements can be satisfied.  */
       if (VEX_check_encoding (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
@@ -6829,11 +6846,13 @@ match_template (char mnem_suffix)
       break;
     }
 
+#undef progress
+
   if (t == current_templates->end)
     {
       /* We found no match.  */
       const char *err_msg;
-      switch (specific_error ? specific_error : i.error)
+      switch (specific_error)
 	{
 	default:
 	  abort ();
--- a/gas/testsuite/gas/i386/inval-tls.l
+++ b/gas/testsuite/gas/i386/inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovd'
-.*:4: Error: operand size mismatch for `kmovd'
+.*:3: Error: .* `kmovd'
+.*:4: Error: .* `kmovd'
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,14 +1,14 @@
 .*: Assembler messages:
-.*:25: Error: .*unsupported instruction.*
+.*:25: Error: .*operand size mismatch.*
 .*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:47: Error: .*unsupported instruction.*
+.*:47: Error: .*operand size mismatch.*
 .*:48: Error: .*unsupported masking.*
 .*:49: Error: .*unsupported masking.*
 .*:50: Error: .*not supported.*
 .*:51: Error: .*not supported.*
 .*:52: Error: .*not supported.*
-.*:69: Error: .*unsupported instruction.*
+.*:69: Error: .*operand size mismatch.*
 .*:70: Error: .*unsupported masking.*
 .*:71: Error: .*unsupported masking.*
 .*:72: Error: .*not supported.*
@@ -17,7 +17,7 @@
 .*:75: Error: .*not supported.*
 .*:76: Error: .*not supported.*
 .*:77: Error: .*not supported.*
-.*:91: Error: .*unsupported instruction.*
+.*:91: Error: .*operand size mismatch.*
 .*:92: Error: .*unsupported masking.*
 .*:93: Error: .*unsupported masking.*
 .*:94: Error: .*not supported.*
@@ -27,7 +27,7 @@
 .*:98: Error: .*not supported.*
 .*:99: Error: .*not supported.*
 .*:100: Error: .*not supported.*
-.*:113: Error: .*unsupported instruction.*
+.*:113: Error: .*operand size mismatch.*
 .*:114: Error: .*unsupported masking.*
 .*:115: Error: .*unsupported masking.*
 .*:116: Error: .*not supported.*
@@ -40,7 +40,7 @@
 .*:126: Error: .*not supported.*
 .*:127: Error: .*not supported.*
 .*:128: Error: .*not supported.*
-.*:135: Error: .*unsupported instruction.*
+.*:135: Error: .*operand size mismatch.*
 .*:136: Error: .*unsupported masking.*
 .*:137: Error: .*unsupported masking.*
 .*:138: Error: .*not supported.*
@@ -54,7 +54,7 @@
 .*:149: Error: .*not supported.*
 .*:150: Error: .*not supported.*
 .*:151: Error: .*not supported.*
-.*:157: Error: .*unsupported instruction.*
+.*:157: Error: .*operand size mismatch.*
 .*:158: Error: .*unsupported masking.*
 .*:159: Error: .*unsupported masking.*
 .*:160: Error: .*not supported.*
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,12 +1,12 @@
 .*: Assembler messages:
-.*:26: Error: .*unsupported instruction.*
-.*:27: Error: .*unsupported instruction.*
+.*:26: Error: .*unsupported masking.*
+.*:27: Error: .*unsupported masking.*
 .*:29: Error: .*unsupported instruction.*
 .*:30: Error: .*unsupported instruction.*
 .*:32: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported instruction.*
-.*:37: Error: .*unsupported instruction.*
+.*:36: Error: .*unsupported masking.*
+.*:37: Error: .*unsupported masking.*
 .*:39: Error: .*unsupported instruction.*
 .*:40: Error: .*unsupported instruction.*
 .*:43: Error: .*unsupported instruction.*
--- a/gas/testsuite/gas/i386/x86-64-branch-4.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-4.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
 .*:2: Error: invalid instruction suffix for `call'
 .*:3: Error: invalid instruction suffix for `call'
-.*:4: Error: operand type mismatch for `jmp'
+.*:4: Error: operand (size|type) mismatch for `jmp'
 .*:5: Error: invalid instruction suffix for `jmp'
 .*:6: Error: invalid instruction suffix for `jmp'
 .*:7: Error: invalid instruction suffix for `ret'
 .*:8: Error: invalid instruction suffix for `ret'
-.*:11: Error: operand type mismatch for `call'
+.*:11: Error: operand (size|type) mismatch for `call'
 .*:12: Error: invalid instruction suffix for `call'
 .*:13: Error: invalid instruction suffix for `call'
-.*:14: Error: operand size mismatch for `call'
-.*:15: Error: operand type mismatch for `jmp'
+.*:14: Error: operand (size|type) mismatch for `call'
+.*:15: Error: operand (size|type) mismatch for `jmp'
 .*:16: Error: invalid instruction suffix for `jmp'
 .*:17: Error: invalid instruction suffix for `jmp'
-.*:18: Error: operand size mismatch for `jmp'
+.*:18: Error: operand (size|type) mismatch for `jmp'
 .*:19: Error: invalid instruction suffix for `ret'
 .*:20: Error: invalid instruction suffix for `ret'
 GAS LISTING .*
--- a/gas/testsuite/gas/i386/x86-64-branch-5.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-5.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
-.*:2: Error: unsupported syntax for `lcall'
-.*:3: Error: unsupported syntax for `lfs'
-.*:4: Error: unsupported syntax for `lfs'
-.*:5: Error: unsupported syntax for `lgs'
-.*:6: Error: unsupported syntax for `lgs'
-.*:7: Error: unsupported syntax for `ljmp'
-.*:8: Error: unsupported syntax for `lss'
-.*:9: Error: unsupported syntax for `lss'
-.*:12: Error: unsupported syntax for `call'
-.*:13: Error: unsupported syntax for `lfs'
-.*:14: Error: unsupported syntax for `lfs'
-.*:15: Error: unsupported syntax for `lgs'
-.*:16: Error: unsupported syntax for `lgs'
-.*:17: Error: unsupported syntax for `jmp'
-.*:18: Error: unsupported syntax for `lss'
-.*:19: Error: unsupported syntax for `lss'
+.*:2: Error: invalid instruction suffix for `lcall'
+.*:3: Error: operand size mismatch for `lfs'
+.*:4: Error: invalid instruction suffix for `lfs'
+.*:5: Error: operand size mismatch for `lgs'
+.*:6: Error: invalid instruction suffix for `lgs'
+.*:7: Error: invalid instruction suffix for `ljmp'
+.*:8: Error: operand size mismatch for `lss'
+.*:9: Error: invalid instruction suffix for `lss'
+.*:12: Error: operand (size|type) mismatch for `call'
+.*:13: Error: operand size mismatch for `lfs'
+.*:14: Error: operand size mismatch for `lfs'
+.*:15: Error: operand size mismatch for `lgs'
+.*:16: Error: operand size mismatch for `lgs'
+.*:17: Error: operand (size|type) mismatch for `jmp'
+.*:18: Error: operand size mismatch for `lss'
+.*:19: Error: operand size mismatch for `lss'
 GAS LISTING .*
 #pass
--- a/gas/testsuite/gas/i386/x86-64-inval-tls.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovq'
-.*:4: Error: operand size mismatch for `kmovq'
+.*:3: Error: .* `kmovq'
+.*:4: Error: .* `kmovq'


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
                   ` (3 preceding siblings ...)
  2022-08-16  7:32 ` [PATCH 4/7] x86: improve match_template()'s diagnostics Jan Beulich
@ 2022-08-16  7:32 ` Jan Beulich
  2022-08-17 20:29   ` H.J. Lu
  2022-08-16  7:33 ` [PATCH 6/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
  2022-08-16  7:34 ` [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:32 UTC (permalink / raw)
  To: Binutils

x86: re-work insn/suffix recognition

Having templates with a suffix explicitly present has always been
quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
a suitable template _and_ didn't itself already need to trim off a
suffix to find a match at all. This requires error reporting adjustments
(albeit luckily fewer than I was afraid might be necessary), as errors
previously reported during matching now need deferring until after the
2nd pass (because, obviously, we must not emit any error if the 2nd pass
succeeds).

Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
templates, mixed IsString/non-IsString template groups cannot occur
anymore. With that maybe_adjust_templates() becomes unnecessary (and is
hence being removed).

Note further that while the additions to the intel16 testcase aren't
really proper Intel syntax, we've been permitting all of those except
for the MOVD variant. The test therefore is to avoid re-introducing such
an inconsistency.
---
To limit code churn I'm using "goto" for the retry loop, but I'd be
happy to make this a proper loop either right here or in a follow-on
change doing just the necessary re-indentation.

The "too many memory references" errors which are being deleted weren't
fully consistent anyway - even the majority of IsString insns accepts
only a single memory operand. If we want to retain that, it would need
re-introducing in md_assemble(), latching the error into i.error just
like match_template() does.

Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
is not?

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -297,9 +297,6 @@ struct _i386_insn
        explicit segment overrides are given.  */
     const reg_entry *seg[2];
 
-    /* Copied first memory operand string, for re-checking.  */
-    char *memop1_string;
-
     /* PREFIX holds all the given prefix opcodes (usually null).
        PREFIXES is the number of prefix opcodes.  */
     unsigned int prefixes;
@@ -4273,7 +4270,20 @@ optimize_encoding (void)
 	   movq $imm31, %r64   -> movl $imm31, %r32
 	   movq $imm32, %r64   -> movl $imm32, %r32
         */
-      i.tm.opcode_modifier.norex64 = 1;
+      i.tm.opcode_modifier.size = SIZE32;
+      if (i.imm_operands)
+	{
+	  i.types[0].bitfield.imm32 = 1;
+	  i.types[0].bitfield.imm32s = 0;
+	  i.types[0].bitfield.imm64 = 0;
+	}
+      else
+	{
+	  i.types[0].bitfield.dword = 1;
+	  i.types[0].bitfield.qword = 0;
+	}
+      i.types[1].bitfield.dword = 1;
+      i.types[1].bitfield.qword = 0;
       if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
 	{
 	  /* Handle
@@ -4283,11 +4293,6 @@ optimize_encoding (void)
 	  i.tm.operand_types[0].bitfield.imm32 = 1;
 	  i.tm.operand_types[0].bitfield.imm32s = 0;
 	  i.tm.operand_types[0].bitfield.imm64 = 0;
-	  i.types[0].bitfield.imm32 = 1;
-	  i.types[0].bitfield.imm32s = 0;
-	  i.types[0].bitfield.imm64 = 0;
-	  i.types[1].bitfield.dword = 1;
-	  i.types[1].bitfield.qword = 0;
 	  if ((i.tm.base_opcode | 1) == 0xc7)
 	    {
 	      /* Handle
@@ -4819,10 +4824,17 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  const char *pass1_mnem = NULL;
+  enum i386_error pass1_err = 0;
   const insn_template *t;
 
+  /* Make a copy of the full line in case we need to retry.  */
+  copy = xstrdup (line);
+
   /* Initialize globals.  */
+  current_templates = NULL;
+ retry:
   memset (&i, '\0', sizeof (i));
   i.rounding.type = rc_none;
   for (j = 0; j < MAX_OPERANDS; j++)
@@ -4837,15 +4849,21 @@ md_assemble (char *line)
 
   line = parse_insn (line, mnemonic);
   if (line == NULL)
-    return;
+    {
+      if (!copy)
+	goto match_error;
+      free (copy);
+      return;
+    }
   mnem_suffix = i.suffix;
 
   line = parse_operands (line, mnemonic);
   this_operand = -1;
-  xfree (i.memop1_string);
-  i.memop1_string = NULL;
   if (line == NULL)
-    return;
+    {
+      free (copy);
+      return;
+    }
 
   /* Now we've parsed the mnemonic into a set of templates, and have the
      operands at hand.  */
@@ -4921,7 +4939,97 @@ md_assemble (char *line)
      with the template operand types.  */
 
   if (!(t = match_template (mnem_suffix)))
-    return;
+    {
+      const char *err_msg;
+
+      if (!mnem_suffix)
+	{
+	  pass1_err = i.error;
+	  pass1_mnem = current_templates->start->name;
+	  line = copy;
+	  copy = NULL;
+	  goto retry;
+	}
+      free (copy);
+  match_error:
+      switch (pass1_mnem ? pass1_err : i.error)
+	{
+	default:
+	  abort ();
+	case operand_size_mismatch:
+	  err_msg = _("operand size mismatch");
+	  break;
+	case operand_type_mismatch:
+	  err_msg = _("operand type mismatch");
+	  break;
+	case register_type_mismatch:
+	  err_msg = _("register type mismatch");
+	  break;
+	case number_of_operands_mismatch:
+	  err_msg = _("number of operands mismatch");
+	  break;
+	case invalid_instruction_suffix:
+	  err_msg = _("invalid instruction suffix");
+	  break;
+	case bad_imm4:
+	  err_msg = _("constant doesn't fit in 4 bits");
+	  break;
+	case unsupported_with_intel_mnemonic:
+	  err_msg = _("unsupported with Intel mnemonic");
+	  break;
+	case unsupported_syntax:
+	  err_msg = _("unsupported syntax");
+	  break;
+	case unsupported:
+	  as_bad (_("unsupported instruction `%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name);
+	  return;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
+	case invalid_vsib_address:
+	  err_msg = _("invalid VSIB address");
+	  break;
+	case invalid_vector_register_set:
+	  err_msg = _("mask, index, and destination registers must be distinct");
+	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("all tmm registers must be distinct");
+	  break;
+	case invalid_dest_and_src_register_set:
+	  err_msg = _("destination and source registers must be distinct");
+	  break;
+	case unsupported_vector_index_register:
+	  err_msg = _("unsupported vector index register");
+	  break;
+	case unsupported_broadcast:
+	  err_msg = _("unsupported broadcast");
+	  break;
+	case broadcast_needed:
+	  err_msg = _("broadcast is needed for operand of such type");
+	  break;
+	case unsupported_masking:
+	  err_msg = _("unsupported masking");
+	  break;
+	case mask_not_on_destination:
+	  err_msg = _("mask not on destination operand");
+	  break;
+	case no_default_mask:
+	  err_msg = _("default mask isn't allowed");
+	  break;
+	case unsupported_rc_sae:
+	  err_msg = _("unsupported static rounding/sae");
+	  break;
+	case invalid_register_operand:
+	  err_msg = _("invalid register operand");
+	  break;
+	}
+      as_bad (_("%s for `%s'"), err_msg,
+	      pass1_mnem ? pass1_mnem : current_templates->start->name);
+      return;
+    }
+
+  free (copy);
 
   if (sse_check != check_none
       /* The opcode space check isn't strictly needed; it's there only to
@@ -5223,6 +5331,7 @@ parse_insn (char *line, char *mnemonic)
   char *l = line;
   char *token_start = l;
   char *mnem_p;
+  bool pass1 = !current_templates;
   int supported;
   const insn_template *t;
   char *dot_p = NULL;
@@ -5392,8 +5501,10 @@ parse_insn (char *line, char *mnemonic)
       current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
     }
 
-  if (!current_templates)
+  if (!current_templates || !pass1)
     {
+      current_templates = NULL;
+
     check_suffix:
       if (mnem_p > mnemonic)
 	{
@@ -5441,7 +5552,8 @@ parse_insn (char *line, char *mnemonic)
 
       if (!current_templates)
 	{
-	  as_bad (_("no such instruction: `%s'"), token_start);
+	  if (pass1)
+	    as_bad (_("no such instruction: `%s'"), token_start);
 	  return NULL;
 	}
     }
@@ -6851,81 +6963,7 @@ match_template (char mnem_suffix)
   if (t == current_templates->end)
     {
       /* We found no match.  */
-      const char *err_msg;
-      switch (specific_error)
-	{
-	default:
-	  abort ();
-	case operand_size_mismatch:
-	  err_msg = _("operand size mismatch");
-	  break;
-	case operand_type_mismatch:
-	  err_msg = _("operand type mismatch");
-	  break;
-	case register_type_mismatch:
-	  err_msg = _("register type mismatch");
-	  break;
-	case number_of_operands_mismatch:
-	  err_msg = _("number of operands mismatch");
-	  break;
-	case invalid_instruction_suffix:
-	  err_msg = _("invalid instruction suffix");
-	  break;
-	case bad_imm4:
-	  err_msg = _("constant doesn't fit in 4 bits");
-	  break;
-	case unsupported_with_intel_mnemonic:
-	  err_msg = _("unsupported with Intel mnemonic");
-	  break;
-	case unsupported_syntax:
-	  err_msg = _("unsupported syntax");
-	  break;
-	case unsupported:
-	  as_bad (_("unsupported instruction `%s'"),
-		  current_templates->start->name);
-	  return NULL;
-	case invalid_sib_address:
-	  err_msg = _("invalid SIB address");
-	  break;
-	case invalid_vsib_address:
-	  err_msg = _("invalid VSIB address");
-	  break;
-	case invalid_vector_register_set:
-	  err_msg = _("mask, index, and destination registers must be distinct");
-	  break;
-	case invalid_tmm_register_set:
-	  err_msg = _("all tmm registers must be distinct");
-	  break;
-	case invalid_dest_and_src_register_set:
-	  err_msg = _("destination and source registers must be distinct");
-	  break;
-	case unsupported_vector_index_register:
-	  err_msg = _("unsupported vector index register");
-	  break;
-	case unsupported_broadcast:
-	  err_msg = _("unsupported broadcast");
-	  break;
-	case broadcast_needed:
-	  err_msg = _("broadcast is needed for operand of such type");
-	  break;
-	case unsupported_masking:
-	  err_msg = _("unsupported masking");
-	  break;
-	case mask_not_on_destination:
-	  err_msg = _("mask not on destination operand");
-	  break;
-	case no_default_mask:
-	  err_msg = _("default mask isn't allowed");
-	  break;
-	case unsupported_rc_sae:
-	  err_msg = _("unsupported static rounding/sae");
-	  break;
-	case invalid_register_operand:
-	  err_msg = _("invalid register operand");
-	  break;
-	}
-      as_bad (_("%s for `%s'"), err_msg,
-	      current_templates->start->name);
+      i.error = specific_error;
       return NULL;
     }
 
@@ -11334,49 +11372,6 @@ RC_SAE_immediate (const char *imm_start)
   return 1;
 }
 
-/* Only string instructions can have a second memory operand, so
-   reduce current_templates to just those if it contains any.  */
-static int
-maybe_adjust_templates (void)
-{
-  const insn_template *t;
-
-  gas_assert (i.mem_operands == 1);
-
-  for (t = current_templates->start; t < current_templates->end; ++t)
-    if (t->opcode_modifier.isstring)
-      break;
-
-  if (t < current_templates->end)
-    {
-      static templates aux_templates;
-      bool recheck;
-
-      aux_templates.start = t;
-      for (; t < current_templates->end; ++t)
-	if (!t->opcode_modifier.isstring)
-	  break;
-      aux_templates.end = t;
-
-      /* Determine whether to re-check the first memory operand.  */
-      recheck = (aux_templates.start != current_templates->start
-		 || t != current_templates->end);
-
-      current_templates = &aux_templates;
-
-      if (recheck)
-	{
-	  i.mem_operands = 0;
-	  if (i.memop1_string != NULL
-	      && i386_index_check (i.memop1_string) == 0)
-	    return 0;
-	  i.mem_operands = 1;
-	}
-    }
-
-  return 1;
-}
-
 static INLINE bool starts_memory_operand (char c)
 {
   return ISDIGIT (c)
@@ -11527,17 +11522,6 @@ i386_att_operand (char *operand_string)
       char *displacement_string_end;
 
     do_memory_reference:
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((i.mem_operands == 1
-	   && !current_templates->start->opcode_modifier.isstring)
-	  || i.mem_operands == 2)
-	{
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
-	}
-
       /* Check for base index form.  We detect the base index form by
 	 looking for an ')' at the end of the operand, searching
 	 for the '(' matching it, and finding a REGISTER_PREFIX or ','
@@ -11737,8 +11721,6 @@ i386_att_operand (char *operand_string)
       if (i386_index_check (operand_string) == 0)
 	return 0;
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       i.mem_operands++;
     }
   else
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
 	   || intel_state.is_mem)
     {
       /* Memory operand.  */
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((int) i.mem_operands
-	  >= 2 - !current_templates->start->opcode_modifier.isstring)
+      if (i.mem_operands)
 	{
 	  /* Handle
 
@@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
 		    }
 		}
 	    }
-
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
 	}
 
       /* Swap base and index in 16-bit memory operands like
@@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
 	return 0;
 
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       ++i.mem_operands;
     }
   else
--- a/gas/testsuite/gas/i386/code16.s
+++ b/gas/testsuite/gas/i386/code16.s
@@ -1,9 +1,9 @@
 	.text
 	.code16
-	rep; movsd
-	rep; cmpsd
-	rep movsd %ds:(%si),%es:(%di)
-	rep cmpsd %es:(%di),%ds:(%si)
+	rep; movsl
+	rep; cmpsl
+	rep movsl %ds:(%si),%es:(%di)
+	rep cmpsl %es:(%di),%ds:(%si)
 
 	mov	%cr2, %ecx
 	mov	%ecx, %cr2
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -73,6 +73,7 @@ if [gas_32_check] then {
     run_dump_test "amd"
     run_dump_test "katmai"
     run_dump_test "jump"
+    run_dump_test "movs32"
     run_dump_test "movz32"
     run_dump_test "relax-1"
     run_dump_test "relax-2"
@@ -806,6 +807,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-segovr"
     run_list_test "x86-64-inval-seg" "-al"
     run_dump_test "x86-64-branch"
+    run_dump_test "movs64"
     run_dump_test "movz64"
     run_dump_test "x86-64-relax-1"
     run_dump_test "svme64"
--- a/gas/testsuite/gas/i386/intel16.d
+++ b/gas/testsuite/gas/i386/intel16.d
@@ -20,4 +20,12 @@ Disassembly of section .text:
   2c:	8d 02 [ 	]*lea    \(%bp,%si\),%ax
   2e:	8d 01 [ 	]*lea    \(%bx,%di\),%ax
   30:	8d 03 [ 	]*lea    \(%bp,%di\),%ax
-	...
+[ 	]*[0-9a-f]+:	67 f7 13[ 	]+notw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 f7 17[ 	]+notl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 0f 1f 03[ 	]+nopw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 0f 1f 07[ 	]+nopl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 83 03 05[ 	]+addw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 83 07 05[ 	]+addl[ 	]+\$0x5,\(%bx\)
+[ 	]*[0-9a-f]+:	67 c7 03 05 00[ 	]+movw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 c7 07 05 00 00 00[ 	]+movl[ 	]+\$0x5,\(%bx\)
+#pass
--- a/gas/testsuite/gas/i386/intel16.s
+++ b/gas/testsuite/gas/i386/intel16.s
@@ -18,4 +18,14 @@
  lea	ax, [di][bx]
  lea	ax, [di][bp]
 
- .p2align 4,0
+ notw	[ebx]
+ notd	[bx]
+
+ nopw	[ebx]
+ nopd	[bx]
+
+ addw	[ebx], 5
+ addd	[bx], 5
+
+ movw	[ebx], 5
+ movd	[bx], 5
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs.s
@@ -0,0 +1,33 @@
+	.text
+movs:
+	movsb	%al,%ax
+	movsb	(%eax),%ax
+	movsb	%al,%eax
+	movsb	(%eax),%eax
+.ifdef x86_64
+	movsb	%al,%rax
+	movsb	(%rax),%rax
+.endif
+
+	movsbw	%al,%ax
+	movsbw	(%eax),%ax
+	movsbl	%al,%eax
+	movsbl	(%eax),%eax
+.ifdef x86_64
+	movsbq	%al,%rax
+	movsbq	(%rax),%rax
+.endif
+
+	movsw	%ax,%eax
+	movsw	(%eax),%eax
+.ifdef x86_64
+	movsw	%ax,%rax
+	movsw	(%rax),%rax
+.endif
+
+	movswl	%ax,%eax
+	movswl	(%eax),%eax
+.ifdef x86_64
+	movswq	%ax,%rax
+	movswq	(%rax),%rax
+.endif
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs32.d
@@ -0,0 +1,22 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (32-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs64.d
@@ -0,0 +1,30 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (64-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+#pass
--- a/gas/testsuite/gas/i386/movx16.l
+++ b/gas/testsuite/gas/i386/movx16.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx32.l
+++ b/gas/testsuite/gas/i386/movx32.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -106,17 +106,17 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBEC8[ 	]+movsb	%al, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %rcx
@@ -192,12 +192,12 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %rcx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBFC8[ 	]+movsw	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -135,47 +135,37 @@
 mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
 mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
 movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
-movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
 mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
 // In the 64bit mode the short form mov immediate is redefined to have
 // 64bit value.
 mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
 mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
 mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
-movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 // The segment register moves accept WordReg so that a segment register
 // can be copied to a 32 bit register, and vice versa, without using a
 // size prefix.  When moving to a 32 bit register, the upper 16 bits
 // are set to an implementation defined value (on the Pentium Pro, the
 // implementation defined value is zero).
-mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
+mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
 mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
-movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
-mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
+mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
 // Move to/from control debug registers.  In the 16 or 32bit modes
 // they are 32bit.  In the 64bit mode they are 64bit.
 mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
 mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
-movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
 mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
 mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
-movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
 mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
-// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
-// conflict with the "movs" string move instruction.
-movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
-movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
-movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
-movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
-movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
+movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
+// "movslq" must not be converted into "movsl" to avoid conflict with the
+// "movsl" string move instruction.
 movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
@@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
 // String manipulation.
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string compare.
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
@@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
 slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string move.
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
                   ` (4 preceding siblings ...)
  2022-08-16  7:32 ` [PATCH 5/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-08-16  7:33 ` Jan Beulich
  2022-08-16  7:34 ` [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
  6 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:33 UTC (permalink / raw)
  To: Binutils

In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
to prefix parsing. Utilize i.error just like match_template() does.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -236,6 +236,8 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    unsupported_on_arch,
+    unsupported_64bit,
     invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
@@ -4852,6 +4854,14 @@ md_assemble (char *line)
     {
       if (!copy)
 	goto match_error;
+      if (i.error != no_error)
+	{
+	  if (!i.suffix)
+	    goto no_match;
+	  /* No point in trying a 2nd pass - it'll only find the same suffix
+	     again.  */
+	  goto match_error;
+	}
       free (copy);
       return;
     }
@@ -4944,14 +4954,23 @@ md_assemble (char *line)
 
       if (!mnem_suffix)
 	{
+  no_match:
 	  pass1_err = i.error;
 	  pass1_mnem = current_templates->start->name;
 	  line = copy;
 	  copy = NULL;
 	  goto retry;
 	}
-      free (copy);
+
+      /* If a non-/only-64bit template (group) was found in pass 1, and if
+	 _some_ template (group) was found in pass 2, squash pass 1's
+	 error.  */
+      if (pass1_err == unsupported_64bit)
+	pass1_mnem = NULL;
+
   match_error:
+      free (copy);
+
       switch (pass1_mnem ? pass1_err : i.error)
 	{
 	default:
@@ -4984,6 +5003,17 @@ md_assemble (char *line)
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : current_templates->start->name);
 	  return;
+	case unsupported_on_arch:
+	  as_bad (_("`%s' is not supported on `%s%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  cpu_arch_name ? cpu_arch_name : default_arch,
+		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
+	  return;
+	case unsupported_64bit:
+	  as_bad (_("`%s' is %s supported in 64-bit mode"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
 	  break;
@@ -5600,16 +5630,13 @@ parse_insn (char *line, char *mnemonic)
 	return l;
     }
 
-  if (!(supported & CPU_FLAGS_64BIT_MATCH))
-    as_bad (flag_code == CODE_64BIT
-	    ? _("`%s' is not supported in 64-bit mode")
-	    : _("`%s' is only supported in 64-bit mode"),
-	    current_templates->start->name);
-  else
-    as_bad (_("`%s' is not supported on `%s%s'"),
-	    current_templates->start->name,
-	    cpu_arch_name ? cpu_arch_name : default_arch,
-	    cpu_sub_arch_name ? cpu_sub_arch_name : "");
+  if (pass1)
+    {
+      if (supported & CPU_FLAGS_64BIT_MATCH)
+        i.error = unsupported_on_arch;
+      else
+        i.error = unsupported_64bit;
+    }
 
   return NULL;
 }
--- a/gas/testsuite/gas/i386/movs.s
+++ b/gas/testsuite/gas/i386/movs.s
@@ -30,4 +30,10 @@ movs:
 .ifdef x86_64
 	movswq	%ax,%rax
 	movswq	(%rax),%rax
+
+	movsl	%eax,%rax
+	movsl	(%rax),%rax
+
+	movslq	%eax,%rax
+	movslq	(%rax),%rax
 .endif
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -241,6 +241,46 @@
 [ 	]*[1-9][0-9]*[ 	]+movswq	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movswq	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movsl	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movslq	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movzx:
 [ 	]*[1-9][0-9]*[ 	]+movzx	%al, %cl
 [ 	]*[1-9][0-9]*[ 	]+movzx	%ax, %cl
--- a/gas/testsuite/gas/i386/movx64.s
+++ b/gas/testsuite/gas/i386/movx64.s
@@ -241,6 +241,46 @@ movsx:
 	movswq	%eax, %rcx
 	movswq	%rax, %rcx
 
+	movsl	%al, %cl
+	movsl	%ax, %cl
+	movsl	%eax, %cl
+	movsl	%rax, %cl
+
+	movsl	%al, %cx
+	movsl	%ax, %cx
+	movsl	%eax, %cx
+	movsl	%rax, %cx
+
+	movsl	%al, %ecx
+	movsl	%ax, %ecx
+	movsl	%eax, %ecx
+	movsl	%rax, %ecx
+
+	movsl	%al, %rcx
+	movsl	%ax, %rcx
+	movsl	%eax, %rcx
+	movsl	%rax, %rcx
+
+	movslq	%al, %cl
+	movslq	%ax, %cl
+	movslq	%eax, %cl
+	movslq	%rax, %cl
+
+	movslq	%al, %cx
+	movslq	%ax, %cx
+	movslq	%eax, %cx
+	movslq	%rax, %cx
+
+	movslq	%al, %ecx
+	movslq	%ax, %ecx
+	movslq	%eax, %ecx
+	movslq	%rax, %ecx
+
+	movslq	%al, %rcx
+	movslq	%ax, %rcx
+	movslq	%eax, %rcx
+	movslq	%rax, %rcx
+
 movzx:
 	movzx	%al, %cl
 	movzx	%ax, %cl
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -164,9 +164,7 @@ movbe, 0x0f38f0, None, CpuMovbe, D|Modrm
 // Move with sign extend.
 movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
-// "movslq" must not be converted into "movsl" to avoid conflict with the
-// "movsl" string move instruction.
-movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
+movsl, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 movsxd, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case
  2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
                   ` (5 preceding siblings ...)
  2022-08-16  7:33 ` [PATCH 6/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
@ 2022-08-16  7:34 ` Jan Beulich
  2022-08-17 20:36   ` H.J. Lu
  6 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-16  7:34 UTC (permalink / raw)
  To: Binutils

Have its use, except where actually legitimate, result in the same "only
supported in 64-bit mode" diagnostic as emitted for other 64-bit only
insns. Also suppress deriving of the suffix in Intel mode except in the
legitimate cases. This in exchange allows dropping the respective code
from match_template().

Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
had no testcase whatsoever for these. Therefore such tests are being
added. Note that the removed line in the x86-64-lfence-load testcase
was redundant with the exact same one a few lines up.
---
With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
disassembler wouldn't better emit a Q suffix instead of the LL one.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4826,7 +4826,7 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy;
   const char *pass1_mnem = NULL;
   enum i386_error pass1_err = 0;
   const insn_template *t;
@@ -4860,6 +4860,7 @@ md_assemble (char *line)
 	    goto no_match;
 	  /* No point in trying a 2nd pass - it'll only find the same suffix
 	     again.  */
+	  mnem_suffix = i.suffix;
 	  goto match_error;
 	}
       free (copy);
@@ -5010,9 +5011,15 @@ md_assemble (char *line)
 		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
 	  return;
 	case unsupported_64bit:
-	  as_bad (_("`%s' is %s supported in 64-bit mode"),
-		  pass1_mnem ? pass1_mnem : current_templates->start->name,
-		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  if (ISLOWER (mnem_suffix))
+	    as_bad (_("`%s%c' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    mnem_suffix,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
+	  else
+	    as_bad (_("`%s' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
 	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
@@ -5355,6 +5362,23 @@ md_assemble (char *line)
     last_insn.kind = last_insn_other;
 }
 
+/* The Q suffix is generally valid only in 64-bit mode, with very few
+   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
+   and fisttp only one of their two templates is matched below: That's
+   sufficient since other relevant attributes are the same between both
+   respective templates.  */
+static INLINE bool q_suffix_allowed(const insn_template *t)
+{
+  return flag_code == CODE_64BIT
+	 || (t->opcode_modifier.opcodespace == SPACE_BASE
+	     && t->base_opcode == 0xdf
+	     && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
+	 || (t->opcode_modifier.opcodespace == SPACE_0F
+	     && t->base_opcode == 0xc7
+	     && t->opcode_modifier.opcodeprefix == PREFIX_NONE
+	     && t->extension_opcode == 1) /* cmpxchg8b */;
+}
+
 static char *
 parse_insn (char *line, char *mnemonic)
 {
@@ -5626,6 +5650,10 @@ parse_insn (char *line, char *mnemonic)
   for (t = current_templates->start; t < current_templates->end; ++t)
     {
       supported |= cpu_flags_match (t);
+
+      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
+	supported &= ~CPU_FLAGS_64BIT_MATCH;
+
       if (supported == CPU_FLAGS_PERFECT_MATCH)
 	return l;
     }
@@ -6661,20 +6689,12 @@ match_template (char mnem_suffix)
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
 
-      /* In general, don't allow
-	 - 64-bit operands outside of 64-bit mode,
-	 - 32-bit operands on pre-386.  */
+      /* In general, don't allow 32-bit operands on pre-386.  */
       specific_error = progress (mnem_suffix ? invalid_instruction_suffix
 					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
-      if (((i.suffix == QWORD_MNEM_SUFFIX
-	    && flag_code != CODE_64BIT
-	    && !(t->opcode_modifier.opcodespace == SPACE_0F
-		 && t->base_opcode == 0xc7
-		 && t->opcode_modifier.opcodeprefix == PREFIX_NONE
-		 && t->extension_opcode == 1) /* cmpxchg8b */)
-	   || (i.suffix == LONG_MNEM_SUFFIX
-	       && !cpu_arch_flags.bitfield.cpui386))
+      if (i.suffix == LONG_MNEM_SUFFIX
+	  && !cpu_arch_flags.bitfield.cpui386
 	  && (intel_syntax
 	      ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
 		 && !intel_float_operand (t->name))
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
 		    continue;
 		  break;
 		case QWORD_MNEM_SUFFIX:
-		  if (t->opcode_modifier.no_qsuf)
+		  if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
 		    continue;
 		  break;
 		case SHORT_MNEM_SUFFIX:
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -592,6 +592,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	add    \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xor    \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adc    \$0x1,%bl
--- a/gas/testsuite/gas/i386/opcode.s
+++ b/gas/testsuite/gas/i386/opcode.s
@@ -592,6 +592,11 @@ foo:
  cmovpe  0x90909090(%eax),%dx
  cmovpo 0x90909090(%eax),%dx
 
+ fildq  (%eax)
+ fildll (%eax)
+ fistpq (%eax)
+ fistpll (%eax)
+
 	.byte 0x82, 0xc3, 0x01
 	.byte 0x82, 0xf3, 0x01
 	.byte 0x82, 0xd3, 0x01
--- a/gas/testsuite/gas/i386/opcode-intel.d
+++ b/gas/testsuite/gas/i386/opcode-intel.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
  +[a-f0-9]+:	82 c3 01             	add    bl,0x1
  +[a-f0-9]+:	82 f3 01             	xor    bl,0x1
  +[a-f0-9]+:	82 d3 01             	adc    bl,0x1
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnpl -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovpw -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnpw -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	addb   \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xorb   \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adcb   \$0x1,%bl
--- a/gas/testsuite/gas/i386/sse3.d
+++ b/gas/testsuite/gas/i386/sse3.d
@@ -13,29 +13,30 @@ Disassembly of section .text:
   10:	df 88 90 90 90 90 [ 	]*fisttps -0x6f6f6f70\(%eax\)
   16:	db 88 90 90 90 90 [ 	]*fisttpl -0x6f6f6f70\(%eax\)
   1c:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
-  22:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
-  27:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
-  2b:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
-  2f:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
-  33:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
-  37:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
-  3b:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
-  3f:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
-  44:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
-  48:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4b:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4e:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
-  52:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  56:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
-  5a:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
-  5e:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
-  62:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
-  66:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  69:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  6c:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  70:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  74:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  78:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
+[ 	]*[0-9a-f]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
+[ 	]*[0-9a-f]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
+[ 	]*[0-9a-f]+:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
+[ 	]*[0-9a-f]+:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
+[ 	]*[0-9a-f]+:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
+[ 	]*[0-9a-f]+:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
+[ 	]*[0-9a-f]+:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
+[ 	]*[0-9a-f]+:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
+[ 	]*[0-9a-f]+:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
+[ 	]*[0-9a-f]+:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
+[ 	]*[0-9a-f]+:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
 [ 	]*[0-9a-f]+:	0f 01 c8[ 	]+monitor %eax,%ecx,%edx
 [ 	]*[0-9a-f]+:	67 0f 01 c8[ 	]+monitor %ax,%ecx,%edx
 [ 	]*[0-9a-f]+:	0f 01 c9[ 	]+mwait  %eax,%ecx
--- a/gas/testsuite/gas/i386/sse3.s
+++ b/gas/testsuite/gas/i386/sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x90909090(%eax)
 	fisttpl		0x90909090(%eax)
+	fisttpq		0x90909090(%eax)
 	fisttpll	0x90909090(%eax)
 	haddpd		0x0(%ebp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/sse3-intel.d
+++ b/gas/testsuite/gas/i386/sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[0-9a-f]+:	df 88 90 90 90 90[ 	]+fisttp WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	db 88 90 90 90 90[ 	]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
 [ 	]*[0-9a-f]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[0-9a-f]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[edi\]
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -44,16 +44,21 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	db 55 00             	fistl  0x0\(%rbp\)
  +[a-f0-9]+:	df 55 00             	fists  0x0\(%rbp\)
+ +[a-f0-9]+:	db 5d 00             	fistpl 0x0\(%rbp\)
+ +[a-f0-9]+:	df 5d 00             	fistps 0x0\(%rbp\)
+ +[a-f0-9]+:	df 7d 00             	fistpll 0x0\(%rbp\)
  +[a-f0-9]+:	db 45 00             	fildl  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	df 6d 00             	fildll 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%rbp\)
  +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
- +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	db 4d 00             	fisttpl 0x0\(%rbp\)
  +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%rbp\)
+ +[a-f0-9]+:	dd 4d 00             	fisttpll 0x0\(%rbp\)
  +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%rbp\)
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -27,12 +27,17 @@ _start:
 	flds (%rbp)
 	fistl (%rbp)
 	fists (%rbp)
+	fistpl (%rbp)
+	fistps (%rbp)
+	fistpq (%rbp)
 	fildl (%rbp)
 	filds (%rbp)
+	fildq (%rbp)
 	fsave (%rbp)
 	frstor (%rbp)
-	filds (%rbp)
+	fisttpl (%rbp)
 	fisttps (%rbp)
+	fisttpq (%rbp)
 	fldenv (%rbp)
 	fstenv (%rbp)
 	fadds  (%rbp)
--- a/gas/testsuite/gas/i386/x86-64-sse3.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3.d
@@ -13,6 +13,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00 [ 	]*fisttps 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00 [ 	]*fisttpl 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%rbp\),%xmm4
 [ 	]*[a-f0-9]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
 [ 	]*[a-f0-9]+:	f2 0f 7c 37 [ 	]*haddps \(%rdi\),%xmm6
--- a/gas/testsuite/gas/i386/x86-64-sse3.s
+++ b/gas/testsuite/gas/i386/x86-64-sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x909090(%rax)
 	fisttpl		0x909090(%rax)
+	fisttpq		0x909090(%rax)
 	fisttpll	0x909090(%rax)
 	haddpd		0x0(%rbp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00[ 	]+fisttp WORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00[ 	]+fisttp DWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
 [ 	]*[a-f0-9]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[a-f0-9]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[rdi\]


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-16  7:30 ` [PATCH 1/7] x86/Intel: restrict suffix derivation Jan Beulich
@ 2022-08-17 19:19   ` H.J. Lu
  2022-08-18  6:07     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 19:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> While in some cases deriving an AT&T-style suffix from an Intel syntax
> memory operand size specifier is necessary, in many cases this is not
> only pointless, but has led to the introduction of various workarounds:
> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> ToQword attributes. Suppress suffix derivation when we can clearly tell
> that the memory operand's size isn't going to be needed to infer the
> possible need for the low byte/word opcode bit or an operand size prefix
> (0x66 or REX.W).
>
> As a result ToDword and ToQword can be dropped entirely, plus a fair
> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> 16-bit code using SIMD insns isn't well tested, clone an existing
> testcase just enough to cover a few insns which are potentially
> problematic but are being touched here.)
>
> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
> counterparts (which I think should also be corrected, but perhaps better
> in a separate change).

I don't think allowing more unnecessary L and Q suffixes for AVX
instructions is desirable.   I prefer not to allow unnecessary L and
Q suffixes in folded entries.   We can add special entries to allow
the existing instructions with suffixes.

> ---
> Long term suffix derivation should be dropped altogether, not the least
> such that bogus error messages like "incorrect register `...' used with
> `...' suffix" don't misguid people anymore when no suffix was used at
> all.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7071,42 +7071,22 @@ process_suffix (void)
>         }
>        else if (i.suffix == BYTE_MNEM_SUFFIX)
>         {
> -         if (intel_syntax
> -             && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
> -             && i.tm.opcode_modifier.no_bsuf)
> -           i.suffix = 0;
> -         else if (!check_byte_reg ())
> +         if (!check_byte_reg ())
>             return 0;
>         }
>        else if (i.suffix == LONG_MNEM_SUFFIX)
>         {
> -         if (intel_syntax
> -             && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
> -             && i.tm.opcode_modifier.no_lsuf
> -             && !i.tm.opcode_modifier.todword
> -             && !i.tm.opcode_modifier.toqword)
> -           i.suffix = 0;
> -         else if (!check_long_reg ())
> +         if (!check_long_reg ())
>             return 0;
>         }
>        else if (i.suffix == QWORD_MNEM_SUFFIX)
>         {
> -         if (intel_syntax
> -             && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
> -             && i.tm.opcode_modifier.no_qsuf
> -             && !i.tm.opcode_modifier.todword
> -             && !i.tm.opcode_modifier.toqword)
> -           i.suffix = 0;
> -         else if (!check_qword_reg ())
> +         if (!check_qword_reg ())
>             return 0;
>         }
>        else if (i.suffix == WORD_MNEM_SUFFIX)
>         {
> -         if (intel_syntax
> -             && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
> -             && i.tm.opcode_modifier.no_wsuf)
> -           i.suffix = 0;
> -         else if (!check_word_reg ())
> +         if (!check_word_reg ())
>             return 0;
>         }
>        else if (intel_syntax
> @@ -7566,20 +7546,9 @@ check_long_reg (void)
>                  || i.tm.operand_types[op].bitfield.instance == Accum)
>              && i.tm.operand_types[op].bitfield.dword)
>        {
> -       if (intel_syntax
> -           && i.tm.opcode_modifier.toqword
> -           && i.types[0].bitfield.class != RegSIMD)
> -         {
> -           /* Convert to QWORD.  We want REX byte. */
> -           i.suffix = QWORD_MNEM_SUFFIX;
> -         }
> -       else
> -         {
> -           as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
> -                   register_prefix, i.op[op].regs->reg_name,
> -                   i.suffix);
> -           return 0;
> -         }
> +       as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
> +               register_prefix, i.op[op].regs->reg_name, i.suffix);
> +       return 0;
>        }
>    return 1;
>  }
> @@ -7617,20 +7586,9 @@ check_qword_reg (void)
>        {
>         /* Prohibit these changes in the 64bit mode, since the
>            lowering is more complicated.  */
> -       if (intel_syntax
> -           && i.tm.opcode_modifier.todword
> -           && i.types[0].bitfield.class != RegSIMD)
> -         {
> -           /* Convert to DWORD.  We don't want REX byte. */
> -           i.suffix = LONG_MNEM_SUFFIX;
> -         }
> -       else
> -         {
> -           as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
> -                   register_prefix, i.op[op].regs->reg_name,
> -                   i.suffix);
> -           return 0;
> -         }
> +       as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
> +               register_prefix, i.op[op].regs->reg_name, i.suffix);
> +       return 0;
>        }
>    return 1;
>  }
> @@ -7670,14 +7628,6 @@ check_word_reg (void)
>                 i.suffix);
>         return 0;
>        }
> -    /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */
> -    else if (i.types[op].bitfield.qword
> -            && intel_syntax
> -            && i.tm.opcode_modifier.toqword)
> -      {
> -         /* Convert to QWORD.  We want EVEX.W byte. */
> -         i.suffix = QWORD_MNEM_SUFFIX;
> -      }
>    return 1;
>  }
>
> --- a/gas/config/tc-i386-intel.c
> +++ b/gas/config/tc-i386-intel.c
> @@ -790,9 +790,83 @@ i386_intel_operand (char *operand_string
>           break;
>         }
>
> +      /* Now check whether we actually want to infer an AT&T-like suffix.
> +        We really only need to do this when operand size determination (incl.
> +        REX.W) is going to be derived from it.  For this we check whether the
> +        given suffix is valid for any of the candidate templates.  */
> +      if (suffix && suffix != i.suffix
> +         && (current_templates->start->opcode_modifier.opcodespace != SPACE_BASE
> +             || current_templates->start->base_opcode != 0x62 /* bound */))
> +       {
> +         const insn_template *t;
> +
> +         for (t = current_templates->start; t < current_templates->end; ++t)
> +           {
> +             /* Operands haven't been swapped yet.  */
> +             unsigned int op = t->operands - 1 - this_operand;
> +
> +             /* Easy checks to skip templates which won't match anyway.  */
> +             if (this_operand >= t->operands || t->opcode_modifier.attsyntax)
> +               continue;
> +
> +             switch (suffix)
> +               {
> +               case BYTE_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_bsuf)
> +                   continue;
> +                 break;
> +               case WORD_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_wsuf)
> +                   continue;
> +                 break;
> +               case LONG_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_lsuf)
> +                   continue;
> +                 break;
> +               case QWORD_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_qsuf)
> +                   continue;
> +                 break;
> +               case SHORT_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_ssuf)
> +                   continue;
> +                 break;
> +               case LONG_DOUBLE_MNEM_SUFFIX:
> +                 if (t->opcode_modifier.no_ldsuf)
> +                   continue;
> +                 break;
> +               default:
> +                 abort ();
> +               }
> +
> +             /* In a few cases suffixes are permitted, but we can nevertheless
> +                derive that these aren't going to be needed.  This is only of
> +                interest for insns using ModR/M, plus we can skip templates with
> +                swappable operands here (simplifying subsequent logic).  */
> +             if (!t->opcode_modifier.modrm || t->opcode_modifier.d)
> +               break;
> +
> +             if (!t->operand_types[op].bitfield.baseindex)
> +               continue;
> +
> +             switch (t->operand_types[op].bitfield.class)
> +               {
> +               case RegMMX:
> +               case RegSIMD:
> +               case RegMask:
> +                 continue;
> +               }
> +
> +             break;
> +           }
> +
> +         if (t == current_templates->end)
> +           suffix = 0;
> +       }
> +
>        if (!i.suffix)
>         i.suffix = suffix;
> -      else if (i.suffix != suffix)
> +      else if (suffix && i.suffix != suffix)
>         {
>           as_bad (_("conflicting operand size modifiers"));
>           return 0;
> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -169,6 +169,7 @@ if [gas_32_check] then {
>      run_dump_test "simd"
>      run_dump_test "simd-intel"
>      run_dump_test "simd-suffix"
> +    run_dump_test "simd16"
>      run_dump_test "mem"
>      run_dump_test "mem-intel"
>      run_dump_test "reg"
> --- a/gas/testsuite/gas/i386/simd.s
> +++ b/gas/testsuite/gas/i386/simd.s
> @@ -1,5 +1,6 @@
>         .text
>  _start:
> +       .ifndef use16
>         addsubps 0x12345678,%xmm1
>         comisd 0x12345678,%xmm1
>         comiss 0x12345678,%xmm1
> @@ -31,6 +32,7 @@ _start:
>         punpcklqdq 0x12345678,%xmm1
>         ucomisd 0x12345678,%xmm1
>         ucomiss 0x12345678,%xmm1
> +       .endif
>
>         cmpeqsd (%eax),%xmm0
>         cmpeqss (%eax),%xmm0
> @@ -101,6 +103,7 @@ cmpsd       $0x10,(%eax),%xmm7
>
>         .intel_syntax noprefix
>
> +       .ifndef use16
>  addsubps xmm1,XMMWORD PTR ds:0x12345678
>  comisd xmm1,QWORD PTR ds:0x12345678
>  comiss xmm1,DWORD PTR ds:0x12345678
> @@ -132,6 +135,8 @@ punpcklwd xmm1,XMMWORD PTR ds:0x12345678
>  punpcklqdq xmm1,XMMWORD PTR ds:0x12345678
>  ucomisd xmm1,QWORD PTR ds:0x12345678
>  ucomiss xmm1,DWORD PTR ds:0x12345678
> +       .endif
> +
>  cmpeqsd xmm0,QWORD PTR [eax]
>  cmpeqss xmm0,DWORD PTR [eax]
>  cvtpi2pd xmm0,QWORD PTR [eax]
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/simd16.d
> @@ -0,0 +1,137 @@
> +#as: --defsym use16=1 -I${srcdir}/$subdir
> +#objdump: -dw -Mi8086
> +#name: i386 SIMD (16-bit)
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+ <_start>:
> +[      ]*[a-f0-9]+:    67 f2 0f c2 00 00       cmpeqsd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f c2 00 00       cmpeqss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 2a 00          cvtpi2pd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 2a 00             cvtpi2ps \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 2d 00             cvtps2pi \(%eax\),%mm0
> +[      ]*[a-f0-9]+:    67 f2 0f 2d 00          cvtsd2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 2c 00          cvttsd2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 5a 00          cvtsd2ss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5a 00          cvtss2sd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 2d 00          cvtss2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f3 0f 2c 00          cvttss2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 5e 00          divsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5e 00          divss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 5f 00          maxsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5f 00          maxss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5d 00          minss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5d 00          minss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 2b 00          movntsd %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f3 0f 2b 00          movntss %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f2 0f 10 00          movsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 11 00          movsd  %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f3 0f 10 00          movss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 11 00          movss  %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f2 0f 59 00          mulsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 59 00          mulss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 53 00          rcpss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 0b 00 00    roundsd \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 0a 00 00    roundss \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 52 00          rsqrtss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 51 00          sqrtsd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 51 00          sqrtss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 5c 00          subsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5c 00          subss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 20 00       pmovsxbw \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 21 00       pmovsxbd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 22 00       pmovsxbq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 23 00       pmovsxwd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 24 00       pmovsxwq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 25 00       pmovsxdq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 30 00       pmovzxbw \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 31 00       pmovzxbd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 32 00       pmovzxbq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 33 00       pmovzxwd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 34 00       pmovzxwq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 35 00       pmovzxdq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 21 00 00    insertps \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 15 08          unpckhpd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 0f 15 08             unpckhps \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 66 0f 14 08          unpcklpd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 0f 14 08             unpcklps \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    f3 0f c2 f7 10          cmpss  \$0x10,%xmm7,%xmm6
> +[      ]*[a-f0-9]+:    67 f3 0f c2 38 10       cmpss  \$0x10,\(%eax\),%xmm7
> +[      ]*[a-f0-9]+:    f2 0f c2 f7 10          cmpsd  \$0x10,%xmm7,%xmm6
> +[      ]*[a-f0-9]+:    67 f2 0f c2 38 10       cmpsd  \$0x10,\(%eax\),%xmm7
> +[      ]*[a-f0-9]+:    f3 0f 2a c8             cvtsi2ss %eax,%xmm1
> +[      ]*[a-f0-9]+:    f2 0f 2a c8             cvtsi2sd %eax,%xmm1
> +[      ]*[a-f0-9]+:    f3 0f 2a c8             cvtsi2ss %eax,%xmm1
> +[      ]*[a-f0-9]+:    f2 0f 2a c8             cvtsi2sd %eax,%xmm1
> +[      ]*[a-f0-9]+:    67 f3 0f 2a 08          cvtsi2ss \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f 2a 08          cvtsi2sd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f3 0f 2a 08          cvtsi2ss \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f 2a 08          cvtsi2sd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f c2 00 00       cmpeqsd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f c2 00 00       cmpeqss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 2a 00          cvtpi2pd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 2a 00             cvtpi2ps \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 2d 00             cvtps2pi \(%eax\),%mm0
> +[      ]*[a-f0-9]+:    67 f2 0f 2d 00          cvtsd2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 2c 00          cvttsd2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 5a 00          cvtsd2ss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5a 00          cvtss2sd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 2d 00          cvtss2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f3 0f 2c 00          cvttss2si \(%eax\),%eax
> +[      ]*[a-f0-9]+:    67 f2 0f 5e 00          divsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5e 00          divss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 5f 00          maxsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5f 00          maxss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5d 00          minss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5d 00          minss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 2b 00          movntsd %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f3 0f 2b 00          movntss %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f2 0f 10 00          movsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 11 00          movsd  %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f3 0f 10 00          movss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 11 00          movss  %xmm0,\(%eax\)
> +[      ]*[a-f0-9]+:    67 f2 0f 59 00          mulsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 59 00          mulss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 53 00          rcpss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 0b 00 00    roundsd \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 0a 00 00    roundss \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 52 00          rsqrtss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 51 00          sqrtsd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 51 00          sqrtss \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f2 0f 5c 00          subsd  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 f3 0f 5c 00          subss  \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 20 00       pmovsxbw \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 21 00       pmovsxbd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 22 00       pmovsxbq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 23 00       pmovsxwd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 24 00       pmovsxwq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 25 00       pmovsxdq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 30 00       pmovzxbw \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 31 00       pmovzxbd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 32 00       pmovzxbq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 33 00       pmovzxwd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 34 00       pmovzxwq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 38 35 00       pmovzxdq \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 3a 21 00 00    insertps \$0x0,\(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 15 00          unpckhpd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 15 00             unpckhps \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 66 0f 14 00          unpcklpd \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    67 0f 14 00             unpcklps \(%eax\),%xmm0
> +[      ]*[a-f0-9]+:    f3 0f c2 f7 10          cmpss  \$0x10,%xmm7,%xmm6
> +[      ]*[a-f0-9]+:    67 f3 0f c2 38 10       cmpss  \$0x10,\(%eax\),%xmm7
> +[      ]*[a-f0-9]+:    f2 0f c2 f7 10          cmpsd  \$0x10,%xmm7,%xmm6
> +[      ]*[a-f0-9]+:    67 f2 0f c2 38 10       cmpsd  \$0x10,\(%eax\),%xmm7
> +[      ]*[a-f0-9]+:    f3 0f 2a c8             cvtsi2ss %eax,%xmm1
> +[      ]*[a-f0-9]+:    f2 0f 2a c8             cvtsi2sd %eax,%xmm1
> +[      ]*[a-f0-9]+:    f3 0f 2a c8             cvtsi2ss %eax,%xmm1
> +[      ]*[a-f0-9]+:    f2 0f 2a c8             cvtsi2sd %eax,%xmm1
> +[      ]*[a-f0-9]+:    67 f3 0f 2a 08          cvtsi2ss \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f3 0f 2a 08          cvtsi2ss \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f 2a 08          cvtsi2sd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f 2a 08          cvtsi2sd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f3 0f 2a 08          cvtsi2ss \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 f2 0f 2a 08          cvtsi2sd \(%eax\),%xmm1
> +[      ]*[a-f0-9]+:    67 0f 2c 00             cvttps2pi \(%eax\),%mm0
> +#pass
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/simd16.s
> @@ -0,0 +1,2 @@
> +       .code16
> +       .include "simd.s"
> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -706,8 +706,6 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (RegKludge),
>    BITFIELD (Implicit1stXmm0),
>    BITFIELD (PrefixOk),
> -  BITFIELD (ToDword),
> -  BITFIELD (ToQword),
>    BITFIELD (AddrPrefixOpReg),
>    BITFIELD (IsPrefix),
>    BITFIELD (ImmExt),
> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -521,10 +521,6 @@ enum
>  #define PrefixHLELock          5 /* Okay with a LOCK prefix.  */
>  #define PrefixHLEAny           6 /* Okay with or without a LOCK prefix.  */
>    PrefixOk,
> -  /* Convert to DWORD */
> -  ToDword,
> -  /* Convert to QWORD */
> -  ToQword,
>    /* Address prefix changes register operand */
>    AddrPrefixOpReg,
>    /* opcode is a prefix */
> @@ -740,8 +736,6 @@ typedef struct i386_opcode_modifier
>    unsigned int regkludge:1;
>    unsigned int implicit1stxmm0:1;
>    unsigned int prefixok:3;
> -  unsigned int todword:1;
> -  unsigned int toqword:1;
>    unsigned int addrprefixopreg:1;
>    unsigned int isprefix:1;
>    unsigned int immext:1;
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -970,18 +970,18 @@ pause, 0xf390, None, Cpu186, No_bSuf|No_
>  <mmx:cpu:pfx:attr:shimm:reg:mem, +
>      $avx:CpuAVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV=2|VexW0|SSE2AVX:RegXMM:Xmmword, +
>      $sse:CpuSSE2:66:::RegXMM:Xmmword, +
> -    $mmx:CpuMMX::NoRex64::RegMMX:Qword>
> +    $mmx:CpuMMX::::RegMMX:Qword>
>
>  <sse2:cpu:attr:scal:vvvv:shimm, +
>      $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV=2|VexW0|SSE2AVX, +
> -    $sse:CpuSSE2::NoRex64::>
> +    $sse:CpuSSE2::::>
>
>  <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
>      b:0:VexW0:Byte:CpuAVX512DQ:66:CpuAVX512VBMI, +
>      w:1:VexW1:Word:CpuAVX512F::CpuAVX512BW>
>
>  <dq:opc:vexw:vexw64:elem:cpu64:gpr:kpfx, +
> -    d:0:VexW0:IgnoreSize:Dword::Reg32:66, +
> +    d:0:VexW0::Dword::Reg32:66, +
>      q:1:VexW1:VexW1:Qword:Cpu64:Reg64:>
>
>  emms, 0xf77, None, CpuMMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> @@ -989,7 +989,7 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
>  // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
>  // spec). AMD's spec, having been in existence for much longer, failed to
>  // recognize that and specified movd for 32- and 64-bit operations.
> -movd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
> +movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
>  movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
>  movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>  movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
> @@ -998,10 +998,10 @@ movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm
>  movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
>  movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
> -movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
> -movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
> +movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
> +movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
>  movq, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
> -movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
> +movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
>  movq, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegMMX }
>  packssdw<mmx>, 0x<mmx:pfx>0f6b, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  packsswb<mmx>, 0x<mmx:pfx>0f63, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
> @@ -1009,7 +1009,7 @@ packuswb<mmx>, 0x<mmx:pfx>0f67, None, <m
>  padd<bw><mmx>, 0x<mmx:pfx>0ffc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  paddd<mmx>, 0x<mmx:pfx>0ffe, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  paddq<sse2>, 0x660fd4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  padds<bw><mmx>, 0x<mmx:pfx>0fec | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  paddus<bw><mmx>, 0x<mmx:pfx>0fdc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  pand<mmx>, 0x<mmx:pfx>0fdb, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
> @@ -1037,25 +1037,25 @@ psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc
>  psub<bw><mmx>, 0x<mmx:pfx>0ff8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  psubd<mmx>, 0x<mmx:pfx>0ffa, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  psubq<sse2>, 0x660ffb, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  psubs<bw><mmx>, 0x<mmx:pfx>0fe8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  psubus<bw><mmx>, 0x<mmx:pfx>0fd8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  punpckhbw<mmx>, 0x<mmx:pfx>0f68, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  punpckhwd<mmx>, 0x<mmx:pfx>0f69, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  punpckhdq<mmx>, 0x<mmx:pfx>0f6a, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>  punpcklbw<sse2>, 0x660f60, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -punpcklbw, 0xf60, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +punpcklbw, 0xf60, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  punpcklwd<sse2>, 0x660f61, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -punpcklwd, 0xf61, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +punpcklwd, 0xf61, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  punpckldq<sse2>, 0x660f62, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -punpckldq, 0xf62, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +punpckldq, 0xf62, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pxor<mmx>, 0x<mmx:pfx>0fef, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
>
>  // SSE instructions.
>
>  <sse:cpu:attr:scal:vvvv, +
>      $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
> -    $sse:CpuSSE::IgnoreSize:>
> +    $sse:CpuSSE:::>
>  <frel:imm:comm, eq:0:C, lt:1:, le:2:, unord:3:C, neq:4:C, nlt:5:, nle:6:, ord:7:C>
>
>  addps<sse>, 0x0f58, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> @@ -1067,21 +1067,21 @@ cmp<frel>ss<sse>, 0xf30fc2, <frel:imm>,
>  cmpps<sse>, 0x0fc2, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  cmpss<sse>, 0xf30fc2, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  comiss<sse>, 0x0f2f, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
> -cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
> +cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
> +cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
>  cvtsi2ss<sse>, 0xf30f2a, None, <sse:cpu>|CpuNo64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>  cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
>  cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
>  cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
>  cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
> -cvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
> -cvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvtss2si, 0xf32d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
> +cvttss2si, 0xf32c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
>  divps<sse>, 0x0f5e, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  divss<sse>, 0xf30f5e, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
> +ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
>  maskmovq, 0xff7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegMMX }
>  maxps<sse>, 0x0f5f, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  maxss<sse>, 0xf30f5f, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> @@ -1089,51 +1089,51 @@ minps<sse>, 0x0f5d, None, <sse:cpu>, Mod
>  minss<sse>, 0xf30f5d, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movaps<sse>, 0x0f28, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  movhlps<sse>, 0x0f12, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
> -movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> -movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> -movhps, 0xf16, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
> +movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> +movhps, 0xf16, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
>  movlhps<sse>, 0x0f16, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
> -movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> -movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> -movlps, 0xf12, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
> +movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> +movlps, 0xf12, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
>  movmskps<sse>, 0x0f50, None, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
>  movntps<sse>, 0x0f2b, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
> -movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
> +movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
>  movntdq<sse2>, 0x660fe7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
> -movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
> +movss, 0xf310, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
>  movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
> -movss, 0xf30f10, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +movss, 0xf30f10, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movups<sse>, 0x0f10, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  mulps<sse>, 0x0f59, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  mulss<sse>, 0xf30f59, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  orps<sse>, 0x0f56, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pavg<bw><sse2>, 0x660fe0 | (3 * <bw:opc>), None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  pextrw<sse2>, 0x660fc5, None, <sse2:cpu>, Load|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
>  pextrw, 0xfc5, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }
>  pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
> -pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
> +pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
>  pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, Reg32|Reg64, RegMMX }
> -pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
> +pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
>  pmaxsw<sse2>, 0x660fee, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pmaxub<sse2>, 0x660fde, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pminsw<sse2>, 0x660fea, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pminub<sse2>, 0x660fda, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pmovmskb<sse2>, 0x660fd7, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
>  pmovmskb, 0xfd7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegMMX, Reg32|Reg64 }
>  pmulhuw<sse2>, 0x660fe4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  prefetchnta, 0xf18, 0, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
>  prefetcht0, 0xf18, 1, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
>  prefetcht1, 0xf18, 2, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
>  prefetcht2, 0xf18, 3, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
> -psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  rcpps<sse>, 0x0f53, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  rcpss<sse>, 0xf30f53, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  rsqrtps<sse>, 0x0f52, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> @@ -1142,7 +1142,7 @@ sfence, 0xfaef8, None, CpuSSE|Cpu3dnowA,
>  shufps<sse>, 0x0fc6, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  sqrtps<sse>, 0x0f51, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sqrtss<sse>, 0xf30f51, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
> +stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
>  subps<sse>, 0x0f5c, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  subss<sse>, 0xf30f5c, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  ucomiss<sse>, 0x0f2e, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> @@ -1161,9 +1161,9 @@ cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>,
>  cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegMMX, RegXMM }
> -cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> -cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex, RegXMM }
> +cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
> +cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
>  cvtsi2sd<sse2>, 0xf20f2a, None, <sse2:cpu>|CpuNo64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>  cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
>  cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
> @@ -1176,17 +1176,17 @@ maxsd<sse2>, 0xf20f5f, None, <sse2:cpu>,
>  minpd<sse2>, 0x660f5d, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  minsd<sse2>, 0xf20f5d, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movapd<sse2>, 0x660f28, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> -movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> -movhpd, 0x660f16, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
> -movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> -movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> -movlpd, 0x660f12, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
> +movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> +movhpd, 0x660f16, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
> +movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
> +movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
>  movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
>  movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
> -movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
>  movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
> -movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  mulpd<sse2>, 0x660f59, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  mulsd<sse2>, 0xf20f59, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> @@ -1200,21 +1200,21 @@ ucomisd<sse2>, 0x660f2e, None, <sse2:cpu
>  unpckhpd<sse2>, 0x660f15, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  unpcklpd<sse2>, 0x660f14, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  xorpd<sse2>, 0x660f57, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  cvtpd2dq<sse2>, 0xf20fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  cvtdq2ps<sse2>, 0x0f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  cvtpd2pi, 0x660f2d, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
>  cvtpd2ps<sse2>, 0x660f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  cvtps2dq<sse2>, 0x660f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -cvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +cvtsd2si, 0xf22d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>
>  cvttpd2pi, 0x660f2c, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
> -cvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvttsd2si, 0xf22c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
>  cvttpd2dq<sse2>, 0x660fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  cvttps2dq<sse2>, 0xf30f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  maskmovdqu<sse2>, 0x660ff7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
> @@ -1223,7 +1223,7 @@ movdqu<sse2>, 0xf30f6f, None, <sse2:cpu>
>  movdq2q, 0xf20fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegMMX }
>  movq2dq, 0xf30fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
>  pmuludq<sse2>, 0x660ff4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>  pshufd<sse2>, 0x660f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  pshufhw<sse2>, 0xf30f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  pshuflw<sse2>, 0xf20f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
> @@ -1245,7 +1245,7 @@ haddps<sse3>, 0xf20f7c, None, <sse3:cpu>
>  hsubpd<sse3>, 0x660f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  hsubps<sse3>, 0xf20f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  lddqu<sse3>, 0xf20ff0, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
> -movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movshdup<sse3>, 0xf30f16, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  movsldup<sse3>, 0xf30f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>
> @@ -1276,17 +1276,17 @@ mwait, 0xf01c9, None, CpuSSE3, CheckRegS
>  // VMX instructions.
>
>  vmcall, 0xf01c1, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -vmclear, 0x660fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> +vmclear, 0x660fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
>  vmlaunch, 0xf01c2, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
>  vmresume, 0xf01c3, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -vmptrld, 0xfc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> -vmptrst, 0xfc7, 7, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> +vmptrld, 0xfc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
> +vmptrst, 0xfc7, 7, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
>  vmread, 0xf78, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Reg32|Unspecified|BaseIndex }
>  vmread, 0xf78, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64, Reg64|Qword|Unspecified|BaseIndex }
>  vmwrite, 0xf79, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32 }
>  vmwrite, 0xf79, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64|Qword|Unspecified|BaseIndex, Reg64 }
>  vmxoff, 0xf01c4, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -vmxon, 0xf30fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> +vmxon, 0xf30fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
>
>  // VMFUNC instruction
>
> @@ -1313,7 +1313,7 @@ invpcid, 0x660f3882, None, CpuINVPCID|Cp
>  <ssse3:cpu:pfx:attr:vvvv:reg:mem, +
>      $avx:CpuAVX:66:Vex128|VexW0|SSE2AVX:VexVVVV:RegXMM:Xmmword, +
>      $sse:CpuSSSE3:66:::RegXMM:Xmmword, +
> -    $mmx:CpuSSSE3::NoRex64::RegMMX:Qword>
> +    $mmx:CpuSSSE3::::RegMMX:Qword>
>
>  phaddw<ssse3>, 0x<ssse3:pfx>0f3801, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
>  phaddd<ssse3>, 0x<ssse3:pfx>0f3802, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
> @@ -1333,7 +1333,7 @@ pabsd<ssse3>, 0x<ssse3:pfx>0f381e, None,
>  // SSE4.1 instructions.
>
>  <sse41:cpu:attr:scal:vvvv, $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, $sse:CpuSSE4_1:::>
> -<sd:ppfx:spfx:opc:vexw:elem:scal, s::f3:0:VexW0:Dword:IgnoreSize, d:66:f2:1:VexW1:Qword:NoRex64>
> +<sd:ppfx:spfx:opc:vexw:elem, s::f3:0:VexW0:Dword, d:66:f2:1:VexW1:Qword>
>
>  blendp<sd><sse41>, 0x660f3a0c | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  blendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=1|VexSources=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
> @@ -1341,11 +1341,11 @@ blendvp<sd>, 0x664a | <sd:opc>, None, Cp
>  blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
>  blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  dpp<sd><sse41>, 0x660f3a40 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
> -extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
> +extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
>  extractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64 }
>  extractps, 0x660f3a17, None, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
>  extractps, 0x660f3a17, None, CpuSSE4_1|Cpu64, RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, RegXMM, Reg64 }
> -insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|IgnoreSize|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movntdqa<sse41>, 0x660f382a, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
>  mpsadbw<sse41>, 0x660f3a42, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  packusdw<sse41>, 0x660f382b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> @@ -1356,7 +1356,7 @@ pblendvb, 0x660f3810, None, CpuSSE4_1, M
>  pblendw<sse41>, 0x660f3a0e, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
>  pcmpeqq<sse41>, 0x660f3829, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, RegMem|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
> -pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
> +pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
>  pextrd<sse41>, 0x660f3a16, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, Reg32|Unspecified|BaseIndex }
>  pextrq, 0x6616, None, CpuAVX|Cpu64, Modrm|Vex|Space0F3A|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
>  pextrq, 0x660f3a16, None, CpuSSE4_1|Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
> @@ -1374,23 +1374,23 @@ pminsb<sse41>, 0x660f3838, None, <sse41:
>  pminsd<sse41>, 0x660f3839, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  pminud<sse41>, 0x660f383b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  pminuw<sse41>, 0x660f383a, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> -pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> -pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  pmuldq<sse41>, 0x660f3828, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  pmulld<sse41>, 0x660f3840, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  ptest<sse41>, 0x660f3817, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  roundp<sd><sse41>, 0x660f3a08 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
> -rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<sd:scal>, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
> +rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
>
>  // SSE4.2 instructions.
>
> @@ -1484,8 +1484,8 @@ vandp<sd>, 0x<sd:ppfx>54, None, CpuAVX,
>  vblendp<sd>, 0x660c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vblendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexW0|VexSources=2|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vbroadcastf128, 0x661a, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
> -vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
> -vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
> +vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex256|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
> +vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex128|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
>  vcmp<frel>p<sd>, 0x<sd:ppfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
>  vcmp<frel>s<sd>, 0x<sd:spfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vcmpp<sd>, 0x<sd:ppfx>c2, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
> @@ -1499,22 +1499,20 @@ vcvtpd2ps<xy>, 0x665a, None, CpuAVX, Mod
>  vcvtps2dq, 0x665b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
>  vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
>  vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex256|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
> -vcvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
>  vcvtsd2ss, 0xf25a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
>  vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vcvtss2sd, 0xf35a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
> -vcvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
>  vcvttpd2dq<xy>, 0x66e6, None, CpuAVX, Modrm|<xy:vex>|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:syntax>, { <xy:dst>, RegXMM }
>  vcvttps2dq, 0xf35b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
> -vcvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> -vcvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
> +vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
>  vdivp<sd>, 0x<sd:ppfx>5e, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vdivs<sd>, 0x<sd:spfx>5e, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
>  vdppd, 0x6641, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
>  vdpps, 0x6640, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vextractf128, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
> -vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
> +vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
>  vextractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
>  vhaddpd, 0x667c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vhaddps, 0xf27c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
> @@ -1523,7 +1521,7 @@ vhsubps, 0xf27d, None, CpuAVX, Modrm|Vex
>  vinsertf128, 0x6618, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
>  vinsertps, 0x6621, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
>  vlddqu, 0xf2f0, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
> -vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
> +vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
>  vmaskmovdqu, 0x66f7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
>  vmaskmovp<sd>, 0x662e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
>  vmaskmovp<sd>, 0x662c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> @@ -1537,26 +1535,26 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX,
>  // by Intel AVX spec).  To avoid extra template in gcc x86 backend and
>  // support assembler for AMD64, we accept 64bit operand on vmovd so
>  // that we can use one template for both SSE and AVX instructions.
> -vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
> +vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>  vmovd, 0x667e, None, CpuAVX|Cpu64, D|RegMem|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { RegXMM, Reg64 }
>  vmovddup, 0xf212, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  vmovddup, 0xf212, None, CpuAVX, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
>  vmovdqa, 0x666f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
>  vmovdqu, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
>  vmovhlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
> -vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> -vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> +vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
>  vmovlhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
> -vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> -vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> +vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
>  vmovmskp<sd>, 0x<sd:ppfx>50, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { RegXMM|RegYMM, Reg32|Reg64 }
>  vmovntdq, 0x66e7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
>  vmovntdqa, 0x662a, None, CpuAVX|CpuAVX2, Modrm|Vex|Space0F38|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
>  vmovntp<sd>, 0x<sd:ppfx>2b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
>  vmovq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  vmovq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
> -vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
> -vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
> +vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
> +vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
>  vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
>  vmovshdup, 0xf316, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
>  vmovsldup, 0xf312, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
> @@ -1692,7 +1690,7 @@ vrsqrtss, 0xf352, None, CpuAVX, Modrm|Ve
>  vshufp<sd>, 0x<sd:ppfx>c6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vsqrtp<sd>, 0x<sd:ppfx>51, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
>  vsqrts<sd>, 0x<sd:spfx>51, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
> -vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
> +vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
>  vsubp<sd>, 0x<sd:ppfx>5c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>  vsubs<sd>, 0x<sd:spfx>5c, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
>  vtestp<sd>, 0x660e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
> @@ -1889,8 +1887,8 @@ vpshl<xop>, 0x94 | <xop:opc>, None, CpuX
>
>  llwpcb, 0x12, 0, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
>  slwpcb, 0x12, 1, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
> -lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
> -lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
> +lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
> +lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
>
>  // BMI instructions
>
> @@ -1918,30 +1916,30 @@ tzmsk, 0x01, 4, CpuTBM, Modrm|CheckRegSi
>  prefetch, 0xf0d, 0, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
>  prefetchw, 0xf0d, 1, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
>  femms, 0xf0e, None, Cpu3dnow, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> -pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> +pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>
>  // AMD extensions.
>  syscall, 0xf05, None, CpuSYSCALL, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> @@ -1967,8 +1965,8 @@ vmsave, 0xf01db, None, CpuSVME, AddrPref
>
>
>  // SSE4a instructions
> -movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> -movntss, 0xf30f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
> +movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> +movntss, 0xf30f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
>  extrq, 0x660f78, 0, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Imm8, RegXMM }
>  extrq, 0x660f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
>  insertq, 0xf20f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
> @@ -2166,8 +2164,8 @@ vcvtps2pd, 0x5A, None, CpuAVX512F, Modrm
>
>  vcvtps2ph, 0x661D, None, CpuAVX512F, Modrm|EVex512|MaskingMorZ|Space0F3A|VexW0|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
>
> -vcvtsd2si, 0xF22D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvtsd2usi, 0xF279, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> +vcvts<sdh>2si, 0x<sdh:spfx>2d, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
> +vcvts<sdh>2usi, 0x<sdh:spfx>79, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
>
>  vcvtsd2ss, 0xF25A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
>
> @@ -2187,20 +2185,14 @@ vcvtusi2ss, 0xF37B, None, CpuAVX512F, Mo
>
>  vcvtss2sd, 0xF35A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW0|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
>
> -vcvtss2si, 0xF32D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvtss2usi, 0xF379, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
> -
>  vcvttpd2dq<xy>, 0x66e6, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
>  vcvttpd2udq<xy>, 0x78, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
>
>  vcvttps2dq, 0xF35B, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>  vcvttps2udq, 0x78, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>
> -vcvttsd2si, 0xF22C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvttsd2usi, 0xF278, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -
> -vcvttss2si, 0xF32C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvttss2usi, 0xF378, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
> +vcvtts<sdh>2si, 0x<sdh:spfx>2c, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
> +vcvtts<sdh>2usi, 0x<sdh:spfx>78, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
>
>  vcvtudq2ps, 0xF27A, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>
> @@ -2216,7 +2208,7 @@ vextracti32x4, 0x6639, None, CpuAVX512F,
>  vextractf64x4, 0x661B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
>  vextracti64x4, 0x663B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
>
> -vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
> +vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
>  vextractps, 0x6617, None, CpuAVX512F|Cpu64, RegMem|EVex128|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
>
>  vfixupimmp<sd>, 0x6654, None, CpuAVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> @@ -2274,7 +2266,7 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX5
>  vmovntp<sd>, 0x<sd:ppfx>2B, None, CpuAVX512F, Modrm|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
>  vmovup<sd>, 0x<sd:ppfx>10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>
> -vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
> +vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>
>  vmovddup, 0xF212, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM }
>
> @@ -2287,16 +2279,16 @@ vmovdqu64, 0xF36F, None, CpuAVX512F, D|M
>  vmovhlps, 0x12, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
>  vmovlhps, 0x16, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
>
> -vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> -vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> -vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> -vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> +vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
> +vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
>
> -vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex=2|Space0F|VexW=2|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
> +vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex128|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
>  vmovq, 0xF37E, None, CpuAVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  vmovq, 0x66D6, None, CpuAVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
>
> -vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
> +vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
>  vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
>
>  vmovshdup, 0xF316, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
> @@ -2596,7 +2588,7 @@ kadd<dq>, 0x<dq:kpfx>4a, None, CpuAVX512
>  kand<dq>, 0x<dq:kpfx>41, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
>  kandn<dq>, 0x<dq:kpfx>42, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegMask, RegMask, RegMask }
>  kmov<dq>, 0x<dq:kpfx>90, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> -kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
> +kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
>  kmov<dq>, 0xf292, None, CpuAVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <dq:gpr>, RegMask }
>  knot<dq>, 0x<dq:kpfx>44, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask }
>  kor<dq>, 0x<dq:kpfx>45, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
> @@ -2985,13 +2977,13 @@ incsspq, 0xf30fae, 5, CpuSHSTK|Cpu64, Mo
>  rdsspd, 0xf30f1e, 1, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32 }
>  rdsspq, 0xf30f1e, 1, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64 }
>  saveprevssp, 0xf30f01ea, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> +rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
>  wrssd, 0x0f38f6, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
> -wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
> +wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
>  wrussd, 0x660f38f5, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
> -wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
> +wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64, Qword|Unspecified|BaseIndex }
>  setssbsy, 0xf30f01e8, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
> -clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
> +clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
>  endbr64, 0xf30f1efa, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
>  endbr32, 0xf30f1efb, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
>
> @@ -3230,9 +3222,6 @@ vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16
>  vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap5|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
>
> -vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
> -
>  vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
>  vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegYMM }
>  vcvttph2dq, 0xf35b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegYMM|Word|Unspecified|BaseIndex, RegZMM }
> @@ -3256,9 +3245,6 @@ vcvtph2psx, 0x6613, None, CpuAVX512_FP16
>  vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>  vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
>
> -vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
> -vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
> -
>  vfpclassph<xyz>, 0x66, None, CpuAVX512_FP16|<xyz:vl>, Modrm|<xyz:attr>|Masking=2|Space0F3A|VexW0|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xyz:att>, { Imm8, <xyz:src>|Word, RegMask }
>
>  vmovw, 0x666e, None, CpuAVX512_FP16, D|Modrm|EVex128|VexWIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex, RegXMM }
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration
  2022-08-16  7:30 ` [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration Jan Beulich
@ 2022-08-17 19:19   ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 19:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> The value of zero would better not indicate any error, but rather hit
> the abort() at the top of the consuming switch().
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -226,6 +226,7 @@ union i386_op
>
>  enum i386_error
>    {
> +    no_error, /* Must be first.  */
>      operand_size_mismatch,
>      operand_type_mismatch,
>      register_type_mismatch,
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-16  7:31 ` [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning Jan Beulich
@ 2022-08-17 19:21   ` H.J. Lu
  2022-08-18  7:21     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 19:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Emitting this warning for every insn, including ones having actual
> errors, is annoying. Introduce a boolean variable to emit the warning
> just once on the first insn after .arch may have changed the things, and
> move the warning to output_insn(). (I didn't want to go as far as
> checking whether the .arch actually turned off the i386 bit, but doing
> so would be an option.)
> ---
> Otoh I wonder whether switching to a pre-386 architecture shouldn't
> automatically move to CODE_16BIT: Us emitting operand- or address-size
> prefixes violates the architecture specification. Alternatively we
> could outright reject such .arch directives when not already in 16-bit
> mode.
>
> I've left the message text unaltered, albeit I think "addressing mode"
> is particularly misleading for instructions without memory operands (nor
> any other address-size affected aspect, like in e.g. JCXZ).
>
> Originally I thought the warning may get in the way of work done in
> subsequent patches, but I think I've convinced myself that all affected
> insns are post-286 and hence wouldn't yield CPU_FLAGS_PERFECT_MATCH.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -765,6 +765,9 @@ int optimize_align_code = 1;
>  /* Non-zero to quieten some warnings.  */
>  static int quiet_warnings = 0;
>
> +/* Guard to avoid repeated warnings about non-16-bit code on 16-bit CPUs.  */
> +static bool pre_386_16bit_warned;
> +
>  /* CPU name.  */
>  static const char *cpu_arch_name = NULL;
>  static char *cpu_sub_arch_name = NULL;
> @@ -2809,6 +2812,7 @@ set_cpu_arch (int dummy ATTRIBUTE_UNUSED
>                       cpu_arch_tune = cpu_arch_isa;
>                       cpu_arch_tune_flags = cpu_arch_isa_flags;
>                     }
> +                 pre_386_16bit_warned = false;
>                   break;
>                 }
>
> @@ -5486,12 +5490,7 @@ parse_insn (char *line, char *mnemonic)
>      {
>        supported |= cpu_flags_match (t);
>        if (supported == CPU_FLAGS_PERFECT_MATCH)
> -       {
> -         if (!cpu_arch_flags.bitfield.cpui386 && (flag_code != CODE_16BIT))
> -           as_warn (_("use .code16 to ensure correct addressing mode"));
> -
> -         return l;
> -       }
> +       return l;
>      }
>
>    if (!(supported & CPU_FLAGS_64BIT_MATCH))
> @@ -9491,6 +9490,13 @@ output_insn (void)
>        fragP->tc_frag_data.max_bytes = max_branch_padding_size;
>      }
>
> +  if (!cpu_arch_flags.bitfield.cpui386 && (flag_code != CODE_16BIT)
> +      && !pre_386_16bit_warned)
> +    {
> +      as_warn (_("use .code16 to ensure correct addressing mode"));
> +      pre_386_16bit_warned = true;
> +    }
> +
>    /* Output jumps.  */
>    if (i.tm.opcode_modifier.jump == JUMP)
>      output_branch ();
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/7] x86: improve match_template()'s diagnostics
  2022-08-16  7:32 ` [PATCH 4/7] x86: improve match_template()'s diagnostics Jan Beulich
@ 2022-08-17 20:24   ` H.J. Lu
  2022-08-18  6:14     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 20:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> At the example of
>
>         extractps $0, %xmm0, %xmm0
>         insertps $0, %xmm0, %eax
>
> (both having respectively the same mistake of using the wrong kind of
> destination register) it is easy to see that current behavior is far
> from ideal: The former results in "unsupported instruction" for 32-bit
> code simply because the 2nd template we have is a Cpu64 one. Instead we
> should aim at emitting the "best" possible error, which will typically
> be the one where we passed the largest number of checks. Generalize the
> original "specific_error" approach by making it apply to the entire
> matching loop, utilizing that line numbers increase as we pass further
> checks.
> ---
> As to the inval-tls testcase: Why is KMOV special? Are e.g. VMOV or
> other vector insns (legacy or EVEX-encoded) any different? Shouldn't the
> use of the respective reloc types be limited to _exactly_ the insns they
> are intended to be used with? Furthermore having this check in
> match_template() is unhelpful, as the resulting diagnostic isn't aiding
> in understanding what's wrong. Template matching should be left alone
> here, and the issue be diagnosed later, say directly in md_assemble()
> (alongside the various further consistency checks there) or in
> process_operands().

GCC may generate invalid TLS code sequences with KMOV, not other
instructions.  We want to catch them by assembler.   It is easier to disallow
the invalid instructions.


> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -2083,12 +2083,7 @@ operand_size_match (const insn_template
>      }
>
>    if (!t->opcode_modifier.d)
> -    {
> -    mismatch:
> -      if (!match)
> -       i.error = operand_size_mismatch;
> -      return match;
> -    }
> +    return match;
>
>    /* Check reverse.  */
>    gas_assert ((i.operands >= 2 && i.operands <= 3)
> @@ -2105,19 +2100,19 @@ operand_size_match (const insn_template
>
>        if (t->operand_types[j].bitfield.class == Reg
>           && !match_operand_size (t, j, given))
> -       goto mismatch;
> +       return match;
>
>        if (t->operand_types[j].bitfield.class == RegSIMD
>           && !match_simd_size (t, j, given))
> -       goto mismatch;
> +       return match;
>
>        if (t->operand_types[j].bitfield.instance == Accum
>           && (!match_operand_size (t, j, given)
>               || !match_simd_size (t, j, given)))
> -       goto mismatch;
> +       return match;
>
>        if ((i.flags[given] & Operand_Mem) && !match_mem_size (t, j, given))
> -       goto mismatch;
> +       return match;
>      }
>
>    return match | MATCH_REVERSE;
> @@ -6386,6 +6381,17 @@ VEX_check_encoding (const insn_template
>    return 0;
>  }
>
> +/* Helper function for the progress() macro in match_template().  */
> +static INLINE enum i386_error progress (enum i386_error new,
> +                                       enum i386_error last,
> +                                       unsigned int line, unsigned int *line_p)
> +{
> +  if (line <= *line_p)
> +    return last;
> +  *line_p = line;
> +  return new;
> +}
> +
>  static const insn_template *
>  match_template (char mnem_suffix)
>  {
> @@ -6397,8 +6403,9 @@ match_template (char mnem_suffix)
>    i386_opcode_modifier suffix_check;
>    i386_operand_type operand_types [MAX_OPERANDS];
>    int addr_prefix_disp;
> -  unsigned int j, size_match, check_register;
> -  enum i386_error specific_error = 0;
> +  unsigned int j, size_match, check_register, errline = __LINE__;
> +  enum i386_error specific_error = number_of_operands_mismatch;
> +#define progress(err) progress(err, specific_error, __LINE__, &errline)
                                                     Need a space before (.
>
>  #if MAX_OPERANDS != 5
>  # error "MAX_OPERANDS must be 5."
> @@ -6436,36 +6443,33 @@ match_template (char mnem_suffix)
>         suffix_check.no_ldsuf = 1;
>      }
>
> -  /* Must have right number of operands.  */
> -  i.error = number_of_operands_mismatch;
> -
>    for (t = current_templates->start; t < current_templates->end; t++)
>      {
>        addr_prefix_disp = -1;
>        found_reverse_match = 0;
>
> +      /* Must have right number of operands.  */
>        if (i.operands != t->operands)
>         continue;
>
>        /* Check processor support.  */
> -      i.error = unsupported;
> +      specific_error = progress (unsupported);
>        if (cpu_flags_match (t) != CPU_FLAGS_PERFECT_MATCH)
>         continue;
>
>        /* Check Pseudo Prefix.  */
> -      i.error = unsupported;
>        if (t->opcode_modifier.pseudovexprefix
>           && !(i.vec_encoding == vex_encoding_vex
>               || i.vec_encoding == vex_encoding_vex3))
>         continue;
>
>        /* Check AT&T mnemonic.   */
> -      i.error = unsupported_with_intel_mnemonic;
> +      specific_error = progress (unsupported_with_intel_mnemonic);
>        if (intel_mnemonic && t->opcode_modifier.attmnemonic)
>         continue;
>
>        /* Check AT&T/Intel syntax.  */
> -      i.error = unsupported_syntax;
> +      specific_error = progress (unsupported_syntax);
>        if ((intel_syntax && t->opcode_modifier.attsyntax)
>           || (!intel_syntax && t->opcode_modifier.intelsyntax))
>         continue;
> @@ -6491,7 +6495,7 @@ match_template (char mnem_suffix)
>         }
>
>        /* Check the suffix.  */
> -      i.error = invalid_instruction_suffix;
> +      specific_error = progress (invalid_instruction_suffix);
>        if ((t->opcode_modifier.no_bsuf && suffix_check.no_bsuf)
>           || (t->opcode_modifier.no_wsuf && suffix_check.no_wsuf)
>           || (t->opcode_modifier.no_lsuf && suffix_check.no_lsuf)
> @@ -6500,6 +6504,7 @@ match_template (char mnem_suffix)
>           || (t->opcode_modifier.no_ldsuf && suffix_check.no_ldsuf))
>         continue;
>
> +      specific_error = progress (operand_size_mismatch);
>        size_match = operand_size_match (t);
>        if (!size_match)
>         continue;
> @@ -6510,11 +6515,9 @@ match_template (char mnem_suffix)
>
>          as the case of a missing * on the operand is accepted (perhaps with
>          a warning, issued further down).  */
> +      specific_error = progress (operand_type_mismatch);
>        if (i.jumpabsolute && t->opcode_modifier.jump != JUMP_ABSOLUTE)
> -       {
> -         i.error = operand_type_mismatch;
> -         continue;
> -       }
> +       continue;
>
>        for (j = 0; j < MAX_OPERANDS; j++)
>         operand_types[j] = t->operand_types[j];
> @@ -6522,6 +6525,8 @@ match_template (char mnem_suffix)
>        /* In general, don't allow
>          - 64-bit operands outside of 64-bit mode,
>          - 32-bit operands on pre-386.  */
> +      specific_error = progress (mnem_suffix ? invalid_instruction_suffix
> +                                            : operand_size_mismatch);
>        j = i.imm_operands + (t->operands > i.imm_operands + 1);
>        if (((i.suffix == QWORD_MNEM_SUFFIX
>             && flag_code != CODE_64BIT
> @@ -6550,7 +6555,7 @@ match_template (char mnem_suffix)
>         {
>           if (VEX_check_encoding (t))
>             {
> -             specific_error = i.error;
> +             specific_error = progress (i.error);
>               continue;
>             }
>
> @@ -6711,6 +6716,8 @@ match_template (char mnem_suffix)
>                                                    i.types[1],
>                                                    operand_types[1])))
>             {
> +             specific_error = progress (i.error);
> +
>               /* Check if other direction is valid ...  */
>               if (!t->opcode_modifier.d)
>                 continue;
> @@ -6735,6 +6742,7 @@ match_template (char mnem_suffix)
>                                                        operand_types[0])))
>                 {
>                   /* Does not match either direction.  */
> +                 specific_error = progress (i.error);
>                   continue;
>                 }
>               /* found_reverse_match holds which of D or FloatR
> @@ -6773,7 +6781,10 @@ match_template (char mnem_suffix)
>                                                        operand_types[3],
>                                                        i.types[4],
>                                                        operand_types[4]))
> -                   continue;
> +                   {
> +                     specific_error = progress (i.error);
> +                     continue;
> +                   }
>                   /* Fall through.  */
>                 case 4:
>                   overlap3 = operand_type_and (i.types[3], operand_types[3]);
> @@ -6788,7 +6799,10 @@ match_template (char mnem_suffix)
>                                                             operand_types[2],
>                                                             i.types[3],
>                                                             operand_types[3])))
> -                   continue;
> +                   {
> +                     specific_error = progress (i.error);
> +                     continue;
> +                   }
>                   /* Fall through.  */
>                 case 3:
>                   overlap2 = operand_type_and (i.types[2], operand_types[2]);
> @@ -6803,7 +6817,10 @@ match_template (char mnem_suffix)
>                                                             operand_types[1],
>                                                             i.types[2],
>                                                             operand_types[2])))
> -                   continue;
> +                   {
> +                     specific_error = progress (i.error);
> +                     continue;
> +                   }
>                   break;
>                 }
>             }
> @@ -6814,14 +6831,14 @@ match_template (char mnem_suffix)
>        /* Check if vector operands are valid.  */
>        if (check_VecOperands (t))
>         {
> -         specific_error = i.error;
> +         specific_error = progress (i.error);
>           continue;
>         }
>
>        /* Check if VEX/EVEX encoding requirements can be satisfied.  */
>        if (VEX_check_encoding (t))
>         {
> -         specific_error = i.error;
> +         specific_error = progress (i.error);
>           continue;
>         }
>
> @@ -6829,11 +6846,13 @@ match_template (char mnem_suffix)
>        break;
>      }
>
> +#undef progress
> +
>    if (t == current_templates->end)
>      {
>        /* We found no match.  */
>        const char *err_msg;
> -      switch (specific_error ? specific_error : i.error)
> +      switch (specific_error)
>         {
>         default:
>           abort ();
> --- a/gas/testsuite/gas/i386/inval-tls.l
> +++ b/gas/testsuite/gas/i386/inval-tls.l
> @@ -1,3 +1,3 @@
>  .*: Assembler messages:
> -.*:3: Error: operand size mismatch for `kmovd'
> -.*:4: Error: operand size mismatch for `kmovd'
> +.*:3: Error: .* `kmovd'
> +.*:4: Error: .* `kmovd'
> --- a/gas/testsuite/gas/i386/noavx512-1.l
> +++ b/gas/testsuite/gas/i386/noavx512-1.l
> @@ -1,14 +1,14 @@
>  .*: Assembler messages:
> -.*:25: Error: .*unsupported instruction.*
> +.*:25: Error: .*operand size mismatch.*
>  .*:26: Error: .*unsupported masking.*
>  .*:27: Error: .*unsupported masking.*
> -.*:47: Error: .*unsupported instruction.*
> +.*:47: Error: .*operand size mismatch.*
>  .*:48: Error: .*unsupported masking.*
>  .*:49: Error: .*unsupported masking.*
>  .*:50: Error: .*not supported.*
>  .*:51: Error: .*not supported.*
>  .*:52: Error: .*not supported.*
> -.*:69: Error: .*unsupported instruction.*
> +.*:69: Error: .*operand size mismatch.*
>  .*:70: Error: .*unsupported masking.*
>  .*:71: Error: .*unsupported masking.*
>  .*:72: Error: .*not supported.*
> @@ -17,7 +17,7 @@
>  .*:75: Error: .*not supported.*
>  .*:76: Error: .*not supported.*
>  .*:77: Error: .*not supported.*
> -.*:91: Error: .*unsupported instruction.*
> +.*:91: Error: .*operand size mismatch.*
>  .*:92: Error: .*unsupported masking.*
>  .*:93: Error: .*unsupported masking.*
>  .*:94: Error: .*not supported.*
> @@ -27,7 +27,7 @@
>  .*:98: Error: .*not supported.*
>  .*:99: Error: .*not supported.*
>  .*:100: Error: .*not supported.*
> -.*:113: Error: .*unsupported instruction.*
> +.*:113: Error: .*operand size mismatch.*
>  .*:114: Error: .*unsupported masking.*
>  .*:115: Error: .*unsupported masking.*
>  .*:116: Error: .*not supported.*
> @@ -40,7 +40,7 @@
>  .*:126: Error: .*not supported.*
>  .*:127: Error: .*not supported.*
>  .*:128: Error: .*not supported.*
> -.*:135: Error: .*unsupported instruction.*
> +.*:135: Error: .*operand size mismatch.*
>  .*:136: Error: .*unsupported masking.*
>  .*:137: Error: .*unsupported masking.*
>  .*:138: Error: .*not supported.*
> @@ -54,7 +54,7 @@
>  .*:149: Error: .*not supported.*
>  .*:150: Error: .*not supported.*
>  .*:151: Error: .*not supported.*
> -.*:157: Error: .*unsupported instruction.*
> +.*:157: Error: .*operand size mismatch.*
>  .*:158: Error: .*unsupported masking.*
>  .*:159: Error: .*unsupported masking.*
>  .*:160: Error: .*not supported.*
> --- a/gas/testsuite/gas/i386/noavx512-2.l
> +++ b/gas/testsuite/gas/i386/noavx512-2.l
> @@ -1,12 +1,12 @@
>  .*: Assembler messages:
> -.*:26: Error: .*unsupported instruction.*
> -.*:27: Error: .*unsupported instruction.*
> +.*:26: Error: .*unsupported masking.*
> +.*:27: Error: .*unsupported masking.*
>  .*:29: Error: .*unsupported instruction.*
>  .*:30: Error: .*unsupported instruction.*
>  .*:32: Error: .*unsupported instruction.*
>  .*:33: Error: .*unsupported instruction.*
> -.*:36: Error: .*unsupported instruction.*
> -.*:37: Error: .*unsupported instruction.*
> +.*:36: Error: .*unsupported masking.*
> +.*:37: Error: .*unsupported masking.*
>  .*:39: Error: .*unsupported instruction.*
>  .*:40: Error: .*unsupported instruction.*
>  .*:43: Error: .*unsupported instruction.*
> --- a/gas/testsuite/gas/i386/x86-64-branch-4.l
> +++ b/gas/testsuite/gas/i386/x86-64-branch-4.l
> @@ -1,19 +1,19 @@
>  .*: Assembler messages:
>  .*:2: Error: invalid instruction suffix for `call'
>  .*:3: Error: invalid instruction suffix for `call'
> -.*:4: Error: operand type mismatch for `jmp'
> +.*:4: Error: operand (size|type) mismatch for `jmp'
>  .*:5: Error: invalid instruction suffix for `jmp'
>  .*:6: Error: invalid instruction suffix for `jmp'
>  .*:7: Error: invalid instruction suffix for `ret'
>  .*:8: Error: invalid instruction suffix for `ret'
> -.*:11: Error: operand type mismatch for `call'
> +.*:11: Error: operand (size|type) mismatch for `call'
>  .*:12: Error: invalid instruction suffix for `call'
>  .*:13: Error: invalid instruction suffix for `call'
> -.*:14: Error: operand size mismatch for `call'
> -.*:15: Error: operand type mismatch for `jmp'
> +.*:14: Error: operand (size|type) mismatch for `call'
> +.*:15: Error: operand (size|type) mismatch for `jmp'
>  .*:16: Error: invalid instruction suffix for `jmp'
>  .*:17: Error: invalid instruction suffix for `jmp'
> -.*:18: Error: operand size mismatch for `jmp'
> +.*:18: Error: operand (size|type) mismatch for `jmp'
>  .*:19: Error: invalid instruction suffix for `ret'
>  .*:20: Error: invalid instruction suffix for `ret'
>  GAS LISTING .*
> --- a/gas/testsuite/gas/i386/x86-64-branch-5.l
> +++ b/gas/testsuite/gas/i386/x86-64-branch-5.l
> @@ -1,19 +1,19 @@
>  .*: Assembler messages:
> -.*:2: Error: unsupported syntax for `lcall'
> -.*:3: Error: unsupported syntax for `lfs'
> -.*:4: Error: unsupported syntax for `lfs'
> -.*:5: Error: unsupported syntax for `lgs'
> -.*:6: Error: unsupported syntax for `lgs'
> -.*:7: Error: unsupported syntax for `ljmp'
> -.*:8: Error: unsupported syntax for `lss'
> -.*:9: Error: unsupported syntax for `lss'
> -.*:12: Error: unsupported syntax for `call'
> -.*:13: Error: unsupported syntax for `lfs'
> -.*:14: Error: unsupported syntax for `lfs'
> -.*:15: Error: unsupported syntax for `lgs'
> -.*:16: Error: unsupported syntax for `lgs'
> -.*:17: Error: unsupported syntax for `jmp'
> -.*:18: Error: unsupported syntax for `lss'
> -.*:19: Error: unsupported syntax for `lss'
> +.*:2: Error: invalid instruction suffix for `lcall'
> +.*:3: Error: operand size mismatch for `lfs'
> +.*:4: Error: invalid instruction suffix for `lfs'
> +.*:5: Error: operand size mismatch for `lgs'
> +.*:6: Error: invalid instruction suffix for `lgs'
> +.*:7: Error: invalid instruction suffix for `ljmp'
> +.*:8: Error: operand size mismatch for `lss'
> +.*:9: Error: invalid instruction suffix for `lss'
> +.*:12: Error: operand (size|type) mismatch for `call'
> +.*:13: Error: operand size mismatch for `lfs'
> +.*:14: Error: operand size mismatch for `lfs'
> +.*:15: Error: operand size mismatch for `lgs'
> +.*:16: Error: operand size mismatch for `lgs'
> +.*:17: Error: operand (size|type) mismatch for `jmp'
> +.*:18: Error: operand size mismatch for `lss'
> +.*:19: Error: operand size mismatch for `lss'
>  GAS LISTING .*
>  #pass
> --- a/gas/testsuite/gas/i386/x86-64-inval-tls.l
> +++ b/gas/testsuite/gas/i386/x86-64-inval-tls.l
> @@ -1,3 +1,3 @@
>  .*: Assembler messages:
> -.*:3: Error: operand size mismatch for `kmovq'
> -.*:4: Error: operand size mismatch for `kmovq'
> +.*:3: Error: .* `kmovq'
> +.*:4: Error: .* `kmovq'
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-16  7:32 ` [PATCH 5/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-08-17 20:29   ` H.J. Lu
  2022-08-18  6:24     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 20:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> x86: re-work insn/suffix recognition
>
> Having templates with a suffix explicitly present has always been
> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find

I don't like the second pass.   What problem does it solve?

> a suitable template _and_ didn't itself already need to trim off a
> suffix to find a match at all. This requires error reporting adjustments
> (albeit luckily fewer than I was afraid might be necessary), as errors
> previously reported during matching now need deferring until after the
> 2nd pass (because, obviously, we must not emit any error if the 2nd pass
> succeeds).
>
> Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
> templates, mixed IsString/non-IsString template groups cannot occur
> anymore. With that maybe_adjust_templates() becomes unnecessary (and is
> hence being removed).
>
> Note further that while the additions to the intel16 testcase aren't
> really proper Intel syntax, we've been permitting all of those except
> for the MOVD variant. The test therefore is to avoid re-introducing such
> an inconsistency.
> ---
> To limit code churn I'm using "goto" for the retry loop, but I'd be
> happy to make this a proper loop either right here or in a follow-on
> change doing just the necessary re-indentation.
>
> The "too many memory references" errors which are being deleted weren't
> fully consistent anyway - even the majority of IsString insns accepts
> only a single memory operand. If we want to retain that, it would need
> re-introducing in md_assemble(), latching the error into i.error just
> like match_template() does.
>
> Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
> is not?
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -297,9 +297,6 @@ struct _i386_insn
>         explicit segment overrides are given.  */
>      const reg_entry *seg[2];
>
> -    /* Copied first memory operand string, for re-checking.  */
> -    char *memop1_string;
> -
>      /* PREFIX holds all the given prefix opcodes (usually null).
>         PREFIXES is the number of prefix opcodes.  */
>      unsigned int prefixes;
> @@ -4273,7 +4270,20 @@ optimize_encoding (void)
>            movq $imm31, %r64   -> movl $imm31, %r32
>            movq $imm32, %r64   -> movl $imm32, %r32
>          */
> -      i.tm.opcode_modifier.norex64 = 1;
> +      i.tm.opcode_modifier.size = SIZE32;
> +      if (i.imm_operands)
> +       {
> +         i.types[0].bitfield.imm32 = 1;
> +         i.types[0].bitfield.imm32s = 0;
> +         i.types[0].bitfield.imm64 = 0;
> +       }
> +      else
> +       {
> +         i.types[0].bitfield.dword = 1;
> +         i.types[0].bitfield.qword = 0;
> +       }
> +      i.types[1].bitfield.dword = 1;
> +      i.types[1].bitfield.qword = 0;
>        if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
>         {
>           /* Handle
> @@ -4283,11 +4293,6 @@ optimize_encoding (void)
>           i.tm.operand_types[0].bitfield.imm32 = 1;
>           i.tm.operand_types[0].bitfield.imm32s = 0;
>           i.tm.operand_types[0].bitfield.imm64 = 0;
> -         i.types[0].bitfield.imm32 = 1;
> -         i.types[0].bitfield.imm32s = 0;
> -         i.types[0].bitfield.imm64 = 0;
> -         i.types[1].bitfield.dword = 1;
> -         i.types[1].bitfield.qword = 0;
>           if ((i.tm.base_opcode | 1) == 0xc7)
>             {
>               /* Handle
> @@ -4819,10 +4824,17 @@ void
>  md_assemble (char *line)
>  {
>    unsigned int j;
> -  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
> +  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
> +  const char *pass1_mnem = NULL;
> +  enum i386_error pass1_err = 0;
>    const insn_template *t;
>
> +  /* Make a copy of the full line in case we need to retry.  */
> +  copy = xstrdup (line);
> +
>    /* Initialize globals.  */
> +  current_templates = NULL;
> + retry:
>    memset (&i, '\0', sizeof (i));
>    i.rounding.type = rc_none;
>    for (j = 0; j < MAX_OPERANDS; j++)
> @@ -4837,15 +4849,21 @@ md_assemble (char *line)
>
>    line = parse_insn (line, mnemonic);
>    if (line == NULL)
> -    return;
> +    {
> +      if (!copy)
> +       goto match_error;
> +      free (copy);
> +      return;
> +    }
>    mnem_suffix = i.suffix;
>
>    line = parse_operands (line, mnemonic);
>    this_operand = -1;
> -  xfree (i.memop1_string);
> -  i.memop1_string = NULL;
>    if (line == NULL)
> -    return;
> +    {
> +      free (copy);
> +      return;
> +    }
>
>    /* Now we've parsed the mnemonic into a set of templates, and have the
>       operands at hand.  */
> @@ -4921,7 +4939,97 @@ md_assemble (char *line)
>       with the template operand types.  */
>
>    if (!(t = match_template (mnem_suffix)))
> -    return;
> +    {
> +      const char *err_msg;
> +
> +      if (!mnem_suffix)
> +       {
> +         pass1_err = i.error;
> +         pass1_mnem = current_templates->start->name;
> +         line = copy;
> +         copy = NULL;
> +         goto retry;
> +       }
> +      free (copy);
> +  match_error:
> +      switch (pass1_mnem ? pass1_err : i.error)
> +       {
> +       default:
> +         abort ();
> +       case operand_size_mismatch:
> +         err_msg = _("operand size mismatch");
> +         break;
> +       case operand_type_mismatch:
> +         err_msg = _("operand type mismatch");
> +         break;
> +       case register_type_mismatch:
> +         err_msg = _("register type mismatch");
> +         break;
> +       case number_of_operands_mismatch:
> +         err_msg = _("number of operands mismatch");
> +         break;
> +       case invalid_instruction_suffix:
> +         err_msg = _("invalid instruction suffix");
> +         break;
> +       case bad_imm4:
> +         err_msg = _("constant doesn't fit in 4 bits");
> +         break;
> +       case unsupported_with_intel_mnemonic:
> +         err_msg = _("unsupported with Intel mnemonic");
> +         break;
> +       case unsupported_syntax:
> +         err_msg = _("unsupported syntax");
> +         break;
> +       case unsupported:
> +         as_bad (_("unsupported instruction `%s'"),
> +                 pass1_mnem ? pass1_mnem : current_templates->start->name);
> +         return;
> +       case invalid_sib_address:
> +         err_msg = _("invalid SIB address");
> +         break;
> +       case invalid_vsib_address:
> +         err_msg = _("invalid VSIB address");
> +         break;
> +       case invalid_vector_register_set:
> +         err_msg = _("mask, index, and destination registers must be distinct");
> +         break;
> +       case invalid_tmm_register_set:
> +         err_msg = _("all tmm registers must be distinct");
> +         break;
> +       case invalid_dest_and_src_register_set:
> +         err_msg = _("destination and source registers must be distinct");
> +         break;
> +       case unsupported_vector_index_register:
> +         err_msg = _("unsupported vector index register");
> +         break;
> +       case unsupported_broadcast:
> +         err_msg = _("unsupported broadcast");
> +         break;
> +       case broadcast_needed:
> +         err_msg = _("broadcast is needed for operand of such type");
> +         break;
> +       case unsupported_masking:
> +         err_msg = _("unsupported masking");
> +         break;
> +       case mask_not_on_destination:
> +         err_msg = _("mask not on destination operand");
> +         break;
> +       case no_default_mask:
> +         err_msg = _("default mask isn't allowed");
> +         break;
> +       case unsupported_rc_sae:
> +         err_msg = _("unsupported static rounding/sae");
> +         break;
> +       case invalid_register_operand:
> +         err_msg = _("invalid register operand");
> +         break;
> +       }
> +      as_bad (_("%s for `%s'"), err_msg,
> +             pass1_mnem ? pass1_mnem : current_templates->start->name);
> +      return;
> +    }
> +
> +  free (copy);
>
>    if (sse_check != check_none
>        /* The opcode space check isn't strictly needed; it's there only to
> @@ -5223,6 +5331,7 @@ parse_insn (char *line, char *mnemonic)
>    char *l = line;
>    char *token_start = l;
>    char *mnem_p;
> +  bool pass1 = !current_templates;
>    int supported;
>    const insn_template *t;
>    char *dot_p = NULL;
> @@ -5392,8 +5501,10 @@ parse_insn (char *line, char *mnemonic)
>        current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
>      }
>
> -  if (!current_templates)
> +  if (!current_templates || !pass1)
>      {
> +      current_templates = NULL;
> +
>      check_suffix:
>        if (mnem_p > mnemonic)
>         {
> @@ -5441,7 +5552,8 @@ parse_insn (char *line, char *mnemonic)
>
>        if (!current_templates)
>         {
> -         as_bad (_("no such instruction: `%s'"), token_start);
> +         if (pass1)
> +           as_bad (_("no such instruction: `%s'"), token_start);
>           return NULL;
>         }
>      }
> @@ -6851,81 +6963,7 @@ match_template (char mnem_suffix)
>    if (t == current_templates->end)
>      {
>        /* We found no match.  */
> -      const char *err_msg;
> -      switch (specific_error)
> -       {
> -       default:
> -         abort ();
> -       case operand_size_mismatch:
> -         err_msg = _("operand size mismatch");
> -         break;
> -       case operand_type_mismatch:
> -         err_msg = _("operand type mismatch");
> -         break;
> -       case register_type_mismatch:
> -         err_msg = _("register type mismatch");
> -         break;
> -       case number_of_operands_mismatch:
> -         err_msg = _("number of operands mismatch");
> -         break;
> -       case invalid_instruction_suffix:
> -         err_msg = _("invalid instruction suffix");
> -         break;
> -       case bad_imm4:
> -         err_msg = _("constant doesn't fit in 4 bits");
> -         break;
> -       case unsupported_with_intel_mnemonic:
> -         err_msg = _("unsupported with Intel mnemonic");
> -         break;
> -       case unsupported_syntax:
> -         err_msg = _("unsupported syntax");
> -         break;
> -       case unsupported:
> -         as_bad (_("unsupported instruction `%s'"),
> -                 current_templates->start->name);
> -         return NULL;
> -       case invalid_sib_address:
> -         err_msg = _("invalid SIB address");
> -         break;
> -       case invalid_vsib_address:
> -         err_msg = _("invalid VSIB address");
> -         break;
> -       case invalid_vector_register_set:
> -         err_msg = _("mask, index, and destination registers must be distinct");
> -         break;
> -       case invalid_tmm_register_set:
> -         err_msg = _("all tmm registers must be distinct");
> -         break;
> -       case invalid_dest_and_src_register_set:
> -         err_msg = _("destination and source registers must be distinct");
> -         break;
> -       case unsupported_vector_index_register:
> -         err_msg = _("unsupported vector index register");
> -         break;
> -       case unsupported_broadcast:
> -         err_msg = _("unsupported broadcast");
> -         break;
> -       case broadcast_needed:
> -         err_msg = _("broadcast is needed for operand of such type");
> -         break;
> -       case unsupported_masking:
> -         err_msg = _("unsupported masking");
> -         break;
> -       case mask_not_on_destination:
> -         err_msg = _("mask not on destination operand");
> -         break;
> -       case no_default_mask:
> -         err_msg = _("default mask isn't allowed");
> -         break;
> -       case unsupported_rc_sae:
> -         err_msg = _("unsupported static rounding/sae");
> -         break;
> -       case invalid_register_operand:
> -         err_msg = _("invalid register operand");
> -         break;
> -       }
> -      as_bad (_("%s for `%s'"), err_msg,
> -             current_templates->start->name);
> +      i.error = specific_error;
>        return NULL;
>      }
>
> @@ -11334,49 +11372,6 @@ RC_SAE_immediate (const char *imm_start)
>    return 1;
>  }
>
> -/* Only string instructions can have a second memory operand, so
> -   reduce current_templates to just those if it contains any.  */
> -static int
> -maybe_adjust_templates (void)
> -{
> -  const insn_template *t;
> -
> -  gas_assert (i.mem_operands == 1);
> -
> -  for (t = current_templates->start; t < current_templates->end; ++t)
> -    if (t->opcode_modifier.isstring)
> -      break;
> -
> -  if (t < current_templates->end)
> -    {
> -      static templates aux_templates;
> -      bool recheck;
> -
> -      aux_templates.start = t;
> -      for (; t < current_templates->end; ++t)
> -       if (!t->opcode_modifier.isstring)
> -         break;
> -      aux_templates.end = t;
> -
> -      /* Determine whether to re-check the first memory operand.  */
> -      recheck = (aux_templates.start != current_templates->start
> -                || t != current_templates->end);
> -
> -      current_templates = &aux_templates;
> -
> -      if (recheck)
> -       {
> -         i.mem_operands = 0;
> -         if (i.memop1_string != NULL
> -             && i386_index_check (i.memop1_string) == 0)
> -           return 0;
> -         i.mem_operands = 1;
> -       }
> -    }
> -
> -  return 1;
> -}
> -
>  static INLINE bool starts_memory_operand (char c)
>  {
>    return ISDIGIT (c)
> @@ -11527,17 +11522,6 @@ i386_att_operand (char *operand_string)
>        char *displacement_string_end;
>
>      do_memory_reference:
> -      if (i.mem_operands == 1 && !maybe_adjust_templates ())
> -       return 0;
> -      if ((i.mem_operands == 1
> -          && !current_templates->start->opcode_modifier.isstring)
> -         || i.mem_operands == 2)
> -       {
> -         as_bad (_("too many memory references for `%s'"),
> -                 current_templates->start->name);
> -         return 0;
> -       }
> -
>        /* Check for base index form.  We detect the base index form by
>          looking for an ')' at the end of the operand, searching
>          for the '(' matching it, and finding a REGISTER_PREFIX or ','
> @@ -11737,8 +11721,6 @@ i386_att_operand (char *operand_string)
>        if (i386_index_check (operand_string) == 0)
>         return 0;
>        i.flags[this_operand] |= Operand_Mem;
> -      if (i.mem_operands == 0)
> -       i.memop1_string = xstrdup (operand_string);
>        i.mem_operands++;
>      }
>    else
> --- a/gas/config/tc-i386-intel.c
> +++ b/gas/config/tc-i386-intel.c
> @@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
>            || intel_state.is_mem)
>      {
>        /* Memory operand.  */
> -      if (i.mem_operands == 1 && !maybe_adjust_templates ())
> -       return 0;
> -      if ((int) i.mem_operands
> -         >= 2 - !current_templates->start->opcode_modifier.isstring)
> +      if (i.mem_operands)
>         {
>           /* Handle
>
> @@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
>                     }
>                 }
>             }
> -
> -         as_bad (_("too many memory references for `%s'"),
> -                 current_templates->start->name);
> -         return 0;
>         }
>
>        /* Swap base and index in 16-bit memory operands like
> @@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
>         return 0;
>
>        i.flags[this_operand] |= Operand_Mem;
> -      if (i.mem_operands == 0)
> -       i.memop1_string = xstrdup (operand_string);
>        ++i.mem_operands;
>      }
>    else
> --- a/gas/testsuite/gas/i386/code16.s
> +++ b/gas/testsuite/gas/i386/code16.s
> @@ -1,9 +1,9 @@
>         .text
>         .code16
> -       rep; movsd
> -       rep; cmpsd
> -       rep movsd %ds:(%si),%es:(%di)
> -       rep cmpsd %es:(%di),%ds:(%si)
> +       rep; movsl
> +       rep; cmpsl
> +       rep movsl %ds:(%si),%es:(%di)
> +       rep cmpsl %es:(%di),%ds:(%si)
>
>         mov     %cr2, %ecx
>         mov     %ecx, %cr2
> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -73,6 +73,7 @@ if [gas_32_check] then {
>      run_dump_test "amd"
>      run_dump_test "katmai"
>      run_dump_test "jump"
> +    run_dump_test "movs32"
>      run_dump_test "movz32"
>      run_dump_test "relax-1"
>      run_dump_test "relax-2"
> @@ -806,6 +807,7 @@ if [gas_64_check] then {
>      run_dump_test "x86-64-segovr"
>      run_list_test "x86-64-inval-seg" "-al"
>      run_dump_test "x86-64-branch"
> +    run_dump_test "movs64"
>      run_dump_test "movz64"
>      run_dump_test "x86-64-relax-1"
>      run_dump_test "svme64"
> --- a/gas/testsuite/gas/i386/intel16.d
> +++ b/gas/testsuite/gas/i386/intel16.d
> @@ -20,4 +20,12 @@ Disassembly of section .text:
>    2c:  8d 02 [         ]*lea    \(%bp,%si\),%ax
>    2e:  8d 01 [         ]*lea    \(%bx,%di\),%ax
>    30:  8d 03 [         ]*lea    \(%bp,%di\),%ax
> -       ...
> +[      ]*[0-9a-f]+:    67 f7 13[       ]+notw[         ]+\(%ebx\)
> +[      ]*[0-9a-f]+:    66 f7 17[       ]+notl[         ]+\(%bx\)
> +[      ]*[0-9a-f]+:    67 0f 1f 03[    ]+nopw[         ]+\(%ebx\)
> +[      ]*[0-9a-f]+:    66 0f 1f 07[    ]+nopl[         ]+\(%bx\)
> +[      ]*[0-9a-f]+:    67 83 03 05[    ]+addw[         ]+\$0x5,\(%ebx\)
> +[      ]*[0-9a-f]+:    66 83 07 05[    ]+addl[         ]+\$0x5,\(%bx\)
> +[      ]*[0-9a-f]+:    67 c7 03 05 00[         ]+movw[         ]+\$0x5,\(%ebx\)
> +[      ]*[0-9a-f]+:    66 c7 07 05 00 00 00[   ]+movl[         ]+\$0x5,\(%bx\)
> +#pass
> --- a/gas/testsuite/gas/i386/intel16.s
> +++ b/gas/testsuite/gas/i386/intel16.s
> @@ -18,4 +18,14 @@
>   lea   ax, [di][bx]
>   lea   ax, [di][bp]
>
> - .p2align 4,0
> + notw  [ebx]
> + notd  [bx]
> +
> + nopw  [ebx]
> + nopd  [bx]
> +
> + addw  [ebx], 5
> + addd  [bx], 5
> +
> + movw  [ebx], 5
> + movd  [bx], 5
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs.s
> @@ -0,0 +1,33 @@
> +       .text
> +movs:
> +       movsb   %al,%ax
> +       movsb   (%eax),%ax
> +       movsb   %al,%eax
> +       movsb   (%eax),%eax
> +.ifdef x86_64
> +       movsb   %al,%rax
> +       movsb   (%rax),%rax
> +.endif
> +
> +       movsbw  %al,%ax
> +       movsbw  (%eax),%ax
> +       movsbl  %al,%eax
> +       movsbl  (%eax),%eax
> +.ifdef x86_64
> +       movsbq  %al,%rax
> +       movsbq  (%rax),%rax
> +.endif
> +
> +       movsw   %ax,%eax
> +       movsw   (%eax),%eax
> +.ifdef x86_64
> +       movsw   %ax,%rax
> +       movsw   (%rax),%rax
> +.endif
> +
> +       movswl  %ax,%eax
> +       movswl  (%eax),%eax
> +.ifdef x86_64
> +       movswq  %ax,%rax
> +       movswq  (%rax),%rax
> +.endif
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs32.d
> @@ -0,0 +1,22 @@
> +#objdump: -dw
> +#source: movs.s
> +#name: x86 mov with sign-extend (32-bit object)
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+ <movs>:
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    66 0f be 00 *   movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    0f be 00 *      movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    66 0f be 00 *   movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    0f be 00 *      movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    0f bf 00 *      movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    0f bf 00 *      movswl \(%eax\),%eax
> +#pass
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs64.d
> @@ -0,0 +1,30 @@
> +#objdump: -dw
> +#source: movs.s
> +#name: x86 mov with sign-extend (64-bit object)
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+ <movs>:
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    67 66 0f be 00 *        movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    67 0f be 00 *   movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f be c0 *   movsbq %al,%rax
> +[      ]*[a-f0-9]+:    48 0f be 00 *   movsbq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    67 66 0f be 00 *        movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    67 0f be 00 *   movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f be c0 *   movsbq %al,%rax
> +[      ]*[a-f0-9]+:    48 0f be 00 *   movsbq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    67 0f bf 00 *   movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f bf c0 *   movswq %ax,%rax
> +[      ]*[a-f0-9]+:    48 0f bf 00 *   movswq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    67 0f bf 00 *   movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f bf c0 *   movswq %ax,%rax
> +[      ]*[a-f0-9]+:    48 0f bf 00 *   movswq \(%rax\),%rax
> +#pass
> --- a/gas/testsuite/gas/i386/movx16.l
> +++ b/gas/testsuite/gas/i386/movx16.l
> @@ -41,11 +41,11 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> @@ -82,7 +82,7 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBFC8[        ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movswl        %al, %cl
> --- a/gas/testsuite/gas/i386/movx32.l
> +++ b/gas/testsuite/gas/i386/movx32.l
> @@ -41,11 +41,11 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> @@ -82,7 +82,7 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBFC8[  ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movswl        %al, %cl
> --- a/gas/testsuite/gas/i386/movx64.l
> +++ b/gas/testsuite/gas/i386/movx64.l
> @@ -106,17 +106,17 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 480FBEC8[        ]+movsb %al, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %rcx
> @@ -192,12 +192,12 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBFC8[  ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %rcx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 480FBFC8[        ]+movsw %ax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %rcx
>  [      ]*[1-9][0-9]*[  ]*
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -135,47 +135,37 @@
>  mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
>  mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
>  movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
> -movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
>  mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
>  // In the 64bit mode the short form mov immediate is redefined to have
>  // 64bit value.
>  mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
>  mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
>  mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
>  movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
> -movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
>  // The segment register moves accept WordReg so that a segment register
>  // can be copied to a 32 bit register, and vice versa, without using a
>  // size prefix.  When moving to a 32 bit register, the upper 16 bits
>  // are set to an implementation defined value (on the Pentium Pro, the
>  // implementation defined value is zero).
> -mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
> +mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
>  mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
> -movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
> -mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
> +mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
>  // Move to/from control debug registers.  In the 16 or 32bit modes
>  // they are 32bit.  In the 64bit mode they are 64bit.
>  mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
>  mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
> -movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
>  mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
>  mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
> -movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
>  mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
>
>  // Move after swapping the bytes
>  movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>
>  // Move with sign extend.
> -// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
> -// conflict with the "movs" string move instruction.
> -movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
> -movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
> -movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
> -movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
> -movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
> +movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> +movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
> +// "movslq" must not be converted into "movsl" to avoid conflict with the
> +// "movsl" string move instruction.
>  movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
>  movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
> @@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
>  // String manipulation.
>  cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -// Intel mode string compare.
> -cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> -cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
>  scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> @@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
>  slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
>  movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -// Intel mode string move.
> -movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> -movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
>  smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case
  2022-08-16  7:34 ` [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
@ 2022-08-17 20:36   ` H.J. Lu
  2022-08-18  6:29     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-17 20:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Aug 16, 2022 at 12:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Have its use, except where actually legitimate, result in the same "only
> supported in 64-bit mode" diagnostic as emitted for other 64-bit only
> insns. Also suppress deriving of the suffix in Intel mode except in the
> legitimate cases. This in exchange allows dropping the respective code
> from match_template().
>
> Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we

This is for inline assembly:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590

> had no testcase whatsoever for these. Therefore such tests are being
> added. Note that the removed line in the x86-64-lfence-load testcase
> was redundant with the exact same one a few lines up.
> ---
> With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
> disassembler wouldn't better emit a Q suffix instead of the LL one.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -4826,7 +4826,7 @@ void
>  md_assemble (char *line)
>  {
>    unsigned int j;
> -  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
> +  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy;
>    const char *pass1_mnem = NULL;
>    enum i386_error pass1_err = 0;
>    const insn_template *t;
> @@ -4860,6 +4860,7 @@ md_assemble (char *line)
>             goto no_match;
>           /* No point in trying a 2nd pass - it'll only find the same suffix
>              again.  */
> +         mnem_suffix = i.suffix;
>           goto match_error;
>         }
>        free (copy);
> @@ -5010,9 +5011,15 @@ md_assemble (char *line)
>                   cpu_sub_arch_name ? cpu_sub_arch_name : "");
>           return;
>         case unsupported_64bit:
> -         as_bad (_("`%s' is %s supported in 64-bit mode"),
> -                 pass1_mnem ? pass1_mnem : current_templates->start->name,
> -                 flag_code == CODE_64BIT ? _("not") : _("only"));
> +         if (ISLOWER (mnem_suffix))
> +           as_bad (_("`%s%c' is %s supported in 64-bit mode"),
> +                   pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                   mnem_suffix,
> +                   flag_code == CODE_64BIT ? _("not") : _("only"));
> +         else
> +           as_bad (_("`%s' is %s supported in 64-bit mode"),
> +                   pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                   flag_code == CODE_64BIT ? _("not") : _("only"));
>           return;
>         case invalid_sib_address:
>           err_msg = _("invalid SIB address");
> @@ -5355,6 +5362,23 @@ md_assemble (char *line)
>      last_insn.kind = last_insn_other;
>  }
>
> +/* The Q suffix is generally valid only in 64-bit mode, with very few
> +   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
> +   and fisttp only one of their two templates is matched below: That's
> +   sufficient since other relevant attributes are the same between both
> +   respective templates.  */
> +static INLINE bool q_suffix_allowed(const insn_template *t)
> +{
> +  return flag_code == CODE_64BIT
> +        || (t->opcode_modifier.opcodespace == SPACE_BASE
> +            && t->base_opcode == 0xdf
> +            && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
> +        || (t->opcode_modifier.opcodespace == SPACE_0F
> +            && t->base_opcode == 0xc7
> +            && t->opcode_modifier.opcodeprefix == PREFIX_NONE
> +            && t->extension_opcode == 1) /* cmpxchg8b */;
> +}
> +
>  static char *
>  parse_insn (char *line, char *mnemonic)
>  {
> @@ -5626,6 +5650,10 @@ parse_insn (char *line, char *mnemonic)
>    for (t = current_templates->start; t < current_templates->end; ++t)
>      {
>        supported |= cpu_flags_match (t);
> +
> +      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
> +       supported &= ~CPU_FLAGS_64BIT_MATCH;
> +
>        if (supported == CPU_FLAGS_PERFECT_MATCH)
>         return l;
>      }
> @@ -6661,20 +6689,12 @@ match_template (char mnem_suffix)
>        for (j = 0; j < MAX_OPERANDS; j++)
>         operand_types[j] = t->operand_types[j];
>
> -      /* In general, don't allow
> -        - 64-bit operands outside of 64-bit mode,
> -        - 32-bit operands on pre-386.  */
> +      /* In general, don't allow 32-bit operands on pre-386.  */
>        specific_error = progress (mnem_suffix ? invalid_instruction_suffix
>                                              : operand_size_mismatch);
>        j = i.imm_operands + (t->operands > i.imm_operands + 1);
> -      if (((i.suffix == QWORD_MNEM_SUFFIX
> -           && flag_code != CODE_64BIT
> -           && !(t->opcode_modifier.opcodespace == SPACE_0F
> -                && t->base_opcode == 0xc7
> -                && t->opcode_modifier.opcodeprefix == PREFIX_NONE
> -                && t->extension_opcode == 1) /* cmpxchg8b */)
> -          || (i.suffix == LONG_MNEM_SUFFIX
> -              && !cpu_arch_flags.bitfield.cpui386))
> +      if (i.suffix == LONG_MNEM_SUFFIX
> +         && !cpu_arch_flags.bitfield.cpui386
>           && (intel_syntax
>               ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
>                  && !intel_float_operand (t->name))
> --- a/gas/config/tc-i386-intel.c
> +++ b/gas/config/tc-i386-intel.c
> @@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
>                     continue;
>                   break;
>                 case QWORD_MNEM_SUFFIX:
> -                 if (t->opcode_modifier.no_qsuf)
> +                 if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
>                     continue;
>                   break;
>                 case SHORT_MNEM_SUFFIX:
> --- a/gas/testsuite/gas/i386/opcode.d
> +++ b/gas/testsuite/gas/i386/opcode.d
> @@ -592,6 +592,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnp -0x6f6f6f70\(%eax\),%edx
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovp  -0x6f6f6f70\(%eax\),%dx
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnp -0x6f6f6f70\(%eax\),%dx
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
>   +[a-f0-9]+:   82 c3 01                add    \$0x1,%bl
>   +[a-f0-9]+:   82 f3 01                xor    \$0x1,%bl
>   +[a-f0-9]+:   82 d3 01                adc    \$0x1,%bl
> --- a/gas/testsuite/gas/i386/opcode.s
> +++ b/gas/testsuite/gas/i386/opcode.s
> @@ -592,6 +592,11 @@ foo:
>   cmovpe  0x90909090(%eax),%dx
>   cmovpo 0x90909090(%eax),%dx
>
> + fildq  (%eax)
> + fildll (%eax)
> + fistpq (%eax)
> + fistpll (%eax)
> +
>         .byte 0x82, 0xc3, 0x01
>         .byte 0x82, 0xf3, 0x01
>         .byte 0x82, 0xd3, 0x01
> --- a/gas/testsuite/gas/i386/opcode-intel.d
> +++ b/gas/testsuite/gas/i386/opcode-intel.d
> @@ -593,6 +593,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
> +[      ]*[a-f0-9]+:    df 28                   fild   QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 28                   fild   QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 38                   fistp  QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 38                   fistp  QWORD PTR \[eax\]
>   +[a-f0-9]+:   82 c3 01                add    bl,0x1
>   +[a-f0-9]+:   82 f3 01                xor    bl,0x1
>   +[a-f0-9]+:   82 d3 01                adc    bl,0x1
> --- a/gas/testsuite/gas/i386/opcode-suffix.d
> +++ b/gas/testsuite/gas/i386/opcode-suffix.d
> @@ -593,6 +593,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnpl -0x6f6f6f70\(%eax\),%edx
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovpw -0x6f6f6f70\(%eax\),%dx
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnpw -0x6f6f6f70\(%eax\),%dx
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
>   +[a-f0-9]+:   82 c3 01                addb   \$0x1,%bl
>   +[a-f0-9]+:   82 f3 01                xorb   \$0x1,%bl
>   +[a-f0-9]+:   82 d3 01                adcb   \$0x1,%bl
> --- a/gas/testsuite/gas/i386/sse3.d
> +++ b/gas/testsuite/gas/i386/sse3.d
> @@ -13,29 +13,30 @@ Disassembly of section .text:
>    10:  df 88 90 90 90 90 [     ]*fisttps -0x6f6f6f70\(%eax\)
>    16:  db 88 90 90 90 90 [     ]*fisttpl -0x6f6f6f70\(%eax\)
>    1c:  dd 88 90 90 90 90 [     ]*fisttpll -0x6f6f6f70\(%eax\)
> -  22:  66 0f 7c 65 00 [        ]*haddpd 0x0\(%ebp\),%xmm4
> -  27:  66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
> -  2b:  f2 0f 7c 37 [   ]*haddps \(%edi\),%xmm6
> -  2f:  f2 0f 7c f8 [   ]*haddps %xmm0,%xmm7
> -  33:  66 0f 7d c1 [   ]*hsubpd %xmm1,%xmm0
> -  37:  66 0f 7d 0a [   ]*hsubpd \(%edx\),%xmm1
> -  3b:  f2 0f 7d d2 [   ]*hsubps %xmm2,%xmm2
> -  3f:  f2 0f 7d 1c 24 [        ]*hsubps \(%esp\),%xmm3
> -  44:  f2 0f f0 2e [   ]*lddqu  \(%esi\),%xmm5
> -  48:  0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> -  4b:  0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> -  4e:  f2 0f 12 f7 [   ]*movddup %xmm7,%xmm6
> -  52:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> -  56:  f3 0f 16 01 [   ]*movshdup \(%ecx\),%xmm0
> -  5a:  f3 0f 16 ca [   ]*movshdup %xmm2,%xmm1
> -  5e:  f3 0f 12 13 [   ]*movsldup \(%ebx\),%xmm2
> -  62:  f3 0f 12 dc [   ]*movsldup %xmm4,%xmm3
> -  66:  0f 01 c9 [      ]*mwait  %eax,%ecx
> -  69:  0f 01 c9 [      ]*mwait  %eax,%ecx
> -  6c:  67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> -  70:  67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> -  74:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> -  78:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    dd 88 90 90 90 90 [     ]*fisttpll -0x6f6f6f70\(%eax\)
> +[      ]*[0-9a-f]+:    66 0f 7c 65 00 [        ]*haddpd 0x0\(%ebp\),%xmm4
> +[      ]*[0-9a-f]+:    66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
> +[      ]*[0-9a-f]+:    f2 0f 7c 37 [   ]*haddps \(%edi\),%xmm6
> +[      ]*[0-9a-f]+:    f2 0f 7c f8 [   ]*haddps %xmm0,%xmm7
> +[      ]*[0-9a-f]+:    66 0f 7d c1 [   ]*hsubpd %xmm1,%xmm0
> +[      ]*[0-9a-f]+:    66 0f 7d 0a [   ]*hsubpd \(%edx\),%xmm1
> +[      ]*[0-9a-f]+:    f2 0f 7d d2 [   ]*hsubps %xmm2,%xmm2
> +[      ]*[0-9a-f]+:    f2 0f 7d 1c 24 [        ]*hsubps \(%esp\),%xmm3
> +[      ]*[0-9a-f]+:    f2 0f f0 2e [   ]*lddqu  \(%esi\),%xmm5
> +[      ]*[0-9a-f]+:    0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> +[      ]*[0-9a-f]+:    0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> +[      ]*[0-9a-f]+:    f2 0f 12 f7 [   ]*movddup %xmm7,%xmm6
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    f3 0f 16 01 [   ]*movshdup \(%ecx\),%xmm0
> +[      ]*[0-9a-f]+:    f3 0f 16 ca [   ]*movshdup %xmm2,%xmm1
> +[      ]*[0-9a-f]+:    f3 0f 12 13 [   ]*movsldup \(%ebx\),%xmm2
> +[      ]*[0-9a-f]+:    f3 0f 12 dc [   ]*movsldup %xmm4,%xmm3
> +[      ]*[0-9a-f]+:    0f 01 c9 [      ]*mwait  %eax,%ecx
> +[      ]*[0-9a-f]+:    0f 01 c9 [      ]*mwait  %eax,%ecx
> +[      ]*[0-9a-f]+:    67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> +[      ]*[0-9a-f]+:    67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
>  [      ]*[0-9a-f]+:    0f 01 c8[       ]+monitor %eax,%ecx,%edx
>  [      ]*[0-9a-f]+:    67 0f 01 c8[    ]+monitor %ax,%ecx,%edx
>  [      ]*[0-9a-f]+:    0f 01 c9[       ]+mwait  %eax,%ecx
> --- a/gas/testsuite/gas/i386/sse3.s
> +++ b/gas/testsuite/gas/i386/sse3.s
> @@ -8,6 +8,7 @@ foo:
>         addsubps        %xmm4,%xmm3
>         fisttps         0x90909090(%eax)
>         fisttpl         0x90909090(%eax)
> +       fisttpq         0x90909090(%eax)
>         fisttpll        0x90909090(%eax)
>         haddpd          0x0(%ebp),%xmm4
>         haddpd          %xmm6,%xmm5
> --- a/gas/testsuite/gas/i386/sse3-intel.d
> +++ b/gas/testsuite/gas/i386/sse3-intel.d
> @@ -14,6 +14,7 @@ Disassembly of section .text:
>  [      ]*[0-9a-f]+:    df 88 90 90 90 90[      ]+fisttp WORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    db 88 90 90 90 90[      ]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    dd 88 90 90 90 90[      ]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
> +[      ]*[0-9a-f]+:    dd 88 90 90 90 90[      ]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    66 0f 7c 65 00[         ]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
>  [      ]*[0-9a-f]+:    66 0f 7c ee[    ]+haddpd xmm5,xmm6
>  [      ]*[0-9a-f]+:    f2 0f 7c 37[    ]+haddps xmm6,(XMMWORD PTR )?\[edi\]
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> @@ -44,16 +44,21 @@ Disassembly of section .text:
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   db 55 00                fistl  0x0\(%rbp\)
>   +[a-f0-9]+:   df 55 00                fists  0x0\(%rbp\)
> + +[a-f0-9]+:   db 5d 00                fistpl 0x0\(%rbp\)
> + +[a-f0-9]+:   df 5d 00                fistps 0x0\(%rbp\)
> + +[a-f0-9]+:   df 7d 00                fistpll 0x0\(%rbp\)
>   +[a-f0-9]+:   db 45 00                fildl  0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   df 45 00                filds  0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
> + +[a-f0-9]+:   df 6d 00                fildll 0x0\(%rbp\)
> + +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   9b dd 75 00             fsave  0x0\(%rbp\)
>   +[a-f0-9]+:   dd 65 00                frstor 0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
> - +[a-f0-9]+:   df 45 00                filds  0x0\(%rbp\)
> - +[a-f0-9]+:   0f ae e8                lfence
> + +[a-f0-9]+:   db 4d 00                fisttpl 0x0\(%rbp\)
>   +[a-f0-9]+:   df 4d 00                fisttps 0x0\(%rbp\)
> + +[a-f0-9]+:   dd 4d 00                fisttpll 0x0\(%rbp\)
>   +[a-f0-9]+:   d9 65 00                fldenv 0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   9b d9 75 00             fstenv 0x0\(%rbp\)
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
> @@ -27,12 +27,17 @@ _start:
>         flds (%rbp)
>         fistl (%rbp)
>         fists (%rbp)
> +       fistpl (%rbp)
> +       fistps (%rbp)
> +       fistpq (%rbp)
>         fildl (%rbp)
>         filds (%rbp)
> +       fildq (%rbp)
>         fsave (%rbp)
>         frstor (%rbp)
> -       filds (%rbp)
> +       fisttpl (%rbp)
>         fisttps (%rbp)
> +       fisttpq (%rbp)
>         fldenv (%rbp)
>         fstenv (%rbp)
>         fadds  (%rbp)
> --- a/gas/testsuite/gas/i386/x86-64-sse3.d
> +++ b/gas/testsuite/gas/i386/x86-64-sse3.d
> @@ -13,6 +13,7 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    df 88 90 90 90 00 [     ]*fisttps 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    db 88 90 90 90 00 [     ]*fisttpl 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    dd 88 90 90 90 00 [     ]*fisttpll 0x909090\(%rax\)
> +[      ]*[a-f0-9]+:    dd 88 90 90 90 00 [     ]*fisttpll 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    66 0f 7c 65 00 [        ]*haddpd 0x0\(%rbp\),%xmm4
>  [      ]*[a-f0-9]+:    66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
>  [      ]*[a-f0-9]+:    f2 0f 7c 37 [   ]*haddps \(%rdi\),%xmm6
> --- a/gas/testsuite/gas/i386/x86-64-sse3.s
> +++ b/gas/testsuite/gas/i386/x86-64-sse3.s
> @@ -8,6 +8,7 @@ foo:
>         addsubps        %xmm4,%xmm3
>         fisttps         0x909090(%rax)
>         fisttpl         0x909090(%rax)
> +       fisttpq         0x909090(%rax)
>         fisttpll        0x909090(%rax)
>         haddpd          0x0(%rbp),%xmm4
>         haddpd          %xmm6,%xmm5
> --- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
> +++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
> @@ -14,6 +14,7 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    df 88 90 90 90 00[      ]+fisttp WORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    db 88 90 90 90 00[      ]+fisttp DWORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    dd 88 90 90 90 00[      ]+fisttp QWORD PTR \[rax\+0x909090\]
> +[      ]*[a-f0-9]+:    dd 88 90 90 90 00[      ]+fisttp QWORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    66 0f 7c 65 00[         ]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
>  [      ]*[a-f0-9]+:    66 0f 7c ee[    ]+haddpd xmm5,xmm6
>  [      ]*[a-f0-9]+:    f2 0f 7c 37[    ]+haddps xmm6,(XMMWORD PTR )?\[rdi\]
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-17 19:19   ` H.J. Lu
@ 2022-08-18  6:07     ` Jan Beulich
  2022-08-18 14:46       ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-18  6:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 17.08.2022 21:19, H.J. Lu wrote:
> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> While in some cases deriving an AT&T-style suffix from an Intel syntax
>> memory operand size specifier is necessary, in many cases this is not
>> only pointless, but has led to the introduction of various workarounds:
>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
>> ToQword attributes. Suppress suffix derivation when we can clearly tell
>> that the memory operand's size isn't going to be needed to infer the
>> possible need for the low byte/word opcode bit or an operand size prefix
>> (0x66 or REX.W).
>>
>> As a result ToDword and ToQword can be dropped entirely, plus a fair
>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
>> 16-bit code using SIMD insns isn't well tested, clone an existing
>> testcase just enough to cover a few insns which are potentially
>> problematic but are being touched here.)
>>
>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
>> counterparts (which I think should also be corrected, but perhaps better
>> in a separate change).
> 
> I don't think allowing more unnecessary L and Q suffixes for AVX
> instructions is desirable.   I prefer not to allow unnecessary L and
> Q suffixes in folded entries.   We can add special entries to allow
> the existing instructions with suffixes.

I think we've been there before, and I continue to think that we should
be consistent throughout the entire ISA in allowing suffixes when GPRs
or their equivalent memory operands are involved. That's in the spirit
of the original AT&T syntax intentions, after all. I have to admit that
I find it particularly worrying that you suggest to introduce new
templates, when the overall / long term goal is to reduce the set, to
keep it manageable in spite of all the new additions that yer yet to
come.

As pointed out elsewhere, any inconsistencies here make it harder for
people to write e.g. heavily macro-ized code. Similarly it can result
in surprises when cloning existing code to deal with new extensions.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/7] x86: improve match_template()'s diagnostics
  2022-08-17 20:24   ` H.J. Lu
@ 2022-08-18  6:14     ` Jan Beulich
  2022-08-18 14:51       ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-18  6:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 17.08.2022 22:24, H.J. Lu wrote:
> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> At the example of
>>
>>         extractps $0, %xmm0, %xmm0
>>         insertps $0, %xmm0, %eax
>>
>> (both having respectively the same mistake of using the wrong kind of
>> destination register) it is easy to see that current behavior is far
>> from ideal: The former results in "unsupported instruction" for 32-bit
>> code simply because the 2nd template we have is a Cpu64 one. Instead we
>> should aim at emitting the "best" possible error, which will typically
>> be the one where we passed the largest number of checks. Generalize the
>> original "specific_error" approach by making it apply to the entire
>> matching loop, utilizing that line numbers increase as we pass further
>> checks.
>> ---
>> As to the inval-tls testcase: Why is KMOV special? Are e.g. VMOV or
>> other vector insns (legacy or EVEX-encoded) any different? Shouldn't the
>> use of the respective reloc types be limited to _exactly_ the insns they
>> are intended to be used with? Furthermore having this check in
>> match_template() is unhelpful, as the resulting diagnostic isn't aiding
>> in understanding what's wrong. Template matching should be left alone
>> here, and the issue be diagnosed later, say directly in md_assemble()
>> (alongside the various further consistency checks there) or in
>> process_operands().
> 
> GCC may generate invalid TLS code sequences with KMOV, not other
> instructions.  We want to catch them by assembler.   It is easier to disallow
> the invalid instructions.

I did actually check both the discussion and gcc code in question, and I
was not able to prove that it could have done so only for KMOV. And yes,
I agree with disallowing the invalid instructions. The question is why
we do so only for a limited and inconsistent subset.

In addition you don't say anything regarding the point in time when we
diagnose this, the placement of which - as said - looks sub-optimal to me.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-17 20:29   ` H.J. Lu
@ 2022-08-18  6:24     ` Jan Beulich
  2022-08-18 15:14       ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-18  6:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 17.08.2022 22:29, H.J. Lu wrote:
> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> x86: re-work insn/suffix recognition
>>
>> Having templates with a suffix explicitly present has always been
>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> 
> I don't like the second pass.   What problem does it solve?

It addresses the reasons we have various pretty odd (and confusing by
their mere presence) insn templates which better would never have been
there. If you have a better suggestion to eliminate those, I'm all ears.

You can also easily see the issues this solves by looking at the
testsuite changes. Among other things this once again is a matter of
providing consistent and hence predictable behavior.

Further this sets the stage for the subsequent two changes, which I
don't think are easily possible without this 2nd pass.

And finally you've likely spotted that this is actually a reduction in
code size, first and foremost because the odd maybe_adjust_templates()
can now go away. Plus I think you realize that the 2nd pass wouldn't
be engaged in many cases - it requires a template match failure in the
1st pass, after all, which isn't going to happen very often.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case
  2022-08-17 20:36   ` H.J. Lu
@ 2022-08-18  6:29     ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2022-08-18  6:29 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 17.08.2022 22:36, H.J. Lu wrote:
> On Tue, Aug 16, 2022 at 12:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Have its use, except where actually legitimate, result in the same "only
>> supported in 64-bit mode" diagnostic as emitted for other 64-bit only
>> insns. Also suppress deriving of the suffix in Intel mode except in the
>> legitimate cases. This in exchange allows dropping the respective code
>> from match_template().
>>
>> Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
> 
> This is for inline assembly:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590

I don't think this is affecting inline assembly only. The Z operand modifier
is also used in i386.md. And the lack of testcase when gcc uses it (no
matter for what purpose) is odd in any event.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-17 19:21   ` H.J. Lu
@ 2022-08-18  7:21     ` Jan Beulich
  2022-08-18 15:30       ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-18  7:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 17.08.2022 21:21, H.J. Lu wrote:
> On Tue, Aug 16, 2022 at 12:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Emitting this warning for every insn, including ones having actual
>> errors, is annoying. Introduce a boolean variable to emit the warning
>> just once on the first insn after .arch may have changed the things, and
>> move the warning to output_insn(). (I didn't want to go as far as
>> checking whether the .arch actually turned off the i386 bit, but doing
>> so would be an option.)
>> ---
>> Otoh I wonder whether switching to a pre-386 architecture shouldn't
>> automatically move to CODE_16BIT: Us emitting operand- or address-size
>> prefixes violates the architecture specification. Alternatively we
>> could outright reject such .arch directives when not already in 16-bit
>> mode.

Mind me asking - no opinion here?

> OK.

Thanks.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-18  6:07     ` Jan Beulich
@ 2022-08-18 14:46       ` H.J. Lu
  2022-08-19  8:19         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-18 14:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.08.2022 21:19, H.J. Lu wrote:
> > On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> While in some cases deriving an AT&T-style suffix from an Intel syntax
> >> memory operand size specifier is necessary, in many cases this is not
> >> only pointless, but has led to the introduction of various workarounds:
> >> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> >> ToQword attributes. Suppress suffix derivation when we can clearly tell
> >> that the memory operand's size isn't going to be needed to infer the
> >> possible need for the low byte/word opcode bit or an operand size prefix
> >> (0x66 or REX.W).
> >>
> >> As a result ToDword and ToQword can be dropped entirely, plus a fair
> >> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> >> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> >> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> >> 16-bit code using SIMD insns isn't well tested, clone an existing
> >> testcase just enough to cover a few insns which are potentially
> >> problematic but are being touched here.)
> >>
> >> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
> >> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
> >> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
> >> counterparts (which I think should also be corrected, but perhaps better
> >> in a separate change).
> >
> > I don't think allowing more unnecessary L and Q suffixes for AVX
> > instructions is desirable.   I prefer not to allow unnecessary L and
> > Q suffixes in folded entries.   We can add special entries to allow
> > the existing instructions with suffixes.
>
> I think we've been there before, and I continue to think that we should
> be consistent throughout the entire ISA in allowing suffixes when GPRs
> or their equivalent memory operands are involved. That's in the spirit
> of the original AT&T syntax intentions, after all. I have to admit that
> I find it particularly worrying that you suggest to introduce new
> templates, when the overall / long term goal is to reduce the set, to
> keep it manageable in spite of all the new additions that yer yet to
> come.
>
> As pointed out elsewhere, any inconsistencies here make it harder for
> people to write e.g. heavily macro-ized code. Similarly it can result
> in surprises when cloning existing code to deal with new extensions.
>

Will it work without unnecessary suffixes?

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/7] x86: improve match_template()'s diagnostics
  2022-08-18  6:14     ` Jan Beulich
@ 2022-08-18 14:51       ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2022-08-18 14:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Aug 17, 2022 at 11:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.08.2022 22:24, H.J. Lu wrote:
> > On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> At the example of
> >>
> >>         extractps $0, %xmm0, %xmm0
> >>         insertps $0, %xmm0, %eax
> >>
> >> (both having respectively the same mistake of using the wrong kind of
> >> destination register) it is easy to see that current behavior is far
> >> from ideal: The former results in "unsupported instruction" for 32-bit
> >> code simply because the 2nd template we have is a Cpu64 one. Instead we
> >> should aim at emitting the "best" possible error, which will typically
> >> be the one where we passed the largest number of checks. Generalize the
> >> original "specific_error" approach by making it apply to the entire
> >> matching loop, utilizing that line numbers increase as we pass further
> >> checks.
> >> ---
> >> As to the inval-tls testcase: Why is KMOV special? Are e.g. VMOV or
> >> other vector insns (legacy or EVEX-encoded) any different? Shouldn't the
> >> use of the respective reloc types be limited to _exactly_ the insns they
> >> are intended to be used with? Furthermore having this check in
> >> match_template() is unhelpful, as the resulting diagnostic isn't aiding
> >> in understanding what's wrong. Template matching should be left alone
> >> here, and the issue be diagnosed later, say directly in md_assemble()
> >> (alongside the various further consistency checks there) or in
> >> process_operands().
> >
> > GCC may generate invalid TLS code sequences with KMOV, not other
> > instructions.  We want to catch them by assembler.   It is easier to disallow
> > the invalid instructions.
>
> I did actually check both the discussion and gcc code in question, and I
> was not able to prove that it could have done so only for KMOV. And yes,
> I agree with disallowing the invalid instructions. The question is why
> we do so only for a limited and inconsistent subset.

So far, linker only sees it with KMOV.  Linker will issue an error if other
instructions are used.

> In addition you don't say anything regarding the point in time when we
> diagnose this, the placement of which - as said - looks sub-optimal to me.
>

Improvement is welcome.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-18  6:24     ` Jan Beulich
@ 2022-08-18 15:14       ` H.J. Lu
  2022-08-19  8:28         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-18 15:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.08.2022 22:29, H.J. Lu wrote:
> > On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> x86: re-work insn/suffix recognition
> >>
> >> Having templates with a suffix explicitly present has always been
> >> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> >
> > I don't like the second pass.   What problem does it solve?
>
> It addresses the reasons we have various pretty odd (and confusing by
> their mere presence) insn templates which better would never have been
> there. If you have a better suggestion to eliminate those, I'm all ears.
>
> You can also easily see the issues this solves by looking at the
> testsuite changes. Among other things this once again is a matter of
> providing consistent and hence predictable behavior.

Did you mean the error reporting behavior?  I don't think we should add
a second pass just for it.

> Further this sets the stage for the subsequent two changes, which I
> don't think are easily possible without this 2nd pass.

Does it indicate that the second pass is used quite often?

> And finally you've likely spotted that this is actually a reduction in
> code size, first and foremost because the odd maybe_adjust_templates()
> can now go away. Plus I think you realize that the 2nd pass wouldn't
> be engaged in many cases - it requires a template match failure in the
> 1st pass, after all, which isn't going to happen very often.
>
> Jan



--
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-18  7:21     ` Jan Beulich
@ 2022-08-18 15:30       ` H.J. Lu
  2022-08-19  6:13         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-18 15:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Aug 18, 2022 at 12:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.08.2022 21:21, H.J. Lu wrote:
> > On Tue, Aug 16, 2022 at 12:31 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> Emitting this warning for every insn, including ones having actual
> >> errors, is annoying. Introduce a boolean variable to emit the warning
> >> just once on the first insn after .arch may have changed the things, and
> >> move the warning to output_insn(). (I didn't want to go as far as
> >> checking whether the .arch actually turned off the i386 bit, but doing
> >> so would be an option.)
> >> ---
> >> Otoh I wonder whether switching to a pre-386 architecture shouldn't
> >> automatically move to CODE_16BIT: Us emitting operand- or address-size
> >> prefixes violates the architecture specification. Alternatively we
> >> could outright reject such .arch directives when not already in 16-bit
> >> mode.
>
> Mind me asking - no opinion here?

We shouldn't change the current behavior to avoid any surprises.

> > OK.
>
> Thanks.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-18 15:30       ` H.J. Lu
@ 2022-08-19  6:13         ` Jan Beulich
  2022-08-19 14:18           ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-19  6:13 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 18.08.2022 17:30, H.J. Lu wrote:
> On Thu, Aug 18, 2022 at 12:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.08.2022 21:21, H.J. Lu wrote:
>>> On Tue, Aug 16, 2022 at 12:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> Emitting this warning for every insn, including ones having actual
>>>> errors, is annoying. Introduce a boolean variable to emit the warning
>>>> just once on the first insn after .arch may have changed the things, and
>>>> move the warning to output_insn(). (I didn't want to go as far as
>>>> checking whether the .arch actually turned off the i386 bit, but doing
>>>> so would be an option.)
>>>> ---
>>>> Otoh I wonder whether switching to a pre-386 architecture shouldn't
>>>> automatically move to CODE_16BIT: Us emitting operand- or address-size
>>>> prefixes violates the architecture specification. Alternatively we
>>>> could outright reject such .arch directives when not already in 16-bit
>>>> mode.
>>
>> Mind me asking - no opinion here?
> 
> We shouldn't change the current behavior to avoid any surprises.

And continue to emit non-working code. Recall that the warning message
talks of only "addressing mode", which even I initially took to mean
what it say, and hence being entirely bogus to emit for insns without
memory operand (or anything else susceptible to address size setting).

I did actually try to find better wording for the warning, but
couldn't come up with anything halfway sensible and not overly long.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-18 14:46       ` H.J. Lu
@ 2022-08-19  8:19         ` Jan Beulich
  2022-08-19 14:23           ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-19  8:19 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 18.08.2022 16:46, H.J. Lu wrote:
> On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.08.2022 21:19, H.J. Lu wrote:
>>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
>>>> memory operand size specifier is necessary, in many cases this is not
>>>> only pointless, but has led to the introduction of various workarounds:
>>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
>>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
>>>> that the memory operand's size isn't going to be needed to infer the
>>>> possible need for the low byte/word opcode bit or an operand size prefix
>>>> (0x66 or REX.W).
>>>>
>>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
>>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
>>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
>>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
>>>> 16-bit code using SIMD insns isn't well tested, clone an existing
>>>> testcase just enough to cover a few insns which are potentially
>>>> problematic but are being touched here.)
>>>>
>>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
>>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
>>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
>>>> counterparts (which I think should also be corrected, but perhaps better
>>>> in a separate change).
>>>
>>> I don't think allowing more unnecessary L and Q suffixes for AVX
>>> instructions is desirable.   I prefer not to allow unnecessary L and
>>> Q suffixes in folded entries.   We can add special entries to allow
>>> the existing instructions with suffixes.
>>
>> I think we've been there before, and I continue to think that we should
>> be consistent throughout the entire ISA in allowing suffixes when GPRs
>> or their equivalent memory operands are involved. That's in the spirit
>> of the original AT&T syntax intentions, after all. I have to admit that
>> I find it particularly worrying that you suggest to introduce new
>> templates, when the overall / long term goal is to reduce the set, to
>> keep it manageable in spite of all the new additions that yer yet to
>> come.
>>
>> As pointed out elsewhere, any inconsistencies here make it harder for
>> people to write e.g. heavily macro-ized code. Similarly it can result
>> in surprises when cloning existing code to deal with new extensions.
>>
> 
> Will it work without unnecessary suffixes?

I'm afraid I can only guess at what "it" means in your reply. Of course
things will work for people who have never used what you call
"unnecessary" prefixes. But there are other people who believe that the
spirit of AT&T syntax is to put suffixes everywhere where multiple
operand sizes are possible, and where the suffix allows to distinguish
them. One possible reason for that could be to have the re-assurance of
the assembler pointing out mismatches between suffix and operand(s).

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-18 15:14       ` H.J. Lu
@ 2022-08-19  8:28         ` Jan Beulich
  2022-08-23  2:00           ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-19  8:28 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 18.08.2022 17:14, H.J. Lu wrote:
> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.08.2022 22:29, H.J. Lu wrote:
>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> x86: re-work insn/suffix recognition
>>>>
>>>> Having templates with a suffix explicitly present has always been
>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
>>>
>>> I don't like the second pass.   What problem does it solve?
>>
>> It addresses the reasons we have various pretty odd (and confusing by
>> their mere presence) insn templates which better would never have been
>> there. If you have a better suggestion to eliminate those, I'm all ears.
>>
>> You can also easily see the issues this solves by looking at the
>> testsuite changes. Among other things this once again is a matter of
>> providing consistent and hence predictable behavior.
> 
> Did you mean the error reporting behavior?  I don't think we should add
> a second pass just for it.

No. Certain insns simply were not accepted previously (this is actually
what finally made me think of a solution here; prior observations
weren't severe enough to try to get past your possible opposition which
was to be expected based on past discussions). And certain other ones
were wrongly accepted.

>> Further this sets the stage for the subsequent two changes, which I
>> don't think are easily possible without this 2nd pass.
> 
> Does it indicate that the second pass is used quite often?

No, what I did say ...

>> And finally you've likely spotted that this is actually a reduction in
>> code size, first and foremost because the odd maybe_adjust_templates()
>> can now go away. Plus I think you realize that the 2nd pass wouldn't
>> be engaged in many cases - it requires a template match failure in the
>> 1st pass, after all, which isn't going to happen very often.

... here will continue to be the case with those later changes.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning
  2022-08-19  6:13         ` Jan Beulich
@ 2022-08-19 14:18           ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2022-08-19 14:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Aug 18, 2022 at 11:13 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.08.2022 17:30, H.J. Lu wrote:
> > On Thu, Aug 18, 2022 at 12:21 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.08.2022 21:21, H.J. Lu wrote:
> >>> On Tue, Aug 16, 2022 at 12:31 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> Emitting this warning for every insn, including ones having actual
> >>>> errors, is annoying. Introduce a boolean variable to emit the warning
> >>>> just once on the first insn after .arch may have changed the things, and
> >>>> move the warning to output_insn(). (I didn't want to go as far as
> >>>> checking whether the .arch actually turned off the i386 bit, but doing
> >>>> so would be an option.)
> >>>> ---
> >>>> Otoh I wonder whether switching to a pre-386 architecture shouldn't
> >>>> automatically move to CODE_16BIT: Us emitting operand- or address-size
> >>>> prefixes violates the architecture specification. Alternatively we
> >>>> could outright reject such .arch directives when not already in 16-bit
> >>>> mode.
> >>
> >> Mind me asking - no opinion here?
> >
> > We shouldn't change the current behavior to avoid any surprises.
>
> And continue to emit non-working code. Recall that the warning message
> talks of only "addressing mode", which even I initially took to mean
> what it say, and hence being entirely bogus to emit for insns without
> memory operand (or anything else susceptible to address size setting).

If it is a real error, we should issue an error.

> I did actually try to find better wording for the warning, but
> couldn't come up with anything halfway sensible and not overly long.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-19  8:19         ` Jan Beulich
@ 2022-08-19 14:23           ` H.J. Lu
  2022-08-19 14:49             ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-19 14:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Fri, Aug 19, 2022 at 1:20 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.08.2022 16:46, H.J. Lu wrote:
> > On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.08.2022 21:19, H.J. Lu wrote:
> >>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
> >>>> memory operand size specifier is necessary, in many cases this is not
> >>>> only pointless, but has led to the introduction of various workarounds:
> >>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> >>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
> >>>> that the memory operand's size isn't going to be needed to infer the
> >>>> possible need for the low byte/word opcode bit or an operand size prefix
> >>>> (0x66 or REX.W).
> >>>>
> >>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
> >>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> >>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> >>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> >>>> 16-bit code using SIMD insns isn't well tested, clone an existing
> >>>> testcase just enough to cover a few insns which are potentially
> >>>> problematic but are being touched here.)
> >>>>
> >>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
> >>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
> >>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
> >>>> counterparts (which I think should also be corrected, but perhaps better
> >>>> in a separate change).
> >>>
> >>> I don't think allowing more unnecessary L and Q suffixes for AVX
> >>> instructions is desirable.   I prefer not to allow unnecessary L and
> >>> Q suffixes in folded entries.   We can add special entries to allow
> >>> the existing instructions with suffixes.
> >>
> >> I think we've been there before, and I continue to think that we should
> >> be consistent throughout the entire ISA in allowing suffixes when GPRs
> >> or their equivalent memory operands are involved. That's in the spirit
> >> of the original AT&T syntax intentions, after all. I have to admit that
> >> I find it particularly worrying that you suggest to introduce new
> >> templates, when the overall / long term goal is to reduce the set, to
> >> keep it manageable in spite of all the new additions that yer yet to
> >> come.
> >>
> >> As pointed out elsewhere, any inconsistencies here make it harder for
> >> people to write e.g. heavily macro-ized code. Similarly it can result
> >> in surprises when cloning existing code to deal with new extensions.
> >>
> >
> > Will it work without unnecessary suffixes?
>
> I'm afraid I can only guess at what "it" means in your reply. Of course
> things will work for people who have never used what you call
> "unnecessary" prefixes. But there are other people who believe that the
> spirit of AT&T syntax is to put suffixes everywhere where multiple
> operand sizes are possible, and where the suffix allows to distinguish

In glibc, integer instructions without suffixes are used to support different
vector sizes.

> them. One possible reason for that could be to have the re-assurance of
> the assembler pointing out mismatches between suffix and operand(s).



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-19 14:23           ` H.J. Lu
@ 2022-08-19 14:49             ` Jan Beulich
  2022-08-19 17:00               ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-19 14:49 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 19.08.2022 16:23, H.J. Lu wrote:
> On Fri, Aug 19, 2022 at 1:20 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 18.08.2022 16:46, H.J. Lu wrote:
>>> On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 17.08.2022 21:19, H.J. Lu wrote:
>>>>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
>>>>>> memory operand size specifier is necessary, in many cases this is not
>>>>>> only pointless, but has led to the introduction of various workarounds:
>>>>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
>>>>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
>>>>>> that the memory operand's size isn't going to be needed to infer the
>>>>>> possible need for the low byte/word opcode bit or an operand size prefix
>>>>>> (0x66 or REX.W).
>>>>>>
>>>>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
>>>>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
>>>>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
>>>>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
>>>>>> 16-bit code using SIMD insns isn't well tested, clone an existing
>>>>>> testcase just enough to cover a few insns which are potentially
>>>>>> problematic but are being touched here.)
>>>>>>
>>>>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
>>>>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
>>>>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
>>>>>> counterparts (which I think should also be corrected, but perhaps better
>>>>>> in a separate change).
>>>>>
>>>>> I don't think allowing more unnecessary L and Q suffixes for AVX
>>>>> instructions is desirable.   I prefer not to allow unnecessary L and
>>>>> Q suffixes in folded entries.   We can add special entries to allow
>>>>> the existing instructions with suffixes.
>>>>
>>>> I think we've been there before, and I continue to think that we should
>>>> be consistent throughout the entire ISA in allowing suffixes when GPRs
>>>> or their equivalent memory operands are involved. That's in the spirit
>>>> of the original AT&T syntax intentions, after all. I have to admit that
>>>> I find it particularly worrying that you suggest to introduce new
>>>> templates, when the overall / long term goal is to reduce the set, to
>>>> keep it manageable in spite of all the new additions that yer yet to
>>>> come.
>>>>
>>>> As pointed out elsewhere, any inconsistencies here make it harder for
>>>> people to write e.g. heavily macro-ized code. Similarly it can result
>>>> in surprises when cloning existing code to deal with new extensions.
>>>>
>>>
>>> Will it work without unnecessary suffixes?
>>
>> I'm afraid I can only guess at what "it" means in your reply. Of course
>> things will work for people who have never used what you call
>> "unnecessary" prefixes. But there are other people who believe that the
>> spirit of AT&T syntax is to put suffixes everywhere where multiple
>> operand sizes are possible, and where the suffix allows to distinguish
> 
> In glibc, integer instructions without suffixes are used to support different
> vector sizes.

1) Could you please point me at an example?

2) How is this related? We wouldn't require suffixes all of the sudden,
we'd only permit their use.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-19 14:49             ` Jan Beulich
@ 2022-08-19 17:00               ` H.J. Lu
  2022-08-22  9:34                 ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-19 17:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Fri, Aug 19, 2022 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 19.08.2022 16:23, H.J. Lu wrote:
> > On Fri, Aug 19, 2022 at 1:20 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 18.08.2022 16:46, H.J. Lu wrote:
> >>> On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 17.08.2022 21:19, H.J. Lu wrote:
> >>>>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
> >>>>>> memory operand size specifier is necessary, in many cases this is not
> >>>>>> only pointless, but has led to the introduction of various workarounds:
> >>>>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> >>>>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
> >>>>>> that the memory operand's size isn't going to be needed to infer the
> >>>>>> possible need for the low byte/word opcode bit or an operand size prefix
> >>>>>> (0x66 or REX.W).
> >>>>>>
> >>>>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
> >>>>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> >>>>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> >>>>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> >>>>>> 16-bit code using SIMD insns isn't well tested, clone an existing
> >>>>>> testcase just enough to cover a few insns which are potentially
> >>>>>> problematic but are being touched here.)
> >>>>>>
> >>>>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
> >>>>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
> >>>>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
> >>>>>> counterparts (which I think should also be corrected, but perhaps better
> >>>>>> in a separate change).
> >>>>>
> >>>>> I don't think allowing more unnecessary L and Q suffixes for AVX
> >>>>> instructions is desirable.   I prefer not to allow unnecessary L and
> >>>>> Q suffixes in folded entries.   We can add special entries to allow
> >>>>> the existing instructions with suffixes.
> >>>>
> >>>> I think we've been there before, and I continue to think that we should
> >>>> be consistent throughout the entire ISA in allowing suffixes when GPRs
> >>>> or their equivalent memory operands are involved. That's in the spirit
> >>>> of the original AT&T syntax intentions, after all. I have to admit that
> >>>> I find it particularly worrying that you suggest to introduce new
> >>>> templates, when the overall / long term goal is to reduce the set, to
> >>>> keep it manageable in spite of all the new additions that yer yet to
> >>>> come.
> >>>>
> >>>> As pointed out elsewhere, any inconsistencies here make it harder for
> >>>> people to write e.g. heavily macro-ized code. Similarly it can result
> >>>> in surprises when cloning existing code to deal with new extensions.
> >>>>
> >>>
> >>> Will it work without unnecessary suffixes?
> >>
> >> I'm afraid I can only guess at what "it" means in your reply. Of course
> >> things will work for people who have never used what you call
> >> "unnecessary" prefixes. But there are other people who believe that the
> >> spirit of AT&T syntax is to put suffixes everywhere where multiple
> >> operand sizes are possible, and where the suffix allows to distinguish
> >
> > In glibc, integer instructions without suffixes are used to support different
> > vector sizes.

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strlen-evex-base.S

> 1) Could you please point me at an example?
>
> 2) How is this related? We wouldn't require suffixes all of the sudden,
> we'd only permit their use.

We shouldn't add suffixes when they aren't needed.   Suffixes aren't
required.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-19 17:00               ` H.J. Lu
@ 2022-08-22  9:34                 ` Jan Beulich
  2022-08-22 14:38                   ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-22  9:34 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 19.08.2022 19:00, H.J. Lu wrote:
> On Fri, Aug 19, 2022 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 19.08.2022 16:23, H.J. Lu wrote:
>>> On Fri, Aug 19, 2022 at 1:20 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 18.08.2022 16:46, H.J. Lu wrote:
>>>>> On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 17.08.2022 21:19, H.J. Lu wrote:
>>>>>>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
>>>>>>>> memory operand size specifier is necessary, in many cases this is not
>>>>>>>> only pointless, but has led to the introduction of various workarounds:
>>>>>>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
>>>>>>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
>>>>>>>> that the memory operand's size isn't going to be needed to infer the
>>>>>>>> possible need for the low byte/word opcode bit or an operand size prefix
>>>>>>>> (0x66 or REX.W).
>>>>>>>>
>>>>>>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
>>>>>>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
>>>>>>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
>>>>>>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
>>>>>>>> 16-bit code using SIMD insns isn't well tested, clone an existing
>>>>>>>> testcase just enough to cover a few insns which are potentially
>>>>>>>> problematic but are being touched here.)
>>>>>>>>
>>>>>>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
>>>>>>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
>>>>>>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
>>>>>>>> counterparts (which I think should also be corrected, but perhaps better
>>>>>>>> in a separate change).
>>>>>>>
>>>>>>> I don't think allowing more unnecessary L and Q suffixes for AVX
>>>>>>> instructions is desirable.   I prefer not to allow unnecessary L and
>>>>>>> Q suffixes in folded entries.   We can add special entries to allow
>>>>>>> the existing instructions with suffixes.
>>>>>>
>>>>>> I think we've been there before, and I continue to think that we should
>>>>>> be consistent throughout the entire ISA in allowing suffixes when GPRs
>>>>>> or their equivalent memory operands are involved. That's in the spirit
>>>>>> of the original AT&T syntax intentions, after all. I have to admit that
>>>>>> I find it particularly worrying that you suggest to introduce new
>>>>>> templates, when the overall / long term goal is to reduce the set, to
>>>>>> keep it manageable in spite of all the new additions that yer yet to
>>>>>> come.
>>>>>>
>>>>>> As pointed out elsewhere, any inconsistencies here make it harder for
>>>>>> people to write e.g. heavily macro-ized code. Similarly it can result
>>>>>> in surprises when cloning existing code to deal with new extensions.
>>>>>>
>>>>>
>>>>> Will it work without unnecessary suffixes?
>>>>
>>>> I'm afraid I can only guess at what "it" means in your reply. Of course
>>>> things will work for people who have never used what you call
>>>> "unnecessary" prefixes. But there are other people who believe that the
>>>> spirit of AT&T syntax is to put suffixes everywhere where multiple
>>>> operand sizes are possible, and where the suffix allows to distinguish
>>>
>>> In glibc, integer instructions without suffixes are used to support different
>>> vector sizes.
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strlen-evex-base.S

I'm afraid I don't see how this is related to the topic. Yes, that's one
way to do such programming. But it doesn't mean others shouldn't be
allowed to do things differently, to their liking. Plus - integer
instructions aren't relevant here at all; we permit suffixes for all of
them anyway. Vector / scalar instructions are what matters, and I see
they actually abstract kmov{q,d} via a KMOV pre-processor macro, for
example. (Not relevant here: I actually view it as a shortcoming of
the assembler that they need to do that, rather than us allowing use of
"kmov" without the Intel-mandated suffix. Obviously this would extend
to other insns as well.)

>> 1) Could you please point me at an example?
>>
>> 2) How is this related? We wouldn't require suffixes all of the sudden,
>> we'd only permit their use.
> 
> We shouldn't add suffixes when they aren't needed.   Suffixes aren't
> required.

You're re-stating what I've said. What you look to not be willing to
accept is that we ought to _allow_ use of suffixes in _all_ places
where they might matter, _irrespective_ of them being required.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/7] x86/Intel: restrict suffix derivation
  2022-08-22  9:34                 ` Jan Beulich
@ 2022-08-22 14:38                   ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2022-08-22 14:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Aug 22, 2022 at 2:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 19.08.2022 19:00, H.J. Lu wrote:
> > On Fri, Aug 19, 2022 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 19.08.2022 16:23, H.J. Lu wrote:
> >>> On Fri, Aug 19, 2022 at 1:20 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 18.08.2022 16:46, H.J. Lu wrote:
> >>>>> On Wed, Aug 17, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 17.08.2022 21:19, H.J. Lu wrote:
> >>>>>>> On Tue, Aug 16, 2022 at 12:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> While in some cases deriving an AT&T-style suffix from an Intel syntax
> >>>>>>>> memory operand size specifier is necessary, in many cases this is not
> >>>>>>>> only pointless, but has led to the introduction of various workarounds:
> >>>>>>>> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> >>>>>>>> ToQword attributes. Suppress suffix derivation when we can clearly tell
> >>>>>>>> that the memory operand's size isn't going to be needed to infer the
> >>>>>>>> possible need for the low byte/word opcode bit or an operand size prefix
> >>>>>>>> (0x66 or REX.W).
> >>>>>>>>
> >>>>>>>> As a result ToDword and ToQword can be dropped entirely, plus a fair
> >>>>>>>> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> >>>>>>>> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> >>>>>>>> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> >>>>>>>> 16-bit code using SIMD insns isn't well tested, clone an existing
> >>>>>>>> testcase just enough to cover a few insns which are potentially
> >>>>>>>> problematic but are being touched here.)
> >>>>>>>>
> >>>>>>>> As a side effect of folding the VCVT{,T}S{S,D,H}2SI templates,
> >>>>>>>> VCVT{,T}SH2SI will now allow L and Q suffixes, consistent with
> >>>>>>>> VCVT{,T}S{S,D}2SI. All of these remain inconsistent with their 2USI
> >>>>>>>> counterparts (which I think should also be corrected, but perhaps better
> >>>>>>>> in a separate change).
> >>>>>>>
> >>>>>>> I don't think allowing more unnecessary L and Q suffixes for AVX
> >>>>>>> instructions is desirable.   I prefer not to allow unnecessary L and
> >>>>>>> Q suffixes in folded entries.   We can add special entries to allow
> >>>>>>> the existing instructions with suffixes.
> >>>>>>
> >>>>>> I think we've been there before, and I continue to think that we should
> >>>>>> be consistent throughout the entire ISA in allowing suffixes when GPRs
> >>>>>> or their equivalent memory operands are involved. That's in the spirit
> >>>>>> of the original AT&T syntax intentions, after all. I have to admit that
> >>>>>> I find it particularly worrying that you suggest to introduce new
> >>>>>> templates, when the overall / long term goal is to reduce the set, to
> >>>>>> keep it manageable in spite of all the new additions that yer yet to
> >>>>>> come.
> >>>>>>
> >>>>>> As pointed out elsewhere, any inconsistencies here make it harder for
> >>>>>> people to write e.g. heavily macro-ized code. Similarly it can result
> >>>>>> in surprises when cloning existing code to deal with new extensions.
> >>>>>>
> >>>>>
> >>>>> Will it work without unnecessary suffixes?
> >>>>
> >>>> I'm afraid I can only guess at what "it" means in your reply. Of course
> >>>> things will work for people who have never used what you call
> >>>> "unnecessary" prefixes. But there are other people who believe that the
> >>>> spirit of AT&T syntax is to put suffixes everywhere where multiple
> >>>> operand sizes are possible, and where the suffix allows to distinguish
> >>>
> >>> In glibc, integer instructions without suffixes are used to support different
> >>> vector sizes.
> >
> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strlen-evex-base.S
>
> I'm afraid I don't see how this is related to the topic. Yes, that's one
> way to do such programming. But it doesn't mean others shouldn't be
> allowed to do things differently, to their liking. Plus - integer
> instructions aren't relevant here at all; we permit suffixes for all of
> them anyway. Vector / scalar instructions are what matters, and I see
> they actually abstract kmov{q,d} via a KMOV pre-processor macro, for
> example. (Not relevant here: I actually view it as a shortcoming of
> the assembler that they need to do that, rather than us allowing use of
> "kmov" without the Intel-mandated suffix. Obviously this would extend
> to other insns as well.)
>
> >> 1) Could you please point me at an example?
> >>
> >> 2) How is this related? We wouldn't require suffixes all of the sudden,
> >> we'd only permit their use.
> >
> > We shouldn't add suffixes when they aren't needed.   Suffixes aren't
> > required.
>
> You're re-stating what I've said. What you look to not be willing to
> accept is that we ought to _allow_ use of suffixes in _all_ places
> where they might matter, _irrespective_ of them being required.
>

That is correct.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-19  8:28         ` Jan Beulich
@ 2022-08-23  2:00           ` H.J. Lu
  2022-08-26  9:26             ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-23  2:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.08.2022 17:14, H.J. Lu wrote:
> > On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.08.2022 22:29, H.J. Lu wrote:
> >>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> x86: re-work insn/suffix recognition
> >>>>
> >>>> Having templates with a suffix explicitly present has always been
> >>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> >>>
> >>> I don't like the second pass.   What problem does it solve?
> >>
> >> It addresses the reasons we have various pretty odd (and confusing by
> >> their mere presence) insn templates which better would never have been
> >> there. If you have a better suggestion to eliminate those, I'm all ears.
> >>
> >> You can also easily see the issues this solves by looking at the
> >> testsuite changes. Among other things this once again is a matter of
> >> providing consistent and hence predictable behavior.
> >
> > Did you mean the error reporting behavior?  I don't think we should add
> > a second pass just for it.
>
> No. Certain insns simply were not accepted previously (this is actually
> what finally made me think of a solution here; prior observations
> weren't severe enough to try to get past your possible opposition which
> was to be expected based on past discussions). And certain other ones
> were wrongly accepted.

Please open bug reports for these cases.

> >> Further this sets the stage for the subsequent two changes, which I
> >> don't think are easily possible without this 2nd pass.
> >
> > Does it indicate that the second pass is used quite often?
>
> No, what I did say ...
>
> >> And finally you've likely spotted that this is actually a reduction in
> >> code size, first and foremost because the odd maybe_adjust_templates()
> >> can now go away. Plus I think you realize that the 2nd pass wouldn't
> >> be engaged in many cases - it requires a template match failure in the
> >> 1st pass, after all, which isn't going to happen very often.

There is a fixed cost to prepare for the second pass.

> ... here will continue to be the case with those later changes.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-23  2:00           ` H.J. Lu
@ 2022-08-26  9:26             ` Jan Beulich
  2022-08-26 18:46               ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-08-26  9:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 23.08.2022 04:00, H.J. Lu wrote:
> On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 18.08.2022 17:14, H.J. Lu wrote:
>>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 17.08.2022 22:29, H.J. Lu wrote:
>>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> x86: re-work insn/suffix recognition
>>>>>>
>>>>>> Having templates with a suffix explicitly present has always been
>>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
>>>>>
>>>>> I don't like the second pass.   What problem does it solve?
>>>>
>>>> It addresses the reasons we have various pretty odd (and confusing by
>>>> their mere presence) insn templates which better would never have been
>>>> there. If you have a better suggestion to eliminate those, I'm all ears.
>>>>
>>>> You can also easily see the issues this solves by looking at the
>>>> testsuite changes. Among other things this once again is a matter of
>>>> providing consistent and hence predictable behavior.
>>>
>>> Did you mean the error reporting behavior?  I don't think we should add
>>> a second pass just for it.
>>
>> No. Certain insns simply were not accepted previously (this is actually
>> what finally made me think of a solution here; prior observations
>> weren't severe enough to try to get past your possible opposition which
>> was to be expected based on past discussions). And certain other ones
>> were wrongly accepted.
> 
> Please open bug reports for these cases.

PR gas/29524
PR gas/29525
PR gas/29526

But really - what's the point of making me waste time on creating bug
reports when fixes are already available?

In any event, I'll be shortly submitting v2 of the series addressing
these.

>>>> Further this sets the stage for the subsequent two changes, which I
>>>> don't think are easily possible without this 2nd pass.
>>>
>>> Does it indicate that the second pass is used quite often?
>>
>> No, what I did say ...
>>
>>>> And finally you've likely spotted that this is actually a reduction in
>>>> code size, first and foremost because the odd maybe_adjust_templates()
>>>> can now go away. Plus I think you realize that the 2nd pass wouldn't
>>>> be engaged in many cases - it requires a template match failure in the
>>>> 1st pass, after all, which isn't going to happen very often.
> 
> There is a fixed cost to prepare for the second pass.

A very limited one (I suppose there are enough other things which are
more of an overhead). Plus part of the preparation happens only when
the first pass didn't find any match.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-26  9:26             ` Jan Beulich
@ 2022-08-26 18:46               ` H.J. Lu
  2022-09-06  6:40                 ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-08-26 18:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Fri, Aug 26, 2022 at 2:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 23.08.2022 04:00, H.J. Lu wrote:
> > On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 18.08.2022 17:14, H.J. Lu wrote:
> >>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 17.08.2022 22:29, H.J. Lu wrote:
> >>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> x86: re-work insn/suffix recognition
> >>>>>>
> >>>>>> Having templates with a suffix explicitly present has always been
> >>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> >>>>>
> >>>>> I don't like the second pass.   What problem does it solve?
> >>>>
> >>>> It addresses the reasons we have various pretty odd (and confusing by
> >>>> their mere presence) insn templates which better would never have been
> >>>> there. If you have a better suggestion to eliminate those, I'm all ears.
> >>>>
> >>>> You can also easily see the issues this solves by looking at the
> >>>> testsuite changes. Among other things this once again is a matter of
> >>>> providing consistent and hence predictable behavior.
> >>>
> >>> Did you mean the error reporting behavior?  I don't think we should add
> >>> a second pass just for it.
> >>
> >> No. Certain insns simply were not accepted previously (this is actually
> >> what finally made me think of a solution here; prior observations
> >> weren't severe enough to try to get past your possible opposition which
> >> was to be expected based on past discussions). And certain other ones
> >> were wrongly accepted.
> >
> > Please open bug reports for these cases.
>
> PR gas/29524
> PR gas/29525
> PR gas/29526
>
> But really - what's the point of making me waste time on creating bug
> reports when fixes are already available?

I don't see them as real issues and we shouldn't make assembler
more complex because of them.

> In any event, I'll be shortly submitting v2 of the series addressing
> these.
>
> >>>> Further this sets the stage for the subsequent two changes, which I
> >>>> don't think are easily possible without this 2nd pass.
> >>>
> >>> Does it indicate that the second pass is used quite often?
> >>
> >> No, what I did say ...
> >>
> >>>> And finally you've likely spotted that this is actually a reduction in
> >>>> code size, first and foremost because the odd maybe_adjust_templates()
> >>>> can now go away. Plus I think you realize that the 2nd pass wouldn't
> >>>> be engaged in many cases - it requires a template match failure in the
> >>>> 1st pass, after all, which isn't going to happen very often.
> >
> > There is a fixed cost to prepare for the second pass.
>
> A very limited one (I suppose there are enough other things which are
> more of an overhead). Plus part of the preparation happens only when
> the first pass didn't find any match.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-08-26 18:46               ` H.J. Lu
@ 2022-09-06  6:40                 ` Jan Beulich
  2022-09-06 21:53                   ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-09-06  6:40 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils, Nick Clifton

On 26.08.2022 20:46, H.J. Lu wrote:
> On Fri, Aug 26, 2022 at 2:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 23.08.2022 04:00, H.J. Lu wrote:
>>> On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 18.08.2022 17:14, H.J. Lu wrote:
>>>>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 17.08.2022 22:29, H.J. Lu wrote:
>>>>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> x86: re-work insn/suffix recognition
>>>>>>>>
>>>>>>>> Having templates with a suffix explicitly present has always been
>>>>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
>>>>>>>
>>>>>>> I don't like the second pass.   What problem does it solve?
>>>>>>
>>>>>> It addresses the reasons we have various pretty odd (and confusing by
>>>>>> their mere presence) insn templates which better would never have been
>>>>>> there. If you have a better suggestion to eliminate those, I'm all ears.
>>>>>>
>>>>>> You can also easily see the issues this solves by looking at the
>>>>>> testsuite changes. Among other things this once again is a matter of
>>>>>> providing consistent and hence predictable behavior.
>>>>>
>>>>> Did you mean the error reporting behavior?  I don't think we should add
>>>>> a second pass just for it.
>>>>
>>>> No. Certain insns simply were not accepted previously (this is actually
>>>> what finally made me think of a solution here; prior observations
>>>> weren't severe enough to try to get past your possible opposition which
>>>> was to be expected based on past discussions). And certain other ones
>>>> were wrongly accepted.
>>>
>>> Please open bug reports for these cases.
>>
>> PR gas/29524
>> PR gas/29525
>> PR gas/29526
>>
>> But really - what's the point of making me waste time on creating bug
>> reports when fixes are already available?
> 
> I don't see them as real issues and we shouldn't make assembler
> more complex because of them.

I sincerely disagree. As said many times - first and foremost the assembler
should behave _consistently_. People should be able to predict behavior for
one insn by knowing what the behavior is for sufficiently similar insns,
without - as is the case twice here - having to further consider anomalies
resulting from _dissimilar_ insns.

I therefore also disagree with you having closed some of the entered bugs
as WONTFIX. I have to admit that I really wonder in how far binutils is an
open source project if (for x86) you alone take such decisions.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-06  6:40                 ` Jan Beulich
@ 2022-09-06 21:53                   ` H.J. Lu
  2022-09-07  7:17                     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-09-06 21:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Nick Clifton

On Mon, Sep 5, 2022 at 11:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 26.08.2022 20:46, H.J. Lu wrote:
> > On Fri, Aug 26, 2022 at 2:26 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 23.08.2022 04:00, H.J. Lu wrote:
> >>> On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 18.08.2022 17:14, H.J. Lu wrote:
> >>>>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 17.08.2022 22:29, H.J. Lu wrote:
> >>>>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> x86: re-work insn/suffix recognition
> >>>>>>>>
> >>>>>>>> Having templates with a suffix explicitly present has always been
> >>>>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> >>>>>>>
> >>>>>>> I don't like the second pass.   What problem does it solve?
> >>>>>>
> >>>>>> It addresses the reasons we have various pretty odd (and confusing by
> >>>>>> their mere presence) insn templates which better would never have been
> >>>>>> there. If you have a better suggestion to eliminate those, I'm all ears.
> >>>>>>
> >>>>>> You can also easily see the issues this solves by looking at the
> >>>>>> testsuite changes. Among other things this once again is a matter of
> >>>>>> providing consistent and hence predictable behavior.
> >>>>>
> >>>>> Did you mean the error reporting behavior?  I don't think we should add
> >>>>> a second pass just for it.
> >>>>
> >>>> No. Certain insns simply were not accepted previously (this is actually
> >>>> what finally made me think of a solution here; prior observations
> >>>> weren't severe enough to try to get past your possible opposition which
> >>>> was to be expected based on past discussions). And certain other ones
> >>>> were wrongly accepted.
> >>>
> >>> Please open bug reports for these cases.
> >>
> >> PR gas/29524
> >> PR gas/29525
> >> PR gas/29526
> >>
> >> But really - what's the point of making me waste time on creating bug
> >> reports when fixes are already available?
> >
> > I don't see them as real issues and we shouldn't make assembler
> > more complex because of them.
>
> I sincerely disagree. As said many times - first and foremost the assembler
> should behave _consistently_. People should be able to predict behavior for
> one insn by knowing what the behavior is for sufficiently similar insns,
> without - as is the case twice here - having to further consider anomalies
> resulting from _dissimilar_ insns.

Assembler should be consistent with x86 SDM and the existing usage.
Due to the history/nature of AT&T syntax and x86 instructions, there are
existing inconsistencies.  I don't think we should issue a warning for cmpsd.
It is inconsistent with the 'd' suffix instead of 'l'.  But it is
consistent with SDM.
What I'd like to see in mnemonics:

1. They are as close to SDM as possible.
2. Allow prefix when there is an ambiguity.
3. Intel syntax shouldn't depend on prefixes

BTW, there is one inconsistency:

https://sourceware.org/bugzilla/show_bug.cgi?id=29551

I'd like to resolve.

>
> I therefore also disagree with you having closed some of the entered bugs
> as WONTFIX. I have to admit that I really wonder in how far binutils is an
> open source project if (for x86) you alone take such decisions.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-06 21:53                   ` H.J. Lu
@ 2022-09-07  7:17                     ` Jan Beulich
  2022-09-26 23:52                       ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-09-07  7:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils, Nick Clifton

On 06.09.2022 23:53, H.J. Lu wrote:
> On Mon, Sep 5, 2022 at 11:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 26.08.2022 20:46, H.J. Lu wrote:
>>> On Fri, Aug 26, 2022 at 2:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 23.08.2022 04:00, H.J. Lu wrote:
>>>>> On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 18.08.2022 17:14, H.J. Lu wrote:
>>>>>>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 17.08.2022 22:29, H.J. Lu wrote:
>>>>>>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> x86: re-work insn/suffix recognition
>>>>>>>>>>
>>>>>>>>>> Having templates with a suffix explicitly present has always been
>>>>>>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
>>>>>>>>>
>>>>>>>>> I don't like the second pass.   What problem does it solve?
>>>>>>>>
>>>>>>>> It addresses the reasons we have various pretty odd (and confusing by
>>>>>>>> their mere presence) insn templates which better would never have been
>>>>>>>> there. If you have a better suggestion to eliminate those, I'm all ears.
>>>>>>>>
>>>>>>>> You can also easily see the issues this solves by looking at the
>>>>>>>> testsuite changes. Among other things this once again is a matter of
>>>>>>>> providing consistent and hence predictable behavior.
>>>>>>>
>>>>>>> Did you mean the error reporting behavior?  I don't think we should add
>>>>>>> a second pass just for it.
>>>>>>
>>>>>> No. Certain insns simply were not accepted previously (this is actually
>>>>>> what finally made me think of a solution here; prior observations
>>>>>> weren't severe enough to try to get past your possible opposition which
>>>>>> was to be expected based on past discussions). And certain other ones
>>>>>> were wrongly accepted.
>>>>>
>>>>> Please open bug reports for these cases.
>>>>
>>>> PR gas/29524
>>>> PR gas/29525
>>>> PR gas/29526
>>>>
>>>> But really - what's the point of making me waste time on creating bug
>>>> reports when fixes are already available?
>>>
>>> I don't see them as real issues and we shouldn't make assembler
>>> more complex because of them.
>>
>> I sincerely disagree. As said many times - first and foremost the assembler
>> should behave _consistently_. People should be able to predict behavior for
>> one insn by knowing what the behavior is for sufficiently similar insns,
>> without - as is the case twice here - having to further consider anomalies
>> resulting from _dissimilar_ insns.
> 
> Assembler should be consistent with x86 SDM and the existing usage.
> Due to the history/nature of AT&T syntax and x86 instructions, there are
> existing inconsistencies.  I don't think we should issue a warning for cmpsd.
> It is inconsistent with the 'd' suffix instead of 'l'.  But it is
> consistent with SDM.
> What I'd like to see in mnemonics:
> 
> 1. They are as close to SDM as possible.

I.e. you want to add support for SCASD and alike? I doubt that was ever
intended with AT&T syntax; instead divergence from Intel documentation
was intentional from all I can tell.

> 2. Allow prefix when there is an ambiguity.
> 3. Intel syntax shouldn't depend on prefixes

DYM "suffix" instead of "prefix" in these two? Assuming so, for 2) it
ought to be "require", not "allow", and for 3) you certainly mean to
allow for exceptions where the SDM has such (e.g. IRETD) or implies
such (e.g. RETD) for there not being other ways to express operand
size (within SDM nomenclature).

As to IRETD - note how the SDM unhelpfully implies IRET to mean IRETW.
We can't follow that doc to the word because of its own inconsistencies
and/or shortcomings.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-07  7:17                     ` Jan Beulich
@ 2022-09-26 23:52                       ` H.J. Lu
  2022-09-28 12:49                         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-09-26 23:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Nick Clifton

On Wed, Sep 7, 2022 at 12:17 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.09.2022 23:53, H.J. Lu wrote:
> > On Mon, Sep 5, 2022 at 11:40 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 26.08.2022 20:46, H.J. Lu wrote:
> >>> On Fri, Aug 26, 2022 at 2:26 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 23.08.2022 04:00, H.J. Lu wrote:
> >>>>> On Fri, Aug 19, 2022 at 1:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 18.08.2022 17:14, H.J. Lu wrote:
> >>>>>>> On Wed, Aug 17, 2022 at 11:24 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> On 17.08.2022 22:29, H.J. Lu wrote:
> >>>>>>>>> On Tue, Aug 16, 2022 at 12:32 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> x86: re-work insn/suffix recognition
> >>>>>>>>>>
> >>>>>>>>>> Having templates with a suffix explicitly present has always been
> >>>>>>>>>> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> >>>>>>>>>
> >>>>>>>>> I don't like the second pass.   What problem does it solve?
> >>>>>>>>
> >>>>>>>> It addresses the reasons we have various pretty odd (and confusing by
> >>>>>>>> their mere presence) insn templates which better would never have been
> >>>>>>>> there. If you have a better suggestion to eliminate those, I'm all ears.
> >>>>>>>>
> >>>>>>>> You can also easily see the issues this solves by looking at the
> >>>>>>>> testsuite changes. Among other things this once again is a matter of
> >>>>>>>> providing consistent and hence predictable behavior.
> >>>>>>>
> >>>>>>> Did you mean the error reporting behavior?  I don't think we should add
> >>>>>>> a second pass just for it.
> >>>>>>
> >>>>>> No. Certain insns simply were not accepted previously (this is actually
> >>>>>> what finally made me think of a solution here; prior observations
> >>>>>> weren't severe enough to try to get past your possible opposition which
> >>>>>> was to be expected based on past discussions). And certain other ones
> >>>>>> were wrongly accepted.
> >>>>>
> >>>>> Please open bug reports for these cases.
> >>>>
> >>>> PR gas/29524
> >>>> PR gas/29525
> >>>> PR gas/29526
> >>>>
> >>>> But really - what's the point of making me waste time on creating bug
> >>>> reports when fixes are already available?
> >>>
> >>> I don't see them as real issues and we shouldn't make assembler
> >>> more complex because of them.
> >>
> >> I sincerely disagree. As said many times - first and foremost the assembler
> >> should behave _consistently_. People should be able to predict behavior for
> >> one insn by knowing what the behavior is for sufficiently similar insns,
> >> without - as is the case twice here - having to further consider anomalies
> >> resulting from _dissimilar_ insns.
> >
> > Assembler should be consistent with x86 SDM and the existing usage.
> > Due to the history/nature of AT&T syntax and x86 instructions, there are
> > existing inconsistencies.  I don't think we should issue a warning for cmpsd.
> > It is inconsistent with the 'd' suffix instead of 'l'.  But it is
> > consistent with SDM.
> > What I'd like to see in mnemonics:
> >
> > 1. They are as close to SDM as possible.
>
> I.e. you want to add support for SCASD and alike? I doubt that was ever
> intended with AT&T syntax; instead divergence from Intel documentation
> was intentional from all I can tell.
>
> > 2. Allow prefix when there is an ambiguity.
> > 3. Intel syntax shouldn't depend on prefixes
>
> DYM "suffix" instead of "prefix" in these two? Assuming so, for 2) it
> ought to be "require", not "allow", and for 3) you certainly mean to
> allow for exceptions where the SDM has such (e.g. IRETD) or implies
> such (e.g. RETD) for there not being other ways to express operand
> size (within SDM nomenclature).
>
> As to IRETD - note how the SDM unhelpfully implies IRET to mean IRETW.
> We can't follow that doc to the word because of its own inconsistencies
> and/or shortcomings.
>

Sorry for the delay.  I was on vacation.  My main concern is to call
strdup and free for each instruction.   I prefer to add new entries to
deal with rare cases instead of penalizing all instructions.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-26 23:52                       ` H.J. Lu
@ 2022-09-28 12:49                         ` Jan Beulich
  2022-09-28 19:33                           ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-09-28 12:49 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 27.09.2022 01:52, H.J. Lu wrote:
> Sorry for the delay.  I was on vacation.  My main concern is to call
> strdup and free for each instruction.   I prefer to add new entries to
> deal with rare cases instead of penalizing all instructions.

Hmm, I think I can take care of this concern: As it looks, at least
parse_insn() leaves the input buffer undisturbed, so minimally I ought
to be able to limit the strdup() to just a very small set of
mnemonics. I'm not sure yet if I may even be able to avoid the copying
altogether; I'll have to check quite carefully in particular
parse_operands() and the functions it calls. But perhaps relying on
this would be risky looking forward, so I guess we better don't make
assumptions here and instead flag mnemonics (in the templates) where
retrying may be necessary when no match was found during the 1st pass.

FTAOD - I take it calling free() with a NULL argument is not a concern?

I guess to prove (and going forward guarantee) the apparent behavior of
parse_insn() I'd like to constify its first parameter. This might
involve adding a cast (to drop const-ness again after the call), which
I generally would like to avoid, or some "interesting" pointer
arithmetic. If you have any opinion here up front, please let me know.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-28 12:49                         ` Jan Beulich
@ 2022-09-28 19:33                           ` H.J. Lu
  2022-09-29  8:08                             ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-09-28 19:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Sep 28, 2022 at 5:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.09.2022 01:52, H.J. Lu wrote:
> > Sorry for the delay.  I was on vacation.  My main concern is to call
> > strdup and free for each instruction.   I prefer to add new entries to
> > deal with rare cases instead of penalizing all instructions.
>
> Hmm, I think I can take care of this concern: As it looks, at least
> parse_insn() leaves the input buffer undisturbed, so minimally I ought
> to be able to limit the strdup() to just a very small set of
> mnemonics. I'm not sure yet if I may even be able to avoid the copying
> altogether; I'll have to check quite carefully in particular
> parse_operands() and the functions it calls. But perhaps relying on
> this would be risky looking forward, so I guess we better don't make
> assumptions here and instead flag mnemonics (in the templates) where
> retrying may be necessary when no match was found during the 1st pass.

This sounds reasonable.

> FTAOD - I take it calling free() with a NULL argument is not a concern?

We can use free (NULL).

> I guess to prove (and going forward guarantee) the apparent behavior of
> parse_insn() I'd like to constify its first parameter. This might
> involve adding a cast (to drop const-ness again after the call), which
> I generally would like to avoid, or some "interesting" pointer
> arithmetic. If you have any opinion here up front, please let me know.

Can we avoid it by adding some new entries to the opcode table?
I don't think we need many such entries.


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-28 19:33                           ` H.J. Lu
@ 2022-09-29  8:08                             ` Jan Beulich
  2022-09-29 16:00                               ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-09-29  8:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 28.09.2022 21:33, H.J. Lu wrote:
> On Wed, Sep 28, 2022 at 5:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>> I guess to prove (and going forward guarantee) the apparent behavior of
>> parse_insn() I'd like to constify its first parameter. This might
>> involve adding a cast (to drop const-ness again after the call), which
>> I generally would like to avoid, or some "interesting" pointer
>> arithmetic. If you have any opinion here up front, please let me know.
> 
> Can we avoid it by adding some new entries to the opcode table?
> I don't think we need many such entries.

I'm afraid I don't see the connection between the intended constification
and what entries there are (or not) in the opcode table. I view it as a
desirable property of the function in the first place to express its
behavior (of not altering the input string) by a pointer-to-const
parameter. In fact I guess I would make such an adjustment a standalone
(prereq for the larger change) patch.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-29  8:08                             ` Jan Beulich
@ 2022-09-29 16:00                               ` H.J. Lu
  2022-09-29 16:06                                 ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2022-09-29 16:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Sep 29, 2022 at 1:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.09.2022 21:33, H.J. Lu wrote:
> > On Wed, Sep 28, 2022 at 5:49 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> I guess to prove (and going forward guarantee) the apparent behavior of
> >> parse_insn() I'd like to constify its first parameter. This might
> >> involve adding a cast (to drop const-ness again after the call), which
> >> I generally would like to avoid, or some "interesting" pointer
> >> arithmetic. If you have any opinion here up front, please let me know.
> >
> > Can we avoid it by adding some new entries to the opcode table?
> > I don't think we need many such entries.
>
> I'm afraid I don't see the connection between the intended constification
> and what entries there are (or not) in the opcode table. I view it as a
> desirable property of the function in the first place to express its
> behavior (of not altering the input string) by a pointer-to-const
> parameter. In fact I guess I would make such an adjustment a standalone
> (prereq for the larger change) patch.
>

Rescan means that the first scan fails.  Can we add new entries which only
do the second scan?


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-29 16:00                               ` H.J. Lu
@ 2022-09-29 16:06                                 ` Jan Beulich
  2022-09-29 16:20                                   ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2022-09-29 16:06 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 29.09.2022 18:00, H.J. Lu wrote:
> On Thu, Sep 29, 2022 at 1:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.09.2022 21:33, H.J. Lu wrote:
>>> On Wed, Sep 28, 2022 at 5:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> I guess to prove (and going forward guarantee) the apparent behavior of
>>>> parse_insn() I'd like to constify its first parameter. This might
>>>> involve adding a cast (to drop const-ness again after the call), which
>>>> I generally would like to avoid, or some "interesting" pointer
>>>> arithmetic. If you have any opinion here up front, please let me know.
>>>
>>> Can we avoid it by adding some new entries to the opcode table?
>>> I don't think we need many such entries.
>>
>> I'm afraid I don't see the connection between the intended constification
>> and what entries there are (or not) in the opcode table. I view it as a
>> desirable property of the function in the first place to express its
>> behavior (of not altering the input string) by a pointer-to-const
>> parameter. In fact I guess I would make such an adjustment a standalone
>> (prereq for the larger change) patch.
>>
> 
> Rescan means that the first scan fails.  Can we add new entries which only
> do the second scan?

Why would we add such redundant entries? All that could happen is them
going out of sync with their counterparts processable on the 1st pass.
The overall goal has been to reduce redundancy and hence the risk of
inconsistencies.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/7] x86: re-work insn/suffix recognition
  2022-09-29 16:06                                 ` Jan Beulich
@ 2022-09-29 16:20                                   ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2022-09-29 16:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Sep 29, 2022 at 9:06 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.09.2022 18:00, H.J. Lu wrote:
> > On Thu, Sep 29, 2022 at 1:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 28.09.2022 21:33, H.J. Lu wrote:
> >>> On Wed, Sep 28, 2022 at 5:49 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> I guess to prove (and going forward guarantee) the apparent behavior of
> >>>> parse_insn() I'd like to constify its first parameter. This might
> >>>> involve adding a cast (to drop const-ness again after the call), which
> >>>> I generally would like to avoid, or some "interesting" pointer
> >>>> arithmetic. If you have any opinion here up front, please let me know.
> >>>
> >>> Can we avoid it by adding some new entries to the opcode table?
> >>> I don't think we need many such entries.
> >>
> >> I'm afraid I don't see the connection between the intended constification
> >> and what entries there are (or not) in the opcode table. I view it as a
> >> desirable property of the function in the first place to express its
> >> behavior (of not altering the input string) by a pointer-to-const
> >> parameter. In fact I guess I would make such an adjustment a standalone
> >> (prereq for the larger change) patch.
> >>
> >
> > Rescan means that the first scan fails.  Can we add new entries which only
> > do the second scan?
>
> Why would we add such redundant entries? All that could happen is them
> going out of sync with their counterparts processable on the 1st pass.
> The overall goal has been to reduce redundancy and hence the risk of
> inconsistencies.
>

These new entries should be rare and only for existing instructions.  We won't
add more of them.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-09-29 16:21 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16  7:27 [PATCH 0/7] x86: suffix handling changes Jan Beulich
2022-08-16  7:30 ` [PATCH 1/7] x86/Intel: restrict suffix derivation Jan Beulich
2022-08-17 19:19   ` H.J. Lu
2022-08-18  6:07     ` Jan Beulich
2022-08-18 14:46       ` H.J. Lu
2022-08-19  8:19         ` Jan Beulich
2022-08-19 14:23           ` H.J. Lu
2022-08-19 14:49             ` Jan Beulich
2022-08-19 17:00               ` H.J. Lu
2022-08-22  9:34                 ` Jan Beulich
2022-08-22 14:38                   ` H.J. Lu
2022-08-16  7:30 ` [PATCH 2/7] x86: insert "no error" enumerator in i386_error enumeration Jan Beulich
2022-08-17 19:19   ` H.J. Lu
2022-08-16  7:31 ` [PATCH 3/7] x86: move / quiesce pre-386 non-16-bit warning Jan Beulich
2022-08-17 19:21   ` H.J. Lu
2022-08-18  7:21     ` Jan Beulich
2022-08-18 15:30       ` H.J. Lu
2022-08-19  6:13         ` Jan Beulich
2022-08-19 14:18           ` H.J. Lu
2022-08-16  7:32 ` [PATCH 4/7] x86: improve match_template()'s diagnostics Jan Beulich
2022-08-17 20:24   ` H.J. Lu
2022-08-18  6:14     ` Jan Beulich
2022-08-18 14:51       ` H.J. Lu
2022-08-16  7:32 ` [PATCH 5/7] x86: re-work insn/suffix recognition Jan Beulich
2022-08-17 20:29   ` H.J. Lu
2022-08-18  6:24     ` Jan Beulich
2022-08-18 15:14       ` H.J. Lu
2022-08-19  8:28         ` Jan Beulich
2022-08-23  2:00           ` H.J. Lu
2022-08-26  9:26             ` Jan Beulich
2022-08-26 18:46               ` H.J. Lu
2022-09-06  6:40                 ` Jan Beulich
2022-09-06 21:53                   ` H.J. Lu
2022-09-07  7:17                     ` Jan Beulich
2022-09-26 23:52                       ` H.J. Lu
2022-09-28 12:49                         ` Jan Beulich
2022-09-28 19:33                           ` H.J. Lu
2022-09-29  8:08                             ` Jan Beulich
2022-09-29 16:00                               ` H.J. Lu
2022-09-29 16:06                                 ` Jan Beulich
2022-09-29 16:20                                   ` H.J. Lu
2022-08-16  7:33 ` [PATCH 6/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
2022-08-16  7:34 ` [PATCH 7/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
2022-08-17 20:36   ` H.J. Lu
2022-08-18  6:29     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).