public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] x86: suffix handling changes
@ 2022-08-26 10:28 Jan Beulich
  2022-08-26 10:28 ` Jan Beulich
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:28 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

... accompanied by a few other improvements (or so I hope) found
along the way.

1: Intel: restrict suffix derivation
2: improve match_template()'s diagnostics
3: re-work insn/suffix recognition
4: further re-work insn/suffix recognition to also cover MOVSL
5: don't recognize/derive Q suffix in the common case
6: allow HLE store of accumulator to absolute 32-bit address
7: move bad-use-of-TLS-reloc check

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 0/7] x86: suffix handling changes
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
@ 2022-08-26 10:28 ` Jan Beulich
  2022-08-26 10:30 ` [PATCH v2 1/7] x86/Intel: restrict suffix derivation Jan Beulich
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:28 UTC (permalink / raw)
  To: Binutils

... accompanied by a few other improvements (or so I hope) found
along the way.

1: Intel: restrict suffix derivation
2: improve match_template()'s diagnostics
3: re-work insn/suffix recognition
4: further re-work insn/suffix recognition to also cover MOVSL
5: don't recognize/derive Q suffix in the common case
6: allow HLE store of accumulator to absolute 32-bit address
7: move bad-use-of-TLS-reloc check

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/7] x86/Intel: restrict suffix derivation
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
  2022-08-26 10:28 ` Jan Beulich
@ 2022-08-26 10:30 ` Jan Beulich
  2022-08-26 10:30   ` Jan Beulich
  2022-09-13 14:20   ` Ping: " Jan Beulich
  2022-08-26 10:30 ` [PATCH v2 2/7] x86: improve match_template()'s diagnostics Jan Beulich
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:30 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

While in some cases deriving an AT&T-style suffix from an Intel syntax
memory operand size specifier is necessary, in many cases this is not
only pointless, but has led to the introduction of various workarounds:
Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
ToQword attributes. Suppress suffix derivation when we can clearly tell
that the memory operand's size isn't going to be needed to infer the
possible need for the low byte/word opcode bit or an operand size prefix
(0x66 or REX.W).

As a result ToDword and ToQword can be dropped entirely, plus a fair
number of IgnoreSize and NoRex64 can also be got rid of. Note that
IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
16-bit code using SIMD insns isn't well tested, clone an existing
testcase just enough to cover a few insns which are potentially
problematic but are being touched here.)

Note that while folding the VCVT{,T}S{S,D}2SI templates, VCVT{,T}SH2SI
isn't included there. This is to fulfill the request of not allowing L
and Q suffixes there, despite the inconsistency with VCVT{,T}S{S,D}2SI.
---
Long term suffix derivation should be dropped altogether, not the least
such that bogus error messages like "incorrect register `...' used with
`...' suffix" don't misguid people anymore when no suffix was used at
all.
---
v2: Don't cover VCVT{,T}SH2SI with the templatization.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7071,42 +7071,22 @@ process_suffix (void)
 	}
       else if (i.suffix == BYTE_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_bsuf)
-	    i.suffix = 0;
-	  else if (!check_byte_reg ())
+	  if (!check_byte_reg ())
 	    return 0;
 	}
       else if (i.suffix == LONG_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_lsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_long_reg ())
+	  if (!check_long_reg ())
 	    return 0;
 	}
       else if (i.suffix == QWORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_qsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_qword_reg ())
+	  if (!check_qword_reg ())
 	    return 0;
 	}
       else if (i.suffix == WORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_wsuf)
-	    i.suffix = 0;
-	  else if (!check_word_reg ())
+	  if (!check_word_reg ())
 	    return 0;
 	}
       else if (intel_syntax
@@ -7566,20 +7546,9 @@ check_long_reg (void)
 		 || i.tm.operand_types[op].bitfield.instance == Accum)
 	     && i.tm.operand_types[op].bitfield.dword)
       {
-	if (intel_syntax
-	    && i.tm.opcode_modifier.toqword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to QWORD.  We want REX byte. */
-	    i.suffix = QWORD_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7617,20 +7586,9 @@ check_qword_reg (void)
       {
 	/* Prohibit these changes in the 64bit mode, since the
 	   lowering is more complicated.  */
-	if (intel_syntax
-	    && i.tm.opcode_modifier.todword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to DWORD.  We don't want REX byte. */
-	    i.suffix = LONG_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7670,14 +7628,6 @@ check_word_reg (void)
 		i.suffix);
 	return 0;
       }
-    /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */
-    else if (i.types[op].bitfield.qword
-	     && intel_syntax
-	     && i.tm.opcode_modifier.toqword)
-      {
-	  /* Convert to QWORD.  We want EVEX.W byte. */
-	  i.suffix = QWORD_MNEM_SUFFIX;
-      }
   return 1;
 }
 
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -790,9 +790,83 @@ i386_intel_operand (char *operand_string
 	  break;
 	}
 
+      /* Now check whether we actually want to infer an AT&T-like suffix.
+	 We really only need to do this when operand size determination (incl.
+	 REX.W) is going to be derived from it.  For this we check whether the
+	 given suffix is valid for any of the candidate templates.  */
+      if (suffix && suffix != i.suffix
+	  && (current_templates->start->opcode_modifier.opcodespace != SPACE_BASE
+	      || current_templates->start->base_opcode != 0x62 /* bound */))
+	{
+	  const insn_template *t;
+
+	  for (t = current_templates->start; t < current_templates->end; ++t)
+	    {
+	      /* Operands haven't been swapped yet.  */
+	      unsigned int op = t->operands - 1 - this_operand;
+
+	      /* Easy checks to skip templates which won't match anyway.  */
+	      if (this_operand >= t->operands || t->opcode_modifier.attsyntax)
+		continue;
+
+	      switch (suffix)
+		{
+		case BYTE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_bsuf)
+		    continue;
+		  break;
+		case WORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_wsuf)
+		    continue;
+		  break;
+		case LONG_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_lsuf)
+		    continue;
+		  break;
+		case QWORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_qsuf)
+		    continue;
+		  break;
+		case SHORT_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ssuf)
+		    continue;
+		  break;
+		case LONG_DOUBLE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ldsuf)
+		    continue;
+		  break;
+		default:
+		  abort ();
+		}
+
+	      /* In a few cases suffixes are permitted, but we can nevertheless
+		 derive that these aren't going to be needed.  This is only of
+		 interest for insns using ModR/M, plus we can skip templates with
+		 swappable operands here (simplifying subsequent logic).  */
+	      if (!t->opcode_modifier.modrm || t->opcode_modifier.d)
+		break;
+
+	      if (!t->operand_types[op].bitfield.baseindex)
+		continue;
+
+	      switch (t->operand_types[op].bitfield.class)
+		{
+		case RegMMX:
+		case RegSIMD:
+		case RegMask:
+		  continue;
+		}
+
+	      break;
+	    }
+
+	  if (t == current_templates->end)
+	    suffix = 0;
+	}
+
       if (!i.suffix)
 	i.suffix = suffix;
-      else if (i.suffix != suffix)
+      else if (suffix && i.suffix != suffix)
 	{
 	  as_bad (_("conflicting operand size modifiers"));
 	  return 0;
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -169,6 +169,7 @@ if [gas_32_check] then {
     run_dump_test "simd"
     run_dump_test "simd-intel"
     run_dump_test "simd-suffix"
+    run_dump_test "simd16"
     run_dump_test "mem"
     run_dump_test "mem-intel"
     run_dump_test "reg"
--- a/gas/testsuite/gas/i386/simd.s
+++ b/gas/testsuite/gas/i386/simd.s
@@ -1,5 +1,6 @@
 	.text
 _start:
+	.ifndef use16
 	addsubps 0x12345678,%xmm1
 	comisd 0x12345678,%xmm1
 	comiss 0x12345678,%xmm1
@@ -31,6 +32,7 @@ _start:
 	punpcklqdq 0x12345678,%xmm1
 	ucomisd 0x12345678,%xmm1
 	ucomiss 0x12345678,%xmm1
+	.endif
 
 	cmpeqsd (%eax),%xmm0
 	cmpeqss (%eax),%xmm0
@@ -101,6 +103,7 @@ cmpsd	$0x10,(%eax),%xmm7
 
 	.intel_syntax noprefix
 
+	.ifndef use16
 addsubps xmm1,XMMWORD PTR ds:0x12345678
 comisd xmm1,QWORD PTR ds:0x12345678
 comiss xmm1,DWORD PTR ds:0x12345678
@@ -132,6 +135,8 @@ punpcklwd xmm1,XMMWORD PTR ds:0x12345678
 punpcklqdq xmm1,XMMWORD PTR ds:0x12345678
 ucomisd xmm1,QWORD PTR ds:0x12345678
 ucomiss xmm1,DWORD PTR ds:0x12345678
+	.endif
+
 cmpeqsd xmm0,QWORD PTR [eax]
 cmpeqss xmm0,DWORD PTR [eax]
 cvtpi2pd xmm0,QWORD PTR [eax]
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.d
@@ -0,0 +1,137 @@
+#as: --defsym use16=1 -I${srcdir}/$subdir
+#objdump: -dw -Mi8086
+#name: i386 SIMD (16-bit)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 08       	unpckhpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 15 08          	unpckhps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 66 0f 14 08       	unpcklpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 14 08          	unpcklps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 00       	unpckhpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 15 00          	unpckhps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 14 00       	unpcklpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 14 00          	unpcklps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 2c 00          	cvttps2pi \(%eax\),%mm0
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.s
@@ -0,0 +1,2 @@
+	.code16
+	.include "simd.s"
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -706,8 +706,6 @@ static bitfield opcode_modifiers[] =
   BITFIELD (RegKludge),
   BITFIELD (Implicit1stXmm0),
   BITFIELD (PrefixOk),
-  BITFIELD (ToDword),
-  BITFIELD (ToQword),
   BITFIELD (AddrPrefixOpReg),
   BITFIELD (IsPrefix),
   BITFIELD (ImmExt),
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -521,10 +521,6 @@ enum
 #define PrefixHLELock		5 /* Okay with a LOCK prefix.  */
 #define PrefixHLEAny		6 /* Okay with or without a LOCK prefix.  */
   PrefixOk,
-  /* Convert to DWORD */
-  ToDword,
-  /* Convert to QWORD */
-  ToQword,
   /* Address prefix changes register operand */
   AddrPrefixOpReg,
   /* opcode is a prefix */
@@ -740,8 +736,6 @@ typedef struct i386_opcode_modifier
   unsigned int regkludge:1;
   unsigned int implicit1stxmm0:1;
   unsigned int prefixok:3;
-  unsigned int todword:1;
-  unsigned int toqword:1;
   unsigned int addrprefixopreg:1;
   unsigned int isprefix:1;
   unsigned int immext:1;
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -970,18 +970,18 @@ pause, 0xf390, None, Cpu186, No_bSuf|No_
 <mmx:cpu:pfx:attr:shimm:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV=2|VexW0|SSE2AVX:RegXMM:Xmmword, +
     $sse:CpuSSE2:66:::RegXMM:Xmmword, +
-    $mmx:CpuMMX::NoRex64::RegMMX:Qword>
+    $mmx:CpuMMX::::RegMMX:Qword>
 
 <sse2:cpu:attr:scal:vvvv:shimm, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV=2|VexW0|SSE2AVX, +
-    $sse:CpuSSE2::NoRex64::>
+    $sse:CpuSSE2::::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
     b:0:VexW0:Byte:CpuAVX512DQ:66:CpuAVX512VBMI, +
     w:1:VexW1:Word:CpuAVX512F::CpuAVX512BW>
 
 <dq:opc:vexw:vexw64:elem:cpu64:gpr:kpfx, +
-    d:0:VexW0:IgnoreSize:Dword::Reg32:66, +
+    d:0:VexW0::Dword::Reg32:66, +
     q:1:VexW1:VexW1:Qword:Cpu64:Reg64:>
 
 emms, 0xf77, None, CpuMMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -989,7 +989,7 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
 // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
 // spec). AMD's spec, having been in existence for much longer, failed to
 // recognize that and specified movd for 32- and 64-bit operations.
-movd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
+movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
@@ -998,10 +998,10 @@ movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm
 movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
-movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
+movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
+movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
 movq, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
+movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
 movq, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegMMX }
 packssdw<mmx>, 0x<mmx:pfx>0f6b, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 packsswb<mmx>, 0x<mmx:pfx>0f63, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1009,7 +1009,7 @@ packuswb<mmx>, 0x<mmx:pfx>0f67, None, <m
 padd<bw><mmx>, 0x<mmx:pfx>0ffc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddd<mmx>, 0x<mmx:pfx>0ffe, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddq<sse2>, 0x660fd4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 padds<bw><mmx>, 0x<mmx:pfx>0fec | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddus<bw><mmx>, 0x<mmx:pfx>0fdc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 pand<mmx>, 0x<mmx:pfx>0fdb, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1037,25 +1037,25 @@ psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc
 psub<bw><mmx>, 0x<mmx:pfx>0ff8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubd<mmx>, 0x<mmx:pfx>0ffa, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubq<sse2>, 0x660ffb, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psubs<bw><mmx>, 0x<mmx:pfx>0fe8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubus<bw><mmx>, 0x<mmx:pfx>0fd8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhbw<mmx>, 0x<mmx:pfx>0f68, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhwd<mmx>, 0x<mmx:pfx>0f69, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhdq<mmx>, 0x<mmx:pfx>0f6a, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpcklbw<sse2>, 0x660f60, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklbw, 0xf60, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklbw, 0xf60, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpcklwd<sse2>, 0x660f61, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklwd, 0xf61, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklwd, 0xf61, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpckldq<sse2>, 0x660f62, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpckldq, 0xf62, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpckldq, 0xf62, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pxor<mmx>, 0x<mmx:pfx>0fef, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 
 // SSE instructions.
 
 <sse:cpu:attr:scal:vvvv, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
-    $sse:CpuSSE::IgnoreSize:>
+    $sse:CpuSSE:::>
 <frel:imm:comm, eq:0:C, lt:1:, le:2:, unord:3:C, neq:4:C, nlt:5:, nle:6:, ord:7:C>
 
 addps<sse>, 0x0f58, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1067,21 +1067,21 @@ cmp<frel>ss<sse>, 0xf30fc2, <frel:imm>,
 cmpps<sse>, 0x0fc2, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpss<sse>, 0xf30fc2, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comiss<sse>, 0x0f2f, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
-cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
+cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
 cvtsi2ss<sse>, 0xf30f2a, None, <sse:cpu>|CpuNo64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
-cvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf32d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvttss2si, 0xf32c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 divps<sse>, 0x0f5e, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 divss<sse>, 0xf30f5e, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 maskmovq, 0xff7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegMMX }
 maxps<sse>, 0x0f5f, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maxss<sse>, 0xf30f5f, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1089,51 +1089,51 @@ minps<sse>, 0x0f5d, None, <sse:cpu>, Mod
 minss<sse>, 0xf30f5d, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movaps<sse>, 0x0f28, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhlps<sse>, 0x0f12, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhps, 0xf16, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhps, 0xf16, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movlhps<sse>, 0x0f16, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlps, 0xf12, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlps, 0xf12, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskps<sse>, 0x0f50, None, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntps<sse>, 0x0f2b, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
+movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
 movntdq<sse2>, 0x660fe7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
+movss, 0xf310, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
 movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movss, 0xf30f10, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movss, 0xf30f10, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movups<sse>, 0x0f10, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulps<sse>, 0x0f59, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulss<sse>, 0xf30f59, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 orps<sse>, 0x0f56, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pavg<bw><sse2>, 0x660fe0 | (3 * <bw:opc>), None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextrw<sse2>, 0x660fc5, None, <sse2:cpu>, Load|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
 pextrw, 0xfc5, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }
 pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
-pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
+pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
 pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, Reg32|Reg64, RegMMX }
-pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
+pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
 pmaxsw<sse2>, 0x660fee, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmaxub<sse2>, 0x660fde, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminsw<sse2>, 0x660fea, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminub<sse2>, 0x660fda, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmovmskb<sse2>, 0x660fd7, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 pmovmskb, 0xfd7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegMMX, Reg32|Reg64 }
 pmulhuw<sse2>, 0x660fe4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 prefetchnta, 0xf18, 0, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht0, 0xf18, 1, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht1, 0xf18, 2, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht2, 0xf18, 3, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
-psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 rcpps<sse>, 0x0f53, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 rcpss<sse>, 0xf30f53, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 rsqrtps<sse>, 0x0f52, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1142,7 +1142,7 @@ sfence, 0xfaef8, None, CpuSSE|Cpu3dnowA,
 shufps<sse>, 0x0fc6, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtps<sse>, 0x0f51, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtss<sse>, 0xf30f51, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 subps<sse>, 0x0f5c, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 subss<sse>, 0xf30f5c, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 ucomiss<sse>, 0x0f2e, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1161,9 +1161,9 @@ cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>,
 cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegMMX, RegXMM }
-cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
+cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd<sse2>, 0xf20f2a, None, <sse2:cpu>|CpuNo64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
@@ -1176,17 +1176,17 @@ maxsd<sse2>, 0xf20f5f, None, <sse2:cpu>,
 minpd<sse2>, 0x660f5d, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minsd<sse2>, 0xf20f5d, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movapd<sse2>, 0x660f28, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhpd, 0x660f16, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlpd, 0x660f12, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhpd, 0x660f16, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulpd<sse2>, 0x660f59, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulsd<sse2>, 0xf20f59, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1200,21 +1200,21 @@ ucomisd<sse2>, 0x660f2e, None, <sse2:cpu
 unpckhpd<sse2>, 0x660f15, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 unpcklpd<sse2>, 0x660f14, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 xorpd<sse2>, 0x660f57, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtpd2dq<sse2>, 0xf20fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtdq2ps<sse2>, 0x0f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtpd2pi, 0x660f2d, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
 cvtpd2ps<sse2>, 0x660f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtps2dq<sse2>, 0x660f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtsd2si, 0xf22d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 cvttpd2pi, 0x660f2c, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
-cvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf22c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 cvttpd2dq<sse2>, 0x660fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvttps2dq<sse2>, 0xf30f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maskmovdqu<sse2>, 0x660ff7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -1223,7 +1223,7 @@ movdqu<sse2>, 0xf30f6f, None, <sse2:cpu>
 movdq2q, 0xf20fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegMMX }
 movq2dq, 0xf30fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
 pmuludq<sse2>, 0x660ff4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pshufd<sse2>, 0x660f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshufhw<sse2>, 0xf30f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshuflw<sse2>, 0xf20f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1245,7 +1245,7 @@ haddps<sse3>, 0xf20f7c, None, <sse3:cpu>
 hsubpd<sse3>, 0x660f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 hsubps<sse3>, 0xf20f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 lddqu<sse3>, 0xf20ff0, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
-movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movshdup<sse3>, 0xf30f16, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movsldup<sse3>, 0xf30f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
@@ -1276,17 +1276,17 @@ mwait, 0xf01c9, None, CpuSSE3, CheckRegS
 // VMX instructions.
 
 vmcall, 0xf01c1, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmclear, 0x660fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmclear, 0x660fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmlaunch, 0xf01c2, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 vmresume, 0xf01c3, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmptrld, 0xfc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
-vmptrst, 0xfc7, 7, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmptrld, 0xfc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
+vmptrst, 0xfc7, 7, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Reg32|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64, Reg64|Qword|Unspecified|BaseIndex }
 vmwrite, 0xf79, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32 }
 vmwrite, 0xf79, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64|Qword|Unspecified|BaseIndex, Reg64 }
 vmxoff, 0xf01c4, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmxon, 0xf30fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmxon, 0xf30fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 
 // VMFUNC instruction
 
@@ -1313,7 +1313,7 @@ invpcid, 0x660f3882, None, CpuINVPCID|Cp
 <ssse3:cpu:pfx:attr:vvvv:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexW0|SSE2AVX:VexVVVV:RegXMM:Xmmword, +
     $sse:CpuSSSE3:66:::RegXMM:Xmmword, +
-    $mmx:CpuSSSE3::NoRex64::RegMMX:Qword>
+    $mmx:CpuSSSE3::::RegMMX:Qword>
 
 phaddw<ssse3>, 0x<ssse3:pfx>0f3801, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
 phaddd<ssse3>, 0x<ssse3:pfx>0f3802, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
@@ -1333,7 +1333,7 @@ pabsd<ssse3>, 0x<ssse3:pfx>0f381e, None,
 // SSE4.1 instructions.
 
 <sse41:cpu:attr:scal:vvvv, $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, $sse:CpuSSE4_1:::>
-<sd:ppfx:spfx:opc:vexw:elem:scal, s::f3:0:VexW0:Dword:IgnoreSize, d:66:f2:1:VexW1:Qword:NoRex64>
+<sd:ppfx:spfx:opc:vexw:elem, s::f3:0:VexW0:Dword, d:66:f2:1:VexW1:Qword>
 
 blendp<sd><sse41>, 0x660f3a0c | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=1|VexSources=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1341,11 +1341,11 @@ blendvp<sd>, 0x664a | <sd:opc>, None, Cp
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 dpp<sd><sse41>, 0x660f3a40 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64 }
 extractps, 0x660f3a17, None, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x660f3a17, None, CpuSSE4_1|Cpu64, RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, RegXMM, Reg64 }
-insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|IgnoreSize|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movntdqa<sse41>, 0x660f382a, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
 mpsadbw<sse41>, 0x660f3a42, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 packusdw<sse41>, 0x660f382b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1356,7 +1356,7 @@ pblendvb, 0x660f3810, None, CpuSSE4_1, M
 pblendw<sse41>, 0x660f3a0e, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpeqq<sse41>, 0x660f3829, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, RegMem|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
-pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
+pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
 pextrd<sse41>, 0x660f3a16, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, Reg32|Unspecified|BaseIndex }
 pextrq, 0x6616, None, CpuAVX|Cpu64, Modrm|Vex|Space0F3A|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
 pextrq, 0x660f3a16, None, CpuSSE4_1|Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
@@ -1374,23 +1374,23 @@ pminsb<sse41>, 0x660f3838, None, <sse41:
 pminsd<sse41>, 0x660f3839, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminud<sse41>, 0x660f383b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminuw<sse41>, 0x660f383a, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 pmuldq<sse41>, 0x660f3828, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pmulld<sse41>, 0x660f3840, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 ptest<sse41>, 0x660f3817, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 roundp<sd><sse41>, 0x660f3a08 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<sd:scal>, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
+rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 // SSE4.2 instructions.
 
@@ -1484,8 +1484,8 @@ vandp<sd>, 0x<sd:ppfx>54, None, CpuAVX,
 vblendp<sd>, 0x660c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vblendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexW0|VexSources=2|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vbroadcastf128, 0x661a, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
-vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
-vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex256|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
+vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex128|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vcmp<frel>p<sd>, 0x<sd:ppfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 vcmp<frel>s<sd>, 0x<sd:spfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcmpp<sd>, 0x<sd:ppfx>c2, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1499,22 +1499,20 @@ vcvtpd2ps<xy>, 0x665a, None, CpuAVX, Mod
 vcvtps2dq, 0x665b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex256|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
-vcvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvtsd2ss, 0xf25a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtss2sd, 0xf35a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vcvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX, Modrm|<xy:vex>|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:syntax>, { <xy:dst>, RegXMM }
 vcvttps2dq, 0xf35b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vcvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-vcvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vdivp<sd>, 0x<sd:ppfx>5e, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vdivs<sd>, 0x<sd:spfx>5e, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdppd, 0x6641, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdpps, 0x6640, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vextractf128, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
-vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 vhaddpd, 0x667c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vhaddps, 0xf27c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1523,7 +1521,7 @@ vhsubps, 0xf27d, None, CpuAVX, Modrm|Vex
 vinsertf128, 0x6618, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
 vinsertps, 0x6621, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vlddqu, 0xf2f0, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vmaskmovdqu, 0x66f7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 vmaskmovp<sd>, 0x662e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmaskmovp<sd>, 0x662c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1537,26 +1535,26 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX,
 // by Intel AVX spec).  To avoid extra template in gcc x86 backend and
 // support assembler for AMD64, we accept 64bit operand on vmovd so
 // that we can use one template for both SSE and AVX instructions.
-vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 vmovd, 0x667e, None, CpuAVX|Cpu64, D|RegMem|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { RegXMM, Reg64 }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
 vmovdqa, 0x666f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovhlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovlhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovmskp<sd>, 0x<sd:ppfx>50, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { RegXMM|RegYMM, Reg32|Reg64 }
 vmovntdq, 0x66e7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovntdqa, 0x662a, None, CpuAVX|CpuAVX2, Modrm|Vex|Space0F38|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vmovntp<sd>, 0x<sd:ppfx>2b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
-vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovshdup, 0xf316, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovsldup, 0xf312, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1692,7 +1690,7 @@ vrsqrtss, 0xf352, None, CpuAVX, Modrm|Ve
 vshufp<sd>, 0x<sd:ppfx>c6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrtp<sd>, 0x<sd:ppfx>51, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrts<sd>, 0x<sd:spfx>51, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vsubp<sd>, 0x<sd:ppfx>5c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsubs<sd>, 0x<sd:spfx>5c, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vtestp<sd>, 0x660e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1889,8 +1887,8 @@ vpshl<xop>, 0x94 | <xop:opc>, None, CpuX
 
 llwpcb, 0x12, 0, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
 slwpcb, 0x12, 1, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
-lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
-lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // BMI instructions
 
@@ -1918,30 +1916,30 @@ tzmsk, 0x01, 4, CpuTBM, Modrm|CheckRegSi
 prefetch, 0xf0d, 0, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetchw, 0xf0d, 1, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 femms, 0xf0e, None, Cpu3dnow, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 
 // AMD extensions.
 syscall, 0xf05, None, CpuSYSCALL, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -1967,8 +1965,8 @@ vmsave, 0xf01db, None, CpuSVME, AddrPref
 
 
 // SSE4a instructions
-movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-movntss, 0xf30f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
+movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+movntss, 0xf30f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
 extrq, 0x660f78, 0, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Imm8, RegXMM }
 extrq, 0x660f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 insertq, 0xf20f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -2166,8 +2164,8 @@ vcvtps2pd, 0x5A, None, CpuAVX512F, Modrm
 
 vcvtps2ph, 0x661D, None, CpuAVX512F, Modrm|EVex512|MaskingMorZ|Space0F3A|VexW0|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vcvtsd2si, 0xF22D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsd2usi, 0xF279, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sdh>2usi, 0x<sdh:spfx>79, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtsd2ss, 0xF25A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
@@ -2187,20 +2185,14 @@ vcvtusi2ss, 0xF37B, None, CpuAVX512F, Mo
 
 vcvtss2sd, 0xF35A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW0|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtss2si, 0xF32D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtss2usi, 0xF379, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 vcvttpd2udq<xy>, 0x78, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 
 vcvttps2dq, 0xF35B, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttps2udq, 0x78, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsd2si, 0xF22C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsd2usi, 0xF278, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-
-vcvttss2si, 0xF32C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttss2usi, 0xF378, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sdh>2usi, 0x<sdh:spfx>78, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtudq2ps, 0xF27A, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
@@ -2216,7 +2208,7 @@ vextracti32x4, 0x6639, None, CpuAVX512F,
 vextractf64x4, 0x661B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti64x4, 0x663B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX512F|Cpu64, RegMem|EVex128|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 
 vfixupimmp<sd>, 0x6654, None, CpuAVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
@@ -2274,7 +2266,7 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX5
 vmovntp<sd>, 0x<sd:ppfx>2B, None, CpuAVX512F, Modrm|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovup<sd>, 0x<sd:ppfx>10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 
 vmovddup, 0xF212, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM }
 
@@ -2287,16 +2279,16 @@ vmovdqu64, 0xF36F, None, CpuAVX512F, D|M
 vmovhlps, 0x12, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovlhps, 0x16, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 
-vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex=2|Space0F|VexW=2|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex128|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
 vmovq, 0xF37E, None, CpuAVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66D6, None, CpuAVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 
-vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
+vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
 vmovshdup, 0xF316, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
@@ -2596,7 +2588,7 @@ kadd<dq>, 0x<dq:kpfx>4a, None, CpuAVX512
 kand<dq>, 0x<dq:kpfx>41, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
 kmov<dq>, 0xf292, None, CpuAVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
@@ -2985,13 +2977,13 @@ incsspq, 0xf30fae, 5, CpuSHSTK|Cpu64, Mo
 rdsspd, 0xf30f1e, 1, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32 }
 rdsspq, 0xf30f1e, 1, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 endbr32, 0xf30f1efb, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 
@@ -3230,8 +3222,7 @@ vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16
 vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap5|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegYMM }
@@ -3256,8 +3247,7 @@ vcvtph2psx, 0x6613, None, CpuAVX512_FP16
 vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vfpclassph<xyz>, 0x66, None, CpuAVX512_FP16|<xyz:vl>, Modrm|<xyz:attr>|Masking=2|Space0F3A|VexW0|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xyz:att>, { Imm8, <xyz:src>|Word, RegMask }
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/7] x86/Intel: restrict suffix derivation
  2022-08-26 10:30 ` [PATCH v2 1/7] x86/Intel: restrict suffix derivation Jan Beulich
@ 2022-08-26 10:30   ` Jan Beulich
  2022-09-13 14:20   ` Ping: " Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:30 UTC (permalink / raw)
  To: Binutils

While in some cases deriving an AT&T-style suffix from an Intel syntax
memory operand size specifier is necessary, in many cases this is not
only pointless, but has led to the introduction of various workarounds:
Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
ToQword attributes. Suppress suffix derivation when we can clearly tell
that the memory operand's size isn't going to be needed to infer the
possible need for the low byte/word opcode bit or an operand size prefix
(0x66 or REX.W).

As a result ToDword and ToQword can be dropped entirely, plus a fair
number of IgnoreSize and NoRex64 can also be got rid of. Note that
IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
16-bit code using SIMD insns isn't well tested, clone an existing
testcase just enough to cover a few insns which are potentially
problematic but are being touched here.)

Note that while folding the VCVT{,T}S{S,D}2SI templates, VCVT{,T}SH2SI
isn't included there. This is to fulfill the request of not allowing L
and Q suffixes there, despite the inconsistency with VCVT{,T}S{S,D}2SI.
---
Long term suffix derivation should be dropped altogether, not the least
such that bogus error messages like "incorrect register `...' used with
`...' suffix" don't misguid people anymore when no suffix was used at
all.
---
v2: Don't cover VCVT{,T}SH2SI with the templatization.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7071,42 +7071,22 @@ process_suffix (void)
 	}
       else if (i.suffix == BYTE_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_bsuf)
-	    i.suffix = 0;
-	  else if (!check_byte_reg ())
+	  if (!check_byte_reg ())
 	    return 0;
 	}
       else if (i.suffix == LONG_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_lsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_long_reg ())
+	  if (!check_long_reg ())
 	    return 0;
 	}
       else if (i.suffix == QWORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_qsuf
-	      && !i.tm.opcode_modifier.todword
-	      && !i.tm.opcode_modifier.toqword)
-	    i.suffix = 0;
-	  else if (!check_qword_reg ())
+	  if (!check_qword_reg ())
 	    return 0;
 	}
       else if (i.suffix == WORD_MNEM_SUFFIX)
 	{
-	  if (intel_syntax
-	      && i.tm.opcode_modifier.mnemonicsize == IGNORESIZE
-	      && i.tm.opcode_modifier.no_wsuf)
-	    i.suffix = 0;
-	  else if (!check_word_reg ())
+	  if (!check_word_reg ())
 	    return 0;
 	}
       else if (intel_syntax
@@ -7566,20 +7546,9 @@ check_long_reg (void)
 		 || i.tm.operand_types[op].bitfield.instance == Accum)
 	     && i.tm.operand_types[op].bitfield.dword)
       {
-	if (intel_syntax
-	    && i.tm.opcode_modifier.toqword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to QWORD.  We want REX byte. */
-	    i.suffix = QWORD_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7617,20 +7586,9 @@ check_qword_reg (void)
       {
 	/* Prohibit these changes in the 64bit mode, since the
 	   lowering is more complicated.  */
-	if (intel_syntax
-	    && i.tm.opcode_modifier.todword
-	    && i.types[0].bitfield.class != RegSIMD)
-	  {
-	    /* Convert to DWORD.  We don't want REX byte. */
-	    i.suffix = LONG_MNEM_SUFFIX;
-	  }
-	else
-	  {
-	    as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
-		    register_prefix, i.op[op].regs->reg_name,
-		    i.suffix);
-	    return 0;
-	  }
+	as_bad (_("incorrect register `%s%s' used with `%c' suffix"),
+		register_prefix, i.op[op].regs->reg_name, i.suffix);
+	return 0;
       }
   return 1;
 }
@@ -7670,14 +7628,6 @@ check_word_reg (void)
 		i.suffix);
 	return 0;
       }
-    /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */
-    else if (i.types[op].bitfield.qword
-	     && intel_syntax
-	     && i.tm.opcode_modifier.toqword)
-      {
-	  /* Convert to QWORD.  We want EVEX.W byte. */
-	  i.suffix = QWORD_MNEM_SUFFIX;
-      }
   return 1;
 }
 
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -790,9 +790,83 @@ i386_intel_operand (char *operand_string
 	  break;
 	}
 
+      /* Now check whether we actually want to infer an AT&T-like suffix.
+	 We really only need to do this when operand size determination (incl.
+	 REX.W) is going to be derived from it.  For this we check whether the
+	 given suffix is valid for any of the candidate templates.  */
+      if (suffix && suffix != i.suffix
+	  && (current_templates->start->opcode_modifier.opcodespace != SPACE_BASE
+	      || current_templates->start->base_opcode != 0x62 /* bound */))
+	{
+	  const insn_template *t;
+
+	  for (t = current_templates->start; t < current_templates->end; ++t)
+	    {
+	      /* Operands haven't been swapped yet.  */
+	      unsigned int op = t->operands - 1 - this_operand;
+
+	      /* Easy checks to skip templates which won't match anyway.  */
+	      if (this_operand >= t->operands || t->opcode_modifier.attsyntax)
+		continue;
+
+	      switch (suffix)
+		{
+		case BYTE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_bsuf)
+		    continue;
+		  break;
+		case WORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_wsuf)
+		    continue;
+		  break;
+		case LONG_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_lsuf)
+		    continue;
+		  break;
+		case QWORD_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_qsuf)
+		    continue;
+		  break;
+		case SHORT_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ssuf)
+		    continue;
+		  break;
+		case LONG_DOUBLE_MNEM_SUFFIX:
+		  if (t->opcode_modifier.no_ldsuf)
+		    continue;
+		  break;
+		default:
+		  abort ();
+		}
+
+	      /* In a few cases suffixes are permitted, but we can nevertheless
+		 derive that these aren't going to be needed.  This is only of
+		 interest for insns using ModR/M, plus we can skip templates with
+		 swappable operands here (simplifying subsequent logic).  */
+	      if (!t->opcode_modifier.modrm || t->opcode_modifier.d)
+		break;
+
+	      if (!t->operand_types[op].bitfield.baseindex)
+		continue;
+
+	      switch (t->operand_types[op].bitfield.class)
+		{
+		case RegMMX:
+		case RegSIMD:
+		case RegMask:
+		  continue;
+		}
+
+	      break;
+	    }
+
+	  if (t == current_templates->end)
+	    suffix = 0;
+	}
+
       if (!i.suffix)
 	i.suffix = suffix;
-      else if (i.suffix != suffix)
+      else if (suffix && i.suffix != suffix)
 	{
 	  as_bad (_("conflicting operand size modifiers"));
 	  return 0;
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -169,6 +169,7 @@ if [gas_32_check] then {
     run_dump_test "simd"
     run_dump_test "simd-intel"
     run_dump_test "simd-suffix"
+    run_dump_test "simd16"
     run_dump_test "mem"
     run_dump_test "mem-intel"
     run_dump_test "reg"
--- a/gas/testsuite/gas/i386/simd.s
+++ b/gas/testsuite/gas/i386/simd.s
@@ -1,5 +1,6 @@
 	.text
 _start:
+	.ifndef use16
 	addsubps 0x12345678,%xmm1
 	comisd 0x12345678,%xmm1
 	comiss 0x12345678,%xmm1
@@ -31,6 +32,7 @@ _start:
 	punpcklqdq 0x12345678,%xmm1
 	ucomisd 0x12345678,%xmm1
 	ucomiss 0x12345678,%xmm1
+	.endif
 
 	cmpeqsd (%eax),%xmm0
 	cmpeqss (%eax),%xmm0
@@ -101,6 +103,7 @@ cmpsd	$0x10,(%eax),%xmm7
 
 	.intel_syntax noprefix
 
+	.ifndef use16
 addsubps xmm1,XMMWORD PTR ds:0x12345678
 comisd xmm1,QWORD PTR ds:0x12345678
 comiss xmm1,DWORD PTR ds:0x12345678
@@ -132,6 +135,8 @@ punpcklwd xmm1,XMMWORD PTR ds:0x12345678
 punpcklqdq xmm1,XMMWORD PTR ds:0x12345678
 ucomisd xmm1,QWORD PTR ds:0x12345678
 ucomiss xmm1,DWORD PTR ds:0x12345678
+	.endif
+
 cmpeqsd xmm0,QWORD PTR [eax]
 cmpeqss xmm0,DWORD PTR [eax]
 cvtpi2pd xmm0,QWORD PTR [eax]
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.d
@@ -0,0 +1,137 @@
+#as: --defsym use16=1 -I${srcdir}/$subdir
+#objdump: -dw -Mi8086
+#name: i386 SIMD (16-bit)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 08       	unpckhpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 15 08          	unpckhps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 66 0f 14 08       	unpcklpd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 14 08          	unpcklps \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f c2 00 00    	cmpeqsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f c2 00 00    	cmpeqss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 2a 00       	cvtpi2pd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2a 00          	cvtpi2ps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 2d 00          	cvtps2pi \(%eax\),%mm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2d 00       	cvtsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 2c 00       	cvttsd2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5a 00       	cvtsd2ss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5a 00       	cvtss2sd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 2d 00       	cvtss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f3 0f 2c 00       	cvttss2si \(%eax\),%eax
+[ 	]*[a-f0-9]+:	67 f2 0f 5e 00       	divsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5e 00       	divss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5f 00       	maxsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5f 00       	maxss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5d 00       	minss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 2b 00       	movntsd %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 2b 00       	movntss %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 10 00       	movsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 11 00       	movsd  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f3 0f 10 00       	movss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 11 00       	movss  %xmm0,\(%eax\)
+[ 	]*[a-f0-9]+:	67 f2 0f 59 00       	mulsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 59 00       	mulss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 53 00       	rcpss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0b 00 00 	roundsd \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 0a 00 00 	roundss \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 52 00       	rsqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 51 00       	sqrtsd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 51 00       	sqrtss \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f2 0f 5c 00       	subsd  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 f3 0f 5c 00       	subss  \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 20 00    	pmovsxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 21 00    	pmovsxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 22 00    	pmovsxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 23 00    	pmovsxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 24 00    	pmovsxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 25 00    	pmovsxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 30 00    	pmovzxbw \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 31 00    	pmovzxbd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 32 00    	pmovzxbq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 33 00    	pmovzxwd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 34 00    	pmovzxwq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 38 35 00    	pmovzxdq \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 3a 21 00 00 	insertps \$0x0,\(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 15 00       	unpckhpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 15 00          	unpckhps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 66 0f 14 00       	unpcklpd \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	67 0f 14 00          	unpcklps \(%eax\),%xmm0
+[ 	]*[a-f0-9]+:	f3 0f c2 f7 10       	cmpss  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f3 0f c2 38 10    	cmpss  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f2 0f c2 f7 10       	cmpsd  \$0x10,%xmm7,%xmm6
+[ 	]*[a-f0-9]+:	67 f2 0f c2 38 10    	cmpsd  \$0x10,\(%eax\),%xmm7
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	f3 0f 2a c8          	cvtsi2ss %eax,%xmm1
+[ 	]*[a-f0-9]+:	f2 0f 2a c8          	cvtsi2sd %eax,%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f3 0f 2a 08       	cvtsi2ss \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 f2 0f 2a 08       	cvtsi2sd \(%eax\),%xmm1
+[ 	]*[a-f0-9]+:	67 0f 2c 00          	cvttps2pi \(%eax\),%mm0
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/simd16.s
@@ -0,0 +1,2 @@
+	.code16
+	.include "simd.s"
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -706,8 +706,6 @@ static bitfield opcode_modifiers[] =
   BITFIELD (RegKludge),
   BITFIELD (Implicit1stXmm0),
   BITFIELD (PrefixOk),
-  BITFIELD (ToDword),
-  BITFIELD (ToQword),
   BITFIELD (AddrPrefixOpReg),
   BITFIELD (IsPrefix),
   BITFIELD (ImmExt),
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -521,10 +521,6 @@ enum
 #define PrefixHLELock		5 /* Okay with a LOCK prefix.  */
 #define PrefixHLEAny		6 /* Okay with or without a LOCK prefix.  */
   PrefixOk,
-  /* Convert to DWORD */
-  ToDword,
-  /* Convert to QWORD */
-  ToQword,
   /* Address prefix changes register operand */
   AddrPrefixOpReg,
   /* opcode is a prefix */
@@ -740,8 +736,6 @@ typedef struct i386_opcode_modifier
   unsigned int regkludge:1;
   unsigned int implicit1stxmm0:1;
   unsigned int prefixok:3;
-  unsigned int todword:1;
-  unsigned int toqword:1;
   unsigned int addrprefixopreg:1;
   unsigned int isprefix:1;
   unsigned int immext:1;
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -970,18 +970,18 @@ pause, 0xf390, None, Cpu186, No_bSuf|No_
 <mmx:cpu:pfx:attr:shimm:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:Vex128|VexVVVV=2|VexW0|SSE2AVX:RegXMM:Xmmword, +
     $sse:CpuSSE2:66:::RegXMM:Xmmword, +
-    $mmx:CpuMMX::NoRex64::RegMMX:Qword>
+    $mmx:CpuMMX::::RegMMX:Qword>
 
 <sse2:cpu:attr:scal:vvvv:shimm, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV:Vex128|VexVVVV=2|VexW0|SSE2AVX, +
-    $sse:CpuSSE2::NoRex64::>
+    $sse:CpuSSE2::::>
 
 <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, +
     b:0:VexW0:Byte:CpuAVX512DQ:66:CpuAVX512VBMI, +
     w:1:VexW1:Word:CpuAVX512F::CpuAVX512BW>
 
 <dq:opc:vexw:vexw64:elem:cpu64:gpr:kpfx, +
-    d:0:VexW0:IgnoreSize:Dword::Reg32:66, +
+    d:0:VexW0::Dword::Reg32:66, +
     q:1:VexW1:VexW1:Qword:Cpu64:Reg64:>
 
 emms, 0xf77, None, CpuMMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -989,7 +989,7 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
 // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
 // spec). AMD's spec, having been in existence for much longer, failed to
 // recognize that and specified movd for 32- and 64-bit operations.
-movd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
+movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
@@ -998,10 +998,10 @@ movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm
 movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
-movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
+movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
+movq, 0x660fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Unspecified|Qword|BaseIndex|RegXMM }
 movq, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
+movq, 0xf6f, None, CpuMMX, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegMMX, RegMMX }
 movq, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegMMX }
 packssdw<mmx>, 0x<mmx:pfx>0f6b, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 packsswb<mmx>, 0x<mmx:pfx>0f63, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1009,7 +1009,7 @@ packuswb<mmx>, 0x<mmx:pfx>0f67, None, <m
 padd<bw><mmx>, 0x<mmx:pfx>0ffc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddd<mmx>, 0x<mmx:pfx>0ffe, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddq<sse2>, 0x660fd4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+paddq, 0xfd4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 padds<bw><mmx>, 0x<mmx:pfx>0fec | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 paddus<bw><mmx>, 0x<mmx:pfx>0fdc | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 pand<mmx>, 0x<mmx:pfx>0fdb, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
@@ -1037,25 +1037,25 @@ psrl<dq><mmx>, 0x<mmx:pfx>0f72 | <dq:opc
 psub<bw><mmx>, 0x<mmx:pfx>0ff8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubd<mmx>, 0x<mmx:pfx>0ffa, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubq<sse2>, 0x660ffb, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psubq, 0xffb, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psubs<bw><mmx>, 0x<mmx:pfx>0fe8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 psubus<bw><mmx>, 0x<mmx:pfx>0fd8 | <bw:opc>, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhbw<mmx>, 0x<mmx:pfx>0f68, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhwd<mmx>, 0x<mmx:pfx>0f69, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpckhdq<mmx>, 0x<mmx:pfx>0f6a, None, <mmx:cpu>, Modrm|<mmx:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 punpcklbw<sse2>, 0x660f60, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklbw, 0xf60, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklbw, 0xf60, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpcklwd<sse2>, 0x660f61, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpcklwd, 0xf61, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpcklwd, 0xf61, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 punpckldq<sse2>, 0x660f62, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-punpckldq, 0xf62, None, CpuMMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
+punpckldq, 0xf62, None, CpuMMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pxor<mmx>, 0x<mmx:pfx>0fef, None, <mmx:cpu>, Modrm|<mmx:attr>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex, <mmx:reg> }
 
 // SSE instructions.
 
 <sse:cpu:attr:scal:vvvv, +
     $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
-    $sse:CpuSSE::IgnoreSize:>
+    $sse:CpuSSE:::>
 <frel:imm:comm, eq:0:C, lt:1:, le:2:, unord:3:C, neq:4:C, nlt:5:, nle:6:, ord:7:C>
 
 addps<sse>, 0x0f58, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1067,21 +1067,21 @@ cmp<frel>ss<sse>, 0xf30fc2, <frel:imm>,
 cmpps<sse>, 0x0fc2, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpss<sse>, 0xf30fc2, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comiss<sse>, 0x0f2f, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
-cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvtpi2ps, 0xf2a, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegXMM }
+cvtps2pi, 0xf2d, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
 cvtsi2ss<sse>, 0xf30f2a, None, <sse:cpu>|CpuNo64, Modrm|<sse:scal>|<sse:vvvv>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf32a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2ss, 0xf30f2a, None, CpuSSE|Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
-cvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
-cvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf32d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtss2si, 0xf30f2d, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttps2pi, 0xf2c, None, CpuSSE, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegMMX }
+cvttss2si, 0xf32c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttss2si, 0xf30f2c, None, CpuSSE, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 divps<sse>, 0x0f5e, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 divss<sse>, 0xf30f5e, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+ldmxcsr<sse>, 0x0fae, 2, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 maskmovq, 0xff7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegMMX }
 maxps<sse>, 0x0f5f, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maxss<sse>, 0xf30f5f, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1089,51 +1089,51 @@ minps<sse>, 0x0f5d, None, <sse:cpu>, Mod
 minss<sse>, 0xf30f5d, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movaps<sse>, 0x0f28, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhlps<sse>, 0x0f12, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhps, 0xf16, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhps, 0x17, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhps, 0xf16, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movlhps<sse>, 0x0f16, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlps, 0xf12, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlps, 0x13, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlps, 0xf12, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskps<sse>, 0x0f50, None, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntps<sse>, 0x0f2b, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
+movntq, 0xfe7, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, Qword|Unspecified|BaseIndex }
 movntdq<sse2>, 0x660fe7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
+movss, 0xf310, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Dword|Unspecified|BaseIndex, RegXMM }
 movss, 0xf310, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movss, 0xf30f10, None, CpuSSE, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movss, 0xf30f10, None, CpuSSE, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movups<sse>, 0x0f10, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulps<sse>, 0x0f59, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulss<sse>, 0xf30f59, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 orps<sse>, 0x0f56, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavg<bw>, 0xfe0 | (3 * <bw:opc>), None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pavg<bw><sse2>, 0x660fe0 | (3 * <bw:opc>), None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextrw<sse2>, 0x660fc5, None, <sse2:cpu>, Load|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
 pextrw, 0xfc5, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }
 pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, Reg32|Reg64, RegXMM }
-pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
+pinsrw<sse2>, 0x660fc4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegXMM }
 pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { Imm8, Reg32|Reg64, RegMMX }
-pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
+pinsrw, 0xfc4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|Unspecified|BaseIndex, RegMMX }
 pmaxsw<sse2>, 0x660fee, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxsw, 0xfee, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmaxub<sse2>, 0x660fde, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmaxub, 0xfde, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminsw<sse2>, 0x660fea, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminsw, 0xfea, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pminub<sse2>, 0x660fda, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pminub, 0xfda, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pmovmskb<sse2>, 0x660fd7, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 pmovmskb, 0xfd7, None, CpuSSE|Cpu3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegMMX, Reg32|Reg64 }
 pmulhuw<sse2>, 0x660fe4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhuw, 0xfe4, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 prefetchnta, 0xf18, 0, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht0, 0xf18, 1, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht1, 0xf18, 2, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetcht2, 0xf18, 3, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
-psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 rcpps<sse>, 0x0f53, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 rcpss<sse>, 0xf30f53, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 rsqrtps<sse>, 0x0f52, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1142,7 +1142,7 @@ sfence, 0xfaef8, None, CpuSSE|Cpu3dnowA,
 shufps<sse>, 0x0fc6, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtps<sse>, 0x0f51, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sqrtss<sse>, 0xf30f51, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+stmxcsr<sse>, 0x0fae, 3, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 subps<sse>, 0x0f5c, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 subss<sse>, 0xf30f5c, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 ucomiss<sse>, 0x0f2e, None, <sse:cpu>, Modrm|<sse:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1161,9 +1161,9 @@ cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>,
 cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { RegMMX, RegXMM }
-cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
+cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd<sse2>, 0xf20f2a, None, <sse2:cpu>|CpuNo64, Modrm|IgnoreSize|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
 cvtsi2sd, 0xf22a, None, CpuAVX|Cpu64, Modrm|Vex=3|Space0F|VexVVVV=1|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM }
@@ -1176,17 +1176,17 @@ maxsd<sse2>, 0xf20f5f, None, <sse2:cpu>,
 minpd<sse2>, 0x660f5d, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minsd<sse2>, 0xf20f5d, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movapd<sse2>, 0x660f28, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movhpd, 0x660f16, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
-movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
-movlpd, 0x660f12, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movhpd, 0x660f16, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6612, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movlpd, 0x6613, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
+movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
-movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulpd<sse2>, 0x660f59, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 mulsd<sse2>, 0xf20f59, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1200,21 +1200,21 @@ ucomisd<sse2>, 0x660f2e, None, <sse2:cpu
 unpckhpd<sse2>, 0x660f15, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 unpcklpd<sse2>, 0x660f14, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 xorpd<sse2>, 0x660f57, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtdq2pd<sse2>, 0xf30fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtpd2dq<sse2>, 0xf20fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtdq2ps<sse2>, 0x0f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvtpd2pi, 0x660f2d, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
 cvtpd2ps<sse2>, 0x660f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtps2pd<sse2>, 0x0f5a, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtps2dq<sse2>, 0x660f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-cvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtsd2si, 0xf22d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2si, 0xf20f2d, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvtsd2ss<sse2>, 0xf20f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cvtss2sd<sse2>, 0xf30f5a, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 cvttpd2pi, 0x660f2c, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegMMX }
-cvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf22c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 cvttpd2dq<sse2>, 0x660fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvttps2dq<sse2>, 0xf30f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maskmovdqu<sse2>, 0x660ff7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -1223,7 +1223,7 @@ movdqu<sse2>, 0xf30f6f, None, <sse2:cpu>
 movdq2q, 0xf20fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegMMX }
 movq2dq, 0xf30fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
 pmuludq<sse2>, 0x660ff4, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmuludq, 0xff4, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 pshufd<sse2>, 0x660f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshufhw<sse2>, 0xf30f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pshuflw<sse2>, 0xf20f70, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1245,7 +1245,7 @@ haddps<sse3>, 0xf20f7c, None, <sse3:cpu>
 hsubpd<sse3>, 0x660f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 hsubps<sse3>, 0xf20f7d, None, <sse3:cpu>, Modrm|<sse3:attr>|<sse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 lddqu<sse3>, 0xf20ff0, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
-movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movddup<sse3>, 0xf20f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movshdup<sse3>, 0xf30f16, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movsldup<sse3>, 0xf30f12, None, <sse3:cpu>, Modrm|<sse3:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
@@ -1276,17 +1276,17 @@ mwait, 0xf01c9, None, CpuSSE3, CheckRegS
 // VMX instructions.
 
 vmcall, 0xf01c1, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmclear, 0x660fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmclear, 0x660fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmlaunch, 0xf01c2, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 vmresume, 0xf01c3, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmptrld, 0xfc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
-vmptrst, 0xfc7, 7, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmptrld, 0xfc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
+vmptrst, 0xfc7, 7, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Reg32|Unspecified|BaseIndex }
 vmread, 0xf78, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64, Reg64|Qword|Unspecified|BaseIndex }
 vmwrite, 0xf79, None, CpuVMX|CpuNo64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32 }
 vmwrite, 0xf79, None, CpuVMX|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Reg64|Qword|Unspecified|BaseIndex, Reg64 }
 vmxoff, 0xf01c4, None, CpuVMX, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-vmxon, 0xf30fc7, 6, CpuVMX, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+vmxon, 0xf30fc7, 6, CpuVMX, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 
 // VMFUNC instruction
 
@@ -1313,7 +1313,7 @@ invpcid, 0x660f3882, None, CpuINVPCID|Cp
 <ssse3:cpu:pfx:attr:vvvv:reg:mem, +
     $avx:CpuAVX:66:Vex128|VexW0|SSE2AVX:VexVVVV:RegXMM:Xmmword, +
     $sse:CpuSSSE3:66:::RegXMM:Xmmword, +
-    $mmx:CpuSSSE3::NoRex64::RegMMX:Qword>
+    $mmx:CpuSSSE3::::RegMMX:Qword>
 
 phaddw<ssse3>, 0x<ssse3:pfx>0f3801, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
 phaddd<ssse3>, 0x<ssse3:pfx>0f3802, None, <ssse3:cpu>, Modrm|<ssse3:attr>|<ssse3:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <ssse3:reg>|<ssse3:mem>|Unspecified|BaseIndex, <ssse3:reg> }
@@ -1333,7 +1333,7 @@ pabsd<ssse3>, 0x<ssse3:pfx>0f381e, None,
 // SSE4.1 instructions.
 
 <sse41:cpu:attr:scal:vvvv, $avx:CpuAVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, $sse:CpuSSE4_1:::>
-<sd:ppfx:spfx:opc:vexw:elem:scal, s::f3:0:VexW0:Dword:IgnoreSize, d:66:f2:1:VexW1:Qword:NoRex64>
+<sd:ppfx:spfx:opc:vexw:elem, s::f3:0:VexW0:Dword, d:66:f2:1:VexW1:Qword>
 
 blendp<sd><sse41>, 0x660f3a0c | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexW=1|VexSources=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1341,11 +1341,11 @@ blendvp<sd>, 0x664a | <sd:opc>, None, Cp
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 blendvp<sd>, 0x660f3814 | <sd:opc>, None, CpuSSE4_1, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 dpp<sd><sse41>, 0x660f3a40 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+extractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64 }
 extractps, 0x660f3a17, None, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 extractps, 0x660f3a17, None, CpuSSE4_1|Cpu64, RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, RegXMM, Reg64 }
-insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|IgnoreSize|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+insertps<sse41>, 0x660f3a21, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movntdqa<sse41>, 0x660f382a, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegXMM }
 mpsadbw<sse41>, 0x660f3a42, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 packusdw<sse41>, 0x660f382b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
@@ -1356,7 +1356,7 @@ pblendvb, 0x660f3810, None, CpuSSE4_1, M
 pblendw<sse41>, 0x660f3a0e, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpeqq<sse41>, 0x660f3829, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, RegMem|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize|NoRex64, { Imm8, RegXMM, Reg32|Reg64 }
-pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
+pextr<bw><sse41>, 0x660f3a14 | <bw:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, <bw:elem>|Unspecified|BaseIndex }
 pextrd<sse41>, 0x660f3a16, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Imm8, RegXMM, Reg32|Unspecified|BaseIndex }
 pextrq, 0x6616, None, CpuAVX|Cpu64, Modrm|Vex|Space0F3A|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
 pextrq, 0x660f3a16, None, CpuSSE4_1|Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64|Unspecified|BaseIndex }
@@ -1374,23 +1374,23 @@ pminsb<sse41>, 0x660f3838, None, <sse41:
 pminsd<sse41>, 0x660f3839, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminud<sse41>, 0x660f383b, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pminuw<sse41>, 0x660f383a, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
-pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IgnoreSize, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
-pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbw<sse41>, 0x660f3820, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbd<sse41>, 0x660f3821, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxbq<sse41>, 0x660f3822, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwd<sse41>, 0x660f3823, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxwq<sse41>, 0x660f3824, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovsxdq<sse41>, 0x660f3825, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbw<sse41>, 0x660f3830, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbd<sse41>, 0x660f3831, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxbq<sse41>, 0x660f3832, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwd<sse41>, 0x660f3833, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxwq<sse41>, 0x660f3834, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+pmovzxdq<sse41>, 0x660f3835, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 pmuldq<sse41>, 0x660f3828, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pmulld<sse41>, 0x660f3840, None, <sse41:cpu>, Modrm|<sse41:attr>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 ptest<sse41>, 0x660f3817, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 roundp<sd><sse41>, 0x660f3a08 | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<sd:scal>, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
+rounds<sd><sse41>, 0x660f3a0a | <sd:opc>, None, <sse41:cpu>, Modrm|<sse41:scal>|<sse41:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM }
 
 // SSE4.2 instructions.
 
@@ -1484,8 +1484,8 @@ vandp<sd>, 0x<sd:ppfx>54, None, CpuAVX,
 vblendp<sd>, 0x660c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vblendvp<sd>, 0x664a | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexW0|VexSources=2|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vbroadcastf128, 0x661a, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Unspecified|BaseIndex, RegYMM }
-vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
-vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex|Space0F38|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
+vbroadcastsd, 0x6619, None, CpuAVX, Modrm|Vex256|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegYMM }
+vbroadcastss, 0x6618, None, CpuAVX, Modrm|Vex128|Space0F38|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vcmp<frel>p<sd>, 0x<sd:ppfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 vcmp<frel>s<sd>, 0x<sd:spfx>c2, 0x<frel:imm>, CpuAVX, Modrm|<frel:comm>|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|<sd:elem>|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcmpp<sd>, 0x<sd:ppfx>c2, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1499,22 +1499,20 @@ vcvtpd2ps<xy>, 0x665a, None, CpuAVX, Mod
 vcvtps2dq, 0x665b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 vcvtps2pd, 0x5a, None, CpuAVX, Modrm|Vex256|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegYMM }
-vcvtsd2si, 0xf22d, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvtsd2ss, 0xf25a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsi2s<sd>, 0x<sd:spfx>2a, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtss2sd, 0xf35a, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vcvtss2si, 0xf32d, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX, Modrm|<xy:vex>|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:syntax>, { <xy:dst>, RegXMM }
 vcvttps2dq, 0xf35b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
-vcvttsd2si, 0xf22c, None, CpuAVX, Modrm|Vex=3|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword, { Qword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
-vcvttss2si, 0xf32c, None, CpuAVX, Modrm|Vex=3|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword, { Dword|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
+vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX, Modrm|VexLIG|Space0F|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, Reg32|Reg64 }
 vdivp<sd>, 0x<sd:ppfx>5e, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vdivs<sd>, 0x<sd:spfx>5e, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdppd, 0x6641, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vdpps, 0x6640, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vextractf128, 0x6619, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM }
-vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX, Modrm|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX|Cpu64, RegMem|Vex|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 vhaddpd, 0x667c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vhaddps, 0xf27c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1523,7 +1521,7 @@ vhsubps, 0xf27d, None, CpuAVX, Modrm|Vex
 vinsertf128, 0x6618, None, CpuAVX, Modrm|Vex=2|Space0F3A|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM }
 vinsertps, 0x6621, None, CpuAVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vlddqu, 0xf2f0, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
-vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vldmxcsr, 0xae, 2, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vmaskmovdqu, 0x66f7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 vmaskmovp<sd>, 0x662e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmaskmovp<sd>, 0x662c | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1537,26 +1535,26 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX,
 // by Intel AVX spec).  To avoid extra template in gcc x86 backend and
 // support assembler for AMD64, we accept 64bit operand on vmovd so
 // that we can use one template for both SSE and AVX instructions.
-vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 vmovd, 0x667e, None, CpuAVX|Cpu64, D|RegMem|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { RegXMM, Reg64 }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
 vmovdqa, 0x666f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovhlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovlhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 vmovmskp<sd>, 0x<sd:ppfx>50, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { RegXMM|RegYMM, Reg32|Reg64 }
 vmovntdq, 0x66e7, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovntdqa, 0x662a, None, CpuAVX|CpuAVX2, Modrm|Vex|Space0F38|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM }
 vmovntp<sd>, 0x<sd:ppfx>2b, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex }
 vmovq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
-vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
-vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sd>, 0x<sd:spfx>10, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovshdup, 0xf316, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovsldup, 0xf312, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1692,7 +1690,7 @@ vrsqrtss, 0xf352, None, CpuAVX, Modrm|Ve
 vshufp<sd>, 0x<sd:ppfx>c6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrtp<sd>, 0x<sd:ppfx>51, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vsqrts<sd>, 0x<sd:spfx>51, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
+vstmxcsr, 0xae, 3, CpuAVX, Modrm|Vex128|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex }
 vsubp<sd>, 0x<sd:ppfx>5c, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vsubs<sd>, 0x<sd:spfx>5c, None, CpuAVX, Modrm|VexLIG|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sd:elem>|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vtestp<sd>, 0x660e | <sd:opc>, None, CpuAVX, Modrm|Vex|Space0F38|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
@@ -1889,8 +1887,8 @@ vpshl<xop>, 0x94 | <xop:opc>, None, CpuX
 
 llwpcb, 0x12, 0, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
 slwpcb, 0x12, 1, CpuLWP, Modrm|SpaceXOP09|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Vex, { Reg32|Reg64 }
-lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
-lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpval, 0x12, 1, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
+lwpins, 0x12, 0, CpuLWP, Modrm|SpaceXOP0A|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|VexVVVV=3|Vex, { Imm32|Imm32S, Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // BMI instructions
 
@@ -1918,30 +1916,30 @@ tzmsk, 0x01, 4, CpuTBM, Modrm|CheckRegSi
 prefetch, 0xf0d, 0, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 prefetchw, 0xf0d, 1, Cpu3dnow|CpuPRFCHW, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex }
 femms, 0xf0e, None, Cpu3dnow, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pavgusb, 0xf0f, 0xbf, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2id, 0xf0f, 0x1d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pf2iw, 0xf0f, 0x1c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfacc, 0xf0f, 0xae, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfadd, 0xf0f, 0x9e, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpeq, 0xf0f, 0xb0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpge, 0xf0f, 0x90, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfcmpgt, 0xf0f, 0xa0, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmax, 0xf0f, 0xa4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmin, 0xf0f, 0x94, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfmul, 0xf0f, 0xb4, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfnacc, 0xf0f, 0x8a, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfpnacc, 0xf0f, 0x8e, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcp, 0xf0f, 0x96, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit1, 0xf0f, 0xa6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrcpit2, 0xf0f, 0xb6, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqit1, 0xf0f, 0xa7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfrsqrt, 0xf0f, 0x97, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsub, 0xf0f, 0x9a, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pfsubr, 0xf0f, 0xaa, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fd, 0xf0f, 0x0d, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pi2fw, 0xf0f, 0x0c, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pmulhrw, 0xf0f, 0xb7, Cpu3dnow, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
+pswapd, 0xf0f, 0xbb, Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 
 // AMD extensions.
 syscall, 0xf05, None, CpuSYSCALL, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
@@ -1967,8 +1965,8 @@ vmsave, 0xf01db, None, CpuSVME, AddrPref
 
 
 // SSE4a instructions
-movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-movntss, 0xf30f2b, None, CpuSSE4a, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
+movntsd, 0xf20f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+movntss, 0xf30f2b, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Dword|Unspecified|BaseIndex }
 extrq, 0x660f78, 0, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Imm8, RegXMM }
 extrq, 0x660f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 insertq, 0xf20f79, None, CpuSSE4a, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
@@ -2166,8 +2164,8 @@ vcvtps2pd, 0x5A, None, CpuAVX512F, Modrm
 
 vcvtps2ph, 0x661D, None, CpuAVX512F, Modrm|EVex512|MaskingMorZ|Space0F3A|VexW0|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vcvtsd2si, 0xF22D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsd2usi, 0xF279, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sd>2si, 0x<sd:spfx>2d, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvts<sdh>2usi, 0x<sdh:spfx>79, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtsd2ss, 0xF25A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
@@ -2187,20 +2185,14 @@ vcvtusi2ss, 0xF37B, None, CpuAVX512F, Mo
 
 vcvtss2sd, 0xF35A, None, CpuAVX512F, Modrm|EVexLIG|Masking=3|Space0F|VexVVVV|VexW0|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtss2si, 0xF32D, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtss2usi, 0xF379, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-
 vcvttpd2dq<xy>, 0x66e6, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 vcvttpd2udq<xy>, 0x78, None, CpuAVX512F|<xy:vl>, Modrm|<xy:attr>|Masking=3|Space0F|VexW1|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xy:sae>, { <xy:src>|Qword, <xy:dst> }
 
 vcvttps2dq, 0xF35B, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttps2udq, 0x78, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsd2si, 0xF22C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsd2usi, 0xF278, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToDword|SAE, { RegXMM|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-
-vcvttss2si, 0xF32C, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttss2usi, 0xF378, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Dword|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sd>2si, 0x<sd:spfx>2c, None, CpuAVX512F, Modrm|EVexLIG|Space0F|Disp8MemShift|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|SAE, { RegXMM|<sd:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtts<sdh>2usi, 0x<sdh:spfx>78, None, <sdh:cpu>, Modrm|EVexLIG|<sdh:spc1>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|<sdh:elem>|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvtudq2ps, 0xF27A, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
@@ -2216,7 +2208,7 @@ vextracti32x4, 0x6639, None, CpuAVX512F,
 vextractf64x4, 0x661B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 vextracti64x4, 0x663B, None, CpuAVX512F, Modrm|EVex=1|MaskingMorZ|Space0F3A|VexW=2|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex }
 
-vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
+vextractps, 0x6617, None, CpuAVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex }
 vextractps, 0x6617, None, CpuAVX512F|Cpu64, RegMem|EVex128|Space0F3A|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM, Reg64 }
 
 vfixupimmp<sd>, 0x6654, None, CpuAVX512F, Modrm|Masking=3|Space0F3A|VexVVVV|<sd:vexw>|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM|RegYMM|RegZMM|<sd:elem>|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
@@ -2274,7 +2266,7 @@ vmovap<sd>, 0x<sd:ppfx>28, None, CpuAVX5
 vmovntp<sd>, 0x<sd:ppfx>2B, None, CpuAVX512F, Modrm|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovup<sd>, 0x<sd:ppfx>10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|<sd:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
+vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 
 vmovddup, 0xF212, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM }
 
@@ -2287,16 +2279,16 @@ vmovdqu64, 0xF36F, None, CpuAVX512F, D|M
 vmovhlps, 0x12, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 vmovlhps, 0x16, None, CpuAVX512F, Modrm|EVex=4|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
-vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
-vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovhp<sd>, 0x<sd:ppfx>16, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovhp<sd>, 0x<sd:ppfx>17, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
+vmovlp<sd>, 0x<sd:ppfx>12, None, CpuAVX512F, Modrm|EVexLIG|Space0F|VexVVVV|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmovlp<sd>, 0x<sd:ppfx>13, None, CpuAVX512F, Modrm|EVexLIG|Space0F|<sd:vexw>|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex }
 
-vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex=2|Space0F|VexW=2|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
+vmovq, 0x666E, None, CpuAVX512F|Cpu64, D|Modrm|EVex128|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64|Unspecified|BaseIndex, RegXMM }
 vmovq, 0xF37E, None, CpuAVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovq, 0x66D6, None, CpuAVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 
-vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
+vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|MaskingMorZ|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <sdh:elem>|Unspecified|BaseIndex, RegXMM }
 vmovs<sdh>, 0x<sdh:spfx>10, None, <sdh:cpu>, D|Modrm|EVexLIG|Masking=3|<sdh:spc1>|VexVVVV|<sdh:vexw>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
 
 vmovshdup, 0xF316, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
@@ -2596,7 +2588,7 @@ kadd<dq>, 0x<dq:kpfx>4a, None, CpuAVX512
 kand<dq>, 0x<dq:kpfx>41, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0x<dq:kpfx>91, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
 kmov<dq>, 0xf292, None, CpuAVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, None, CpuAVX512BW, Modrm|Vex128|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, None, CpuAVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMask, RegMask, RegMask }
@@ -2985,13 +2977,13 @@ incsspq, 0xf30fae, 5, CpuSHSTK|Cpu64, Mo
 rdsspd, 0xf30f1e, 1, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32 }
 rdsspq, 0xf30f1e, 1, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+rstorssp, 0xf30f01, 5, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x0f38f6, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, None, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32, Dword|Unspecified|BaseIndex }
-wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x660f38f5, None, CpuSHSTK|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, None, CpuSHSTK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
-clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Qword|Unspecified|BaseIndex }
+clrssbsy, 0xf30fae, 6, CpuSHSTK, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 endbr32, 0xf30f1efb, None, CpuIBT, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, {}
 
@@ -3230,8 +3222,7 @@ vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16
 vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap5|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|EVexMap6|VexVVVV|VexW0|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
 
-vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
 vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|EVexMap5|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegYMM }
@@ -3256,8 +3247,7 @@ vcvtph2psx, 0x6613, None, CpuAVX512_FP16
 vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|Masking=3|EVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
-vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
-vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ToQword|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
+vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|EVexMap5|Disp8MemShift=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { RegXMM|Word|Unspecified|BaseIndex, Reg32|Reg64 }
 
 vfpclassph<xyz>, 0x66, None, CpuAVX512_FP16|<xyz:vl>, Modrm|<xyz:attr>|Masking=2|Space0F3A|VexW0|Broadcast|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|<xyz:att>, { Imm8, <xyz:src>|Word, RegMask }
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 2/7] x86: improve match_template()'s diagnostics
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
  2022-08-26 10:28 ` Jan Beulich
  2022-08-26 10:30 ` [PATCH v2 1/7] x86/Intel: restrict suffix derivation Jan Beulich
@ 2022-08-26 10:30 ` Jan Beulich
  2022-08-26 10:30   ` Jan Beulich
  2022-09-13 14:24   ` Ping: " Jan Beulich
  2022-08-26 10:31 ` [PATCH v2 3/7] x86: re-work insn/suffix recognition Jan Beulich
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:30 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

At the example of

	extractps $0, %xmm0, %xmm0
	insertps $0, %xmm0, %eax

(both having respectively the same mistake of using the wrong kind of
destination register) it is easy to see that current behavior is far
from ideal: The former results in "unsupported instruction" for 32-bit
code simply because the 2nd template we have is a Cpu64 one. Instead we
should aim at emitting the "best" possible error, which will typically
be the one where we passed the largest number of checks. Generalize the
original "specific_error" approach by making it apply to the entire
matching loop, utilizing that line numbers increase as we pass further
checks.
---
v2: Style correction.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2083,12 +2083,7 @@ operand_size_match (const insn_template
     }
 
   if (!t->opcode_modifier.d)
-    {
-    mismatch:
-      if (!match)
-	i.error = operand_size_mismatch;
-      return match;
-    }
+    return match;
 
   /* Check reverse.  */
   gas_assert ((i.operands >= 2 && i.operands <= 3)
@@ -2105,19 +2100,19 @@ operand_size_match (const insn_template
 
       if (t->operand_types[j].bitfield.class == Reg
 	  && !match_operand_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.class == RegSIMD
 	  && !match_simd_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.instance == Accum
 	  && (!match_operand_size (t, j, given)
 	      || !match_simd_size (t, j, given)))
-	goto mismatch;
+	return match;
 
       if ((i.flags[given] & Operand_Mem) && !match_mem_size (t, j, given))
-	goto mismatch;
+	return match;
     }
 
   return match | MATCH_REVERSE;
@@ -6386,6 +6381,17 @@ VEX_check_encoding (const insn_template
   return 0;
 }
 
+/* Helper function for the progress() macro in match_template().  */
+static INLINE enum i386_error progress (enum i386_error new,
+					enum i386_error last,
+					unsigned int line, unsigned int *line_p)
+{
+  if (line <= *line_p)
+    return last;
+  *line_p = line;
+  return new;
+}
+
 static const insn_template *
 match_template (char mnem_suffix)
 {
@@ -6397,8 +6403,9 @@ match_template (char mnem_suffix)
   i386_opcode_modifier suffix_check;
   i386_operand_type operand_types [MAX_OPERANDS];
   int addr_prefix_disp;
-  unsigned int j, size_match, check_register;
-  enum i386_error specific_error = 0;
+  unsigned int j, size_match, check_register, errline = __LINE__;
+  enum i386_error specific_error = number_of_operands_mismatch;
+#define progress(err) progress (err, specific_error, __LINE__, &errline)
 
 #if MAX_OPERANDS != 5
 # error "MAX_OPERANDS must be 5."
@@ -6436,36 +6443,33 @@ match_template (char mnem_suffix)
 	suffix_check.no_ldsuf = 1;
     }
 
-  /* Must have right number of operands.  */
-  i.error = number_of_operands_mismatch;
-
   for (t = current_templates->start; t < current_templates->end; t++)
     {
       addr_prefix_disp = -1;
       found_reverse_match = 0;
 
+      /* Must have right number of operands.  */
       if (i.operands != t->operands)
 	continue;
 
       /* Check processor support.  */
-      i.error = unsupported;
+      specific_error = progress (unsupported);
       if (cpu_flags_match (t) != CPU_FLAGS_PERFECT_MATCH)
 	continue;
 
       /* Check Pseudo Prefix.  */
-      i.error = unsupported;
       if (t->opcode_modifier.pseudovexprefix
 	  && !(i.vec_encoding == vex_encoding_vex
 	      || i.vec_encoding == vex_encoding_vex3))
 	continue;
 
       /* Check AT&T mnemonic.   */
-      i.error = unsupported_with_intel_mnemonic;
+      specific_error = progress (unsupported_with_intel_mnemonic);
       if (intel_mnemonic && t->opcode_modifier.attmnemonic)
 	continue;
 
       /* Check AT&T/Intel syntax.  */
-      i.error = unsupported_syntax;
+      specific_error = progress (unsupported_syntax);
       if ((intel_syntax && t->opcode_modifier.attsyntax)
 	  || (!intel_syntax && t->opcode_modifier.intelsyntax))
 	continue;
@@ -6491,7 +6495,7 @@ match_template (char mnem_suffix)
 	}
 
       /* Check the suffix.  */
-      i.error = invalid_instruction_suffix;
+      specific_error = progress (invalid_instruction_suffix);
       if ((t->opcode_modifier.no_bsuf && suffix_check.no_bsuf)
 	  || (t->opcode_modifier.no_wsuf && suffix_check.no_wsuf)
 	  || (t->opcode_modifier.no_lsuf && suffix_check.no_lsuf)
@@ -6500,6 +6504,7 @@ match_template (char mnem_suffix)
 	  || (t->opcode_modifier.no_ldsuf && suffix_check.no_ldsuf))
 	continue;
 
+      specific_error = progress (operand_size_mismatch);
       size_match = operand_size_match (t);
       if (!size_match)
 	continue;
@@ -6510,11 +6515,9 @@ match_template (char mnem_suffix)
 
 	 as the case of a missing * on the operand is accepted (perhaps with
 	 a warning, issued further down).  */
+      specific_error = progress (operand_type_mismatch);
       if (i.jumpabsolute && t->opcode_modifier.jump != JUMP_ABSOLUTE)
-	{
-	  i.error = operand_type_mismatch;
-	  continue;
-	}
+	continue;
 
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
@@ -6522,6 +6525,8 @@ match_template (char mnem_suffix)
       /* In general, don't allow
 	 - 64-bit operands outside of 64-bit mode,
 	 - 32-bit operands on pre-386.  */
+      specific_error = progress (mnem_suffix ? invalid_instruction_suffix
+					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
       if (((i.suffix == QWORD_MNEM_SUFFIX
 	    && flag_code != CODE_64BIT
@@ -6550,7 +6555,7 @@ match_template (char mnem_suffix)
 	{
 	  if (VEX_check_encoding (t))
 	    {
-	      specific_error = i.error;
+	      specific_error = progress (i.error);
 	      continue;
 	    }
 
@@ -6711,6 +6716,8 @@ match_template (char mnem_suffix)
 						   i.types[1],
 						   operand_types[1])))
 	    {
+	      specific_error = progress (i.error);
+
 	      /* Check if other direction is valid ...  */
 	      if (!t->opcode_modifier.d)
 		continue;
@@ -6735,6 +6742,7 @@ match_template (char mnem_suffix)
 						       operand_types[0])))
 		{
 		  /* Does not match either direction.  */
+		  specific_error = progress (i.error);
 		  continue;
 		}
 	      /* found_reverse_match holds which of D or FloatR
@@ -6773,7 +6781,10 @@ match_template (char mnem_suffix)
 						       operand_types[3],
 						       i.types[4],
 						       operand_types[4]))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 4:
 		  overlap3 = operand_type_and (i.types[3], operand_types[3]);
@@ -6788,7 +6799,10 @@ match_template (char mnem_suffix)
 							    operand_types[2],
 							    i.types[3],
 							    operand_types[3])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 3:
 		  overlap2 = operand_type_and (i.types[2], operand_types[2]);
@@ -6803,7 +6817,10 @@ match_template (char mnem_suffix)
 							    operand_types[1],
 							    i.types[2],
 							    operand_types[2])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  break;
 		}
 	    }
@@ -6814,14 +6831,14 @@ match_template (char mnem_suffix)
       /* Check if vector operands are valid.  */
       if (check_VecOperands (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
       /* Check if VEX/EVEX encoding requirements can be satisfied.  */
       if (VEX_check_encoding (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
@@ -6829,11 +6846,13 @@ match_template (char mnem_suffix)
       break;
     }
 
+#undef progress
+
   if (t == current_templates->end)
     {
       /* We found no match.  */
       const char *err_msg;
-      switch (specific_error ? specific_error : i.error)
+      switch (specific_error)
 	{
 	default:
 	  abort ();
--- a/gas/testsuite/gas/i386/inval-tls.l
+++ b/gas/testsuite/gas/i386/inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovd'
-.*:4: Error: operand size mismatch for `kmovd'
+.*:3: Error: .* `kmovd'
+.*:4: Error: .* `kmovd'
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,14 +1,14 @@
 .*: Assembler messages:
-.*:25: Error: .*unsupported instruction.*
+.*:25: Error: .*operand size mismatch.*
 .*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:47: Error: .*unsupported instruction.*
+.*:47: Error: .*operand size mismatch.*
 .*:48: Error: .*unsupported masking.*
 .*:49: Error: .*unsupported masking.*
 .*:50: Error: .*not supported.*
 .*:51: Error: .*not supported.*
 .*:52: Error: .*not supported.*
-.*:69: Error: .*unsupported instruction.*
+.*:69: Error: .*operand size mismatch.*
 .*:70: Error: .*unsupported masking.*
 .*:71: Error: .*unsupported masking.*
 .*:72: Error: .*not supported.*
@@ -17,7 +17,7 @@
 .*:75: Error: .*not supported.*
 .*:76: Error: .*not supported.*
 .*:77: Error: .*not supported.*
-.*:91: Error: .*unsupported instruction.*
+.*:91: Error: .*operand size mismatch.*
 .*:92: Error: .*unsupported masking.*
 .*:93: Error: .*unsupported masking.*
 .*:94: Error: .*not supported.*
@@ -27,7 +27,7 @@
 .*:98: Error: .*not supported.*
 .*:99: Error: .*not supported.*
 .*:100: Error: .*not supported.*
-.*:113: Error: .*unsupported instruction.*
+.*:113: Error: .*operand size mismatch.*
 .*:114: Error: .*unsupported masking.*
 .*:115: Error: .*unsupported masking.*
 .*:116: Error: .*not supported.*
@@ -40,7 +40,7 @@
 .*:126: Error: .*not supported.*
 .*:127: Error: .*not supported.*
 .*:128: Error: .*not supported.*
-.*:135: Error: .*unsupported instruction.*
+.*:135: Error: .*operand size mismatch.*
 .*:136: Error: .*unsupported masking.*
 .*:137: Error: .*unsupported masking.*
 .*:138: Error: .*not supported.*
@@ -54,7 +54,7 @@
 .*:149: Error: .*not supported.*
 .*:150: Error: .*not supported.*
 .*:151: Error: .*not supported.*
-.*:157: Error: .*unsupported instruction.*
+.*:157: Error: .*operand size mismatch.*
 .*:158: Error: .*unsupported masking.*
 .*:159: Error: .*unsupported masking.*
 .*:160: Error: .*not supported.*
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,12 +1,12 @@
 .*: Assembler messages:
-.*:26: Error: .*unsupported instruction.*
-.*:27: Error: .*unsupported instruction.*
+.*:26: Error: .*unsupported masking.*
+.*:27: Error: .*unsupported masking.*
 .*:29: Error: .*unsupported instruction.*
 .*:30: Error: .*unsupported instruction.*
 .*:32: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported instruction.*
-.*:37: Error: .*unsupported instruction.*
+.*:36: Error: .*unsupported masking.*
+.*:37: Error: .*unsupported masking.*
 .*:39: Error: .*unsupported instruction.*
 .*:40: Error: .*unsupported instruction.*
 .*:43: Error: .*unsupported instruction.*
--- a/gas/testsuite/gas/i386/x86-64-branch-4.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-4.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
 .*:2: Error: invalid instruction suffix for `call'
 .*:3: Error: invalid instruction suffix for `call'
-.*:4: Error: operand type mismatch for `jmp'
+.*:4: Error: operand (size|type) mismatch for `jmp'
 .*:5: Error: invalid instruction suffix for `jmp'
 .*:6: Error: invalid instruction suffix for `jmp'
 .*:7: Error: invalid instruction suffix for `ret'
 .*:8: Error: invalid instruction suffix for `ret'
-.*:11: Error: operand type mismatch for `call'
+.*:11: Error: operand (size|type) mismatch for `call'
 .*:12: Error: invalid instruction suffix for `call'
 .*:13: Error: invalid instruction suffix for `call'
-.*:14: Error: operand size mismatch for `call'
-.*:15: Error: operand type mismatch for `jmp'
+.*:14: Error: operand (size|type) mismatch for `call'
+.*:15: Error: operand (size|type) mismatch for `jmp'
 .*:16: Error: invalid instruction suffix for `jmp'
 .*:17: Error: invalid instruction suffix for `jmp'
-.*:18: Error: operand size mismatch for `jmp'
+.*:18: Error: operand (size|type) mismatch for `jmp'
 .*:19: Error: invalid instruction suffix for `ret'
 .*:20: Error: invalid instruction suffix for `ret'
 GAS LISTING .*
--- a/gas/testsuite/gas/i386/x86-64-branch-5.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-5.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
-.*:2: Error: unsupported syntax for `lcall'
-.*:3: Error: unsupported syntax for `lfs'
-.*:4: Error: unsupported syntax for `lfs'
-.*:5: Error: unsupported syntax for `lgs'
-.*:6: Error: unsupported syntax for `lgs'
-.*:7: Error: unsupported syntax for `ljmp'
-.*:8: Error: unsupported syntax for `lss'
-.*:9: Error: unsupported syntax for `lss'
-.*:12: Error: unsupported syntax for `call'
-.*:13: Error: unsupported syntax for `lfs'
-.*:14: Error: unsupported syntax for `lfs'
-.*:15: Error: unsupported syntax for `lgs'
-.*:16: Error: unsupported syntax for `lgs'
-.*:17: Error: unsupported syntax for `jmp'
-.*:18: Error: unsupported syntax for `lss'
-.*:19: Error: unsupported syntax for `lss'
+.*:2: Error: invalid instruction suffix for `lcall'
+.*:3: Error: operand size mismatch for `lfs'
+.*:4: Error: invalid instruction suffix for `lfs'
+.*:5: Error: operand size mismatch for `lgs'
+.*:6: Error: invalid instruction suffix for `lgs'
+.*:7: Error: invalid instruction suffix for `ljmp'
+.*:8: Error: operand size mismatch for `lss'
+.*:9: Error: invalid instruction suffix for `lss'
+.*:12: Error: operand (size|type) mismatch for `call'
+.*:13: Error: operand size mismatch for `lfs'
+.*:14: Error: operand size mismatch for `lfs'
+.*:15: Error: operand size mismatch for `lgs'
+.*:16: Error: operand size mismatch for `lgs'
+.*:17: Error: operand (size|type) mismatch for `jmp'
+.*:18: Error: operand size mismatch for `lss'
+.*:19: Error: operand size mismatch for `lss'
 GAS LISTING .*
 #pass
--- a/gas/testsuite/gas/i386/x86-64-inval-tls.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovq'
-.*:4: Error: operand size mismatch for `kmovq'
+.*:3: Error: .* `kmovq'
+.*:4: Error: .* `kmovq'


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 2/7] x86: improve match_template()'s diagnostics
  2022-08-26 10:30 ` [PATCH v2 2/7] x86: improve match_template()'s diagnostics Jan Beulich
@ 2022-08-26 10:30   ` Jan Beulich
  2022-09-13 14:24   ` Ping: " Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:30 UTC (permalink / raw)
  To: Binutils

At the example of

	extractps $0, %xmm0, %xmm0
	insertps $0, %xmm0, %eax

(both having respectively the same mistake of using the wrong kind of
destination register) it is easy to see that current behavior is far
from ideal: The former results in "unsupported instruction" for 32-bit
code simply because the 2nd template we have is a Cpu64 one. Instead we
should aim at emitting the "best" possible error, which will typically
be the one where we passed the largest number of checks. Generalize the
original "specific_error" approach by making it apply to the entire
matching loop, utilizing that line numbers increase as we pass further
checks.
---
v2: Style correction.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2083,12 +2083,7 @@ operand_size_match (const insn_template
     }
 
   if (!t->opcode_modifier.d)
-    {
-    mismatch:
-      if (!match)
-	i.error = operand_size_mismatch;
-      return match;
-    }
+    return match;
 
   /* Check reverse.  */
   gas_assert ((i.operands >= 2 && i.operands <= 3)
@@ -2105,19 +2100,19 @@ operand_size_match (const insn_template
 
       if (t->operand_types[j].bitfield.class == Reg
 	  && !match_operand_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.class == RegSIMD
 	  && !match_simd_size (t, j, given))
-	goto mismatch;
+	return match;
 
       if (t->operand_types[j].bitfield.instance == Accum
 	  && (!match_operand_size (t, j, given)
 	      || !match_simd_size (t, j, given)))
-	goto mismatch;
+	return match;
 
       if ((i.flags[given] & Operand_Mem) && !match_mem_size (t, j, given))
-	goto mismatch;
+	return match;
     }
 
   return match | MATCH_REVERSE;
@@ -6386,6 +6381,17 @@ VEX_check_encoding (const insn_template
   return 0;
 }
 
+/* Helper function for the progress() macro in match_template().  */
+static INLINE enum i386_error progress (enum i386_error new,
+					enum i386_error last,
+					unsigned int line, unsigned int *line_p)
+{
+  if (line <= *line_p)
+    return last;
+  *line_p = line;
+  return new;
+}
+
 static const insn_template *
 match_template (char mnem_suffix)
 {
@@ -6397,8 +6403,9 @@ match_template (char mnem_suffix)
   i386_opcode_modifier suffix_check;
   i386_operand_type operand_types [MAX_OPERANDS];
   int addr_prefix_disp;
-  unsigned int j, size_match, check_register;
-  enum i386_error specific_error = 0;
+  unsigned int j, size_match, check_register, errline = __LINE__;
+  enum i386_error specific_error = number_of_operands_mismatch;
+#define progress(err) progress (err, specific_error, __LINE__, &errline)
 
 #if MAX_OPERANDS != 5
 # error "MAX_OPERANDS must be 5."
@@ -6436,36 +6443,33 @@ match_template (char mnem_suffix)
 	suffix_check.no_ldsuf = 1;
     }
 
-  /* Must have right number of operands.  */
-  i.error = number_of_operands_mismatch;
-
   for (t = current_templates->start; t < current_templates->end; t++)
     {
       addr_prefix_disp = -1;
       found_reverse_match = 0;
 
+      /* Must have right number of operands.  */
       if (i.operands != t->operands)
 	continue;
 
       /* Check processor support.  */
-      i.error = unsupported;
+      specific_error = progress (unsupported);
       if (cpu_flags_match (t) != CPU_FLAGS_PERFECT_MATCH)
 	continue;
 
       /* Check Pseudo Prefix.  */
-      i.error = unsupported;
       if (t->opcode_modifier.pseudovexprefix
 	  && !(i.vec_encoding == vex_encoding_vex
 	      || i.vec_encoding == vex_encoding_vex3))
 	continue;
 
       /* Check AT&T mnemonic.   */
-      i.error = unsupported_with_intel_mnemonic;
+      specific_error = progress (unsupported_with_intel_mnemonic);
       if (intel_mnemonic && t->opcode_modifier.attmnemonic)
 	continue;
 
       /* Check AT&T/Intel syntax.  */
-      i.error = unsupported_syntax;
+      specific_error = progress (unsupported_syntax);
       if ((intel_syntax && t->opcode_modifier.attsyntax)
 	  || (!intel_syntax && t->opcode_modifier.intelsyntax))
 	continue;
@@ -6491,7 +6495,7 @@ match_template (char mnem_suffix)
 	}
 
       /* Check the suffix.  */
-      i.error = invalid_instruction_suffix;
+      specific_error = progress (invalid_instruction_suffix);
       if ((t->opcode_modifier.no_bsuf && suffix_check.no_bsuf)
 	  || (t->opcode_modifier.no_wsuf && suffix_check.no_wsuf)
 	  || (t->opcode_modifier.no_lsuf && suffix_check.no_lsuf)
@@ -6500,6 +6504,7 @@ match_template (char mnem_suffix)
 	  || (t->opcode_modifier.no_ldsuf && suffix_check.no_ldsuf))
 	continue;
 
+      specific_error = progress (operand_size_mismatch);
       size_match = operand_size_match (t);
       if (!size_match)
 	continue;
@@ -6510,11 +6515,9 @@ match_template (char mnem_suffix)
 
 	 as the case of a missing * on the operand is accepted (perhaps with
 	 a warning, issued further down).  */
+      specific_error = progress (operand_type_mismatch);
       if (i.jumpabsolute && t->opcode_modifier.jump != JUMP_ABSOLUTE)
-	{
-	  i.error = operand_type_mismatch;
-	  continue;
-	}
+	continue;
 
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
@@ -6522,6 +6525,8 @@ match_template (char mnem_suffix)
       /* In general, don't allow
 	 - 64-bit operands outside of 64-bit mode,
 	 - 32-bit operands on pre-386.  */
+      specific_error = progress (mnem_suffix ? invalid_instruction_suffix
+					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
       if (((i.suffix == QWORD_MNEM_SUFFIX
 	    && flag_code != CODE_64BIT
@@ -6550,7 +6555,7 @@ match_template (char mnem_suffix)
 	{
 	  if (VEX_check_encoding (t))
 	    {
-	      specific_error = i.error;
+	      specific_error = progress (i.error);
 	      continue;
 	    }
 
@@ -6711,6 +6716,8 @@ match_template (char mnem_suffix)
 						   i.types[1],
 						   operand_types[1])))
 	    {
+	      specific_error = progress (i.error);
+
 	      /* Check if other direction is valid ...  */
 	      if (!t->opcode_modifier.d)
 		continue;
@@ -6735,6 +6742,7 @@ match_template (char mnem_suffix)
 						       operand_types[0])))
 		{
 		  /* Does not match either direction.  */
+		  specific_error = progress (i.error);
 		  continue;
 		}
 	      /* found_reverse_match holds which of D or FloatR
@@ -6773,7 +6781,10 @@ match_template (char mnem_suffix)
 						       operand_types[3],
 						       i.types[4],
 						       operand_types[4]))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 4:
 		  overlap3 = operand_type_and (i.types[3], operand_types[3]);
@@ -6788,7 +6799,10 @@ match_template (char mnem_suffix)
 							    operand_types[2],
 							    i.types[3],
 							    operand_types[3])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  /* Fall through.  */
 		case 3:
 		  overlap2 = operand_type_and (i.types[2], operand_types[2]);
@@ -6803,7 +6817,10 @@ match_template (char mnem_suffix)
 							    operand_types[1],
 							    i.types[2],
 							    operand_types[2])))
-		    continue;
+		    {
+		      specific_error = progress (i.error);
+		      continue;
+		    }
 		  break;
 		}
 	    }
@@ -6814,14 +6831,14 @@ match_template (char mnem_suffix)
       /* Check if vector operands are valid.  */
       if (check_VecOperands (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
       /* Check if VEX/EVEX encoding requirements can be satisfied.  */
       if (VEX_check_encoding (t))
 	{
-	  specific_error = i.error;
+	  specific_error = progress (i.error);
 	  continue;
 	}
 
@@ -6829,11 +6846,13 @@ match_template (char mnem_suffix)
       break;
     }
 
+#undef progress
+
   if (t == current_templates->end)
     {
       /* We found no match.  */
       const char *err_msg;
-      switch (specific_error ? specific_error : i.error)
+      switch (specific_error)
 	{
 	default:
 	  abort ();
--- a/gas/testsuite/gas/i386/inval-tls.l
+++ b/gas/testsuite/gas/i386/inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovd'
-.*:4: Error: operand size mismatch for `kmovd'
+.*:3: Error: .* `kmovd'
+.*:4: Error: .* `kmovd'
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,14 +1,14 @@
 .*: Assembler messages:
-.*:25: Error: .*unsupported instruction.*
+.*:25: Error: .*operand size mismatch.*
 .*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:47: Error: .*unsupported instruction.*
+.*:47: Error: .*operand size mismatch.*
 .*:48: Error: .*unsupported masking.*
 .*:49: Error: .*unsupported masking.*
 .*:50: Error: .*not supported.*
 .*:51: Error: .*not supported.*
 .*:52: Error: .*not supported.*
-.*:69: Error: .*unsupported instruction.*
+.*:69: Error: .*operand size mismatch.*
 .*:70: Error: .*unsupported masking.*
 .*:71: Error: .*unsupported masking.*
 .*:72: Error: .*not supported.*
@@ -17,7 +17,7 @@
 .*:75: Error: .*not supported.*
 .*:76: Error: .*not supported.*
 .*:77: Error: .*not supported.*
-.*:91: Error: .*unsupported instruction.*
+.*:91: Error: .*operand size mismatch.*
 .*:92: Error: .*unsupported masking.*
 .*:93: Error: .*unsupported masking.*
 .*:94: Error: .*not supported.*
@@ -27,7 +27,7 @@
 .*:98: Error: .*not supported.*
 .*:99: Error: .*not supported.*
 .*:100: Error: .*not supported.*
-.*:113: Error: .*unsupported instruction.*
+.*:113: Error: .*operand size mismatch.*
 .*:114: Error: .*unsupported masking.*
 .*:115: Error: .*unsupported masking.*
 .*:116: Error: .*not supported.*
@@ -40,7 +40,7 @@
 .*:126: Error: .*not supported.*
 .*:127: Error: .*not supported.*
 .*:128: Error: .*not supported.*
-.*:135: Error: .*unsupported instruction.*
+.*:135: Error: .*operand size mismatch.*
 .*:136: Error: .*unsupported masking.*
 .*:137: Error: .*unsupported masking.*
 .*:138: Error: .*not supported.*
@@ -54,7 +54,7 @@
 .*:149: Error: .*not supported.*
 .*:150: Error: .*not supported.*
 .*:151: Error: .*not supported.*
-.*:157: Error: .*unsupported instruction.*
+.*:157: Error: .*operand size mismatch.*
 .*:158: Error: .*unsupported masking.*
 .*:159: Error: .*unsupported masking.*
 .*:160: Error: .*not supported.*
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,12 +1,12 @@
 .*: Assembler messages:
-.*:26: Error: .*unsupported instruction.*
-.*:27: Error: .*unsupported instruction.*
+.*:26: Error: .*unsupported masking.*
+.*:27: Error: .*unsupported masking.*
 .*:29: Error: .*unsupported instruction.*
 .*:30: Error: .*unsupported instruction.*
 .*:32: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported instruction.*
-.*:37: Error: .*unsupported instruction.*
+.*:36: Error: .*unsupported masking.*
+.*:37: Error: .*unsupported masking.*
 .*:39: Error: .*unsupported instruction.*
 .*:40: Error: .*unsupported instruction.*
 .*:43: Error: .*unsupported instruction.*
--- a/gas/testsuite/gas/i386/x86-64-branch-4.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-4.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
 .*:2: Error: invalid instruction suffix for `call'
 .*:3: Error: invalid instruction suffix for `call'
-.*:4: Error: operand type mismatch for `jmp'
+.*:4: Error: operand (size|type) mismatch for `jmp'
 .*:5: Error: invalid instruction suffix for `jmp'
 .*:6: Error: invalid instruction suffix for `jmp'
 .*:7: Error: invalid instruction suffix for `ret'
 .*:8: Error: invalid instruction suffix for `ret'
-.*:11: Error: operand type mismatch for `call'
+.*:11: Error: operand (size|type) mismatch for `call'
 .*:12: Error: invalid instruction suffix for `call'
 .*:13: Error: invalid instruction suffix for `call'
-.*:14: Error: operand size mismatch for `call'
-.*:15: Error: operand type mismatch for `jmp'
+.*:14: Error: operand (size|type) mismatch for `call'
+.*:15: Error: operand (size|type) mismatch for `jmp'
 .*:16: Error: invalid instruction suffix for `jmp'
 .*:17: Error: invalid instruction suffix for `jmp'
-.*:18: Error: operand size mismatch for `jmp'
+.*:18: Error: operand (size|type) mismatch for `jmp'
 .*:19: Error: invalid instruction suffix for `ret'
 .*:20: Error: invalid instruction suffix for `ret'
 GAS LISTING .*
--- a/gas/testsuite/gas/i386/x86-64-branch-5.l
+++ b/gas/testsuite/gas/i386/x86-64-branch-5.l
@@ -1,19 +1,19 @@
 .*: Assembler messages:
-.*:2: Error: unsupported syntax for `lcall'
-.*:3: Error: unsupported syntax for `lfs'
-.*:4: Error: unsupported syntax for `lfs'
-.*:5: Error: unsupported syntax for `lgs'
-.*:6: Error: unsupported syntax for `lgs'
-.*:7: Error: unsupported syntax for `ljmp'
-.*:8: Error: unsupported syntax for `lss'
-.*:9: Error: unsupported syntax for `lss'
-.*:12: Error: unsupported syntax for `call'
-.*:13: Error: unsupported syntax for `lfs'
-.*:14: Error: unsupported syntax for `lfs'
-.*:15: Error: unsupported syntax for `lgs'
-.*:16: Error: unsupported syntax for `lgs'
-.*:17: Error: unsupported syntax for `jmp'
-.*:18: Error: unsupported syntax for `lss'
-.*:19: Error: unsupported syntax for `lss'
+.*:2: Error: invalid instruction suffix for `lcall'
+.*:3: Error: operand size mismatch for `lfs'
+.*:4: Error: invalid instruction suffix for `lfs'
+.*:5: Error: operand size mismatch for `lgs'
+.*:6: Error: invalid instruction suffix for `lgs'
+.*:7: Error: invalid instruction suffix for `ljmp'
+.*:8: Error: operand size mismatch for `lss'
+.*:9: Error: invalid instruction suffix for `lss'
+.*:12: Error: operand (size|type) mismatch for `call'
+.*:13: Error: operand size mismatch for `lfs'
+.*:14: Error: operand size mismatch for `lfs'
+.*:15: Error: operand size mismatch for `lgs'
+.*:16: Error: operand size mismatch for `lgs'
+.*:17: Error: operand (size|type) mismatch for `jmp'
+.*:18: Error: operand size mismatch for `lss'
+.*:19: Error: operand size mismatch for `lss'
 GAS LISTING .*
 #pass
--- a/gas/testsuite/gas/i386/x86-64-inval-tls.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-tls.l
@@ -1,3 +1,3 @@
 .*: Assembler messages:
-.*:3: Error: operand size mismatch for `kmovq'
-.*:4: Error: operand size mismatch for `kmovq'
+.*:3: Error: .* `kmovq'
+.*:4: Error: .* `kmovq'


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 3/7] x86: re-work insn/suffix recognition
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
                   ` (2 preceding siblings ...)
  2022-08-26 10:30 ` [PATCH v2 2/7] x86: improve match_template()'s diagnostics Jan Beulich
@ 2022-08-26 10:31 ` Jan Beulich
  2022-08-26 10:31   ` Jan Beulich
  2022-08-26 10:32 ` [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:31 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

x86: re-work insn/suffix recognition

PR gas/29524
Having templates with a suffix explicitly present has always been
quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
a suitable template _and_ didn't itself already need to trim off a
suffix to find a match at all. This requires error reporting adjustments
(albeit luckily fewer than I was afraid might be necessary), as errors
previously reported during matching now need deferring until after the
2nd pass (because, obviously, we must not emit any error if the 2nd pass
succeeds).

PR gas/29525
Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
templates, mixed IsString/non-IsString template groups cannot occur
anymore. With that maybe_adjust_templates() becomes unnecessary (and is
hence being removed).

PR gas/29526
Note further that while the additions to the intel16 testcase aren't
really proper Intel syntax, we've been permitting all of those except
for the MOVD variant. The test therefore is to avoid re-introducing such
an inconsistency.
---
To limit code churn I'm using "goto" for the retry loop, but I'd be
happy to make this a proper loop either right here or in a follow-on
change doing just the necessary re-indentation.

The "too many memory references" errors which are being deleted weren't
fully consistent anyway - even the majority of IsString insns accepts
only a single memory operand. If we want to retain that, it would need
re-introducing in md_assemble(), latching the error into i.error just
like match_template() does.

Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
is not?

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -297,9 +297,6 @@ struct _i386_insn
        explicit segment overrides are given.  */
     const reg_entry *seg[2];
 
-    /* Copied first memory operand string, for re-checking.  */
-    char *memop1_string;
-
     /* PREFIX holds all the given prefix opcodes (usually null).
        PREFIXES is the number of prefix opcodes.  */
     unsigned int prefixes;
@@ -4273,7 +4270,20 @@ optimize_encoding (void)
 	   movq $imm31, %r64   -> movl $imm31, %r32
 	   movq $imm32, %r64   -> movl $imm32, %r32
         */
-      i.tm.opcode_modifier.norex64 = 1;
+      i.tm.opcode_modifier.size = SIZE32;
+      if (i.imm_operands)
+	{
+	  i.types[0].bitfield.imm32 = 1;
+	  i.types[0].bitfield.imm32s = 0;
+	  i.types[0].bitfield.imm64 = 0;
+	}
+      else
+	{
+	  i.types[0].bitfield.dword = 1;
+	  i.types[0].bitfield.qword = 0;
+	}
+      i.types[1].bitfield.dword = 1;
+      i.types[1].bitfield.qword = 0;
       if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
 	{
 	  /* Handle
@@ -4283,11 +4293,6 @@ optimize_encoding (void)
 	  i.tm.operand_types[0].bitfield.imm32 = 1;
 	  i.tm.operand_types[0].bitfield.imm32s = 0;
 	  i.tm.operand_types[0].bitfield.imm64 = 0;
-	  i.types[0].bitfield.imm32 = 1;
-	  i.types[0].bitfield.imm32s = 0;
-	  i.types[0].bitfield.imm64 = 0;
-	  i.types[1].bitfield.dword = 1;
-	  i.types[1].bitfield.qword = 0;
 	  if ((i.tm.base_opcode | 1) == 0xc7)
 	    {
 	      /* Handle
@@ -4819,10 +4824,17 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  const char *pass1_mnem = NULL;
+  enum i386_error pass1_err = 0;
   const insn_template *t;
 
+  /* Make a copy of the full line in case we need to retry.  */
+  copy = xstrdup (line);
+
   /* Initialize globals.  */
+  current_templates = NULL;
+ retry:
   memset (&i, '\0', sizeof (i));
   i.rounding.type = rc_none;
   for (j = 0; j < MAX_OPERANDS; j++)
@@ -4837,15 +4849,21 @@ md_assemble (char *line)
 
   line = parse_insn (line, mnemonic);
   if (line == NULL)
-    return;
+    {
+      if (!copy)
+	goto match_error;
+      free (copy);
+      return;
+    }
   mnem_suffix = i.suffix;
 
   line = parse_operands (line, mnemonic);
   this_operand = -1;
-  xfree (i.memop1_string);
-  i.memop1_string = NULL;
   if (line == NULL)
-    return;
+    {
+      free (copy);
+      return;
+    }
 
   /* Now we've parsed the mnemonic into a set of templates, and have the
      operands at hand.  */
@@ -4921,7 +4939,97 @@ md_assemble (char *line)
      with the template operand types.  */
 
   if (!(t = match_template (mnem_suffix)))
-    return;
+    {
+      const char *err_msg;
+
+      if (!mnem_suffix)
+	{
+	  pass1_err = i.error;
+	  pass1_mnem = current_templates->start->name;
+	  line = copy;
+	  copy = NULL;
+	  goto retry;
+	}
+      free (copy);
+  match_error:
+      switch (pass1_mnem ? pass1_err : i.error)
+	{
+	default:
+	  abort ();
+	case operand_size_mismatch:
+	  err_msg = _("operand size mismatch");
+	  break;
+	case operand_type_mismatch:
+	  err_msg = _("operand type mismatch");
+	  break;
+	case register_type_mismatch:
+	  err_msg = _("register type mismatch");
+	  break;
+	case number_of_operands_mismatch:
+	  err_msg = _("number of operands mismatch");
+	  break;
+	case invalid_instruction_suffix:
+	  err_msg = _("invalid instruction suffix");
+	  break;
+	case bad_imm4:
+	  err_msg = _("constant doesn't fit in 4 bits");
+	  break;
+	case unsupported_with_intel_mnemonic:
+	  err_msg = _("unsupported with Intel mnemonic");
+	  break;
+	case unsupported_syntax:
+	  err_msg = _("unsupported syntax");
+	  break;
+	case unsupported:
+	  as_bad (_("unsupported instruction `%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name);
+	  return;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
+	case invalid_vsib_address:
+	  err_msg = _("invalid VSIB address");
+	  break;
+	case invalid_vector_register_set:
+	  err_msg = _("mask, index, and destination registers must be distinct");
+	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("all tmm registers must be distinct");
+	  break;
+	case invalid_dest_and_src_register_set:
+	  err_msg = _("destination and source registers must be distinct");
+	  break;
+	case unsupported_vector_index_register:
+	  err_msg = _("unsupported vector index register");
+	  break;
+	case unsupported_broadcast:
+	  err_msg = _("unsupported broadcast");
+	  break;
+	case broadcast_needed:
+	  err_msg = _("broadcast is needed for operand of such type");
+	  break;
+	case unsupported_masking:
+	  err_msg = _("unsupported masking");
+	  break;
+	case mask_not_on_destination:
+	  err_msg = _("mask not on destination operand");
+	  break;
+	case no_default_mask:
+	  err_msg = _("default mask isn't allowed");
+	  break;
+	case unsupported_rc_sae:
+	  err_msg = _("unsupported static rounding/sae");
+	  break;
+	case invalid_register_operand:
+	  err_msg = _("invalid register operand");
+	  break;
+	}
+      as_bad (_("%s for `%s'"), err_msg,
+	      pass1_mnem ? pass1_mnem : current_templates->start->name);
+      return;
+    }
+
+  free (copy);
 
   if (sse_check != check_none
       /* The opcode space check isn't strictly needed; it's there only to
@@ -5223,6 +5331,7 @@ parse_insn (char *line, char *mnemonic)
   char *l = line;
   char *token_start = l;
   char *mnem_p;
+  bool pass1 = !current_templates;
   int supported;
   const insn_template *t;
   char *dot_p = NULL;
@@ -5392,8 +5501,10 @@ parse_insn (char *line, char *mnemonic)
       current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
     }
 
-  if (!current_templates)
+  if (!current_templates || !pass1)
     {
+      current_templates = NULL;
+
     check_suffix:
       if (mnem_p > mnemonic)
 	{
@@ -5441,7 +5552,8 @@ parse_insn (char *line, char *mnemonic)
 
       if (!current_templates)
 	{
-	  as_bad (_("no such instruction: `%s'"), token_start);
+	  if (pass1)
+	    as_bad (_("no such instruction: `%s'"), token_start);
 	  return NULL;
 	}
     }
@@ -6851,81 +6963,7 @@ match_template (char mnem_suffix)
   if (t == current_templates->end)
     {
       /* We found no match.  */
-      const char *err_msg;
-      switch (specific_error)
-	{
-	default:
-	  abort ();
-	case operand_size_mismatch:
-	  err_msg = _("operand size mismatch");
-	  break;
-	case operand_type_mismatch:
-	  err_msg = _("operand type mismatch");
-	  break;
-	case register_type_mismatch:
-	  err_msg = _("register type mismatch");
-	  break;
-	case number_of_operands_mismatch:
-	  err_msg = _("number of operands mismatch");
-	  break;
-	case invalid_instruction_suffix:
-	  err_msg = _("invalid instruction suffix");
-	  break;
-	case bad_imm4:
-	  err_msg = _("constant doesn't fit in 4 bits");
-	  break;
-	case unsupported_with_intel_mnemonic:
-	  err_msg = _("unsupported with Intel mnemonic");
-	  break;
-	case unsupported_syntax:
-	  err_msg = _("unsupported syntax");
-	  break;
-	case unsupported:
-	  as_bad (_("unsupported instruction `%s'"),
-		  current_templates->start->name);
-	  return NULL;
-	case invalid_sib_address:
-	  err_msg = _("invalid SIB address");
-	  break;
-	case invalid_vsib_address:
-	  err_msg = _("invalid VSIB address");
-	  break;
-	case invalid_vector_register_set:
-	  err_msg = _("mask, index, and destination registers must be distinct");
-	  break;
-	case invalid_tmm_register_set:
-	  err_msg = _("all tmm registers must be distinct");
-	  break;
-	case invalid_dest_and_src_register_set:
-	  err_msg = _("destination and source registers must be distinct");
-	  break;
-	case unsupported_vector_index_register:
-	  err_msg = _("unsupported vector index register");
-	  break;
-	case unsupported_broadcast:
-	  err_msg = _("unsupported broadcast");
-	  break;
-	case broadcast_needed:
-	  err_msg = _("broadcast is needed for operand of such type");
-	  break;
-	case unsupported_masking:
-	  err_msg = _("unsupported masking");
-	  break;
-	case mask_not_on_destination:
-	  err_msg = _("mask not on destination operand");
-	  break;
-	case no_default_mask:
-	  err_msg = _("default mask isn't allowed");
-	  break;
-	case unsupported_rc_sae:
-	  err_msg = _("unsupported static rounding/sae");
-	  break;
-	case invalid_register_operand:
-	  err_msg = _("invalid register operand");
-	  break;
-	}
-      as_bad (_("%s for `%s'"), err_msg,
-	      current_templates->start->name);
+      i.error = specific_error;
       return NULL;
     }
 
@@ -11334,49 +11372,6 @@ RC_SAE_immediate (const char *imm_start)
   return 1;
 }
 
-/* Only string instructions can have a second memory operand, so
-   reduce current_templates to just those if it contains any.  */
-static int
-maybe_adjust_templates (void)
-{
-  const insn_template *t;
-
-  gas_assert (i.mem_operands == 1);
-
-  for (t = current_templates->start; t < current_templates->end; ++t)
-    if (t->opcode_modifier.isstring)
-      break;
-
-  if (t < current_templates->end)
-    {
-      static templates aux_templates;
-      bool recheck;
-
-      aux_templates.start = t;
-      for (; t < current_templates->end; ++t)
-	if (!t->opcode_modifier.isstring)
-	  break;
-      aux_templates.end = t;
-
-      /* Determine whether to re-check the first memory operand.  */
-      recheck = (aux_templates.start != current_templates->start
-		 || t != current_templates->end);
-
-      current_templates = &aux_templates;
-
-      if (recheck)
-	{
-	  i.mem_operands = 0;
-	  if (i.memop1_string != NULL
-	      && i386_index_check (i.memop1_string) == 0)
-	    return 0;
-	  i.mem_operands = 1;
-	}
-    }
-
-  return 1;
-}
-
 static INLINE bool starts_memory_operand (char c)
 {
   return ISDIGIT (c)
@@ -11527,17 +11522,6 @@ i386_att_operand (char *operand_string)
       char *displacement_string_end;
 
     do_memory_reference:
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((i.mem_operands == 1
-	   && !current_templates->start->opcode_modifier.isstring)
-	  || i.mem_operands == 2)
-	{
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
-	}
-
       /* Check for base index form.  We detect the base index form by
 	 looking for an ')' at the end of the operand, searching
 	 for the '(' matching it, and finding a REGISTER_PREFIX or ','
@@ -11737,8 +11721,6 @@ i386_att_operand (char *operand_string)
       if (i386_index_check (operand_string) == 0)
 	return 0;
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       i.mem_operands++;
     }
   else
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
 	   || intel_state.is_mem)
     {
       /* Memory operand.  */
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((int) i.mem_operands
-	  >= 2 - !current_templates->start->opcode_modifier.isstring)
+      if (i.mem_operands)
 	{
 	  /* Handle
 
@@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
 		    }
 		}
 	    }
-
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
 	}
 
       /* Swap base and index in 16-bit memory operands like
@@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
 	return 0;
 
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       ++i.mem_operands;
     }
   else
--- a/gas/testsuite/gas/i386/code16.s
+++ b/gas/testsuite/gas/i386/code16.s
@@ -1,9 +1,9 @@
 	.text
 	.code16
-	rep; movsd
-	rep; cmpsd
-	rep movsd %ds:(%si),%es:(%di)
-	rep cmpsd %es:(%di),%ds:(%si)
+	rep; movsl
+	rep; cmpsl
+	rep movsl %ds:(%si),%es:(%di)
+	rep cmpsl %es:(%di),%ds:(%si)
 
 	mov	%cr2, %ecx
 	mov	%ecx, %cr2
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -73,6 +73,7 @@ if [gas_32_check] then {
     run_dump_test "amd"
     run_dump_test "katmai"
     run_dump_test "jump"
+    run_dump_test "movs32"
     run_dump_test "movz32"
     run_dump_test "relax-1"
     run_dump_test "relax-2"
@@ -806,6 +807,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-segovr"
     run_list_test "x86-64-inval-seg" "-al"
     run_dump_test "x86-64-branch"
+    run_dump_test "movs64"
     run_dump_test "movz64"
     run_dump_test "x86-64-relax-1"
     run_dump_test "svme64"
--- a/gas/testsuite/gas/i386/intel16.d
+++ b/gas/testsuite/gas/i386/intel16.d
@@ -20,4 +20,12 @@ Disassembly of section .text:
   2c:	8d 02 [ 	]*lea    \(%bp,%si\),%ax
   2e:	8d 01 [ 	]*lea    \(%bx,%di\),%ax
   30:	8d 03 [ 	]*lea    \(%bp,%di\),%ax
-	...
+[ 	]*[0-9a-f]+:	67 f7 13[ 	]+notw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 f7 17[ 	]+notl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 0f 1f 03[ 	]+nopw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 0f 1f 07[ 	]+nopl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 83 03 05[ 	]+addw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 83 07 05[ 	]+addl[ 	]+\$0x5,\(%bx\)
+[ 	]*[0-9a-f]+:	67 c7 03 05 00[ 	]+movw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 c7 07 05 00 00 00[ 	]+movl[ 	]+\$0x5,\(%bx\)
+#pass
--- a/gas/testsuite/gas/i386/intel16.s
+++ b/gas/testsuite/gas/i386/intel16.s
@@ -18,4 +18,14 @@
  lea	ax, [di][bx]
  lea	ax, [di][bp]
 
- .p2align 4,0
+ notw	[ebx]
+ notd	[bx]
+
+ nopw	[ebx]
+ nopd	[bx]
+
+ addw	[ebx], 5
+ addd	[bx], 5
+
+ movw	[ebx], 5
+ movd	[bx], 5
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs.s
@@ -0,0 +1,33 @@
+	.text
+movs:
+	movsb	%al,%ax
+	movsb	(%eax),%ax
+	movsb	%al,%eax
+	movsb	(%eax),%eax
+.ifdef x86_64
+	movsb	%al,%rax
+	movsb	(%rax),%rax
+.endif
+
+	movsbw	%al,%ax
+	movsbw	(%eax),%ax
+	movsbl	%al,%eax
+	movsbl	(%eax),%eax
+.ifdef x86_64
+	movsbq	%al,%rax
+	movsbq	(%rax),%rax
+.endif
+
+	movsw	%ax,%eax
+	movsw	(%eax),%eax
+.ifdef x86_64
+	movsw	%ax,%rax
+	movsw	(%rax),%rax
+.endif
+
+	movswl	%ax,%eax
+	movswl	(%eax),%eax
+.ifdef x86_64
+	movswq	%ax,%rax
+	movswq	(%rax),%rax
+.endif
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs32.d
@@ -0,0 +1,22 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (32-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs64.d
@@ -0,0 +1,30 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (64-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+#pass
--- a/gas/testsuite/gas/i386/movx16.l
+++ b/gas/testsuite/gas/i386/movx16.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx32.l
+++ b/gas/testsuite/gas/i386/movx32.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -106,17 +106,17 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBEC8[ 	]+movsb	%al, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %rcx
@@ -192,12 +192,12 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %rcx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBFC8[ 	]+movsw	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -135,47 +135,37 @@
 mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
 mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
 movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
-movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
 mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
 // In the 64bit mode the short form mov immediate is redefined to have
 // 64bit value.
 mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
 mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
 mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
-movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 // The segment register moves accept WordReg so that a segment register
 // can be copied to a 32 bit register, and vice versa, without using a
 // size prefix.  When moving to a 32 bit register, the upper 16 bits
 // are set to an implementation defined value (on the Pentium Pro, the
 // implementation defined value is zero).
-mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
+mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
 mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
-movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
-mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
+mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
 // Move to/from control debug registers.  In the 16 or 32bit modes
 // they are 32bit.  In the 64bit mode they are 64bit.
 mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
 mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
-movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
 mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
 mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
-movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
 mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
-// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
-// conflict with the "movs" string move instruction.
-movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
-movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
-movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
-movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
-movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
+movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
+// "movslq" must not be converted into "movsl" to avoid conflict with the
+// "movsl" string move instruction.
 movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
@@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
 // String manipulation.
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string compare.
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
@@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
 slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string move.
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 3/7] x86: re-work insn/suffix recognition
  2022-08-26 10:31 ` [PATCH v2 3/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-08-26 10:31   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:31 UTC (permalink / raw)
  To: Binutils

x86: re-work insn/suffix recognition

PR gas/29524
Having templates with a suffix explicitly present has always been
quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
a suitable template _and_ didn't itself already need to trim off a
suffix to find a match at all. This requires error reporting adjustments
(albeit luckily fewer than I was afraid might be necessary), as errors
previously reported during matching now need deferring until after the
2nd pass (because, obviously, we must not emit any error if the 2nd pass
succeeds).

PR gas/29525
Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
templates, mixed IsString/non-IsString template groups cannot occur
anymore. With that maybe_adjust_templates() becomes unnecessary (and is
hence being removed).

PR gas/29526
Note further that while the additions to the intel16 testcase aren't
really proper Intel syntax, we've been permitting all of those except
for the MOVD variant. The test therefore is to avoid re-introducing such
an inconsistency.
---
To limit code churn I'm using "goto" for the retry loop, but I'd be
happy to make this a proper loop either right here or in a follow-on
change doing just the necessary re-indentation.

The "too many memory references" errors which are being deleted weren't
fully consistent anyway - even the majority of IsString insns accepts
only a single memory operand. If we want to retain that, it would need
re-introducing in md_assemble(), latching the error into i.error just
like match_template() does.

Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
is not?

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -297,9 +297,6 @@ struct _i386_insn
        explicit segment overrides are given.  */
     const reg_entry *seg[2];
 
-    /* Copied first memory operand string, for re-checking.  */
-    char *memop1_string;
-
     /* PREFIX holds all the given prefix opcodes (usually null).
        PREFIXES is the number of prefix opcodes.  */
     unsigned int prefixes;
@@ -4273,7 +4270,20 @@ optimize_encoding (void)
 	   movq $imm31, %r64   -> movl $imm31, %r32
 	   movq $imm32, %r64   -> movl $imm32, %r32
         */
-      i.tm.opcode_modifier.norex64 = 1;
+      i.tm.opcode_modifier.size = SIZE32;
+      if (i.imm_operands)
+	{
+	  i.types[0].bitfield.imm32 = 1;
+	  i.types[0].bitfield.imm32s = 0;
+	  i.types[0].bitfield.imm64 = 0;
+	}
+      else
+	{
+	  i.types[0].bitfield.dword = 1;
+	  i.types[0].bitfield.qword = 0;
+	}
+      i.types[1].bitfield.dword = 1;
+      i.types[1].bitfield.qword = 0;
       if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
 	{
 	  /* Handle
@@ -4283,11 +4293,6 @@ optimize_encoding (void)
 	  i.tm.operand_types[0].bitfield.imm32 = 1;
 	  i.tm.operand_types[0].bitfield.imm32s = 0;
 	  i.tm.operand_types[0].bitfield.imm64 = 0;
-	  i.types[0].bitfield.imm32 = 1;
-	  i.types[0].bitfield.imm32s = 0;
-	  i.types[0].bitfield.imm64 = 0;
-	  i.types[1].bitfield.dword = 1;
-	  i.types[1].bitfield.qword = 0;
 	  if ((i.tm.base_opcode | 1) == 0xc7)
 	    {
 	      /* Handle
@@ -4819,10 +4824,17 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  const char *pass1_mnem = NULL;
+  enum i386_error pass1_err = 0;
   const insn_template *t;
 
+  /* Make a copy of the full line in case we need to retry.  */
+  copy = xstrdup (line);
+
   /* Initialize globals.  */
+  current_templates = NULL;
+ retry:
   memset (&i, '\0', sizeof (i));
   i.rounding.type = rc_none;
   for (j = 0; j < MAX_OPERANDS; j++)
@@ -4837,15 +4849,21 @@ md_assemble (char *line)
 
   line = parse_insn (line, mnemonic);
   if (line == NULL)
-    return;
+    {
+      if (!copy)
+	goto match_error;
+      free (copy);
+      return;
+    }
   mnem_suffix = i.suffix;
 
   line = parse_operands (line, mnemonic);
   this_operand = -1;
-  xfree (i.memop1_string);
-  i.memop1_string = NULL;
   if (line == NULL)
-    return;
+    {
+      free (copy);
+      return;
+    }
 
   /* Now we've parsed the mnemonic into a set of templates, and have the
      operands at hand.  */
@@ -4921,7 +4939,97 @@ md_assemble (char *line)
      with the template operand types.  */
 
   if (!(t = match_template (mnem_suffix)))
-    return;
+    {
+      const char *err_msg;
+
+      if (!mnem_suffix)
+	{
+	  pass1_err = i.error;
+	  pass1_mnem = current_templates->start->name;
+	  line = copy;
+	  copy = NULL;
+	  goto retry;
+	}
+      free (copy);
+  match_error:
+      switch (pass1_mnem ? pass1_err : i.error)
+	{
+	default:
+	  abort ();
+	case operand_size_mismatch:
+	  err_msg = _("operand size mismatch");
+	  break;
+	case operand_type_mismatch:
+	  err_msg = _("operand type mismatch");
+	  break;
+	case register_type_mismatch:
+	  err_msg = _("register type mismatch");
+	  break;
+	case number_of_operands_mismatch:
+	  err_msg = _("number of operands mismatch");
+	  break;
+	case invalid_instruction_suffix:
+	  err_msg = _("invalid instruction suffix");
+	  break;
+	case bad_imm4:
+	  err_msg = _("constant doesn't fit in 4 bits");
+	  break;
+	case unsupported_with_intel_mnemonic:
+	  err_msg = _("unsupported with Intel mnemonic");
+	  break;
+	case unsupported_syntax:
+	  err_msg = _("unsupported syntax");
+	  break;
+	case unsupported:
+	  as_bad (_("unsupported instruction `%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name);
+	  return;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
+	case invalid_vsib_address:
+	  err_msg = _("invalid VSIB address");
+	  break;
+	case invalid_vector_register_set:
+	  err_msg = _("mask, index, and destination registers must be distinct");
+	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("all tmm registers must be distinct");
+	  break;
+	case invalid_dest_and_src_register_set:
+	  err_msg = _("destination and source registers must be distinct");
+	  break;
+	case unsupported_vector_index_register:
+	  err_msg = _("unsupported vector index register");
+	  break;
+	case unsupported_broadcast:
+	  err_msg = _("unsupported broadcast");
+	  break;
+	case broadcast_needed:
+	  err_msg = _("broadcast is needed for operand of such type");
+	  break;
+	case unsupported_masking:
+	  err_msg = _("unsupported masking");
+	  break;
+	case mask_not_on_destination:
+	  err_msg = _("mask not on destination operand");
+	  break;
+	case no_default_mask:
+	  err_msg = _("default mask isn't allowed");
+	  break;
+	case unsupported_rc_sae:
+	  err_msg = _("unsupported static rounding/sae");
+	  break;
+	case invalid_register_operand:
+	  err_msg = _("invalid register operand");
+	  break;
+	}
+      as_bad (_("%s for `%s'"), err_msg,
+	      pass1_mnem ? pass1_mnem : current_templates->start->name);
+      return;
+    }
+
+  free (copy);
 
   if (sse_check != check_none
       /* The opcode space check isn't strictly needed; it's there only to
@@ -5223,6 +5331,7 @@ parse_insn (char *line, char *mnemonic)
   char *l = line;
   char *token_start = l;
   char *mnem_p;
+  bool pass1 = !current_templates;
   int supported;
   const insn_template *t;
   char *dot_p = NULL;
@@ -5392,8 +5501,10 @@ parse_insn (char *line, char *mnemonic)
       current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
     }
 
-  if (!current_templates)
+  if (!current_templates || !pass1)
     {
+      current_templates = NULL;
+
     check_suffix:
       if (mnem_p > mnemonic)
 	{
@@ -5441,7 +5552,8 @@ parse_insn (char *line, char *mnemonic)
 
       if (!current_templates)
 	{
-	  as_bad (_("no such instruction: `%s'"), token_start);
+	  if (pass1)
+	    as_bad (_("no such instruction: `%s'"), token_start);
 	  return NULL;
 	}
     }
@@ -6851,81 +6963,7 @@ match_template (char mnem_suffix)
   if (t == current_templates->end)
     {
       /* We found no match.  */
-      const char *err_msg;
-      switch (specific_error)
-	{
-	default:
-	  abort ();
-	case operand_size_mismatch:
-	  err_msg = _("operand size mismatch");
-	  break;
-	case operand_type_mismatch:
-	  err_msg = _("operand type mismatch");
-	  break;
-	case register_type_mismatch:
-	  err_msg = _("register type mismatch");
-	  break;
-	case number_of_operands_mismatch:
-	  err_msg = _("number of operands mismatch");
-	  break;
-	case invalid_instruction_suffix:
-	  err_msg = _("invalid instruction suffix");
-	  break;
-	case bad_imm4:
-	  err_msg = _("constant doesn't fit in 4 bits");
-	  break;
-	case unsupported_with_intel_mnemonic:
-	  err_msg = _("unsupported with Intel mnemonic");
-	  break;
-	case unsupported_syntax:
-	  err_msg = _("unsupported syntax");
-	  break;
-	case unsupported:
-	  as_bad (_("unsupported instruction `%s'"),
-		  current_templates->start->name);
-	  return NULL;
-	case invalid_sib_address:
-	  err_msg = _("invalid SIB address");
-	  break;
-	case invalid_vsib_address:
-	  err_msg = _("invalid VSIB address");
-	  break;
-	case invalid_vector_register_set:
-	  err_msg = _("mask, index, and destination registers must be distinct");
-	  break;
-	case invalid_tmm_register_set:
-	  err_msg = _("all tmm registers must be distinct");
-	  break;
-	case invalid_dest_and_src_register_set:
-	  err_msg = _("destination and source registers must be distinct");
-	  break;
-	case unsupported_vector_index_register:
-	  err_msg = _("unsupported vector index register");
-	  break;
-	case unsupported_broadcast:
-	  err_msg = _("unsupported broadcast");
-	  break;
-	case broadcast_needed:
-	  err_msg = _("broadcast is needed for operand of such type");
-	  break;
-	case unsupported_masking:
-	  err_msg = _("unsupported masking");
-	  break;
-	case mask_not_on_destination:
-	  err_msg = _("mask not on destination operand");
-	  break;
-	case no_default_mask:
-	  err_msg = _("default mask isn't allowed");
-	  break;
-	case unsupported_rc_sae:
-	  err_msg = _("unsupported static rounding/sae");
-	  break;
-	case invalid_register_operand:
-	  err_msg = _("invalid register operand");
-	  break;
-	}
-      as_bad (_("%s for `%s'"), err_msg,
-	      current_templates->start->name);
+      i.error = specific_error;
       return NULL;
     }
 
@@ -11334,49 +11372,6 @@ RC_SAE_immediate (const char *imm_start)
   return 1;
 }
 
-/* Only string instructions can have a second memory operand, so
-   reduce current_templates to just those if it contains any.  */
-static int
-maybe_adjust_templates (void)
-{
-  const insn_template *t;
-
-  gas_assert (i.mem_operands == 1);
-
-  for (t = current_templates->start; t < current_templates->end; ++t)
-    if (t->opcode_modifier.isstring)
-      break;
-
-  if (t < current_templates->end)
-    {
-      static templates aux_templates;
-      bool recheck;
-
-      aux_templates.start = t;
-      for (; t < current_templates->end; ++t)
-	if (!t->opcode_modifier.isstring)
-	  break;
-      aux_templates.end = t;
-
-      /* Determine whether to re-check the first memory operand.  */
-      recheck = (aux_templates.start != current_templates->start
-		 || t != current_templates->end);
-
-      current_templates = &aux_templates;
-
-      if (recheck)
-	{
-	  i.mem_operands = 0;
-	  if (i.memop1_string != NULL
-	      && i386_index_check (i.memop1_string) == 0)
-	    return 0;
-	  i.mem_operands = 1;
-	}
-    }
-
-  return 1;
-}
-
 static INLINE bool starts_memory_operand (char c)
 {
   return ISDIGIT (c)
@@ -11527,17 +11522,6 @@ i386_att_operand (char *operand_string)
       char *displacement_string_end;
 
     do_memory_reference:
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((i.mem_operands == 1
-	   && !current_templates->start->opcode_modifier.isstring)
-	  || i.mem_operands == 2)
-	{
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
-	}
-
       /* Check for base index form.  We detect the base index form by
 	 looking for an ')' at the end of the operand, searching
 	 for the '(' matching it, and finding a REGISTER_PREFIX or ','
@@ -11737,8 +11721,6 @@ i386_att_operand (char *operand_string)
       if (i386_index_check (operand_string) == 0)
 	return 0;
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       i.mem_operands++;
     }
   else
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
 	   || intel_state.is_mem)
     {
       /* Memory operand.  */
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((int) i.mem_operands
-	  >= 2 - !current_templates->start->opcode_modifier.isstring)
+      if (i.mem_operands)
 	{
 	  /* Handle
 
@@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
 		    }
 		}
 	    }
-
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
 	}
 
       /* Swap base and index in 16-bit memory operands like
@@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
 	return 0;
 
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       ++i.mem_operands;
     }
   else
--- a/gas/testsuite/gas/i386/code16.s
+++ b/gas/testsuite/gas/i386/code16.s
@@ -1,9 +1,9 @@
 	.text
 	.code16
-	rep; movsd
-	rep; cmpsd
-	rep movsd %ds:(%si),%es:(%di)
-	rep cmpsd %es:(%di),%ds:(%si)
+	rep; movsl
+	rep; cmpsl
+	rep movsl %ds:(%si),%es:(%di)
+	rep cmpsl %es:(%di),%ds:(%si)
 
 	mov	%cr2, %ecx
 	mov	%ecx, %cr2
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -73,6 +73,7 @@ if [gas_32_check] then {
     run_dump_test "amd"
     run_dump_test "katmai"
     run_dump_test "jump"
+    run_dump_test "movs32"
     run_dump_test "movz32"
     run_dump_test "relax-1"
     run_dump_test "relax-2"
@@ -806,6 +807,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-segovr"
     run_list_test "x86-64-inval-seg" "-al"
     run_dump_test "x86-64-branch"
+    run_dump_test "movs64"
     run_dump_test "movz64"
     run_dump_test "x86-64-relax-1"
     run_dump_test "svme64"
--- a/gas/testsuite/gas/i386/intel16.d
+++ b/gas/testsuite/gas/i386/intel16.d
@@ -20,4 +20,12 @@ Disassembly of section .text:
   2c:	8d 02 [ 	]*lea    \(%bp,%si\),%ax
   2e:	8d 01 [ 	]*lea    \(%bx,%di\),%ax
   30:	8d 03 [ 	]*lea    \(%bp,%di\),%ax
-	...
+[ 	]*[0-9a-f]+:	67 f7 13[ 	]+notw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 f7 17[ 	]+notl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 0f 1f 03[ 	]+nopw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 0f 1f 07[ 	]+nopl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 83 03 05[ 	]+addw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 83 07 05[ 	]+addl[ 	]+\$0x5,\(%bx\)
+[ 	]*[0-9a-f]+:	67 c7 03 05 00[ 	]+movw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 c7 07 05 00 00 00[ 	]+movl[ 	]+\$0x5,\(%bx\)
+#pass
--- a/gas/testsuite/gas/i386/intel16.s
+++ b/gas/testsuite/gas/i386/intel16.s
@@ -18,4 +18,14 @@
  lea	ax, [di][bx]
  lea	ax, [di][bp]
 
- .p2align 4,0
+ notw	[ebx]
+ notd	[bx]
+
+ nopw	[ebx]
+ nopd	[bx]
+
+ addw	[ebx], 5
+ addd	[bx], 5
+
+ movw	[ebx], 5
+ movd	[bx], 5
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs.s
@@ -0,0 +1,33 @@
+	.text
+movs:
+	movsb	%al,%ax
+	movsb	(%eax),%ax
+	movsb	%al,%eax
+	movsb	(%eax),%eax
+.ifdef x86_64
+	movsb	%al,%rax
+	movsb	(%rax),%rax
+.endif
+
+	movsbw	%al,%ax
+	movsbw	(%eax),%ax
+	movsbl	%al,%eax
+	movsbl	(%eax),%eax
+.ifdef x86_64
+	movsbq	%al,%rax
+	movsbq	(%rax),%rax
+.endif
+
+	movsw	%ax,%eax
+	movsw	(%eax),%eax
+.ifdef x86_64
+	movsw	%ax,%rax
+	movsw	(%rax),%rax
+.endif
+
+	movswl	%ax,%eax
+	movswl	(%eax),%eax
+.ifdef x86_64
+	movswq	%ax,%rax
+	movswq	(%rax),%rax
+.endif
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs32.d
@@ -0,0 +1,22 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (32-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs64.d
@@ -0,0 +1,30 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (64-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+#pass
--- a/gas/testsuite/gas/i386/movx16.l
+++ b/gas/testsuite/gas/i386/movx16.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx32.l
+++ b/gas/testsuite/gas/i386/movx32.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -106,17 +106,17 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBEC8[ 	]+movsb	%al, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %rcx
@@ -192,12 +192,12 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %rcx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBFC8[ 	]+movsw	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -135,47 +135,37 @@
 mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
 mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
 movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
-movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
 mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
 // In the 64bit mode the short form mov immediate is redefined to have
 // 64bit value.
 mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
 mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
 mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
-movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 // The segment register moves accept WordReg so that a segment register
 // can be copied to a 32 bit register, and vice versa, without using a
 // size prefix.  When moving to a 32 bit register, the upper 16 bits
 // are set to an implementation defined value (on the Pentium Pro, the
 // implementation defined value is zero).
-mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
+mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
 mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
-movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
-mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
+mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
 // Move to/from control debug registers.  In the 16 or 32bit modes
 // they are 32bit.  In the 64bit mode they are 64bit.
 mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
 mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
-movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
 mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
 mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
-movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
 mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
-// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
-// conflict with the "movs" string move instruction.
-movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
-movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
-movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
-movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
-movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
+movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
+// "movslq" must not be converted into "movsl" to avoid conflict with the
+// "movsl" string move instruction.
 movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
@@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
 // String manipulation.
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string compare.
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
@@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
 slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string move.
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
                   ` (3 preceding siblings ...)
  2022-08-26 10:31 ` [PATCH v2 3/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-08-26 10:32 ` Jan Beulich
  2022-08-26 10:32   ` Jan Beulich
  2022-08-26 10:32 ` [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

PR gas/29524
In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
to prefix parsing. Utilize i.error just like match_template() does.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -236,6 +236,8 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    unsupported_on_arch,
+    unsupported_64bit,
     invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
@@ -4852,6 +4854,14 @@ md_assemble (char *line)
     {
       if (!copy)
 	goto match_error;
+      if (i.error != no_error)
+	{
+	  if (!i.suffix)
+	    goto no_match;
+	  /* No point in trying a 2nd pass - it'll only find the same suffix
+	     again.  */
+	  goto match_error;
+	}
       free (copy);
       return;
     }
@@ -4944,14 +4954,23 @@ md_assemble (char *line)
 
       if (!mnem_suffix)
 	{
+  no_match:
 	  pass1_err = i.error;
 	  pass1_mnem = current_templates->start->name;
 	  line = copy;
 	  copy = NULL;
 	  goto retry;
 	}
-      free (copy);
+
+      /* If a non-/only-64bit template (group) was found in pass 1, and if
+	 _some_ template (group) was found in pass 2, squash pass 1's
+	 error.  */
+      if (pass1_err == unsupported_64bit)
+	pass1_mnem = NULL;
+
   match_error:
+      free (copy);
+
       switch (pass1_mnem ? pass1_err : i.error)
 	{
 	default:
@@ -4984,6 +5003,17 @@ md_assemble (char *line)
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : current_templates->start->name);
 	  return;
+	case unsupported_on_arch:
+	  as_bad (_("`%s' is not supported on `%s%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  cpu_arch_name ? cpu_arch_name : default_arch,
+		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
+	  return;
+	case unsupported_64bit:
+	  as_bad (_("`%s' is %s supported in 64-bit mode"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
 	  break;
@@ -5600,16 +5630,13 @@ parse_insn (char *line, char *mnemonic)
 	return l;
     }
 
-  if (!(supported & CPU_FLAGS_64BIT_MATCH))
-    as_bad (flag_code == CODE_64BIT
-	    ? _("`%s' is not supported in 64-bit mode")
-	    : _("`%s' is only supported in 64-bit mode"),
-	    current_templates->start->name);
-  else
-    as_bad (_("`%s' is not supported on `%s%s'"),
-	    current_templates->start->name,
-	    cpu_arch_name ? cpu_arch_name : default_arch,
-	    cpu_sub_arch_name ? cpu_sub_arch_name : "");
+  if (pass1)
+    {
+      if (supported & CPU_FLAGS_64BIT_MATCH)
+        i.error = unsupported_on_arch;
+      else
+        i.error = unsupported_64bit;
+    }
 
   return NULL;
 }
--- a/gas/testsuite/gas/i386/movs.s
+++ b/gas/testsuite/gas/i386/movs.s
@@ -30,4 +30,10 @@ movs:
 .ifdef x86_64
 	movswq	%ax,%rax
 	movswq	(%rax),%rax
+
+	movsl	%eax,%rax
+	movsl	(%rax),%rax
+
+	movslq	%eax,%rax
+	movslq	(%rax),%rax
 .endif
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -241,6 +241,46 @@
 [ 	]*[1-9][0-9]*[ 	]+movswq	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movswq	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movsl	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movslq	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movzx:
 [ 	]*[1-9][0-9]*[ 	]+movzx	%al, %cl
 [ 	]*[1-9][0-9]*[ 	]+movzx	%ax, %cl
--- a/gas/testsuite/gas/i386/movx64.s
+++ b/gas/testsuite/gas/i386/movx64.s
@@ -241,6 +241,46 @@ movsx:
 	movswq	%eax, %rcx
 	movswq	%rax, %rcx
 
+	movsl	%al, %cl
+	movsl	%ax, %cl
+	movsl	%eax, %cl
+	movsl	%rax, %cl
+
+	movsl	%al, %cx
+	movsl	%ax, %cx
+	movsl	%eax, %cx
+	movsl	%rax, %cx
+
+	movsl	%al, %ecx
+	movsl	%ax, %ecx
+	movsl	%eax, %ecx
+	movsl	%rax, %ecx
+
+	movsl	%al, %rcx
+	movsl	%ax, %rcx
+	movsl	%eax, %rcx
+	movsl	%rax, %rcx
+
+	movslq	%al, %cl
+	movslq	%ax, %cl
+	movslq	%eax, %cl
+	movslq	%rax, %cl
+
+	movslq	%al, %cx
+	movslq	%ax, %cx
+	movslq	%eax, %cx
+	movslq	%rax, %cx
+
+	movslq	%al, %ecx
+	movslq	%ax, %ecx
+	movslq	%eax, %ecx
+	movslq	%rax, %ecx
+
+	movslq	%al, %rcx
+	movslq	%ax, %rcx
+	movslq	%eax, %rcx
+	movslq	%rax, %rcx
+
 movzx:
 	movzx	%al, %cl
 	movzx	%ax, %cl
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -164,9 +164,7 @@ movbe, 0x0f38f0, None, CpuMovbe, D|Modrm
 // Move with sign extend.
 movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
-// "movslq" must not be converted into "movsl" to avoid conflict with the
-// "movsl" string move instruction.
-movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
+movsl, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 movsxd, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-08-26 10:32 ` [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
@ 2022-08-26 10:32   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Binutils

PR gas/29524
In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
to prefix parsing. Utilize i.error just like match_template() does.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -236,6 +236,8 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    unsupported_on_arch,
+    unsupported_64bit,
     invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
@@ -4852,6 +4854,14 @@ md_assemble (char *line)
     {
       if (!copy)
 	goto match_error;
+      if (i.error != no_error)
+	{
+	  if (!i.suffix)
+	    goto no_match;
+	  /* No point in trying a 2nd pass - it'll only find the same suffix
+	     again.  */
+	  goto match_error;
+	}
       free (copy);
       return;
     }
@@ -4944,14 +4954,23 @@ md_assemble (char *line)
 
       if (!mnem_suffix)
 	{
+  no_match:
 	  pass1_err = i.error;
 	  pass1_mnem = current_templates->start->name;
 	  line = copy;
 	  copy = NULL;
 	  goto retry;
 	}
-      free (copy);
+
+      /* If a non-/only-64bit template (group) was found in pass 1, and if
+	 _some_ template (group) was found in pass 2, squash pass 1's
+	 error.  */
+      if (pass1_err == unsupported_64bit)
+	pass1_mnem = NULL;
+
   match_error:
+      free (copy);
+
       switch (pass1_mnem ? pass1_err : i.error)
 	{
 	default:
@@ -4984,6 +5003,17 @@ md_assemble (char *line)
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : current_templates->start->name);
 	  return;
+	case unsupported_on_arch:
+	  as_bad (_("`%s' is not supported on `%s%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  cpu_arch_name ? cpu_arch_name : default_arch,
+		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
+	  return;
+	case unsupported_64bit:
+	  as_bad (_("`%s' is %s supported in 64-bit mode"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
 	  break;
@@ -5600,16 +5630,13 @@ parse_insn (char *line, char *mnemonic)
 	return l;
     }
 
-  if (!(supported & CPU_FLAGS_64BIT_MATCH))
-    as_bad (flag_code == CODE_64BIT
-	    ? _("`%s' is not supported in 64-bit mode")
-	    : _("`%s' is only supported in 64-bit mode"),
-	    current_templates->start->name);
-  else
-    as_bad (_("`%s' is not supported on `%s%s'"),
-	    current_templates->start->name,
-	    cpu_arch_name ? cpu_arch_name : default_arch,
-	    cpu_sub_arch_name ? cpu_sub_arch_name : "");
+  if (pass1)
+    {
+      if (supported & CPU_FLAGS_64BIT_MATCH)
+        i.error = unsupported_on_arch;
+      else
+        i.error = unsupported_64bit;
+    }
 
   return NULL;
 }
--- a/gas/testsuite/gas/i386/movs.s
+++ b/gas/testsuite/gas/i386/movs.s
@@ -30,4 +30,10 @@ movs:
 .ifdef x86_64
 	movswq	%ax,%rax
 	movswq	(%rax),%rax
+
+	movsl	%eax,%rax
+	movsl	(%rax),%rax
+
+	movslq	%eax,%rax
+	movslq	(%rax),%rax
 .endif
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -241,6 +241,46 @@
 [ 	]*[1-9][0-9]*[ 	]+movswq	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movswq	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movsl	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movslq	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movzx:
 [ 	]*[1-9][0-9]*[ 	]+movzx	%al, %cl
 [ 	]*[1-9][0-9]*[ 	]+movzx	%ax, %cl
--- a/gas/testsuite/gas/i386/movx64.s
+++ b/gas/testsuite/gas/i386/movx64.s
@@ -241,6 +241,46 @@ movsx:
 	movswq	%eax, %rcx
 	movswq	%rax, %rcx
 
+	movsl	%al, %cl
+	movsl	%ax, %cl
+	movsl	%eax, %cl
+	movsl	%rax, %cl
+
+	movsl	%al, %cx
+	movsl	%ax, %cx
+	movsl	%eax, %cx
+	movsl	%rax, %cx
+
+	movsl	%al, %ecx
+	movsl	%ax, %ecx
+	movsl	%eax, %ecx
+	movsl	%rax, %ecx
+
+	movsl	%al, %rcx
+	movsl	%ax, %rcx
+	movsl	%eax, %rcx
+	movsl	%rax, %rcx
+
+	movslq	%al, %cl
+	movslq	%ax, %cl
+	movslq	%eax, %cl
+	movslq	%rax, %cl
+
+	movslq	%al, %cx
+	movslq	%ax, %cx
+	movslq	%eax, %cx
+	movslq	%rax, %cx
+
+	movslq	%al, %ecx
+	movslq	%ax, %ecx
+	movslq	%eax, %ecx
+	movslq	%rax, %ecx
+
+	movslq	%al, %rcx
+	movslq	%ax, %rcx
+	movslq	%eax, %rcx
+	movslq	%rax, %rcx
+
 movzx:
 	movzx	%al, %cl
 	movzx	%ax, %cl
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -164,9 +164,7 @@ movbe, 0x0f38f0, None, CpuMovbe, D|Modrm
 // Move with sign extend.
 movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
-// "movslq" must not be converted into "movsl" to avoid conflict with the
-// "movsl" string move instruction.
-movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
+movsl, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 movsxd, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
                   ` (4 preceding siblings ...)
  2022-08-26 10:32 ` [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
@ 2022-08-26 10:32 ` Jan Beulich
  2022-08-26 10:32   ` Jan Beulich
  2022-08-26 10:33 ` [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
  2022-08-26 10:33 ` [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
  7 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Have its use, except where actually legitimate, result in the same "only
supported in 64-bit mode" diagnostic as emitted for other 64-bit only
insns. Also suppress deriving of the suffix in Intel mode except in the
legitimate cases. This in exchange allows dropping the respective code
from match_template().

Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
had no testcase whatsoever for these. Therefore such tests are being
added. Note that the removed line in the x86-64-lfence-load testcase
was redundant with the exact same one a few lines up.
---
With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
disassembler wouldn't better emit a Q suffix instead of the LL one.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4826,7 +4826,7 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy;
   const char *pass1_mnem = NULL;
   enum i386_error pass1_err = 0;
   const insn_template *t;
@@ -4860,6 +4860,7 @@ md_assemble (char *line)
 	    goto no_match;
 	  /* No point in trying a 2nd pass - it'll only find the same suffix
 	     again.  */
+	  mnem_suffix = i.suffix;
 	  goto match_error;
 	}
       free (copy);
@@ -5010,9 +5011,15 @@ md_assemble (char *line)
 		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
 	  return;
 	case unsupported_64bit:
-	  as_bad (_("`%s' is %s supported in 64-bit mode"),
-		  pass1_mnem ? pass1_mnem : current_templates->start->name,
-		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  if (ISLOWER (mnem_suffix))
+	    as_bad (_("`%s%c' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    mnem_suffix,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
+	  else
+	    as_bad (_("`%s' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
 	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
@@ -5355,6 +5362,23 @@ md_assemble (char *line)
     last_insn.kind = last_insn_other;
 }
 
+/* The Q suffix is generally valid only in 64-bit mode, with very few
+   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
+   and fisttp only one of their two templates is matched below: That's
+   sufficient since other relevant attributes are the same between both
+   respective templates.  */
+static INLINE bool q_suffix_allowed(const insn_template *t)
+{
+  return flag_code == CODE_64BIT
+	 || (t->opcode_modifier.opcodespace == SPACE_BASE
+	     && t->base_opcode == 0xdf
+	     && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
+	 || (t->opcode_modifier.opcodespace == SPACE_0F
+	     && t->base_opcode == 0xc7
+	     && t->opcode_modifier.opcodeprefix == PREFIX_NONE
+	     && t->extension_opcode == 1) /* cmpxchg8b */;
+}
+
 static char *
 parse_insn (char *line, char *mnemonic)
 {
@@ -5626,6 +5650,10 @@ parse_insn (char *line, char *mnemonic)
   for (t = current_templates->start; t < current_templates->end; ++t)
     {
       supported |= cpu_flags_match (t);
+
+      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
+	supported &= ~CPU_FLAGS_64BIT_MATCH;
+
       if (supported == CPU_FLAGS_PERFECT_MATCH)
 	return l;
     }
@@ -6661,20 +6689,12 @@ match_template (char mnem_suffix)
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
 
-      /* In general, don't allow
-	 - 64-bit operands outside of 64-bit mode,
-	 - 32-bit operands on pre-386.  */
+      /* In general, don't allow 32-bit operands on pre-386.  */
       specific_error = progress (mnem_suffix ? invalid_instruction_suffix
 					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
-      if (((i.suffix == QWORD_MNEM_SUFFIX
-	    && flag_code != CODE_64BIT
-	    && !(t->opcode_modifier.opcodespace == SPACE_0F
-		 && t->base_opcode == 0xc7
-		 && t->opcode_modifier.opcodeprefix == PREFIX_NONE
-		 && t->extension_opcode == 1) /* cmpxchg8b */)
-	   || (i.suffix == LONG_MNEM_SUFFIX
-	       && !cpu_arch_flags.bitfield.cpui386))
+      if (i.suffix == LONG_MNEM_SUFFIX
+	  && !cpu_arch_flags.bitfield.cpui386
 	  && (intel_syntax
 	      ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
 		 && !intel_float_operand (t->name))
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
 		    continue;
 		  break;
 		case QWORD_MNEM_SUFFIX:
-		  if (t->opcode_modifier.no_qsuf)
+		  if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
 		    continue;
 		  break;
 		case SHORT_MNEM_SUFFIX:
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -592,6 +592,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	add    \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xor    \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adc    \$0x1,%bl
--- a/gas/testsuite/gas/i386/opcode.s
+++ b/gas/testsuite/gas/i386/opcode.s
@@ -592,6 +592,11 @@ foo:
  cmovpe  0x90909090(%eax),%dx
  cmovpo 0x90909090(%eax),%dx
 
+ fildq  (%eax)
+ fildll (%eax)
+ fistpq (%eax)
+ fistpll (%eax)
+
 	.byte 0x82, 0xc3, 0x01
 	.byte 0x82, 0xf3, 0x01
 	.byte 0x82, 0xd3, 0x01
--- a/gas/testsuite/gas/i386/opcode-intel.d
+++ b/gas/testsuite/gas/i386/opcode-intel.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
  +[a-f0-9]+:	82 c3 01             	add    bl,0x1
  +[a-f0-9]+:	82 f3 01             	xor    bl,0x1
  +[a-f0-9]+:	82 d3 01             	adc    bl,0x1
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnpl -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovpw -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnpw -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	addb   \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xorb   \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adcb   \$0x1,%bl
--- a/gas/testsuite/gas/i386/sse3.d
+++ b/gas/testsuite/gas/i386/sse3.d
@@ -13,29 +13,30 @@ Disassembly of section .text:
   10:	df 88 90 90 90 90 [ 	]*fisttps -0x6f6f6f70\(%eax\)
   16:	db 88 90 90 90 90 [ 	]*fisttpl -0x6f6f6f70\(%eax\)
   1c:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
-  22:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
-  27:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
-  2b:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
-  2f:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
-  33:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
-  37:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
-  3b:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
-  3f:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
-  44:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
-  48:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4b:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4e:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
-  52:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  56:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
-  5a:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
-  5e:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
-  62:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
-  66:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  69:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  6c:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  70:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  74:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  78:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
+[ 	]*[0-9a-f]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
+[ 	]*[0-9a-f]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
+[ 	]*[0-9a-f]+:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
+[ 	]*[0-9a-f]+:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
+[ 	]*[0-9a-f]+:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
+[ 	]*[0-9a-f]+:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
+[ 	]*[0-9a-f]+:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
+[ 	]*[0-9a-f]+:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
+[ 	]*[0-9a-f]+:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
+[ 	]*[0-9a-f]+:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
+[ 	]*[0-9a-f]+:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
 [ 	]*[0-9a-f]+:	0f 01 c8[ 	]+monitor %eax,%ecx,%edx
 [ 	]*[0-9a-f]+:	67 0f 01 c8[ 	]+monitor %ax,%ecx,%edx
 [ 	]*[0-9a-f]+:	0f 01 c9[ 	]+mwait  %eax,%ecx
--- a/gas/testsuite/gas/i386/sse3.s
+++ b/gas/testsuite/gas/i386/sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x90909090(%eax)
 	fisttpl		0x90909090(%eax)
+	fisttpq		0x90909090(%eax)
 	fisttpll	0x90909090(%eax)
 	haddpd		0x0(%ebp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/sse3-intel.d
+++ b/gas/testsuite/gas/i386/sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[0-9a-f]+:	df 88 90 90 90 90[ 	]+fisttp WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	db 88 90 90 90 90[ 	]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
 [ 	]*[0-9a-f]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[0-9a-f]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[edi\]
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -44,16 +44,21 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	db 55 00             	fistl  0x0\(%rbp\)
  +[a-f0-9]+:	df 55 00             	fists  0x0\(%rbp\)
+ +[a-f0-9]+:	db 5d 00             	fistpl 0x0\(%rbp\)
+ +[a-f0-9]+:	df 5d 00             	fistps 0x0\(%rbp\)
+ +[a-f0-9]+:	df 7d 00             	fistpll 0x0\(%rbp\)
  +[a-f0-9]+:	db 45 00             	fildl  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	df 6d 00             	fildll 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%rbp\)
  +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
- +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	db 4d 00             	fisttpl 0x0\(%rbp\)
  +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%rbp\)
+ +[a-f0-9]+:	dd 4d 00             	fisttpll 0x0\(%rbp\)
  +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%rbp\)
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -27,12 +27,17 @@ _start:
 	flds (%rbp)
 	fistl (%rbp)
 	fists (%rbp)
+	fistpl (%rbp)
+	fistps (%rbp)
+	fistpq (%rbp)
 	fildl (%rbp)
 	filds (%rbp)
+	fildq (%rbp)
 	fsave (%rbp)
 	frstor (%rbp)
-	filds (%rbp)
+	fisttpl (%rbp)
 	fisttps (%rbp)
+	fisttpq (%rbp)
 	fldenv (%rbp)
 	fstenv (%rbp)
 	fadds  (%rbp)
--- a/gas/testsuite/gas/i386/x86-64-sse3.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3.d
@@ -13,6 +13,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00 [ 	]*fisttps 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00 [ 	]*fisttpl 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%rbp\),%xmm4
 [ 	]*[a-f0-9]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
 [ 	]*[a-f0-9]+:	f2 0f 7c 37 [ 	]*haddps \(%rdi\),%xmm6
--- a/gas/testsuite/gas/i386/x86-64-sse3.s
+++ b/gas/testsuite/gas/i386/x86-64-sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x909090(%rax)
 	fisttpl		0x909090(%rax)
+	fisttpq		0x909090(%rax)
 	fisttpll	0x909090(%rax)
 	haddpd		0x0(%rbp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00[ 	]+fisttp WORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00[ 	]+fisttp DWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
 [ 	]*[a-f0-9]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[a-f0-9]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[rdi\]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case
  2022-08-26 10:32 ` [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
@ 2022-08-26 10:32   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Binutils

Have its use, except where actually legitimate, result in the same "only
supported in 64-bit mode" diagnostic as emitted for other 64-bit only
insns. Also suppress deriving of the suffix in Intel mode except in the
legitimate cases. This in exchange allows dropping the respective code
from match_template().

Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
had no testcase whatsoever for these. Therefore such tests are being
added. Note that the removed line in the x86-64-lfence-load testcase
was redundant with the exact same one a few lines up.
---
With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
disassembler wouldn't better emit a Q suffix instead of the LL one.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4826,7 +4826,7 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy;
   const char *pass1_mnem = NULL;
   enum i386_error pass1_err = 0;
   const insn_template *t;
@@ -4860,6 +4860,7 @@ md_assemble (char *line)
 	    goto no_match;
 	  /* No point in trying a 2nd pass - it'll only find the same suffix
 	     again.  */
+	  mnem_suffix = i.suffix;
 	  goto match_error;
 	}
       free (copy);
@@ -5010,9 +5011,15 @@ md_assemble (char *line)
 		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
 	  return;
 	case unsupported_64bit:
-	  as_bad (_("`%s' is %s supported in 64-bit mode"),
-		  pass1_mnem ? pass1_mnem : current_templates->start->name,
-		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  if (ISLOWER (mnem_suffix))
+	    as_bad (_("`%s%c' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    mnem_suffix,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
+	  else
+	    as_bad (_("`%s' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
 	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
@@ -5355,6 +5362,23 @@ md_assemble (char *line)
     last_insn.kind = last_insn_other;
 }
 
+/* The Q suffix is generally valid only in 64-bit mode, with very few
+   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
+   and fisttp only one of their two templates is matched below: That's
+   sufficient since other relevant attributes are the same between both
+   respective templates.  */
+static INLINE bool q_suffix_allowed(const insn_template *t)
+{
+  return flag_code == CODE_64BIT
+	 || (t->opcode_modifier.opcodespace == SPACE_BASE
+	     && t->base_opcode == 0xdf
+	     && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
+	 || (t->opcode_modifier.opcodespace == SPACE_0F
+	     && t->base_opcode == 0xc7
+	     && t->opcode_modifier.opcodeprefix == PREFIX_NONE
+	     && t->extension_opcode == 1) /* cmpxchg8b */;
+}
+
 static char *
 parse_insn (char *line, char *mnemonic)
 {
@@ -5626,6 +5650,10 @@ parse_insn (char *line, char *mnemonic)
   for (t = current_templates->start; t < current_templates->end; ++t)
     {
       supported |= cpu_flags_match (t);
+
+      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
+	supported &= ~CPU_FLAGS_64BIT_MATCH;
+
       if (supported == CPU_FLAGS_PERFECT_MATCH)
 	return l;
     }
@@ -6661,20 +6689,12 @@ match_template (char mnem_suffix)
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
 
-      /* In general, don't allow
-	 - 64-bit operands outside of 64-bit mode,
-	 - 32-bit operands on pre-386.  */
+      /* In general, don't allow 32-bit operands on pre-386.  */
       specific_error = progress (mnem_suffix ? invalid_instruction_suffix
 					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
-      if (((i.suffix == QWORD_MNEM_SUFFIX
-	    && flag_code != CODE_64BIT
-	    && !(t->opcode_modifier.opcodespace == SPACE_0F
-		 && t->base_opcode == 0xc7
-		 && t->opcode_modifier.opcodeprefix == PREFIX_NONE
-		 && t->extension_opcode == 1) /* cmpxchg8b */)
-	   || (i.suffix == LONG_MNEM_SUFFIX
-	       && !cpu_arch_flags.bitfield.cpui386))
+      if (i.suffix == LONG_MNEM_SUFFIX
+	  && !cpu_arch_flags.bitfield.cpui386
 	  && (intel_syntax
 	      ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
 		 && !intel_float_operand (t->name))
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
 		    continue;
 		  break;
 		case QWORD_MNEM_SUFFIX:
-		  if (t->opcode_modifier.no_qsuf)
+		  if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
 		    continue;
 		  break;
 		case SHORT_MNEM_SUFFIX:
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -592,6 +592,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	add    \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xor    \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adc    \$0x1,%bl
--- a/gas/testsuite/gas/i386/opcode.s
+++ b/gas/testsuite/gas/i386/opcode.s
@@ -592,6 +592,11 @@ foo:
  cmovpe  0x90909090(%eax),%dx
  cmovpo 0x90909090(%eax),%dx
 
+ fildq  (%eax)
+ fildll (%eax)
+ fistpq (%eax)
+ fistpll (%eax)
+
 	.byte 0x82, 0xc3, 0x01
 	.byte 0x82, 0xf3, 0x01
 	.byte 0x82, 0xd3, 0x01
--- a/gas/testsuite/gas/i386/opcode-intel.d
+++ b/gas/testsuite/gas/i386/opcode-intel.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
  +[a-f0-9]+:	82 c3 01             	add    bl,0x1
  +[a-f0-9]+:	82 f3 01             	xor    bl,0x1
  +[a-f0-9]+:	82 d3 01             	adc    bl,0x1
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnpl -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovpw -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnpw -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	addb   \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xorb   \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adcb   \$0x1,%bl
--- a/gas/testsuite/gas/i386/sse3.d
+++ b/gas/testsuite/gas/i386/sse3.d
@@ -13,29 +13,30 @@ Disassembly of section .text:
   10:	df 88 90 90 90 90 [ 	]*fisttps -0x6f6f6f70\(%eax\)
   16:	db 88 90 90 90 90 [ 	]*fisttpl -0x6f6f6f70\(%eax\)
   1c:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
-  22:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
-  27:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
-  2b:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
-  2f:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
-  33:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
-  37:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
-  3b:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
-  3f:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
-  44:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
-  48:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4b:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4e:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
-  52:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  56:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
-  5a:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
-  5e:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
-  62:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
-  66:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  69:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  6c:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  70:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  74:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  78:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
+[ 	]*[0-9a-f]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
+[ 	]*[0-9a-f]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
+[ 	]*[0-9a-f]+:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
+[ 	]*[0-9a-f]+:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
+[ 	]*[0-9a-f]+:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
+[ 	]*[0-9a-f]+:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
+[ 	]*[0-9a-f]+:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
+[ 	]*[0-9a-f]+:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
+[ 	]*[0-9a-f]+:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
+[ 	]*[0-9a-f]+:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
+[ 	]*[0-9a-f]+:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
 [ 	]*[0-9a-f]+:	0f 01 c8[ 	]+monitor %eax,%ecx,%edx
 [ 	]*[0-9a-f]+:	67 0f 01 c8[ 	]+monitor %ax,%ecx,%edx
 [ 	]*[0-9a-f]+:	0f 01 c9[ 	]+mwait  %eax,%ecx
--- a/gas/testsuite/gas/i386/sse3.s
+++ b/gas/testsuite/gas/i386/sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x90909090(%eax)
 	fisttpl		0x90909090(%eax)
+	fisttpq		0x90909090(%eax)
 	fisttpll	0x90909090(%eax)
 	haddpd		0x0(%ebp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/sse3-intel.d
+++ b/gas/testsuite/gas/i386/sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[0-9a-f]+:	df 88 90 90 90 90[ 	]+fisttp WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	db 88 90 90 90 90[ 	]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
 [ 	]*[0-9a-f]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[0-9a-f]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[edi\]
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -44,16 +44,21 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	db 55 00             	fistl  0x0\(%rbp\)
  +[a-f0-9]+:	df 55 00             	fists  0x0\(%rbp\)
+ +[a-f0-9]+:	db 5d 00             	fistpl 0x0\(%rbp\)
+ +[a-f0-9]+:	df 5d 00             	fistps 0x0\(%rbp\)
+ +[a-f0-9]+:	df 7d 00             	fistpll 0x0\(%rbp\)
  +[a-f0-9]+:	db 45 00             	fildl  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	df 6d 00             	fildll 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%rbp\)
  +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
- +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	db 4d 00             	fisttpl 0x0\(%rbp\)
  +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%rbp\)
+ +[a-f0-9]+:	dd 4d 00             	fisttpll 0x0\(%rbp\)
  +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%rbp\)
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -27,12 +27,17 @@ _start:
 	flds (%rbp)
 	fistl (%rbp)
 	fists (%rbp)
+	fistpl (%rbp)
+	fistps (%rbp)
+	fistpq (%rbp)
 	fildl (%rbp)
 	filds (%rbp)
+	fildq (%rbp)
 	fsave (%rbp)
 	frstor (%rbp)
-	filds (%rbp)
+	fisttpl (%rbp)
 	fisttps (%rbp)
+	fisttpq (%rbp)
 	fldenv (%rbp)
 	fstenv (%rbp)
 	fadds  (%rbp)
--- a/gas/testsuite/gas/i386/x86-64-sse3.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3.d
@@ -13,6 +13,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00 [ 	]*fisttps 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00 [ 	]*fisttpl 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%rbp\),%xmm4
 [ 	]*[a-f0-9]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
 [ 	]*[a-f0-9]+:	f2 0f 7c 37 [ 	]*haddps \(%rdi\),%xmm6
--- a/gas/testsuite/gas/i386/x86-64-sse3.s
+++ b/gas/testsuite/gas/i386/x86-64-sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x909090(%rax)
 	fisttpl		0x909090(%rax)
+	fisttpq		0x909090(%rax)
 	fisttpll	0x909090(%rax)
 	haddpd		0x0(%rbp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00[ 	]+fisttp WORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00[ 	]+fisttp DWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
 [ 	]*[a-f0-9]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[a-f0-9]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[rdi\]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
                   ` (5 preceding siblings ...)
  2022-08-26 10:32 ` [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
@ 2022-08-26 10:33 ` Jan Beulich
  2022-08-26 10:33   ` Jan Beulich
  2022-08-26 10:33 ` [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
  7 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:33 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

In commit 1212781b35c9 ("ix86: allow HLE store of accumulator to
absolute address") I was wrong to exclude 64-bit code. Dropping the
check also leads to better diagnostics in 64-bit code ("MOV", after
all, isn't invalid with "XRELEASE").

While there also limit the amount of further checks done: The operand
type checks that were there were effectively redundant with other ones
anyway, plus it's quite fine to also have "mov <disp>, %eax" look for
the next MOV template (in fact again also improving diagnostics).
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -6817,12 +6817,9 @@ match_template (char mnem_suffix)
 	    continue;
 	  /* xrelease mov %eax, <disp> is another special case. It must not
 	     match the accumulator-only encoding of mov.  */
-	  if (flag_code != CODE_64BIT
-	      && i.hle_prefix
+	  if (i.hle_prefix
 	      && t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE
-	      && i.types[0].bitfield.instance == Accum
-	      && (i.flags[1] & Operand_Mem))
+	      && t->opcode_modifier.opcodespace == SPACE_BASE)
 	    continue;
 	  /* Fall through.  */
 
--- a/gas/testsuite/gas/i386/x86-64-hle-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-hle-intel.d
@@ -425,6 +425,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov BYTE PTR \[rcx\],al
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov BYTE PTR (ds:)?0x12345678,al
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov BYTE PTR \[eiz\*1\+0x87654321\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or BYTE PTR \[rcx\],al
@@ -476,6 +478,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov WORD PTR \[rcx\],ax
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov WORD PTR (ds:)?0x12345678,ax
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov WORD PTR \[eiz\*1\+0x87654321\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or WORD PTR \[rcx\],ax
@@ -527,6 +531,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov DWORD PTR \[rcx\],eax
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov DWORD PTR (ds:)?0x12345678,eax
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov DWORD PTR \[eiz\*1\+0x87654321\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or DWORD PTR \[rcx\],eax
@@ -578,6 +584,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov QWORD PTR \[rcx\],rax
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov QWORD PTR (ds:)?0x12345678,rax
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov QWORD PTR \[eiz\*1\+0x87654321\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or QWORD PTR \[rcx\],rax
--- a/gas/testsuite/gas/i386/x86-64-hle.d
+++ b/gas/testsuite/gas/i386/x86-64-hle.d
@@ -424,6 +424,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov %al,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov %al,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov %al,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or %al,\(%rcx\)
@@ -475,6 +477,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov %ax,\(%rcx\)
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov %ax,0x12345678
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov %ax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or %ax,\(%rcx\)
@@ -526,6 +530,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov %eax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov %eax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov %eax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or %eax,\(%rcx\)
@@ -577,6 +583,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov %rax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov %rax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov %rax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or %rax,\(%rcx\)
--- a/gas/testsuite/gas/i386/x86-64-hle.s
+++ b/gas/testsuite/gas/i386/x86-64-hle.s
@@ -442,6 +442,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andb %al,(%rcx)
 	.byte 0xf0; .byte 0xf3; andb %al,(%rcx)
 	xrelease movb %al,(%rcx)
+	xrelease movb %al,0x12345678
+	xrelease addr32 movb %al,0x87654321
 	xacquire lock orb %al,(%rcx)
 	lock xacquire orb %al,(%rcx)
 	xrelease lock orb %al,(%rcx)
@@ -496,6 +498,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andw %ax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andw %ax,(%rcx)
 	xrelease movw %ax,(%rcx)
+	xrelease movw %ax,0x12345678
+	xrelease addr32 movw %ax,0x87654321
 	xacquire lock orw %ax,(%rcx)
 	lock xacquire orw %ax,(%rcx)
 	xrelease lock orw %ax,(%rcx)
@@ -550,6 +554,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andl %eax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andl %eax,(%rcx)
 	xrelease movl %eax,(%rcx)
+	xrelease movl %eax,0x12345678
+	xrelease addr32 movl %eax,0x87654321
 	xacquire lock orl %eax,(%rcx)
 	lock xacquire orl %eax,(%rcx)
 	xrelease lock orl %eax,(%rcx)
@@ -604,6 +610,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andq %rax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andq %rax,(%rcx)
 	xrelease movq %rax,(%rcx)
+	xrelease movq %rax,0x12345678
+	xrelease addr32 movq %rax,0x87654321
 	xacquire lock orq %rax,(%rcx)
 	lock xacquire orq %rax,(%rcx)
 	xrelease lock orq %rax,(%rcx)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address
  2022-08-26 10:33 ` [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
@ 2022-08-26 10:33   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:33 UTC (permalink / raw)
  To: Binutils

In commit 1212781b35c9 ("ix86: allow HLE store of accumulator to
absolute address") I was wrong to exclude 64-bit code. Dropping the
check also leads to better diagnostics in 64-bit code ("MOV", after
all, isn't invalid with "XRELEASE").

While there also limit the amount of further checks done: The operand
type checks that were there were effectively redundant with other ones
anyway, plus it's quite fine to also have "mov <disp>, %eax" look for
the next MOV template (in fact again also improving diagnostics).
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -6817,12 +6817,9 @@ match_template (char mnem_suffix)
 	    continue;
 	  /* xrelease mov %eax, <disp> is another special case. It must not
 	     match the accumulator-only encoding of mov.  */
-	  if (flag_code != CODE_64BIT
-	      && i.hle_prefix
+	  if (i.hle_prefix
 	      && t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE
-	      && i.types[0].bitfield.instance == Accum
-	      && (i.flags[1] & Operand_Mem))
+	      && t->opcode_modifier.opcodespace == SPACE_BASE)
 	    continue;
 	  /* Fall through.  */
 
--- a/gas/testsuite/gas/i386/x86-64-hle-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-hle-intel.d
@@ -425,6 +425,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov BYTE PTR \[rcx\],al
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov BYTE PTR (ds:)?0x12345678,al
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov BYTE PTR \[eiz\*1\+0x87654321\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or BYTE PTR \[rcx\],al
@@ -476,6 +478,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov WORD PTR \[rcx\],ax
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov WORD PTR (ds:)?0x12345678,ax
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov WORD PTR \[eiz\*1\+0x87654321\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or WORD PTR \[rcx\],ax
@@ -527,6 +531,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov DWORD PTR \[rcx\],eax
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov DWORD PTR (ds:)?0x12345678,eax
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov DWORD PTR \[eiz\*1\+0x87654321\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or DWORD PTR \[rcx\],eax
@@ -578,6 +584,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov QWORD PTR \[rcx\],rax
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov QWORD PTR (ds:)?0x12345678,rax
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov QWORD PTR \[eiz\*1\+0x87654321\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or QWORD PTR \[rcx\],rax
--- a/gas/testsuite/gas/i386/x86-64-hle.d
+++ b/gas/testsuite/gas/i386/x86-64-hle.d
@@ -424,6 +424,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov %al,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov %al,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov %al,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or %al,\(%rcx\)
@@ -475,6 +477,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov %ax,\(%rcx\)
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov %ax,0x12345678
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov %ax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or %ax,\(%rcx\)
@@ -526,6 +530,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov %eax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov %eax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov %eax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or %eax,\(%rcx\)
@@ -577,6 +583,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov %rax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov %rax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov %rax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or %rax,\(%rcx\)
--- a/gas/testsuite/gas/i386/x86-64-hle.s
+++ b/gas/testsuite/gas/i386/x86-64-hle.s
@@ -442,6 +442,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andb %al,(%rcx)
 	.byte 0xf0; .byte 0xf3; andb %al,(%rcx)
 	xrelease movb %al,(%rcx)
+	xrelease movb %al,0x12345678
+	xrelease addr32 movb %al,0x87654321
 	xacquire lock orb %al,(%rcx)
 	lock xacquire orb %al,(%rcx)
 	xrelease lock orb %al,(%rcx)
@@ -496,6 +498,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andw %ax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andw %ax,(%rcx)
 	xrelease movw %ax,(%rcx)
+	xrelease movw %ax,0x12345678
+	xrelease addr32 movw %ax,0x87654321
 	xacquire lock orw %ax,(%rcx)
 	lock xacquire orw %ax,(%rcx)
 	xrelease lock orw %ax,(%rcx)
@@ -550,6 +554,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andl %eax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andl %eax,(%rcx)
 	xrelease movl %eax,(%rcx)
+	xrelease movl %eax,0x12345678
+	xrelease addr32 movl %eax,0x87654321
 	xacquire lock orl %eax,(%rcx)
 	lock xacquire orl %eax,(%rcx)
 	xrelease lock orl %eax,(%rcx)
@@ -604,6 +610,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andq %rax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andq %rax,(%rcx)
 	xrelease movq %rax,(%rcx)
+	xrelease movq %rax,0x12345678
+	xrelease addr32 movq %rax,0x87654321
 	xacquire lock orq %rax,(%rcx)
 	lock xacquire orq %rax,(%rcx)
 	xrelease lock orq %rax,(%rcx)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check
  2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
                   ` (6 preceding siblings ...)
  2022-08-26 10:33 ` [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
@ 2022-08-26 10:33 ` Jan Beulich
  2022-08-26 10:33   ` Jan Beulich
  7 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:33 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Having it in match_template() is unhelpful. Neither does looking for the
next template to possibly match make any sense in that case, nor is the
resulting diagnostic making clear what the problem is.

While moving the check, also generalize it to include all SIMD and VEX-
encoded insns. This way an existing conditional can be re-used in
md_assemble(). Note though that this still leaves a lof of insns which
are also wrong to use with these relocations.

Further fold the remaining check (BFD_RELOC_386_GOT32) with the XRELEASE
related one a few lines down. This again allows re-using an existing
conditional.
---
Is the set of relocations enumerated here actually complete?
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -5116,14 +5116,30 @@ md_assemble (char *line)
       return;
     }
 
-  /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
-  if (i.prefix[DATA_PREFIX]
-      && (is_any_vex_encoding (&i.tm)
-	  || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
-	  || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX))
+  if (is_any_vex_encoding (&i.tm)
+      || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
+      || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX)
     {
-      as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
-      return;
+      /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
+      if (i.prefix[DATA_PREFIX])
+	{
+	  as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
+	  return;
+	}
+
+      /* Don't allow e.g. KMOV in TLS code sequences.  */
+      for (j = i.imm_operands; j < i.operands; ++j)
+	switch (i.reloc[j])
+	  {
+	  case BFD_RELOC_386_TLS_GOTIE:
+	  case BFD_RELOC_386_TLS_LE_32:
+	  case BFD_RELOC_X86_64_GOTTPOFF:
+	  case BFD_RELOC_X86_64_TLSLD:
+	    as_bad (_("TLS relocation cannot be used with `%s'"), i.tm.name);
+	    return;
+	  default:
+	    break;
+	  }
     }
 
   /* Check if HLE prefix is OK.  */
@@ -6765,26 +6781,6 @@ match_template (char mnem_suffix)
 	    }
 	}
 
-      switch (i.reloc[0])
-	{
-	case BFD_RELOC_386_GOT32:
-	  /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
-	  if (t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
-	  break;
-	case BFD_RELOC_386_TLS_GOTIE:
-	case BFD_RELOC_386_TLS_LE_32:
-	case BFD_RELOC_X86_64_GOTTPOFF:
-	case BFD_RELOC_X86_64_TLSLD:
-	  /* Don't allow KMOV in TLS code sequences.  */
-	  if (t->opcode_modifier.vex)
-	    continue;
-	  break;
-	default:
-	  break;
-	}
-
       /* We check register size if needed.  */
       if (t->opcode_modifier.checkregsize)
 	{
@@ -6815,12 +6811,19 @@ match_template (char mnem_suffix)
 	      && i.types[1].bitfield.instance == Accum
 	      && i.types[1].bitfield.dword)
 	    continue;
-	  /* xrelease mov %eax, <disp> is another special case. It must not
-	     match the accumulator-only encoding of mov.  */
-	  if (i.hle_prefix
-	      && t->base_opcode == 0xa0
+
+	  if (t->base_opcode == MOV_AX_DISP32
 	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
+	    {
+	      /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
+	      if (i.reloc[0] == BFD_RELOC_386_GOT32)
+		continue;
+
+	      /* xrelease mov %eax, <disp> is another special case. It must not
+		 match the accumulator-only encoding of mov.  */
+	      if (i.hle_prefix)
+		continue;
+	    }
 	  /* Fall through.  */
 
 	case 3:


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check
  2022-08-26 10:33 ` [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
@ 2022-08-26 10:33   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-26 10:33 UTC (permalink / raw)
  To: Binutils

Having it in match_template() is unhelpful. Neither does looking for the
next template to possibly match make any sense in that case, nor is the
resulting diagnostic making clear what the problem is.

While moving the check, also generalize it to include all SIMD and VEX-
encoded insns. This way an existing conditional can be re-used in
md_assemble(). Note though that this still leaves a lof of insns which
are also wrong to use with these relocations.

Further fold the remaining check (BFD_RELOC_386_GOT32) with the XRELEASE
related one a few lines down. This again allows re-using an existing
conditional.
---
Is the set of relocations enumerated here actually complete?
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -5116,14 +5116,30 @@ md_assemble (char *line)
       return;
     }
 
-  /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
-  if (i.prefix[DATA_PREFIX]
-      && (is_any_vex_encoding (&i.tm)
-	  || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
-	  || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX))
+  if (is_any_vex_encoding (&i.tm)
+      || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
+      || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX)
     {
-      as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
-      return;
+      /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
+      if (i.prefix[DATA_PREFIX])
+	{
+	  as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
+	  return;
+	}
+
+      /* Don't allow e.g. KMOV in TLS code sequences.  */
+      for (j = i.imm_operands; j < i.operands; ++j)
+	switch (i.reloc[j])
+	  {
+	  case BFD_RELOC_386_TLS_GOTIE:
+	  case BFD_RELOC_386_TLS_LE_32:
+	  case BFD_RELOC_X86_64_GOTTPOFF:
+	  case BFD_RELOC_X86_64_TLSLD:
+	    as_bad (_("TLS relocation cannot be used with `%s'"), i.tm.name);
+	    return;
+	  default:
+	    break;
+	  }
     }
 
   /* Check if HLE prefix is OK.  */
@@ -6765,26 +6781,6 @@ match_template (char mnem_suffix)
 	    }
 	}
 
-      switch (i.reloc[0])
-	{
-	case BFD_RELOC_386_GOT32:
-	  /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
-	  if (t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
-	  break;
-	case BFD_RELOC_386_TLS_GOTIE:
-	case BFD_RELOC_386_TLS_LE_32:
-	case BFD_RELOC_X86_64_GOTTPOFF:
-	case BFD_RELOC_X86_64_TLSLD:
-	  /* Don't allow KMOV in TLS code sequences.  */
-	  if (t->opcode_modifier.vex)
-	    continue;
-	  break;
-	default:
-	  break;
-	}
-
       /* We check register size if needed.  */
       if (t->opcode_modifier.checkregsize)
 	{
@@ -6815,12 +6811,19 @@ match_template (char mnem_suffix)
 	      && i.types[1].bitfield.instance == Accum
 	      && i.types[1].bitfield.dword)
 	    continue;
-	  /* xrelease mov %eax, <disp> is another special case. It must not
-	     match the accumulator-only encoding of mov.  */
-	  if (i.hle_prefix
-	      && t->base_opcode == 0xa0
+
+	  if (t->base_opcode == MOV_AX_DISP32
 	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
+	    {
+	      /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
+	      if (i.reloc[0] == BFD_RELOC_386_GOT32)
+		continue;
+
+	      /* xrelease mov %eax, <disp> is another special case. It must not
+		 match the accumulator-only encoding of mov.  */
+	      if (i.hle_prefix)
+		continue;
+	    }
 	  /* Fall through.  */
 
 	case 3:


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Ping: [PATCH v2 1/7] x86/Intel: restrict suffix derivation
  2022-08-26 10:30 ` [PATCH v2 1/7] x86/Intel: restrict suffix derivation Jan Beulich
  2022-08-26 10:30   ` Jan Beulich
@ 2022-09-13 14:20   ` Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-09-13 14:20 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 26.08.2022 12:30, Jan Beulich via Binutils wrote:
> While in some cases deriving an AT&T-style suffix from an Intel syntax
> memory operand size specifier is necessary, in many cases this is not
> only pointless, but has led to the introduction of various workarounds:
> Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
> ToQword attributes. Suppress suffix derivation when we can clearly tell
> that the memory operand's size isn't going to be needed to infer the
> possible need for the low byte/word opcode bit or an operand size prefix
> (0x66 or REX.W).
> 
> As a result ToDword and ToQword can be dropped entirely, plus a fair
> number of IgnoreSize and NoRex64 can also be got rid of. Note that
> IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
> operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
> 16-bit code using SIMD insns isn't well tested, clone an existing
> testcase just enough to cover a few insns which are potentially
> problematic but are being touched here.)
> 
> Note that while folding the VCVT{,T}S{S,D}2SI templates, VCVT{,T}SH2SI
> isn't included there. This is to fulfill the request of not allowing L
> and Q suffixes there, despite the inconsistency with VCVT{,T}S{S,D}2SI.
> ---
> Long term suffix derivation should be dropped altogether, not the least
> such that bogus error messages like "incorrect register `...' used with
> `...' suffix" don't misguid people anymore when no suffix was used at
> all.
> ---
> v2: Don't cover VCVT{,T}SH2SI with the templatization.

Since you didn't like the respective aspect of v1, may I ask for explicit
feedback on v2? If I don't hear back by the end of next week, I guess I'll
commit this change.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Ping: [PATCH v2 2/7] x86: improve match_template()'s diagnostics
  2022-08-26 10:30 ` [PATCH v2 2/7] x86: improve match_template()'s diagnostics Jan Beulich
  2022-08-26 10:30   ` Jan Beulich
@ 2022-09-13 14:24   ` Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-09-13 14:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 26.08.2022 12:30, Jan Beulich via Binutils wrote:
> At the example of
> 
> 	extractps $0, %xmm0, %xmm0
> 	insertps $0, %xmm0, %eax
> 
> (both having respectively the same mistake of using the wrong kind of
> destination register) it is easy to see that current behavior is far
> from ideal: The former results in "unsupported instruction" for 32-bit
> code simply because the 2nd template we have is a Cpu64 one. Instead we
> should aim at emitting the "best" possible error, which will typically
> be the one where we passed the largest number of checks. Generalize the
> original "specific_error" approach by making it apply to the entire
> matching loop, utilizing that line numbers increase as we pass further
> checks.
> ---
> v2: Style correction.

Like for patch 1 - since you did comment on v1, may I ask for explicit
feedback? Otherwise again: If I don't hear back by the end of next week,
I guess I'll commit this change.

Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-09-13 14:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 10:28 [PATCH v2 0/7] x86: suffix handling changes Jan Beulich
2022-08-26 10:28 ` Jan Beulich
2022-08-26 10:30 ` [PATCH v2 1/7] x86/Intel: restrict suffix derivation Jan Beulich
2022-08-26 10:30   ` Jan Beulich
2022-09-13 14:20   ` Ping: " Jan Beulich
2022-08-26 10:30 ` [PATCH v2 2/7] x86: improve match_template()'s diagnostics Jan Beulich
2022-08-26 10:30   ` Jan Beulich
2022-09-13 14:24   ` Ping: " Jan Beulich
2022-08-26 10:31 ` [PATCH v2 3/7] x86: re-work insn/suffix recognition Jan Beulich
2022-08-26 10:31   ` Jan Beulich
2022-08-26 10:32 ` [PATCH v2 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
2022-08-26 10:32   ` Jan Beulich
2022-08-26 10:32 ` [PATCH v2 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
2022-08-26 10:32   ` Jan Beulich
2022-08-26 10:33 ` [PATCH v2 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
2022-08-26 10:33   ` Jan Beulich
2022-08-26 10:33 ` [PATCH v2 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
2022-08-26 10:33   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).