public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
@ 2023-11-03 16:50 Cui, Lili
  2023-11-06 17:07 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Cui, Lili @ 2023-11-03 16:50 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

Hi Jan,

This patch is based on patch " x86: split insn templates' CPU field' ".

Regards,
Lili.

This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the "MOVBE
  reg1, reg2".

  gas/ChangeLog:

	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
	function for APX cpu flag checking.
	(cpu_flags_match): handle cpu_flags_not_or_check.
	(install_template): Add AMX_TILE and APX combine.
	(is_any_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(check_EgprOperands): Add nodgpr check for apx.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.l: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-mod.h: Handle MOD_EVEX_MAP4_65,
	MOD_EVEX_MAP4_66_PREFIX_0, MOD_EVEX_MAP4_8A_W_0,
	MOD_EVEX_MAP4_DA_PREFIX_1, MOD_EVEX_MAP4_DB_PREFIX_1,
	MOD_EVEX_MAP4_DC_PREFIX_1, MOD_EVEX_MAP4_DD_PREFIX_1,
	MOD_EVEX_MAP4_DE_PREFIX_1, MOD_EVEX_MAP4_DF_PREFIX_1,
	MOD_EVEX_MAP4_F8_PREFIX_1, MOD_EVEX_MAP4_F8_PREFIX_2,
	MOD_EVEX_MAP4_F8_PREFIX_3, MOD_EVEX_MAP4_F9,
	MOD_EVEX_MAP4_8B.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_60,
	PREFIX_EVEX_MAP4_61, PREFIX_EVEX_MAP4_66,
	PREFIX_EVEX_MAP4_8B_M_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8,
	PREFIX_EVEX_MAP4_FC.
	* i386-dis-evex-reg.h: Handle REG_EVEX_MAP4_D8_PREFIX_1,
	REG_EVEX_0F38F3_L_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* i386-dis.c (REG enum): Add REG_EVEX_MAP4_D8_PREFIX_1.
	(MOD enum): Add MOD_EVEX_MAP4_65, MOD_EVEX_MAP4_66_PREFIX_0,
	MOD_EVEX_MAP4_8A_W_0, MOD_EVEX_MAP4_8B,
	MOD_EVEX_MAP4_DA_PREFIX_1, MOD_EVEX_MAP4_DB_PREFIX_1,
	MOD_EVEX_MAP4_DC_PREFIX_1, MOD_EVEX_MAP4_DD_PREFIX_1,
	MOD_EVEX_MAP4_DE_PREFIX_1, MOD_EVEX_MAP4_DF_PREFIX_1,
	MOD_EVEX_MAP4_F8_PREFIX_1, MOD_EVEX_MAP4_F8_PREFIX_2,
	MOD_EVEX_MAP4_F8_PREFIX_3, MOD_EVEX_MAP4_F9,
	REG_EVEX_0F38F3_L_0.
	(PREFIX enum): Add PREFIX_EVEX_MAP4_60, PREFIX_EVEX_MAP4_61,
	PREFIX_EVEX_MAP4_66, PREFIX_EVEX_MAP4_8B_M_0,
	PREFIX_EVEX_MAP4_D8, PREFIX_EVEX_MAP4_DA,
	PREFIX_EVEX_MAP4_DB, PREFIX_EVEX_MAP4_DC,
	PREFIX_EVEX_MAP4_DD, PREFIX_EVEX_MAP4_DE,
	PREFIX_EVEX_MAP4_DF, PREFIX_EVEX_MAP4_F0,
	PREFIX_EVEX_MAP4_F1, PREFIX_EVEX_MAP4_F2,
	PREFIX_EVEX_MAP4_F8, PREFIX_EVEX_MAP4_FC.
	(EVEX_LEN_enum): Add EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	(EVEX_X86_enum): Add X86_64_EVEX_0F90, X86_64_EVEX_0F91,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F3849,
	X86_64_EVEX_0F384B, X86_64_EVEX_0F38E0, X86_64_EVEX_0F38E1,
	X86_64_EVEX_0F38E2, X86_64_EVEX_0F38E3, X86_64_EVEX_0F38E4,
	X86_64_EVEX_0F38E5, X86_64_EVEX_0F38E6, X86_64_EVEX_0F38E7,
	X86_64_EVEX_0F38E8, X86_64_EVEX_0F38E9, X86_64_EVEX_0F38EA,
	X86_64_EVEX_0F38EB, X86_64_EVEX_0F38EC, X86_64_EVEX_0F38ED,
	X86_64_EVEX_0F38EE, X86_64_EVEX_0F38EF, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0.
	(struct instr_info): Deleted bool r.
	(putop): Ditto.
	(PREFIX_DATA_AND_NP_ONLY): New define.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier):
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                        |  91 +++++++++--
 gas/testsuite/gas/i386/x86-64-evex.d        |   2 +-
 gas/testsuite/gas/i386/x86-64-inval-movbe.l |  31 ++--
 gas/testsuite/gas/i386/x86-64-inval-movbe.s |   1 +
 opcodes/i386-dis-evex-len.h                 |  10 ++
 opcodes/i386-dis-evex-mod.h                 |  42 +++++
 opcodes/i386-dis-evex-prefix.h              |  69 ++++++++
 opcodes/i386-dis-evex-reg.h                 |  14 ++
 opcodes/i386-dis-evex-x86-64.h              | 140 +++++++++++++++++
 opcodes/i386-dis-evex.h                     |  94 +++++------
 opcodes/i386-dis.c                          | 165 +++++++++++++++++---
 opcodes/i386-gen.c                          |   1 +
 opcodes/i386-opc.h                          |   2 +
 opcodes/i386-opc.tbl                        |  69 +++++++-
 14 files changed, 628 insertions(+), 103 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 62d02801f16..df3d6bcd96b 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
   {
-      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
-	   || maybe_cpu (t, CpuFMA))
-	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
+    if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
+	 || maybe_cpu (t, CpuFMA) ||  maybe_cpu (t, CpuAMX_TILE))
+	&& (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
+	    || maybe_cpu (t, CpuAPX_F)))
 	{
 	  if (need_evex_encoding ())
 	    {
@@ -3689,12 +3690,11 @@ install_template (const insn_template *t)
 		i.tm.cpu.bitfield.cpuavx = 1;
 	      else
 		{
-		  gas_assert (!i.tm.cpu.bitfield.isa);
 		  i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
 		}
 	    }
 	}
-  }
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3885,6 +3885,14 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || is_evex_encoding (t);
 }
 
+static INLINE bool
+is_any_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_any_apx_rex2_encoding (void)
 {
@@ -4161,6 +4169,27 @@ build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= 0xef;
+  if (i.vex.register_specifier
+      && register_number (i.vex.register_specifier) > 0xf)
+    i.vex.bytes[3] &= 0xf7;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= 0xfb;
+}
+
 static void
 process_immext (void)
 {
@@ -5624,19 +5653,42 @@ md_assemble (char *line)
 	}
 
       /* Check for explicit REX2 prefix.  */
-      if (i.rex2 || i.rex2_encoding)
+      if (i.rex2_encoding)
 	{
 	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_any_apx_evex_encoding ())
+	{
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4 && (i.prefix[DATA_PREFIX] != 0))
+	    {
+	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	      i.prefix[DATA_PREFIX] = 0;
+	    }
+
+	  build_apx_evex_prefix ();
+
+	  /* Encode the NDD bit of the instruction promoted from the legacy
+	     space.  */
+	  if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.vex.bytes[3] |= 0x10;
+
+	  /* Encode the NF bit of the instruction promoted from legacy and vex
+	     space.  */
+	  if (i.has_nf)
+	    i.vex.bytes[3] |= 0x04;
+	}
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -5663,16 +5715,17 @@ md_assemble (char *line)
      instruction already has a prefix, we need to convert old
      registers to new ones.  */
 
-  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
-       && (i.op[0].regs->reg_flags & RegRex64) != 0)
-      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
-	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
-      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
-	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && (i.rex != 0 || i.rex2 != 0)))
+  if (!is_any_vex_encoding (&i.tm)
+      && ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
+	   && (i.op[0].regs->reg_flags & RegRex64) != 0)
+	  || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
+	      && (i.op[1].regs->reg_flags & RegRex64) != 0)
+	  || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
+	       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
+	      && (i.rex != 0 || i.rex2 != 0))))
     {
       int x;
-      if (!i.rex2)
+      if (!is_any_apx_rex2_encoding ())
 	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
@@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
 static int
 check_EgprOperands (const insn_template *t)
 {
-  if (t->opcode_modifier.noegpr)
+  if (t->opcode_modifier.noegpr && !need_evex_encoding())
     {
       for (unsigned int op = 0; op < i.operands; op++)
 	{
@@ -8042,7 +8095,8 @@ process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry *r)
 
   if (r->reg_flags & RegRex2)
     {
+      if (is_evex_encoding (current_templates->start))
+	i.vec_encoding = vex_encoding_evex;
+
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-inval-movbe.l b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
index 1c8ceb55c11..44ddfe4f034 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-movbe.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
@@ -1,29 +1,30 @@
 .*: Assembler messages:
-.*:4: Error: .*
 .*:5: Error: .*
 .*:6: Error: .*
 .*:7: Error: .*
 .*:8: Error: .*
-.*:11: Error: .*
+.*:9: Error: .*
 .*:12: Error: .*
 .*:13: Error: .*
 .*:14: Error: .*
 .*:15: Error: .*
+.*:16: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\# Check illegal movbe in 64bit mode\.
 [ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+foo:
-[ 	]*4[ 	]+movbe	\(%rcx\),%bl
-[ 	]*5[ 	]+movbe	%ecx,%ebx
-[ 	]*6[ 	]+movbe	%bx,%rcx
-[ 	]*7[ 	]+movbe	%rbx,%rcx
-[ 	]*8[ 	]+movbe	%bl,\(%rcx\)
-[ 	]*9[ 	]+
-[ 	]*10[ 	]+\.intel_syntax noprefix
-[ 	]*11[ 	]+movbe bl, byte ptr \[rcx\]
-[ 	]*12[ 	]+movbe ebx, ecx
-[ 	]*13[ 	]+movbe rcx, bx
-[ 	]*14[ 	]+movbe rcx, rbx
-[ 	]*15[ 	]+movbe byte ptr \[rcx\], bl
+[ 	]*3[ 	]+\.arch \.noapx_f
+[ 	]*4[ 	]+foo:
+[ 	]*5[ 	]+movbe	\(%rcx\),%bl
+[ 	]*6[ 	]+movbe	%ecx,%ebx
+[ 	]*7[ 	]+movbe	%bx,%rcx
+[ 	]*8[ 	]+movbe	%rbx,%rcx
+[ 	]*9[ 	]+movbe	%bl,\(%rcx\)
+[ 	]*10[ 	]+
+[ 	]*11[ 	]+\.intel_syntax noprefix
+[ 	]*12[ 	]+movbe bl, byte ptr \[rcx\]
+[ 	]*13[ 	]+movbe ebx, ecx
+[ 	]*14[ 	]+movbe rcx, bx
+[ 	]*15[ 	]+movbe rcx, rbx
+[ 	]*16[ 	]+movbe byte ptr \[rcx\], bl
diff --git a/gas/testsuite/gas/i386/x86-64-inval-movbe.s b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
index 38f09b14d64..380a9191b6a 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-movbe.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
@@ -1,5 +1,6 @@
 # Check illegal movbe in 64bit mode.
 	.text
+	.arch .noapx_f
 foo:
 	movbe	(%rcx),%bl
 	movbe	%ecx,%ebx
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..1933a045822 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { REG_TABLE(REG_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-mod.h b/opcodes/i386-dis-evex-mod.h
index f9f912c5094..a60c19add3c 100644
--- a/opcodes/i386-dis-evex-mod.h
+++ b/opcodes/i386-dis-evex-mod.h
@@ -1 +1,43 @@
 /* Nothing at present.  */
+  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
+  {
+    { Bad_Opcode },
+    { "encodekey128", { Gd, Ed }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
+  {
+    { Bad_Opcode },
+    { "encodekey256", { Gd, Ed }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
+  {
+    { "aesenc128kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
+  {
+    { "aesdec128kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
+  {
+    { "aesenc256kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
+  {
+    { "aesdec256kl",    { XM, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
+  {
+    { "enqcmds",	{ Gva, M },  0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
+  {
+    { "movdir64b",	{ Gva, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
+  {
+    { "enqcmd",		{ Gva, M }, 0 },
+  },
+  /* MOD_EVEX_MAP4_F9 */
+  {
+    { "movdiri",	{ Edq, Gdq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..e8f32324ade 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -338,6 +338,75 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_66 */
+  {
+    { "wrssK",	{ M, Gdq }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_EVEX_MAP4_D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DA_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DB_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DC_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { MOD_TABLE (MOD_EVEX_MAP4_DD_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_DE_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_DF_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A",	{ Gdq, Eb }, 0 },
+    { "invept",	{ Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q",	{ Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q",	{ Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_1) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_2) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_3) },
+  },
+  /* PREFIX_EVEX_MAP4_FC */
+  {
+    { "aadd",	{ Mdq, Gdq }, 0 },
+    { "axor",	{ Mdq, Gdq }, 0 },
+    { "aand",	{ Mdq, Gdq }, 0 },
+    { "aor",	{ Mdq, Gdq }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..c3b4f083346 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,17 @@
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",		{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",		{ VexGdq, Edq }, 0 },
+  },
+  /* REG_EVEX_MAP4_D8_PREFIX_1 */
+  {
+    { "aesencwide128kl",	{ M }, 0 },
+    { "aesdecwide128kl",	{ M }, 0 },
+    { "aesencwide256kl",	{ M }, 0 },
+    { "aesdecwide256kl",	{ M }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..1121223d877
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,140 @@
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F90) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F91) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F92) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F93) },
+  },
+  /* X86_64_EVEX_0F3849 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
+  },
+  /* X86_64_EVEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
+  },
+  /* X86_64_EVEX_0F38E0 */
+  {
+    { Bad_Opcode },
+    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E1 */
+  {
+    { Bad_Opcode },
+    { "cmpnoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E2 */
+  {
+    { Bad_Opcode },
+    { "cmpbxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E3 */
+  {
+    { Bad_Opcode },
+    { "cmpnbxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E4 */
+  {
+    { Bad_Opcode },
+    { "cmpzxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E5 */
+  {
+    { Bad_Opcode },
+    { "cmpnzxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E6 */
+  {
+    { Bad_Opcode },
+    { "cmpbexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E7 */
+  {
+    { Bad_Opcode },
+    { "cmpnbexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E8 */
+  {
+    { Bad_Opcode },
+    { "cmpsxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38E9 */
+  {
+    { Bad_Opcode },
+    { "cmpnsxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EA */
+  {
+    { Bad_Opcode },
+    { "cmppxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EB */
+  {
+    { Bad_Opcode },
+    { "cmpnpxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EC */
+  {
+    { Bad_Opcode },
+    { "cmplxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38ED */
+  {
+    { Bad_Opcode },
+    { "cmpnlxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EE */
+  {
+    { Bad_Opcode },
+    { "cmplexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38EF */
+  {
+    { Bad_Opcode },
+    { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..65a2cbeaeb2 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@ static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@ static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_66) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1", { XM, EXxmm }, 0 },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { MOD_TABLE (MOD_EVEX_MAP4_F9) },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index a34bb6fe343..54f13a35bf0 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@ enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -211,7 +218,6 @@ struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -219,6 +225,8 @@ struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -304,6 +312,7 @@ struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -801,6 +810,7 @@ enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -819,6 +829,8 @@ enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -879,7 +891,9 @@ enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0,
+  REG_EVEX_MAP4_D8_PREFIX_1
 };
 
 enum
@@ -920,6 +934,17 @@ enum
   MOD_0F38F8,
 
   MOD_VEX_0F3849_X86_64_L_0_W_0,
+
+  MOD_EVEX_MAP4_DA_PREFIX_1,
+  MOD_EVEX_MAP4_DB_PREFIX_1,
+  MOD_EVEX_MAP4_DC_PREFIX_1,
+  MOD_EVEX_MAP4_DD_PREFIX_1,
+  MOD_EVEX_MAP4_DE_PREFIX_1,
+  MOD_EVEX_MAP4_DF_PREFIX_1,
+  MOD_EVEX_MAP4_F8_PREFIX_1,
+  MOD_EVEX_MAP4_F8_PREFIX_2,
+  MOD_EVEX_MAP4_F8_PREFIX_3,
+  MOD_EVEX_MAP4_F9,
 };
 
 enum
@@ -1157,6 +1182,20 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_66,
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+  PREFIX_EVEX_MAP4_FC,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1268,6 +1307,36 @@ enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F3849,
+  X86_64_EVEX_0F384B,
+  X86_64_EVEX_0F38E0,
+  X86_64_EVEX_0F38E1,
+  X86_64_EVEX_0F38E2,
+  X86_64_EVEX_0F38E3,
+  X86_64_EVEX_0F38E4,
+  X86_64_EVEX_0F38E5,
+  X86_64_EVEX_0F38E6,
+  X86_64_EVEX_0F38E7,
+  X86_64_EVEX_0F38E8,
+  X86_64_EVEX_0F38E9,
+  X86_64_EVEX_0F38EA,
+  X86_64_EVEX_0F38EB,
+  X86_64_EVEX_0F38EC,
+  X86_64_EVEX_0F38ED,
+  X86_64_EVEX_0F38EE,
+  X86_64_EVEX_0F38EF,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
+
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
 };
 
@@ -1300,6 +1369,7 @@ enum
   EVEX_MAP4,
   EVEX_MAP5,
   EVEX_MAP6,
+  EVEX_MAP7,
 };
 
 enum
@@ -1453,6 +1523,8 @@ enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -4522,12 +4594,13 @@ static const struct dis386 x86_64_table[][2] = {
     { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
   },
 
+#include "i386-dis-evex-x86-64.h"
+
   /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
   {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
-
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8734,6 +8807,9 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8979,9 +9055,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch ((*ins->codep & 0x7))
 	{
 	default:
 	  return &bad_opcode;
@@ -8994,12 +9074,19 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
 	case 0x6:
 	  vex_table_index = EVEX_MAP6;
 	  break;
+	case 0x7:
+	  vex_table_index = EVEX_MAP7;
+	  break;
 	}
 
       /* The second byte after 0x62.  */
@@ -9010,9 +9097,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -9042,12 +9128,31 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.L'L, EVEX.z and the
+	 lower 2 bits of EVEX.aaa must be 0.
+	 EVEX from evex instrucions require that EVEX.L'L and the lower 2 bits of
+	 EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy || ins->evex_type == evex_from_vex)
+	{
+	  if ((*ins->codep & 0x3) != 0
+	      || (*ins->codep >> 6 & 0x3) != 0
+	      || (ins->evex_type == evex_from_legacy
+		  && (*ins->codep >> 5 & 0x1) != 0)
+	      || (ins->evex_type == evex_from_vex
+		  && !ins->vex.b))
+	    return &bad_opcode;
+	}
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9460,6 +9565,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, prefix
+	 0x66 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9640,6 +9752,19 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    default:
+      break;
+
     }
 
   /* Check if the REX prefix is used.  */
@@ -10348,7 +10473,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11242,7 +11367,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11460,7 +11585,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11608,6 +11733,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require REX_X not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11807,7 +11939,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -12013,10 +12145,7 @@ OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12646,7 +12775,7 @@ OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13654,7 +13783,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 9599ac22b0c..25e852b5ec5 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -1094,6 +1094,7 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(VEXMAP7),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index f211c218965..0c629fe58ba 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -975,6 +975,7 @@ typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      7: VEXMAP7 opcode prefix / space.
@@ -986,6 +987,7 @@ typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4  4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_VEXMAP7	7
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index a90279265bc..f6ae9b80288 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -109,6 +109,7 @@
 #define SpaceXOP09 OpcodeSpace=SPACE_XOP09
 #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
 
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4
 #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
 #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
 
@@ -138,6 +139,8 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+#define APX_F_64 APX_F&x64
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -191,6 +194,7 @@ mov, 0xf24, i386&No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe&APX_F_64, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -338,6 +342,7 @@ adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 aaa, 0x37, No64, NoSuf, {}
@@ -1316,13 +1321,16 @@ getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, APX_F_64&INVPCID, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1422,7 +1430,9 @@ pcmpestrm, 0x660f3a60, SSE4_2&x64, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { I
 pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F_64, W|Modrm|No_sSuf|No_qSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F_64, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVex128|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1834,13 +1844,21 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 // BMI2 instructions.
 
 bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+rorx, 0xf2f0, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
 sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1911,10 +1929,15 @@ lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 // BMI instructions
 
 andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+andn, 0xf2, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bextr, 0xf7, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2043,13 +2066,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2109,8 +2139,11 @@ kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, {
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
 kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>&APX_F_64, Modrm|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
 kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>&APX_F_64, Modrm|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
 kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>&APX_F_64, D|Modrm|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2586,8 +2619,11 @@ kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
 kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, AVX512BW&APX_F_64, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
 kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0x<dq:kpfx>91, AVX512BW&APX_F_64, Modrm|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
 kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0xf292, AVX512BW&APX_F_64, D|Modrm|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2986,9 +3022,13 @@ rdsspq, 0xf30f1e/1, SHSTK&x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK&APX_F_64, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK&x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, APX_F_64&SHSTK, Modrm|NoSuf|Size64|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK&APX_F_64, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK&x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3036,7 +3076,9 @@ cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI&APX_F_64, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B&APX_F_64, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3065,7 +3107,9 @@ vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, ENQCMD&APX_F_64, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, ENQCMD&APX_F_64, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3126,8 +3170,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE&x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE&x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, AMX_TILE&x64&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, AMX_TILE&x64&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX&x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX&x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3139,9 +3183,9 @@ tdpbuud, 0x5e, AMX_INT8&x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|No
 tdpbusd, 0x665e, AMX_INT8&x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8&x64, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE&x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE&x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE&x64, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, AMX_TILE&x64&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, AMX_TILE&x64&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, AMX_TILE&x64&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE&x64, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3153,15 +3197,25 @@ tilezero, 0xf249, AMX_TILE&x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL&APX_F_64, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3310,6 +3364,7 @@ prefetchit1, 0xf18/6, PREFETCHI&x64, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 // CMPCCXADD instructions.
 
 cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64&APX_F_64, Modrm|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3329,9 +3384,13 @@ wrmsrlist, 0xf30f01c6, MSRLIST&x64, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT&APX_F_64, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT&APX_F_64, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT&APX_F_64, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT&APX_F_64, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-03 16:50 [PATCH V2 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-11-06 17:07 ` Jan Beulich
  2023-11-13  5:53   ` Cui, Lili
  2023-11-07 13:29 ` Jan Beulich
  2023-11-07 14:53 ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-06 17:07 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 03.11.2023 17:50, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>    {
> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> -	   || maybe_cpu (t, CpuFMA))
> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> +    if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> +	 || maybe_cpu (t, CpuFMA) ||  maybe_cpu (t, CpuAMX_TILE))
> +	&& (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
> +	    || maybe_cpu (t, CpuAPX_F)))

Something's odd with formatting (indentation) here. I first thought I
had screwed up in my patch, but looking back things appear to be
correct there.

> @@ -3689,12 +3690,11 @@ install_template (const insn_template *t)
>  		i.tm.cpu.bitfield.cpuavx = 1;
>  	      else
>  		{
> -		  gas_assert (!i.tm.cpu.bitfield.isa);
>  		  i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;

You may not remove the assertion here, or else the assignment could
silently overwrite a value. Can you explain why you did the removal?

> @@ -3885,6 +3885,14 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || is_evex_encoding (t);
>  }
>  
> +static INLINE bool
> +is_any_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);
> +}

The use of i.rex2 here doesn't fit the name; the sole user has first
checked that no legacy encoding is going to be used, and that's a
prereq here. Such a prereq needs spelling out, such that one can be
easily aware when possibly adding another caller.

Also, what does "any" stand for in the name here ...

>  static INLINE bool
>  is_any_apx_rex2_encoding (void)

... and also the one here? In is_any_vex_encoding() it refers to
VEX/XOP/EVEX.

> @@ -4161,6 +4169,27 @@ build_rex2_prefix (void)
>  		    | (i.rex2 << 4) | i.rex);
>  }
>  
> +/* Build the EVEX prefix (4-byte) for evex insn
> +   | 62h |
> +   | `R`X`B`R' | B'mmm |
> +   | W | v`v`v`v | `x' | pp |
> +   | z| L'L | b | `v | aaa |
> +*/
> +static void
> +build_apx_evex_prefix (void)
> +{
> +  build_evex_prefix ();
> +  if (i.rex2 & REX_R)
> +    i.vex.bytes[1] &= 0xef;

I think this would read easier as ~0x10 (similarly below then). I also
think ...

> +  if (i.vex.register_specifier
> +      && register_number (i.vex.register_specifier) > 0xf)
> +    i.vex.bytes[3] &= 0xf7;
> +  if (i.rex2 & REX_B)
> +    i.vex.bytes[1] |= 0x08;

... it would help if the byte 1 updates were kept together (the compiler
may even produce better code then), and maybe ...

> +  if (i.rex2 & REX_X)
> +    i.vex.bytes[2] &= 0xfb;

... the byte 2 update ahead of the byte 3 one.

Also why do you use register_number() here, and not the RegRex2 flag?

> @@ -5624,19 +5653,42 @@ md_assemble (char *line)
>  	}
>  
>        /* Check for explicit REX2 prefix.  */
> -      if (i.rex2 || i.rex2_encoding)
> +      if (i.rex2_encoding)

This change ...

>  	{
>  	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));

... further invalidates this message. Yet iirc you said you removed the
i.rex2 part of the check anyway from patch 1.

>  	  return;
>  	}
>  
> -      if (i.tm.opcode_modifier.vex)
> +      if (is_any_apx_evex_encoding ())
> +	{
> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 && (i.prefix[DATA_PREFIX] != 0))
> +	    {
> +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;

Perhaps better assert that no other embedded prefix was already recorded
here?

> +	      i.prefix[DATA_PREFIX] = 0;
> +	    }
> +
> +	  build_apx_evex_prefix ();
> +
> +	  /* Encode the NDD bit of the instruction promoted from the legacy
> +	     space.  */
> +	  if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
> +	    i.vex.bytes[3] |= 0x10;

Why the restriction to map 3? And why is this not part of
build_apx_evex_prefix()?

> +	  /* Encode the NF bit of the instruction promoted from legacy and vex
> +	     space.  */
> +	  if (i.has_nf)
> +	    i.vex.bytes[3] |= 0x04;

This wants to move to the patch actually introducing NF handling.

> @@ -5663,16 +5715,17 @@ md_assemble (char *line)
>       instruction already has a prefix, we need to convert old
>       registers to new ones.  */
>  
> -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && (i.rex != 0 || i.rex2 != 0)))
> +  if (!is_any_vex_encoding (&i.tm)
> +      && ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> +	   && (i.op[0].regs->reg_flags & RegRex64) != 0)
> +	  || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> +	      && (i.op[1].regs->reg_flags & RegRex64) != 0)
> +	  || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> +	       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> +	      && (i.rex != 0 || i.rex2 != 0))))

I'm a little puzzled by this (hard to read) adjustment:
is_any_vex_encoding() basically excludes most new templates added here,
yet for those permitting Reg8 you still need to enforce the EVEX-imposed
restriction.

> @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
>  static int
>  check_EgprOperands (const insn_template *t)
>  {
> -  if (t->opcode_modifier.noegpr)
> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
>      {
>        for (unsigned int op = 0; op < i.operands; op++)
>  	{

What is this change about?

> @@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry *r)
>  
>    if (r->reg_flags & RegRex2)
>      {
> +      if (is_evex_encoding (current_templates->start))
> +	i.vec_encoding = vex_encoding_evex;

What if the APX template isn't first in the group?

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -975,6 +975,7 @@ typedef struct insn_template
>       1: 0F opcode prefix / space.
>       2: 0F38 opcode prefix / space.
>       3: 0F3A opcode prefix / space.
> +     4: EVEXMAP4 opcode prefix / space.
>       5: EVEXMAP5 opcode prefix / space.
>       6: EVEXMAP6 opcode prefix / space.
>       7: VEXMAP7 opcode prefix / space.
> @@ -986,6 +987,7 @@ typedef struct insn_template
>  #define SPACE_0F	1
>  #define SPACE_0F38	2
>  #define SPACE_0F3A	3
> +#define SPACE_EVEXMAP4  4
>  #define SPACE_EVEXMAP5	5
>  #define SPACE_EVEXMAP6	6
>  #define SPACE_VEXMAP7	7

Nit: Please can padding here match surrounding code?

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -109,6 +109,7 @@
>  #define SpaceXOP09 OpcodeSpace=SPACE_XOP09
>  #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
>  
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4
>  #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
>  #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
>  
> @@ -138,6 +139,8 @@
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +#define APX_F_64 APX_F&x64

What's wrong with

#define APX_F APX_F&x64

? Oh, I see - this wouldn't build with the uses in the AMX-TILE insns.

Yet as said elsewhere, it may be better if we make the 64-bit-mode-
only-ISAs' dependencies (not just for APX) implicit as far as the
opcode table goes.

> @@ -338,6 +342,7 @@ adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
>  adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
>  neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +
>  not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
>  aaa, 0x37, No64, NoSuf, {}

Nit: In an already overlarge patch it is extremely helpful if any
unrelated changed could be omitted.

> @@ -1316,13 +1321,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>  
>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invept, 0xf3f0, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invvpid, 0xf3f1, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
>  
>  // INVPCID instruction
>  
>  invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invpcid, 0xf3f2, APX_F_64&INVPCID, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }

While ordering doesn't matter functionality wise, overall readability
imo would end up improved if you had EPT&APX_F and INVPCID&APX_f here
(and the likewise elsewhere, albeit it looks in other cases you already
have things the other way round).

> @@ -3310,6 +3364,7 @@ prefetchit1, 0xf18/6, PREFETCHI&x64, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
>  // CMPCCXADD instructions.
>  
>  cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
> +cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64&APX_F_64, Modrm|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }

Nit: Redundant x64.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-03 16:50 [PATCH V2 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
  2023-11-06 17:07 ` Jan Beulich
@ 2023-11-07 13:29 ` Jan Beulich
  2023-11-09  8:38   ` Cui, Lili
  2023-11-07 14:53 ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-07 13:29 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 03.11.2023 17:50, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)

What I'm surprised by is that you don't have any change to cpu_flags_match().
I don't think you can get away without for dual VEX/EVEX templates. I hope
further down you'll find a sufficient explanation of what I think is going
to be needed.

> @@ -1834,13 +1844,21 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>  // BMI2 instructions.
>  
>  bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> +bzhi, 0xf5, BMI2&APX_F_64, Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

With the now two prereq patches sent a few minutes ago, this wants to follow
what you have ...

> @@ -3126,8 +3170,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
>  
>  // AMX instructions.
>  
> -ldtilecfg, 0x49/0, AMX_TILE&x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
> -sttilecfg, 0x6649/0, AMX_TILE&x64, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
> +ldtilecfg, 0x49/0, AMX_TILE&x64&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
> +sttilecfg, 0x6649/0, AMX_TILE&x64&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }

... here, simplified to <feat>&(<feat>|APF_F). I would strongly recommend
putting this in a macro, accompanied by a clarifying comment (not the least
because (a && (a || b)) == a):

/* Many APX_F instructions require a 2nd feature to also be available.  When
   the respective instructions were originally VEX-encoded, a single template
   can cover both encodings.  While the expression below may look odd on the
   surface (first of all we really mean "feat || (feat && APX_F)", but that is
   nothing else than just "feat", and in turn the same as the expression below,
   which is what i386-gen can grok), gas/config/tc-i386.c:cpu_flags_match()
   will zap either of the inner two features depending on whether EVEX
   encoding was determined to be necessary.  Only then will the actual feature
   checking be carried out.  */
#define APX_F(feat) feat&(feat|APF_F)

Since the list of bits we want to clear in cpu_flags_match()'s "any" is going
to grow, we may want to consider using cpu_flags_and_not() there as well. But
that can likely be a follow-on cleanup patch (if deemed desirable in the first
place), so I would recommend not worrying about that right away.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-03 16:50 [PATCH V2 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
  2023-11-06 17:07 ` Jan Beulich
  2023-11-07 13:29 ` Jan Beulich
@ 2023-11-07 14:53 ` Jan Beulich
  2023-11-09 12:31   ` Cui, Lili
                     ` (2 more replies)
  2 siblings, 3 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-07 14:53 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 03.11.2023 17:50, Cui, Lili wrote:
> --- a/gas/testsuite/gas/i386/x86-64-inval-movbe.l
> +++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
> @@ -1,29 +1,30 @@
>  .*: Assembler messages:
> -.*:4: Error: .*
>  .*:5: Error: .*
>  .*:6: Error: .*
>  .*:7: Error: .*
>  .*:8: Error: .*
> -.*:11: Error: .*
> +.*:9: Error: .*
>  .*:12: Error: .*
>  .*:13: Error: .*
>  .*:14: Error: .*
>  .*:15: Error: .*
> +.*:16: Error: .*
>  GAS LISTING .*
>  
>  
>  [ 	]*1[ 	]+\# Check illegal movbe in 64bit mode\.
>  [ 	]*2[ 	]+\.text
> -[ 	]*3[ 	]+foo:
> -[ 	]*4[ 	]+movbe	\(%rcx\),%bl
> -[ 	]*5[ 	]+movbe	%ecx,%ebx
> -[ 	]*6[ 	]+movbe	%bx,%rcx
> -[ 	]*7[ 	]+movbe	%rbx,%rcx
> -[ 	]*8[ 	]+movbe	%bl,\(%rcx\)
> -[ 	]*9[ 	]+
> -[ 	]*10[ 	]+\.intel_syntax noprefix
> -[ 	]*11[ 	]+movbe bl, byte ptr \[rcx\]
> -[ 	]*12[ 	]+movbe ebx, ecx
> -[ 	]*13[ 	]+movbe rcx, bx
> -[ 	]*14[ 	]+movbe rcx, rbx
> -[ 	]*15[ 	]+movbe byte ptr \[rcx\], bl
> +[ 	]*3[ 	]+\.arch \.noapx_f
> +[ 	]*4[ 	]+foo:
> +[ 	]*5[ 	]+movbe	\(%rcx\),%bl
> +[ 	]*6[ 	]+movbe	%ecx,%ebx
> +[ 	]*7[ 	]+movbe	%bx,%rcx
> +[ 	]*8[ 	]+movbe	%rbx,%rcx
> +[ 	]*9[ 	]+movbe	%bl,\(%rcx\)
> +[ 	]*10[ 	]+
> +[ 	]*11[ 	]+\.intel_syntax noprefix
> +[ 	]*12[ 	]+movbe bl, byte ptr \[rcx\]
> +[ 	]*13[ 	]+movbe ebx, ecx
> +[ 	]*14[ 	]+movbe rcx, bx
> +[ 	]*15[ 	]+movbe rcx, rbx
> +[ 	]*16[ 	]+movbe byte ptr \[rcx\], bl

To avoid the need to fiddle with this file, did you consider changing
the test's command line options instead? In any event ...

> --- a/gas/testsuite/gas/i386/x86-64-inval-movbe.s
> +++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
> @@ -1,5 +1,6 @@
>  # Check illegal movbe in 64bit mode.
>  	.text
> +	.arch .noapx_f
>  foo:
>  	movbe	(%rcx),%bl
>  	movbe	%ecx,%ebx

... the comment here wants expanding, to (briefly) mention the deliberate
exclusion of APX.

> --- a/opcodes/i386-dis-evex-len.h
> +++ b/opcodes/i386-dis-evex-len.h
> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
>    },
>  
> +  /* EVEX_LEN_0F38F2 */
> +  {
> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
> +  },

There's no sign of a prefix decode step here.

> --- a/opcodes/i386-dis-evex-mod.h
> +++ b/opcodes/i386-dis-evex-mod.h
> @@ -1 +1,43 @@
>  /* Nothing at present.  */

This comment needs to go away when new stuff is added here. However, I'm
sure I requested before that no new entries be put here which have only
one of their two slots populated. The reg-only and mem-only aspects can
be expressed via proper choice of operand specifiers, at least in almost
all cases. Note how you already use ...

> +  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
> +  {
> +    { Bad_Opcode },
> +    { "encodekey128", { Gd, Ed }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
> +  {
> +    { Bad_Opcode },
> +    { "encodekey256", { Gd, Ed }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
> +  {
> +    { "aesenc128kl",    { XM, M }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
> +  {
> +    { "aesdec128kl",    { XM, M }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
> +  {
> +    { "aesenc256kl",    { XM, M }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
> +  {
> +    { "aesdec256kl",    { XM, M }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
> +  {
> +    { "enqcmds",	{ Gva, M },  0 },
> +  },
> +  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
> +  {
> +    { "movdir64b",	{ Gva, M }, 0 },
> +  },
> +  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
> +  {
> +    { "enqcmd",		{ Gva, M }, 0 },
> +  },

... M in many entries anyway. These can move one level up without needing
further adjustment.

For all of the above, however, an EVEX.W decode step looks to be missing.
Interestingly the doc consistently omits the (presumably) .W0 suffix for
these, having merely a trailing dot there. The issue (doc and/or code) is
present elsewhere as well.

> +  /* MOD_EVEX_MAP4_F9 */
> +  {
> +    { "movdiri",	{ Edq, Gdq }, 0 },
> +  },

Missing PREFIX_OPCODE?

> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h
> @@ -338,6 +338,75 @@
>      { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
>      { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
>    },
> +  /* PREFIX_EVEX_MAP4_66 */
> +  {
> +    { "wrssK",	{ M, Gdq }, 0 },
> +  },
> +  /* PREFIX_EVEX_MAP4_D8 */
> +  {
> +    { "sha1nexte", { XM, EXxmm }, 0 },
> +    { REG_TABLE (REG_EVEX_MAP4_D8_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DA */
> +  {
> +    { "sha1msg2", { XM, EXxmm }, 0 },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DA_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DB */
> +  {
> +    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DB_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DC */
> +  {
> +    { "sha256msg1", { XM, EXxmm }, 0 },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DC_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DD */
> +  {
> +    { "sha256msg2", { XM, EXxmm }, 0 },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DD_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DE */
> +  {
> +    { Bad_Opcode },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DE_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_DF */
> +  {
> +    { Bad_Opcode },
> +    { MOD_TABLE (MOD_EVEX_MAP4_DF_PREFIX_1) },
> +  },
> +  /* PREFIX_EVEX_MAP4_F0 */
> +  {
> +    { "crc32A",	{ Gdq, Eb }, 0 },
> +    { "invept",	{ Gm, Mo }, 0 },
> +  },
> +  /* PREFIX_EVEX_MAP4_F1 */
> +  {
> +    { "crc32Q",	{ Gdq, Ev }, 0 },
> +    { "invvpid", { Gm, Mo }, 0 },
> +    { "crc32Q",	{ Gdq, Ev }, 0 },
> +  },
> +  /* PREFIX_EVEX_MAP4_F2 */
> +  {
> +    { Bad_Opcode },
> +    { "invpcid", { Gm, M }, 0 },
> +  },

As opposed to the entries further up, INV* are indeed WIG, so validly
don't decode EVEX.W.

> +  /* PREFIX_EVEX_MAP4_F8 */
> +  {
> +    { Bad_Opcode },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_1) },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_2) },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_3) },
> +  },
> +  /* PREFIX_EVEX_MAP4_FC */
> +  {
> +    { "aadd",	{ Mdq, Gdq }, 0 },
> +    { "axor",	{ Mdq, Gdq }, 0 },
> +    { "aand",	{ Mdq, Gdq }, 0 },
> +    { "aor",	{ Mdq, Gdq }, 0 },
> +  },

This looks to match the PREFIX_0F38FC entry.

> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -49,3 +49,17 @@
>      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>    },
> +  /* REG_EVEX_0F38F3_L_0 */
> +  {
> +    { Bad_Opcode },
> +    { "blsrS",		{ VexGdq, Edq }, 0 },
> +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> +    { "blsiS",		{ VexGdq, Edq }, 0 },
> +  },

Compared to the REG_VEX_0F38F3_L_0 entry this lacks PREFIX_OPCODE, but
then the question is why you do not re-use that entry in the first
place.

> +  /* REG_EVEX_MAP4_D8_PREFIX_1 */
> +  {
> +    { "aesencwide128kl",	{ M }, 0 },
> +    { "aesdecwide128kl",	{ M }, 0 },
> +    { "aesencwide256kl",	{ M }, 0 },
> +    { "aesdecwide256kl",	{ M }, 0 },
> +  },

How is this different from the REG_0F38D8_PREFIX_1 entry?

> --- /dev/null
> +++ b/opcodes/i386-dis-evex-x86-64.h
> @@ -0,0 +1,140 @@
> +  /* X86_64_EVEX_0F90 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F90) },
> +  },
> +  /* X86_64_EVEX_0F91 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F91) },
> +  },
> +  /* X86_64_EVEX_0F92 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F92) },
> +  },
> +  /* X86_64_EVEX_0F93 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F93) },
> +  },
> +  /* X86_64_EVEX_0F3849 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F384B */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F38E0 */
> +  {
> +    { Bad_Opcode },
> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
> +  },

This and its sibling entries look to again fully match X86_64_VEX_0F38E<n>.

> --- a/opcodes/i386-dis-evex.h
> +++ b/opcodes/i386-dis-evex.h
>[...]
> @@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
>      { Bad_Opcode },
>      { Bad_Opcode },
>      { Bad_Opcode },
> -    { Bad_Opcode },
> +    { "sha1rnds4", { XM, EXxmm, Ib }, 0 },

PREFIX_OPCODE? EVEX.W?

>      { Bad_Opcode },
>      { Bad_Opcode },
>      { Bad_Opcode },
>      /* D8 */
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> -    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
> +    { "sha1msg1", { XM, EXxmm }, 0 },

Again.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
>[...]
> @@ -4522,12 +4594,13 @@ static const struct dis386 x86_64_table[][2] = {
>      { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
>    },
>  
> +#include "i386-dis-evex-x86-64.h"
> +
>    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
>    {
>      { Bad_Opcode },
>      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
>    },
> -
>  };

Please can EVEX entries come after VEX ones?

> @@ -8979,9 +9055,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>        if (!fetch_code (ins->info, ins->codep + 4))
>  	return &err_opcode;
>        /* The first byte after 0x62.  */
> +      if (*ins->codep & 0x8)
> +	ins->rex2 |= REX_B;
> +      if (!(*ins->codep & 0x10))
> +	ins->rex2 |= REX_R;

In patch 1 you check whether REX2 bits are properly consumed. While
in principle I agree with re-using ->rex2 here, don't you need to take
precautions to not print a random {rex2} (or whichever way it was
expressed) when one of these bits ends up not contributing to any
operand?

> @@ -8994,12 +9074,19 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	case 0x3:
>  	  vex_table_index = EVEX_0F3A;
>  	  break;
> +	case 0x4:
> +	  vex_table_index = EVEX_MAP4;
> +	  ins->evex_type = evex_from_legacy;
> +	  break;
>  	case 0x5:
>  	  vex_table_index = EVEX_MAP5;
>  	  break;
>  	case 0x6:
>  	  vex_table_index = EVEX_MAP6;
>  	  break;
> +	case 0x7:
> +	  vex_table_index = EVEX_MAP7;
> +	  break;
>  	}

How is map7 coming into play here?

> @@ -9042,12 +9128,31 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  
>        if (ins->address_mode != mode_64bit)
>  	{
> +	  if (ins->evex_type != evex_default
> +	      || (ins->rex2 & (REX_B | REX_X)))
> +	    return &bad_opcode;
>  	  /* In 16/32-bit mode silently ignore following bits.  */
>  	  ins->rex &= ~REX_B;
> -	  ins->vex.r = true;
> +	  ins->rex2 &= ~REX_R;
>  	}
>  
>        ins->need_vex = 4;
> +
> +      /* EVEX from legacy instructions require that EVEX.L'L, EVEX.z and the
> +	 lower 2 bits of EVEX.aaa must be 0.
> +	 EVEX from evex instrucions require that EVEX.L'L and the lower 2 bits of
> +	 EVEX.aaa must be 0.  */

EVEX from VEX you mean? Also, what about EVEX.z for those? Really the sole
difference between the two forms is EVEX.nd. The doc mentions EVEX.l for
the from-VEX case, but afaics there's no insn actually having that bit set
(which also matches what you say above).

> +      if (ins->evex_type == evex_from_legacy || ins->evex_type == evex_from_vex)
> +	{
> +	  if ((*ins->codep & 0x3) != 0
> +	      || (*ins->codep >> 6 & 0x3) != 0
> +	      || (ins->evex_type == evex_from_legacy
> +		  && (*ins->codep >> 5 & 0x1) != 0)

Please can the shifts be parenthesized agaist the &-s? Albeit - this is
perhaps easier to read as just &-s anyway.

> +	      || (ins->evex_type == evex_from_vex
> +		  && !ins->vex.b))

This doesn't look to match any part of the comment.

> @@ -9640,6 +9752,19 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        if (ins.last_repnz_prefix >= 0)
>  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
>        break;
> +
> +    case PREFIX_NP_OR_DATA:
> +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
> +	{
> +	  i386_dis_printf (info, dis_style_text, "(bad)");
> +	  ret = ins.end_codep - priv.the_buffer;
> +	  goto out;
> +	}
> +      break;
> +
> +    default:
> +      break;
> +
>      }

Why suddenly a default case? And why an extra blank line below it?

> @@ -11242,7 +11367,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      case b_swap_mode:
>        if (reg & 4)
>  	USED_REX (0);
> -      if (ins->rex)
> +      if (ins->rex || ins->rex2)
>  	names = att_names8rex;
>        else
>  	names = att_names8;

Isn't this already needed in the REX2 patch?

> @@ -11460,7 +11585,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>  
>    add += (ins->rex2 & REX_B) ? 16 : 0;
>  
> -  if (ins->vex.evex)
> +  if (ins->vex.evex && ins->evex_type == evex_default)
>      {
>  
>        /* Zeroing-masking is invalid for memory destinations. Set the flag
> @@ -11608,6 +11733,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>  		abort ();
>  	      if (ins->vex.evex)
>  		{
> +		  /* S/G EVEX insns require REX_X not to be set.  */
> +		  if (ins->rex2 & REX_X)

REX_X in a comment can only mean REX.X. You mean EVEX.X4 though, I think
(which internally is latched into the REX_X bit of ->rex2), and that also
needs to be set, not clear.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-07 13:29 ` Jan Beulich
@ 2023-11-09  8:38   ` Cui, Lili
  2023-11-09 11:07     ` Jan Beulich
  2023-11-09 11:12     ` Jan Beulich
  0 siblings, 2 replies; 26+ messages in thread
From: Cui, Lili @ 2023-11-09  8:38 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 03.11.2023 17:50, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
> 
> What I'm surprised by is that you don't have any change to
> cpu_flags_match().
> I don't think you can get away without for dual VEX/EVEX templates. I hope
> further down you'll find a sufficient explanation of what I think is going to be
> needed.
> 

I think it is ok. Or am I missing something?

 $ cat a.s
       .allow_index_reg
        .text
_start:
        .text
        .arch .noapx_f
ldtilecfg  (%r31,%rdx,2)
        .arch .noamx_tile
ldtilecfg  (%rax,%rdx,2)

$ as --64 a.s -o a.o
a.s: Assembler messages:
a.s:9: Error: `ldtilecfg' is not supported on `x86_64.noapx_f'
a.s:11: Error: `ldtilecfg' is not supported on `x86_64.noapx_f.noamx_tile'

> > @@ -1834,13 +1844,21 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}  // BMI2
> > instructions.
> >
> >  bzhi, 0xf5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_
> bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 }
> > +bzhi, 0xf5, BMI2&APX_F_64,
> >
> +Modrm|CheckOperandSize|EVex128|Space0F38|VexVVVV|SwapSources|N
> o_bSuf|
> > +No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > +Reg32|Reg64 }
> 
> With the now two prereq patches sent a few minutes ago, this wants to follow
> what you have ...
> 

I changed them locally and it worked. I think the prereq patches are almost done, I'll tweak the apx patches after you check them in.

> > @@ -3126,8 +3170,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
> >
> >  // AMX instructions.
> >
> > -ldtilecfg, 0x49/0, AMX_TILE&x64,
> Modrm|Vex128|Space0F38|VexW0|NoSuf,
> > { Unspecified|BaseIndex } -sttilecfg, 0x6649/0, AMX_TILE&x64,
> > Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
> > +ldtilecfg, 0x49/0, AMX_TILE&x64&(AMX_TILE|APX_F),
> > +Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf,
> { Unspecified|BaseIndex }
> > +sttilecfg, 0x6649/0, AMX_TILE&x64&(AMX_TILE|APX_F),
> > +Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf,
> { Unspecified|BaseIndex }
> 
> ... here, simplified to <feat>&(<feat>|APF_F). I would strongly recommend
> putting this in a macro, accompanied by a clarifying comment (not the least
> because (a && (a || b)) == a):
> 
> /* Many APX_F instructions require a 2nd feature to also be available.  When
>    the respective instructions were originally VEX-encoded, a single template
>    can cover both encodings.  While the expression below may look odd on
> the
>    surface (first of all we really mean "feat || (feat && APX_F)", but that is
>    nothing else than just "feat", and in turn the same as the expression below,
>    which is what i386-gen can grok), gas/config/tc-i386.c:cpu_flags_match()
>    will zap either of the inner two features depending on whether EVEX
>    encoding was determined to be necessary.  Only then will the actual feature
>    checking be carried out.  */
> #define APX_F(feat) feat&(feat|APF_F)
> 
Ok , I'll put them in opcodes/i386-opc.tbl.

> Since the list of bits we want to clear in cpu_flags_match()'s "any" is going to
> grow, we may want to consider using cpu_flags_and_not() there as well. But
> that can likely be a follow-on cleanup patch (if deemed desirable in the first
> place), so I would recommend not worrying about that right away.
> 
Sure.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-09  8:38   ` Cui, Lili
@ 2023-11-09 11:07     ` Jan Beulich
  2023-11-09 11:12     ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-09 11:07 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 09.11.2023 09:38, Cui, Lili wrote:
>> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
>>
>> On 03.11.2023 17:50, Cui, Lili wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
>>
>> What I'm surprised by is that you don't have any change to
>> cpu_flags_match().
>> I don't think you can get away without for dual VEX/EVEX templates. I hope
>> further down you'll find a sufficient explanation of what I think is going to be
>> needed.
>>
> 
> I think it is ok. Or am I missing something?
> 
>  $ cat a.s
>        .allow_index_reg
>         .text
> _start:
>         .text
>         .arch .noapx_f
> ldtilecfg  (%r31,%rdx,2)
>         .arch .noamx_tile
> ldtilecfg  (%rax,%rdx,2)
> 
> $ as --64 a.s -o a.o
> a.s: Assembler messages:
> a.s:9: Error: `ldtilecfg' is not supported on `x86_64.noapx_f'

Hmm, the error here ought to be about %r31 not being a valid register name.
In any event prior to .noamx_tile "is not supported" is wrong: The
instruction itself is valid, it's merely used with the wrong operand. What
you also want to check is "{evex} ldtilecfg (%rax,%rdx,2)".

> a.s:11: Error: `ldtilecfg' is not supported on `x86_64.noapx_f.noamx_tile'

This error message is correct.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-09  8:38   ` Cui, Lili
  2023-11-09 11:07     ` Jan Beulich
@ 2023-11-09 11:12     ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-09 11:12 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 09.11.2023 09:38, Cui, Lili wrote:
>> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
>>
>> On 03.11.2023 17:50, Cui, Lili wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
>>
>> What I'm surprised by is that you don't have any change to
>> cpu_flags_match().
>> I don't think you can get away without for dual VEX/EVEX templates. I hope
>> further down you'll find a sufficient explanation of what I think is going to be
>> needed.
>>
> 
> I think it is ok. Or am I missing something?
> 
>  $ cat a.s
>        .allow_index_reg
>         .text
> _start:
>         .text
>         .arch .noapx_f
> ldtilecfg  (%r31,%rdx,2)
>         .arch .noamx_tile
> ldtilecfg  (%rax,%rdx,2)
> 
> $ as --64 a.s -o a.o
> a.s: Assembler messages:
> a.s:9: Error: `ldtilecfg' is not supported on `x86_64.noapx_f'
> a.s:11: Error: `ldtilecfg' is not supported on `x86_64.noapx_f.noamx_tile'

Oh, in case my earlier reply to this turns out incorrect (for me
overlooking something), ...

>>> @@ -3126,8 +3170,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
>>>
>>>  // AMX instructions.
>>>
>>> -ldtilecfg, 0x49/0, AMX_TILE&x64,
>> Modrm|Vex128|Space0F38|VexW0|NoSuf,
>>> { Unspecified|BaseIndex } -sttilecfg, 0x6649/0, AMX_TILE&x64,
>>> Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
>>> +ldtilecfg, 0x49/0, AMX_TILE&x64&(AMX_TILE|APX_F),
>>> +Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf,
>> { Unspecified|BaseIndex }
>>> +sttilecfg, 0x6649/0, AMX_TILE&x64&(AMX_TILE|APX_F),
>>> +Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf,
>> { Unspecified|BaseIndex }
>>
>> ... here, simplified to <feat>&(<feat>|APF_F). I would strongly recommend
>> putting this in a macro, accompanied by a clarifying comment (not the least
>> because (a && (a || b)) == a):
>>
>> /* Many APX_F instructions require a 2nd feature to also be available.  When
>>    the respective instructions were originally VEX-encoded, a single template
>>    can cover both encodings.  While the expression below may look odd on
>> the
>>    surface (first of all we really mean "feat || (feat && APX_F)", but that is
>>    nothing else than just "feat", and in turn the same as the expression below,
>>    which is what i386-gen can grok), gas/config/tc-i386.c:cpu_flags_match()
>>    will zap either of the inner two features depending on whether EVEX
>>    encoding was determined to be necessary.  Only then will the actual feature
>>    checking be carried out.  */
>> #define APX_F(feat) feat&(feat|APF_F)
>>
> Ok , I'll put them in opcodes/i386-opc.tbl.

... the comment here may need adapting: Depending on where such clearing
happens, one or both of cpu_flags_match() and install_template() will want
referring to.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-07 14:53 ` Jan Beulich
@ 2023-11-09 12:31   ` Cui, Lili
  2023-11-09 13:05     ` Jan Beulich
  2023-11-14  7:42   ` Cui, Lili
  2023-11-15  6:03   ` Cui, Lili
  2 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-09 12:31 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 03.11.2023 17:50, Cui, Lili wrote:
> > --- a/gas/testsuite/gas/i386/x86-64-inval-movbe.l
> > +++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
> > @@ -1,29 +1,30 @@
> >  .*: Assembler messages:
> > -.*:4: Error: .*
> >  .*:5: Error: .*
> >  .*:6: Error: .*
> >  .*:7: Error: .*
> >  .*:8: Error: .*
> > -.*:11: Error: .*
> > +.*:9: Error: .*
> >  .*:12: Error: .*
> >  .*:13: Error: .*
> >  .*:14: Error: .*
> >  .*:15: Error: .*
> > +.*:16: Error: .*
> >  GAS LISTING .*
> >
> >
> >  [ 	]*1[ 	]+\# Check illegal movbe in 64bit mode\.
> >  [ 	]*2[ 	]+\.text
> > -[ 	]*3[ 	]+foo:
> > -[ 	]*4[ 	]+movbe	\(%rcx\),%bl
> > -[ 	]*5[ 	]+movbe	%ecx,%ebx
> > -[ 	]*6[ 	]+movbe	%bx,%rcx
> > -[ 	]*7[ 	]+movbe	%rbx,%rcx
> > -[ 	]*8[ 	]+movbe	%bl,\(%rcx\)
> > -[ 	]*9[ 	]+
> > -[ 	]*10[ 	]+\.intel_syntax noprefix
> > -[ 	]*11[ 	]+movbe bl, byte ptr \[rcx\]
> > -[ 	]*12[ 	]+movbe ebx, ecx
> > -[ 	]*13[ 	]+movbe rcx, bx
> > -[ 	]*14[ 	]+movbe rcx, rbx
> > -[ 	]*15[ 	]+movbe byte ptr \[rcx\], bl
> > +[ 	]*3[ 	]+\.arch \.noapx_f
> > +[ 	]*4[ 	]+foo:
> > +[ 	]*5[ 	]+movbe	\(%rcx\),%bl
> > +[ 	]*6[ 	]+movbe	%ecx,%ebx
> > +[ 	]*7[ 	]+movbe	%bx,%rcx
> > +[ 	]*8[ 	]+movbe	%rbx,%rcx
> > +[ 	]*9[ 	]+movbe	%bl,\(%rcx\)
> > +[ 	]*10[ 	]+
> > +[ 	]*11[ 	]+\.intel_syntax noprefix
> > +[ 	]*12[ 	]+movbe bl, byte ptr \[rcx\]
> > +[ 	]*13[ 	]+movbe ebx, ecx
> > +[ 	]*14[ 	]+movbe rcx, bx
> > +[ 	]*15[ 	]+movbe rcx, rbx
> > +[ 	]*16[ 	]+movbe byte ptr \[rcx\], bl
> 
> To avoid the need to fiddle with this file, did you consider changing the test's
> command line options instead? In any event ...
> 

Do you mean disabling apx_f with the command line "#as..."?  I tried "#as -march=+noapx_f" , but not worked.

> > --- a/gas/testsuite/gas/i386/x86-64-inval-movbe.s
> > +++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.s
> > @@ -1,5 +1,6 @@
> >  # Check illegal movbe in 64bit mode.
> >  	.text
> > +	.arch .noapx_f
> >  foo:
> >  	movbe	(%rcx),%bl
> >  	movbe	%ecx,%ebx
> 
> ... the comment here wants expanding, to (briefly) mention the deliberate
> exclusion of APX.
> 

Done. 

> > --- a/opcodes/i386-dis-evex-len.h
> > +++ b/opcodes/i386-dis-evex-len.h
> > @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
> >      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
> >    },
> >
> > +  /* EVEX_LEN_0F38F2 */
> > +  {
> > +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
> > +  },
> 
> There's no sign of a prefix decode step here.
> 

The prefix decoding step is in the NF patch and its dependent patches (Part II 2/6). Both are suspended currently.

> > --- a/opcodes/i386-dis-evex-mod.h
> > +++ b/opcodes/i386-dis-evex-mod.h
> > @@ -1 +1,43 @@
> >  /* Nothing at present.  */
> 
> This comment needs to go away when new stuff is added here. However, I'm
> sure I requested before that no new entries be put here which have only one
> of their two slots populated. The reg-only and mem-only aspects can be
> expressed via proper choice of operand specifiers, at least in almost all cases.
> Note how you already use ...
> 
> > +  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "encodekey128", { Gd, Ed }, 0 },  },
> > +  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "encodekey256", { Gd, Ed }, 0 },  },
> > +  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
> > +  {
> > +    { "aesenc128kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
> > +  {
> > +    { "aesdec128kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
> > +  {
> > +    { "aesenc256kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
> > +  {
> > +    { "aesdec256kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
> > +  {
> > +    { "enqcmds",	{ Gva, M },  0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
> > +  {
> > +    { "movdir64b",	{ Gva, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
> > +  {
> > +    { "enqcmd",		{ Gva, M }, 0 },
> > +  },
> 
> ... M in many entries anyway. These can move one level up without needing
> further adjustment.
> 
> For all of the above, however, an EVEX.W decode step looks to be missing.
> Interestingly the doc consistently omits the (presumably) .W0 suffix for these,
> having merely a trailing dot there. The issue (doc and/or code) is present
> elsewhere as well.
> 

Do you mean that all EVEX promoted instructions need to be explicitly .W0 in the doc? in the case of, I'll handle it uniformly in get_valid_dis386.

Thanks,
Lili.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-09 12:31   ` Cui, Lili
@ 2023-11-09 13:05     ` Jan Beulich
  2023-11-09 14:57       ` Cui, Lili
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-09 13:05 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 09.11.2023 13:31, Cui, Lili wrote:
>> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
>>
>> On 03.11.2023 17:50, Cui, Lili wrote:
>>> --- a/gas/testsuite/gas/i386/x86-64-inval-movbe.l
>>> +++ b/gas/testsuite/gas/i386/x86-64-inval-movbe.l
>>> @@ -1,29 +1,30 @@
>>>  .*: Assembler messages:
>>> -.*:4: Error: .*
>>>  .*:5: Error: .*
>>>  .*:6: Error: .*
>>>  .*:7: Error: .*
>>>  .*:8: Error: .*
>>> -.*:11: Error: .*
>>> +.*:9: Error: .*
>>>  .*:12: Error: .*
>>>  .*:13: Error: .*
>>>  .*:14: Error: .*
>>>  .*:15: Error: .*
>>> +.*:16: Error: .*
>>>  GAS LISTING .*
>>>
>>>
>>>  [ 	]*1[ 	]+\# Check illegal movbe in 64bit mode\.
>>>  [ 	]*2[ 	]+\.text
>>> -[ 	]*3[ 	]+foo:
>>> -[ 	]*4[ 	]+movbe	\(%rcx\),%bl
>>> -[ 	]*5[ 	]+movbe	%ecx,%ebx
>>> -[ 	]*6[ 	]+movbe	%bx,%rcx
>>> -[ 	]*7[ 	]+movbe	%rbx,%rcx
>>> -[ 	]*8[ 	]+movbe	%bl,\(%rcx\)
>>> -[ 	]*9[ 	]+
>>> -[ 	]*10[ 	]+\.intel_syntax noprefix
>>> -[ 	]*11[ 	]+movbe bl, byte ptr \[rcx\]
>>> -[ 	]*12[ 	]+movbe ebx, ecx
>>> -[ 	]*13[ 	]+movbe rcx, bx
>>> -[ 	]*14[ 	]+movbe rcx, rbx
>>> -[ 	]*15[ 	]+movbe byte ptr \[rcx\], bl
>>> +[ 	]*3[ 	]+\.arch \.noapx_f
>>> +[ 	]*4[ 	]+foo:
>>> +[ 	]*5[ 	]+movbe	\(%rcx\),%bl
>>> +[ 	]*6[ 	]+movbe	%ecx,%ebx
>>> +[ 	]*7[ 	]+movbe	%bx,%rcx
>>> +[ 	]*8[ 	]+movbe	%rbx,%rcx
>>> +[ 	]*9[ 	]+movbe	%bl,\(%rcx\)
>>> +[ 	]*10[ 	]+
>>> +[ 	]*11[ 	]+\.intel_syntax noprefix
>>> +[ 	]*12[ 	]+movbe bl, byte ptr \[rcx\]
>>> +[ 	]*13[ 	]+movbe ebx, ecx
>>> +[ 	]*14[ 	]+movbe rcx, bx
>>> +[ 	]*15[ 	]+movbe rcx, rbx
>>> +[ 	]*16[ 	]+movbe byte ptr \[rcx\], bl
>>
>> To avoid the need to fiddle with this file, did you consider changing the test's
>> command line options instead? In any event ...
>>
> 
> Do you mean disabling apx_f with the command line "#as..."?

Yes.

>  I tried "#as -march=+noapx_f" , but not worked.

Listing tests have their command line options passed directly from the
.exp file.

>>> --- a/opcodes/i386-dis-evex-len.h
>>> +++ b/opcodes/i386-dis-evex-len.h
>>> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
>>>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
>>>    },
>>>
>>> +  /* EVEX_LEN_0F38F2 */
>>> +  {
>>> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
>>> +  },
>>
>> There's no sign of a prefix decode step here.
>>
> 
> The prefix decoding step is in the NF patch and its dependent patches (Part II 2/6). Both are suspended currently.

But prefix decoding is orthogonal to NF handling. Why would that
step be added only there?

>>> --- a/opcodes/i386-dis-evex-mod.h
>>> +++ b/opcodes/i386-dis-evex-mod.h
>>> @@ -1 +1,43 @@
>>>  /* Nothing at present.  */
>>
>> This comment needs to go away when new stuff is added here. However, I'm
>> sure I requested before that no new entries be put here which have only one
>> of their two slots populated. The reg-only and mem-only aspects can be
>> expressed via proper choice of operand specifiers, at least in almost all cases.
>> Note how you already use ...
>>
>>> +  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { "encodekey128", { Gd, Ed }, 0 },  },
>>> +  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { "encodekey256", { Gd, Ed }, 0 },  },
>>> +  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
>>> +  {
>>> +    { "aesenc128kl",    { XM, M }, 0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
>>> +  {
>>> +    { "aesdec128kl",    { XM, M }, 0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
>>> +  {
>>> +    { "aesenc256kl",    { XM, M }, 0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
>>> +  {
>>> +    { "aesdec256kl",    { XM, M }, 0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
>>> +  {
>>> +    { "enqcmds",	{ Gva, M },  0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
>>> +  {
>>> +    { "movdir64b",	{ Gva, M }, 0 },
>>> +  },
>>> +  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
>>> +  {
>>> +    { "enqcmd",		{ Gva, M }, 0 },
>>> +  },
>>
>> ... M in many entries anyway. These can move one level up without needing
>> further adjustment.
>>
>> For all of the above, however, an EVEX.W decode step looks to be missing.
>> Interestingly the doc consistently omits the (presumably) .W0 suffix for these,
>> having merely a trailing dot there. The issue (doc and/or code) is present
>> elsewhere as well.
>>
> 
> Do you mean that all EVEX promoted instructions need to be explicitly .W0 in
> the doc?

Not exactly, no. The doc should explicitly state what the behavior wrt
EVEX.W is. I'm merely guessing that for the insns where it's missing
W0 might be meant (on the assumption that a common goal ought to be to
waste as little encoding space as possible). If it's WIG (in the present
doc typically spelled IGNORED) instead, ...

> in the case of, I'll handle it uniformly in get_valid_dis386.

... no W decoding step would be needed of course.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-09 13:05     ` Jan Beulich
@ 2023-11-09 14:57       ` Cui, Lili
  2023-11-09 15:39         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-09 14:57 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >> To avoid the need to fiddle with this file, did you consider changing
> >> the test's command line options instead? In any event ...
> >>
> >
> > Do you mean disabling apx_f with the command line "#as..."?
> 
> Yes.
> 
> >  I tried "#as -march=+noapx_f" , but not worked.
> 
> Listing tests have their command line options passed directly from the .exp
> file.
> 

It worked, thanks.

+run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"

> >>> --- a/opcodes/i386-dis-evex-len.h
> >>> +++ b/opcodes/i386-dis-evex-len.h
> >>> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
> >>>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
> >>>    },
> >>>
> >>> +  /* EVEX_LEN_0F38F2 */
> >>> +  {
> >>> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
> >>> +  },
> >>
> >> There's no sign of a prefix decode step here.
> >>
> >
> > The prefix decoding step is in the NF patch and its dependent patches (Part
> II 2/6). Both are suspended currently.
> 
> But prefix decoding is orthogonal to NF handling. Why would that step be
> added only there?
>

In V1 NF patch, we treated {nf} as prefix, {evex} and {nf} need to be selected or empty.

"XN" => print "{nf} " pseudo prefix when EVEX.NF = 1. When EVEX.NF = 0 and under certain conditions, need to print {evex}

  For XN:
              if (ins->vex.nf == true)
                {
                  *ins->obufp++ = '{';
                  *ins->obufp++ = 'n';
                  *ins->obufp++ = 'f';
                  *ins->obufp++ = '}';
                  *ins->obufp++ = ' ';
                }
              else if (ins->evex_type == evex_from_legacy && !ins->vex.b)
                {
                  *ins->obufp++ = '{';
                  *ins->obufp++ = 'e';
                  *ins->obufp++ = 'v';
                  *ins->obufp++ = 'e';
                  *ins->obufp++ = 'x';
                  *ins->obufp++ = '}';
                  *ins->obufp++ = ' ';
                }

Now there is a high probability that nf will be treated as suffix, the specific expression is still uncertain. Maybe we should add %XE to them in this patch. make sure its output is correct.

> >>> --- a/opcodes/i386-dis-evex-mod.h
> >>> +++ b/opcodes/i386-dis-evex-mod.h
> >>> @@ -1 +1,43 @@
> >>>  /* Nothing at present.  */
> >>
> >> This comment needs to go away when new stuff is added here. However,
> >> I'm sure I requested before that no new entries be put here which
> >> have only one of their two slots populated. The reg-only and mem-only
> >> aspects can be expressed via proper choice of operand specifiers, at least
> in almost all cases.
> >> Note how you already use ...
> >>
> >>> +  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { "encodekey128", { Gd, Ed }, 0 },  },
> >>> +  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { "encodekey256", { Gd, Ed }, 0 },  },
> >>> +  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
> >>> +  {
> >>> +    { "aesenc128kl",    { XM, M }, 0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
> >>> +  {
> >>> +    { "aesdec128kl",    { XM, M }, 0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
> >>> +  {
> >>> +    { "aesenc256kl",    { XM, M }, 0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
> >>> +  {
> >>> +    { "aesdec256kl",    { XM, M }, 0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
> >>> +  {
> >>> +    { "enqcmds",	{ Gva, M },  0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
> >>> +  {
> >>> +    { "movdir64b",	{ Gva, M }, 0 },
> >>> +  },
> >>> +  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
> >>> +  {
> >>> +    { "enqcmd",		{ Gva, M }, 0 },
> >>> +  },
> >>
> >> ... M in many entries anyway. These can move one level up without
> >> needing further adjustment.
> >>
> >> For all of the above, however, an EVEX.W decode step looks to be missing.
> >> Interestingly the doc consistently omits the (presumably) .W0 suffix
> >> for these, having merely a trailing dot there. The issue (doc and/or
> >> code) is present elsewhere as well.
> >>
> >
> > Do you mean that all EVEX promoted instructions need to be explicitly
> > .W0 in the doc?
> 
> Not exactly, no. The doc should explicitly state what the behavior wrt EVEX.W
> is. I'm merely guessing that for the insns where it's missing
> W0 might be meant (on the assumption that a common goal ought to be to
> waste as little encoding space as possible). If it's WIG (in the present doc
> typically spelled IGNORED) instead, ...
> 
> > in the case of, I'll handle it uniformly in get_valid_dis386.
> 
> ... no W decoding step would be needed of course.
> 
Ok

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-09 14:57       ` Cui, Lili
@ 2023-11-09 15:39         ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-09 15:39 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 09.11.2023 15:57, Cui, Lili wrote:
>>>>> --- a/opcodes/i386-dis-evex-len.h
>>>>> +++ b/opcodes/i386-dis-evex-len.h
>>>>> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
>>>>>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
>>>>>    },
>>>>>
>>>>> +  /* EVEX_LEN_0F38F2 */
>>>>> +  {
>>>>> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
>>>>> +  },
>>>>
>>>> There's no sign of a prefix decode step here.
>>>>
>>>
>>> The prefix decoding step is in the NF patch and its dependent patches (Part
>> II 2/6). Both are suspended currently.
>>
>> But prefix decoding is orthogonal to NF handling. Why would that step be
>> added only there?
>>
> 
> In V1 NF patch, we treated {nf} as prefix, {evex} and {nf} need to be selected or empty.
> 
> "XN" => print "{nf} " pseudo prefix when EVEX.NF = 1. When EVEX.NF = 0 and under certain conditions, need to print {evex}
> 
>   For XN:
>               if (ins->vex.nf == true)
>                 {
>                   *ins->obufp++ = '{';
>                   *ins->obufp++ = 'n';
>                   *ins->obufp++ = 'f';
>                   *ins->obufp++ = '}';
>                   *ins->obufp++ = ' ';
>                 }
>               else if (ins->evex_type == evex_from_legacy && !ins->vex.b)
>                 {
>                   *ins->obufp++ = '{';
>                   *ins->obufp++ = 'e';
>                   *ins->obufp++ = 'v';
>                   *ins->obufp++ = 'e';
>                   *ins->obufp++ = 'x';
>                   *ins->obufp++ = '}';
>                   *ins->obufp++ = ' ';
>                 }
> 
> Now there is a high probability that nf will be treated as suffix, the specific expression is still uncertain. Maybe we should add %XE to them in this patch. make sure its output is correct.

I'm afraid there's a terminology issue here: When I say "prefix decode step"
in the context of the binutils disassembler, then that's the decoding step
involving prefix_table[]. I.e. either 66, f2, f3 (with REX2) or EVEX.pp.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-06 17:07 ` Jan Beulich
@ 2023-11-13  5:53   ` Cui, Lili
  2023-11-13  8:34     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-13  5:53 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 03.11.2023 17:50, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -3672,9 +3672,10 @@ install_template (const insn_template *t)
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.
> */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >    {
> > -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > -	   || maybe_cpu (t, CpuFMA))
> > -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> > +    if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > +	 || maybe_cpu (t, CpuFMA) ||  maybe_cpu (t, CpuAMX_TILE))
> > +	&& (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
> > +	    || maybe_cpu (t, CpuAPX_F)))
> 
> Something's odd with formatting (indentation) here. I first thought I had
> screwed up in my patch, but looking back things appear to be correct there.
> 

Rebase  with master. 

> > @@ -3689,12 +3690,11 @@ install_template (const insn_template *t)
> >  		i.tm.cpu.bitfield.cpuavx = 1;
> >  	      else
> >  		{
> > -		  gas_assert (!i.tm.cpu.bitfield.isa);
> >  		  i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
> 
> You may not remove the assertion here, or else the assignment could silently
> overwrite a value. Can you explain why you did the removal?
>

Rebased with master. Sorry I just saw the comments here.

> > @@ -3885,6 +3885,14 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || is_evex_encoding (t);  }
> >
> > +static INLINE bool
> > +is_any_apx_evex_encoding (void)
> > +{
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2); }
> 
> The use of i.rex2 here doesn't fit the name; the sole user has first checked
> that no legacy encoding is going to be used, and that's a prereq here. Such a
> prereq needs spelling out, such that one can be easily aware when possibly
> adding another caller.
> 
> Also, what does "any" stand for in the name here ...
> 

How about "check_if_any_vex_is_evex_apx_encoding ()" ?

> >  static INLINE bool
> >  is_any_apx_rex2_encoding (void)
> 
> ... and also the one here? In is_any_vex_encoding() it refers to VEX/XOP/EVEX.
> 

Changed it to  is_apx_rex2_encoding.

> > @@ -4161,6 +4169,27 @@ build_rex2_prefix (void)
> >  		    | (i.rex2 << 4) | i.rex);
> >  }
> >
> > +/* Build the EVEX prefix (4-byte) for evex insn
> > +   | 62h |
> > +   | `R`X`B`R' | B'mmm |
> > +   | W | v`v`v`v | `x' | pp |
> > +   | z| L'L | b | `v | aaa |
> > +*/
> > +static void
> > +build_apx_evex_prefix (void)
> > +{
> > +  build_evex_prefix ();
> > +  if (i.rex2 & REX_R)
> > +    i.vex.bytes[1] &= 0xef;
> 
> I think this would read easier as ~0x10 (similarly below then). I also think ...
> 

Done.

> > +  if (i.vex.register_specifier
> > +      && register_number (i.vex.register_specifier) > 0xf)
> > +    i.vex.bytes[3] &= 0xf7;
> > +  if (i.rex2 & REX_B)
> > +    i.vex.bytes[1] |= 0x08;
> 
> ... it would help if the byte 1 updates were kept together (the compiler may
> even produce better code then), and maybe ...
> 
> > +  if (i.rex2 & REX_X)
> > +    i.vex.bytes[2] &= 0xfb;
> 
> ... the byte 2 update ahead of the byte 3 one.
> 

Done.

> Also why do you use register_number() here, and not the RegRex2 flag?
> 

RegRex2 flag is better, changed it.

> > @@ -5624,19 +5653,42 @@ md_assemble (char *line)
> >  	}
> >
> >        /* Check for explicit REX2 prefix.  */
> > -      if (i.rex2 || i.rex2_encoding)
> > +      if (i.rex2_encoding)
> 
> This change ...
> 
> >  	{
> >  	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> 
> ... further invalidates this message. Yet iirc you said you removed the
> i.rex2 part of the check anyway from patch 1.
> 
> >  	  return;
> >  	}
> >
> > -      if (i.tm.opcode_modifier.vex)
> > +      if (is_any_apx_evex_encoding ())
> > +	{
> > +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
> (i.prefix[DATA_PREFIX] != 0))
> > +	    {
> > +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> 
> Perhaps better assert that no other embedded prefix was already recorded
> here?

Added the code as below, I added as_bad instead of assert, I think this is a input error and not a gas internal error, right?  Besides REX_PREFIX?

      if (check_if_any_vex_is_evex_apx_encoding ())
        {
          if (i.tm.opcode_space == SPACE_EVEXMAP4 && (i.prefix[DATA_PREFIX] != 0))
            {

              i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
              i.prefix[DATA_PREFIX] = 0;

              /*  Prefixes other than the rex prefix cannot be used with the data prefix.  */
              const unsigned char *p = i.prefix;

              for (j = 0; j < ARRAY_SIZE (i.prefix); ++j, ++p)
                {
                  if (!*p)
                    continue;

                  switch (j)
                    {
                    case DATA_PREFIX:
                    case REX_PREFIX:
                      break;
                    default:
                      as_bad (_("unexpecting prefix %x together with DATA "
                                "prefix in front of evex-promoted apx "
                                "instruction "), *p);
                      return;
                    }
                }
            }

          build_apx_evex_prefix ();
      }

> 
> > +	      i.prefix[DATA_PREFIX] = 0;
> > +	    }
> > +
> > +	  build_apx_evex_prefix ();
> > +
> > +	  /* Encode the NDD bit of the instruction promoted from the legacy
> > +	     space.  */
> > +	  if (i.vex.register_specifier && i.tm.opcode_space ==
> SPACE_EVEXMAP4)
> > +	    i.vex.bytes[3] |= 0x10;
> 
> Why the restriction to map 3? And why is this not part of
> build_apx_evex_prefix()?

It is the ND bit, which is added in advance here. After reading the comments below, I'll move this to an NDD patch.

> 
> > +	  /* Encode the NF bit of the instruction promoted from legacy and
> vex
> > +	     space.  */
> > +	  if (i.has_nf)
> > +	    i.vex.bytes[3] |= 0x04;
> 
> This wants to move to the patch actually introducing NF handling.
> 

Done.

> > @@ -5663,16 +5715,17 @@ md_assemble (char *line)
> >       instruction already has a prefix, we need to convert old
> >       registers to new ones.  */
> >
> > -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && (i.rex != 0 || i.rex2 != 0)))
> > +  if (!is_any_vex_encoding (&i.tm)
> > +      && ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > +	   && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > +	  || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > +	      && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > +	  || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > +	       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > +	      && (i.rex != 0 || i.rex2 != 0))))
> 
> I'm a little puzzled by this (hard to read) adjustment:
> is_any_vex_encoding() basically excludes most new templates added here,
> yet for those permitting Reg8 you still need to enforce the EVEX-imposed
> restriction.
> 

Moved is_any_vex_encoding() into if body to constrain i.rex |= REX_OPCODE.

if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
        && (i.op[0].regs->reg_flags & RegRex64) != 0)
       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
           && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
            || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
           && (i.rex != 0 || i.rex2 != 0))))
    {
      int x;

      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
        i.rex |= REX_OPCODE;

> > @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
> > static int  check_EgprOperands (const insn_template *t)  {
> > -  if (t->opcode_modifier.noegpr)
> > +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
> >      {
> >        for (unsigned int op = 0; op < i.operands; op++)
> >  	{
> 
> What is this change about?
> 

After merging vex and evex, evex exits here early and all evex supports egpr.

> > @@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry
> > *r)
> >
> >    if (r->reg_flags & RegRex2)
> >      {
> > +      if (is_evex_encoding (current_templates->start))
> > +	i.vec_encoding = vex_encoding_evex;
> 
> What if the APX template isn't first in the group?
> 

If apx_f is not supported, it will return false, just after this code. Oh, better to move it to the back. Done.

  if (r->reg_flags & RegRex2)
    {
      if (current_templates->start->opcode_modifier.evex)
        i.vec_encoding = vex_encoding_evex;

      if (!cpu_arch_flags.bitfield.cpuapx_f
          || flag_code != CODE_64BIT)
        return false;
    }

> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -975,6 +975,7 @@ typedef struct insn_template
> >       1: 0F opcode prefix / space.
> >       2: 0F38 opcode prefix / space.
> >       3: 0F3A opcode prefix / space.
> > +     4: EVEXMAP4 opcode prefix / space.
> >       5: EVEXMAP5 opcode prefix / space.
> >       6: EVEXMAP6 opcode prefix / space.
> >       7: VEXMAP7 opcode prefix / space.
> > @@ -986,6 +987,7 @@ typedef struct insn_template
> >  #define SPACE_0F	1
> >  #define SPACE_0F38	2
> >  #define SPACE_0F3A	3
> > +#define SPACE_EVEXMAP4  4
> >  #define SPACE_EVEXMAP5	5
> >  #define SPACE_EVEXMAP6	6
> >  #define SPACE_VEXMAP7	7
> 
> Nit: Please can padding here match surrounding code?
> 

Done.

> > @@ -338,6 +342,7 @@ adc, 0x14, 0, W|No_sSuf, {
> > Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }  adc,
> 0x80/2, 0,
> > W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> >
> >  neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> > +
> >  not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> >
> >  aaa, 0x37, No64, NoSuf, {}
> 
> Nit: In an already overlarge patch it is extremely helpful if any unrelated
> changed could be omitted.
> 

Done.

> > @@ -1316,13 +1321,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >
> >  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invept, 0xf3f0, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> >  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invvpid, 0xf3f1, APX_F_64&EPT, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> >
> >  // INVPCID instruction
> >
> >  invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invpcid, 0x660f3882,
> > INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex,
> Reg64
> > }
> > +invpcid, 0xf3f2, APX_F_64&INVPCID, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> 
> While ordering doesn't matter functionality wise, overall readability imo
> would end up improved if you had EPT&APX_F and INVPCID&APX_f here (and
> the likewise elsewhere, albeit it looks in other cases you already have things
> the other way round).
> 

Done,  it should be consistent.

> > @@ -3310,6 +3364,7 @@ prefetchit1, 0xf18/6, PREFETCHI&x64,
> > Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex  // CMPCCXADD
> instructions.
> >
> >  cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64,
> > Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf,
> {
> > Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
> > +cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&x64&APX_F_64,
> >
> +Modrm|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|N
> oSuf, {
> > +Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
> 
> Nit: Redundant x64.
> 

Done, thanks!

Lili.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-13  5:53   ` Cui, Lili
@ 2023-11-13  8:34     ` Jan Beulich
  2023-11-14  3:12       ` Cui, Lili
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-13  8:34 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 13.11.2023 06:53, Cui, Lili wrote:
>> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
>>
>> On 03.11.2023 17:50, Cui, Lili wrote:
>>> @@ -3885,6 +3885,14 @@ is_any_vex_encoding (const insn_template *t)
>>>    return t->opcode_modifier.vex || is_evex_encoding (t);  }
>>>
>>> +static INLINE bool
>>> +is_any_apx_evex_encoding (void)
>>> +{
>>> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
>>> +    || (i.vex.register_specifier
>>> +	&& i.vex.register_specifier->reg_flags & RegRex2); }
>>
>> The use of i.rex2 here doesn't fit the name; the sole user has first checked
>> that no legacy encoding is going to be used, and that's a prereq here. Such a
>> prereq needs spelling out, such that one can be easily aware when possibly
>> adding another caller.
>>
>> Also, what does "any" stand for in the name here ...
>>
> 
> How about "check_if_any_vex_is_evex_apx_encoding ()" ?

That still has "any" in the name for an unexplained reason, and the
new name is yet longer and hence yet more difficult to follow. My
question was rather towards simply dropping the "any" from the name.
Unless of course you can clarify what "any" means there.

>>> @@ -5624,19 +5653,42 @@ md_assemble (char *line)
>>>[...]
>>> -      if (i.tm.opcode_modifier.vex)
>>> +      if (is_any_apx_evex_encoding ())
>>> +	{
>>> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
>> (i.prefix[DATA_PREFIX] != 0))
>>> +	    {
>>> +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
>>
>> Perhaps better assert that no other embedded prefix was already recorded
>> here?
> 
> Added the code as below, I added as_bad instead of assert, I think this is a input error and not a gas internal error, right?  Besides REX_PREFIX?
> 
>       if (check_if_any_vex_is_evex_apx_encoding ())
>         {
>           if (i.tm.opcode_space == SPACE_EVEXMAP4 && (i.prefix[DATA_PREFIX] != 0))
>             {
> 
>               i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
>               i.prefix[DATA_PREFIX] = 0;
> 
>               /*  Prefixes other than the rex prefix cannot be used with the data prefix.  */
>               const unsigned char *p = i.prefix;
> 
>               for (j = 0; j < ARRAY_SIZE (i.prefix); ++j, ++p)
>                 {
>                   if (!*p)
>                     continue;
> 
>                   switch (j)
>                     {
>                     case DATA_PREFIX:
>                     case REX_PREFIX:
>                       break;
>                     default:
>                       as_bad (_("unexpecting prefix %x together with DATA "
>                                 "prefix in front of evex-promoted apx "
>                                 "instruction "), *p);
>                       return;
>                     }
>                 }
>             }
> 
>           build_apx_evex_prefix ();
>       }

We already have code refusing most legacy prefixes when ahead of VEX/EVEX,
don't we? I'd expect that code to take care of bogus DATA prefixes here as
well. Possibly the conversion needs dealing with differently if here we
can't tell a user-supplied i.prefix[DATA_PREFIX] from one internally
derived from operands which were supplied? E.g. by not having
process_suffix() invoke add_prefix() in this particular case at all, but
instead modify i.tm accordingly?

>>> @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
>>> static int  check_EgprOperands (const insn_template *t)  {
>>> -  if (t->opcode_modifier.noegpr)
>>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
>>>      {
>>>        for (unsigned int op = 0; op < i.operands; op++)
>>>  	{
>>
>> What is this change about?
>>
> 
> After merging vex and evex, evex exits here early and all evex supports egpr.

Yet / hence no EVEX template should ever have NoEgpr set. IOW I still
don't follow why this change would be needed.

>>> @@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry
>>> *r)
>>>
>>>    if (r->reg_flags & RegRex2)
>>>      {
>>> +      if (is_evex_encoding (current_templates->start))
>>> +	i.vec_encoding = vex_encoding_evex;
>>
>> What if the APX template isn't first in the group?
>>
> 
> If apx_f is not supported, it will return false, just after this code. Oh, better to move it to the back. Done.
> 
>   if (r->reg_flags & RegRex2)
>     {
>       if (current_templates->start->opcode_modifier.evex)
>         i.vec_encoding = vex_encoding_evex;
> 
>       if (!cpu_arch_flags.bitfield.cpuapx_f
>           || flag_code != CODE_64BIT)
>         return false;
>     }

Hmm, as before there's a use of current_templates here, which I'm afraid
isn't appropriate. Whether a register is legitimate to use depends on only
the present mode we're assembling in (flag_code + cpu_arch_flags, plus a
few other globals for certain special cases). There may not be dependencies
on the insn we're processing. Or if at all (yet even that would need a
pretty good justification), then only on the collective set of all
templates in the chosen template group.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-13  8:34     ` Jan Beulich
@ 2023-11-14  3:12       ` Cui, Lili
  2023-11-14 10:29         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-14  3:12 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 13.11.2023 06:53, Cui, Lili wrote:
> >> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> >>
> >> On 03.11.2023 17:50, Cui, Lili wrote:
> >>> @@ -3885,6 +3885,14 @@ is_any_vex_encoding (const insn_template *t)
> >>>    return t->opcode_modifier.vex || is_evex_encoding (t);  }
> >>>
> >>> +static INLINE bool
> >>> +is_any_apx_evex_encoding (void)
> >>> +{
> >>> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> >>> +    || (i.vex.register_specifier
> >>> +	&& i.vex.register_specifier->reg_flags & RegRex2); }
> >>
> >> The use of i.rex2 here doesn't fit the name; the sole user has first
> >> checked that no legacy encoding is going to be used, and that's a
> >> prereq here. Such a prereq needs spelling out, such that one can be
> >> easily aware when possibly adding another caller.
> >>
> >> Also, what does "any" stand for in the name here ...
> >>
> >
> > How about "check_if_any_vex_is_evex_apx_encoding ()" ?
> 
> That still has "any" in the name for an unexplained reason, and the new
> name is yet longer and hence yet more difficult to follow. My question was
> rather towards simply dropping the "any" from the name.
> Unless of course you can clarify what "any" means there.
> 

Done.

> >>> @@ -5624,19 +5653,42 @@ md_assemble (char *line) [...]
> >>> -      if (i.tm.opcode_modifier.vex)
> >>> +      if (is_any_apx_evex_encoding ())
> >>> +	{
> >>> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
> >> (i.prefix[DATA_PREFIX] != 0))
> >>> +	    {
> >>> +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> >>
> >> Perhaps better assert that no other embedded prefix was already
> >> recorded here?
> >
> > Added the code as below, I added as_bad instead of assert, I think this is a
> input error and not a gas internal error, right?  Besides REX_PREFIX?
> >
> >       if (check_if_any_vex_is_evex_apx_encoding ())
> >         {
> >           if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
> (i.prefix[DATA_PREFIX] != 0))
> >             {
> >
> >               i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> >               i.prefix[DATA_PREFIX] = 0;
> >
> >               /*  Prefixes other than the rex prefix cannot be used with the data
> prefix.  */
> >               const unsigned char *p = i.prefix;
> >
> >               for (j = 0; j < ARRAY_SIZE (i.prefix); ++j, ++p)
> >                 {
> >                   if (!*p)
> >                     continue;
> >
> >                   switch (j)
> >                     {
> >                     case DATA_PREFIX:
> >                     case REX_PREFIX:
> >                       break;
> >                     default:
> >                       as_bad (_("unexpecting prefix %x together with DATA "
> >                                 "prefix in front of evex-promoted apx "
> >                                 "instruction "), *p);
> >                       return;
> >                     }
> >                 }
> >             }
> >
> >           build_apx_evex_prefix ();
> >       }
> 
> We already have code refusing most legacy prefixes when ahead of VEX/EVEX,
> don't we? I'd expect that code to take care of bogus DATA prefixes here as
> well. Possibly the conversion needs dealing with differently if here we can't
> tell a user-supplied i.prefix[DATA_PREFIX] from one internally derived from
> operands which were supplied? E.g. by not having
> process_suffix() invoke add_prefix() in this particular case at all, but instead
> modify i.tm accordingly?
> 

Done.

-         if (!add_prefix (prefix))
-           return 0;
+         /* The DATA PREFIX of EVEX promoted from legacy APX instructions
+            needs to be adjusted.  */
+         if (i.tm.opcode_space == SPACE_EVEXMAP4)
+           i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+         else
+           if (!add_prefix (prefix))
+             return 0;

> >>> @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
> >>> static int  check_EgprOperands (const insn_template *t)  {
> >>> -  if (t->opcode_modifier.noegpr)
> >>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
> >>>      {
> >>>        for (unsigned int op = 0; op < i.operands; op++)
> >>>  	{
> >>
> >> What is this change about?
> >>
> >
> > After merging vex and evex, evex exits here early and all evex supports egpr.
> 
> Yet / hence no EVEX template should ever have NoEgpr set. IOW I still don't
> follow why this change would be needed.
>

After merging VEX and EVEX templates, they shared NoEgpr ==1 , and if !EVEX is not added, It will check for NoEgpr for EVEX instructions, but EVEX supports egpr.  That's why I added the code.

ldtilecfg, 0x49/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf|NoEgpr, { Unspecified|BaseIndex }

When merging VEX and EVEX templates, it means that VEX +Egpr (illegal) has a corresponding EVEX + Egpr, and we don't need to check whether VEX is illegal (just set NoEgpr =0 ). Then we can remove "!need_evex_encoding()" and the following comments code. IOW merging templates means we cannot differentiate their NoEgpr.

> >>> @@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry
> >>> *r)
> >>>
> >>>    if (r->reg_flags & RegRex2)
> >>>      {
> >>> +      if (is_evex_encoding (current_templates->start))
> >>> +	i.vec_encoding = vex_encoding_evex;
> >>
> >> What if the APX template isn't first in the group?
> >>
> >
> > If apx_f is not supported, it will return false, just after this code. Oh, better
> to move it to the back. Done.
> >
> >   if (r->reg_flags & RegRex2)
> >     {
> >       if (current_templates->start->opcode_modifier.evex)
> >         i.vec_encoding = vex_encoding_evex;
> >
> >       if (!cpu_arch_flags.bitfield.cpuapx_f
> >           || flag_code != CODE_64BIT)
> >         return false;
> >     }
> 
> Hmm, as before there's a use of current_templates here, which I'm afraid isn't
> appropriate. Whether a register is legitimate to use depends on only the
> present mode we're assembling in (flag_code + cpu_arch_flags, plus a few
> other globals for certain special cases). There may not be dependencies on
> the insn we're processing. Or if at all (yet even that would need a pretty good
> justification), then only on the collective set of all templates in the chosen
> template group.
> 

The code here is need by the upper comments.

Thanks, 
Lili.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-07 14:53 ` Jan Beulich
  2023-11-09 12:31   ` Cui, Lili
@ 2023-11-14  7:42   ` Cui, Lili
  2023-11-14 10:40     ` Jan Beulich
  2023-11-15  6:03   ` Cui, Lili
  2 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-14  7:42 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> 
> > --- a/opcodes/i386-dis-evex-len.h
> > +++ b/opcodes/i386-dis-evex-len.h
> > @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
> >      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
> >    },
> >
> > +  /* EVEX_LEN_0F38F2 */
> > +  {
> > +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
> > +  },
> 
> There's no sign of a prefix decode step here.
> > +  /* MOD_EVEX_MAP4_F9 */
> > +  {
> > +    { "movdiri",	{ Edq, Gdq }, 0 },
> > +  },
> 

> Missing PREFIX_OPCODE?

Legacy both have PREFIX_OPCODE, but currently EVEX seems to only use the vex.w bit to check the operand size. I'm confused whether we should add PREFIX_OPCODE ? Currently it reports bad.
case PREFIX_OPCODE: 
...
         (ins.vex.evex && dp->prefix_requirement != PREFIX_DATA
              && !ins.vex.w != !(ins.used_prefixes & PREFIX_DATA))
...


Thanks,
Lili.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-14  3:12       ` Cui, Lili
@ 2023-11-14 10:29         ` Jan Beulich
  2023-11-15  8:39           ` Cui, Lili
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-14 10:29 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 14.11.2023 04:12, Cui, Lili wrote:
>> On 13.11.2023 06:53, Cui, Lili wrote:
>>>>> @@ -5624,19 +5653,42 @@ md_assemble (char *line) [...]
>>>>> -      if (i.tm.opcode_modifier.vex)
>>>>> +      if (is_any_apx_evex_encoding ())
>>>>> +	{
>>>>> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
>>>> (i.prefix[DATA_PREFIX] != 0))
>>>>> +	    {
>>>>> +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
>>>>
>>>> Perhaps better assert that no other embedded prefix was already
>>>> recorded here?
>>>
>>> Added the code as below, I added as_bad instead of assert, I think this is a
>> input error and not a gas internal error, right?  Besides REX_PREFIX?
>>>
>>>       if (check_if_any_vex_is_evex_apx_encoding ())
>>>         {
>>>           if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
>> (i.prefix[DATA_PREFIX] != 0))
>>>             {
>>>
>>>               i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
>>>               i.prefix[DATA_PREFIX] = 0;
>>>
>>>               /*  Prefixes other than the rex prefix cannot be used with the data
>> prefix.  */
>>>               const unsigned char *p = i.prefix;
>>>
>>>               for (j = 0; j < ARRAY_SIZE (i.prefix); ++j, ++p)
>>>                 {
>>>                   if (!*p)
>>>                     continue;
>>>
>>>                   switch (j)
>>>                     {
>>>                     case DATA_PREFIX:
>>>                     case REX_PREFIX:
>>>                       break;
>>>                     default:
>>>                       as_bad (_("unexpecting prefix %x together with DATA "
>>>                                 "prefix in front of evex-promoted apx "
>>>                                 "instruction "), *p);
>>>                       return;
>>>                     }
>>>                 }
>>>             }
>>>
>>>           build_apx_evex_prefix ();
>>>       }
>>
>> We already have code refusing most legacy prefixes when ahead of VEX/EVEX,
>> don't we? I'd expect that code to take care of bogus DATA prefixes here as
>> well. Possibly the conversion needs dealing with differently if here we can't
>> tell a user-supplied i.prefix[DATA_PREFIX] from one internally derived from
>> operands which were supplied? E.g. by not having
>> process_suffix() invoke add_prefix() in this particular case at all, but instead
>> modify i.tm accordingly?
>>
> 
> Done.
> 
> -         if (!add_prefix (prefix))
> -           return 0;
> +         /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> +            needs to be adjusted.  */
> +         if (i.tm.opcode_space == SPACE_EVEXMAP4)
> +           i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> +         else
> +           if (!add_prefix (prefix))
> +             return 0;

Except that you want "else" and "if" on the same line, please.

>>>>> @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template *t)
>>>>> static int  check_EgprOperands (const insn_template *t)  {
>>>>> -  if (t->opcode_modifier.noegpr)
>>>>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
>>>>>      {
>>>>>        for (unsigned int op = 0; op < i.operands; op++)
>>>>>  	{
>>>>
>>>> What is this change about?
>>>>
>>>
>>> After merging vex and evex, evex exits here early and all evex supports egpr.
>>
>> Yet / hence no EVEX template should ever have NoEgpr set. IOW I still don't
>> follow why this change would be needed.
>>
> 
> After merging VEX and EVEX templates, they shared NoEgpr ==1 , and if !EVEX is not added, It will check for NoEgpr for EVEX instructions, but EVEX supports egpr.  That's why I added the code.
> 
> ldtilecfg, 0x49/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf|NoEgpr, { Unspecified|BaseIndex }
> 
> When merging VEX and EVEX templates, it means that VEX +Egpr (illegal) has a corresponding EVEX + Egpr, and we don't need to check whether VEX is illegal (just set NoEgpr =0 ). Then we can remove "!need_evex_encoding()" and the following comments code. IOW merging templates means we cannot differentiate their NoEgpr.

Well, such merged templates necessarily allow Egpr, so they must not have NoEgpr.
For them, use of an extended register simply excludes use of the VEX form of the
insn. This may or may not be what the last paragraph of your reply is intended to
be telling me.

>>>>> @@ -14252,6 +14306,9 @@ static bool check_register (const reg_entry
>>>>> *r)
>>>>>
>>>>>    if (r->reg_flags & RegRex2)
>>>>>      {
>>>>> +      if (is_evex_encoding (current_templates->start))
>>>>> +	i.vec_encoding = vex_encoding_evex;
>>>>
>>>> What if the APX template isn't first in the group?
>>>>
>>>
>>> If apx_f is not supported, it will return false, just after this code. Oh, better
>> to move it to the back. Done.
>>>
>>>   if (r->reg_flags & RegRex2)
>>>     {
>>>       if (current_templates->start->opcode_modifier.evex)
>>>         i.vec_encoding = vex_encoding_evex;
>>>
>>>       if (!cpu_arch_flags.bitfield.cpuapx_f
>>>           || flag_code != CODE_64BIT)
>>>         return false;
>>>     }
>>
>> Hmm, as before there's a use of current_templates here, which I'm afraid isn't
>> appropriate. Whether a register is legitimate to use depends on only the
>> present mode we're assembling in (flag_code + cpu_arch_flags, plus a few
>> other globals for certain special cases). There may not be dependencies on
>> the insn we're processing. Or if at all (yet even that would need a pretty good
>> justification), then only on the collective set of all templates in the chosen
>> template group.
> 
> The code here is need by the upper comments.

I'm afraid I don't understand your reply here. It certainly doesn't address my
comment. Why would setting i.vec_encoding depend on the first template in a
group? I indicated earlier that using vex_encoding_evex may not be feasible
here, because you don't mean to enforce EVEX encoding. You only want to record
the fact that an encoding needs using which is eGPR-capable (i.e. either one
with NoEgpr clear, or an EVEX one). Hence why I further suggested to possibly
introduce vex_encoding_egpr. (The "vex" in the names and the "vec" in the
field name are increasingly misleading, but we can address that later on.)

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-14  7:42   ` Cui, Lili
@ 2023-11-14 10:40     ` Jan Beulich
  2023-11-14 14:46       ` Cui, Lili
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-14 10:40 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 14.11.2023 08:42, Cui, Lili wrote:
>>
>>> --- a/opcodes/i386-dis-evex-len.h
>>> +++ b/opcodes/i386-dis-evex-len.h
>>> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
>>>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
>>>    },
>>>
>>> +  /* EVEX_LEN_0F38F2 */
>>> +  {
>>> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
>>> +  },
>>
>> There's no sign of a prefix decode step here.
>>> +  /* MOD_EVEX_MAP4_F9 */
>>> +  {
>>> +    { "movdiri",	{ Edq, Gdq }, 0 },
>>> +  },
>>
> 
>> Missing PREFIX_OPCODE?
> 
> Legacy both have PREFIX_OPCODE, but currently EVEX seems to only use the vex.w bit to check the operand size. I'm confused whether we should add PREFIX_OPCODE ? Currently it reports bad.
> case PREFIX_OPCODE: 
> ...
>          (ins.vex.evex && dp->prefix_requirement != PREFIX_DATA
>               && !ins.vex.w != !(ins.used_prefixes & PREFIX_DATA))

Considering the context where this appears, this could equally be
written as

         (ins.vex.evex && dp->prefix_requirement == PREFIX_OPCODE
              && !ins.vex.w != !(ins.used_prefixes & PREFIX_DATA))

and is about many vector-FP AVX512 insns needing EVEX.W == EVEX.pp. From
that angle using PREFIX_OPCODE in the entries above doesn't look to be
right; I'm sorry for the wrong comment. Yet still EVEX.pp needs dealing
with everywhere, including for the entries above. Whether that's done by
going through prefix_table[] or by introducing a new PREFIX_* qualifier
or by yet other means is secondary (and a question of what's most
efficient).

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-14 10:40     ` Jan Beulich
@ 2023-11-14 14:46       ` Cui, Lili
  0 siblings, 0 replies; 26+ messages in thread
From: Cui, Lili @ 2023-11-14 14:46 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 14.11.2023 08:42, Cui, Lili wrote:
> >>
> >>> --- a/opcodes/i386-dis-evex-len.h
> >>> +++ b/opcodes/i386-dis-evex-len.h
> >>> @@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
> >>>      { REG_TABLE (REG_EVEX_0F38C7_L_2) },
> >>>    },
> >>>
> >>> +  /* EVEX_LEN_0F38F2 */
> >>> +  {
> >>> +    { "andnS",		{ Gdq, VexGdq, Edq }, 0 },
> >>> +  },
> >>
> >> There's no sign of a prefix decode step here.
> >>> +  /* MOD_EVEX_MAP4_F9 */
> >>> +  {
> >>> +    { "movdiri",	{ Edq, Gdq }, 0 },
> >>> +  },
> >>
> >
> >> Missing PREFIX_OPCODE?
> >
> > Legacy both have PREFIX_OPCODE, but currently EVEX seems to only use
> the vex.w bit to check the operand size. I'm confused whether we should add
> PREFIX_OPCODE ? Currently it reports bad.
> > case PREFIX_OPCODE:
> > ...
> >          (ins.vex.evex && dp->prefix_requirement != PREFIX_DATA
> >               && !ins.vex.w != !(ins.used_prefixes & PREFIX_DATA))
> 
> Considering the context where this appears, this could equally be written as
> 
>          (ins.vex.evex && dp->prefix_requirement == PREFIX_OPCODE
>               && !ins.vex.w != !(ins.used_prefixes & PREFIX_DATA))
> 
> and is about many vector-FP AVX512 insns needing EVEX.W == EVEX.pp. From
> that angle using PREFIX_OPCODE in the entries above doesn't look to be right;
> I'm sorry for the wrong comment. Yet still EVEX.pp needs dealing with
> everywhere, including for the entries above. Whether that's done by going
> through prefix_table[] or by introducing a new PREFIX_* qualifier or by yet
> other means is secondary (and a question of what's most efficient).
> 

I would let them go through prefix_table[].

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-07 14:53 ` Jan Beulich
  2023-11-09 12:31   ` Cui, Lili
  2023-11-14  7:42   ` Cui, Lili
@ 2023-11-15  6:03   ` Cui, Lili
  2023-11-15  9:11     ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-15  6:03 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> > --- a/opcodes/i386-dis-evex-mod.h
> > +++ b/opcodes/i386-dis-evex-mod.h
> > @@ -1 +1,43 @@
> >  /* Nothing at present.  */
> 
> This comment needs to go away when new stuff is added here. However, I'm
> sure I requested before that no new entries be put here which have only one
> of their two slots populated. The reg-only and mem-only aspects can be
> expressed via proper choice of operand specifiers, at least in almost all cases.
> Note how you already use ...
> 
> > +  /* MOD_EVEX_MAP4_DA_PREFIX_1 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "encodekey128", { Gd, Ed }, 0 },  },
> > +  /* MOD_EVEX_MAP4_DB_PREFIX_1 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "encodekey256", { Gd, Ed }, 0 },  },
> > +  /* MOD_EVEX_MAP4_DC_PREFIX_1 */
> > +  {
> > +    { "aesenc128kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DD_PREFIX_1 */
> > +  {
> > +    { "aesdec128kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DE_PREFIX_1 */
> > +  {
> > +    { "aesenc256kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_DF_PREFIX_1 */
> > +  {
> > +    { "aesdec256kl",    { XM, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_1 */
> > +  {
> > +    { "enqcmds",	{ Gva, M },  0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_2 */
> > +  {
> > +    { "movdir64b",	{ Gva, M }, 0 },
> > +  },
> > +  /* MOD_EVEX_MAP4_F8_PREFIX_3 */
> > +  {
> > +    { "enqcmd",		{ Gva, M }, 0 },
> > +  },
> 
> ... M in many entries anyway. These can move one level up without needing
> further adjustment.

Done.

> +  /* PREFIX_EVEX_MAP4_F8 */
> +  {
> +    { Bad_Opcode },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_1) },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_2) },
> +    { MOD_TABLE (MOD_EVEX_MAP4_F8_PREFIX_3) },  },
> +  /* PREFIX_EVEX_MAP4_FC */
> +  {
> +    { "aadd",	{ Mdq, Gdq }, 0 },
> +    { "axor",	{ Mdq, Gdq }, 0 },
> +    { "aand",	{ Mdq, Gdq }, 0 },
> +    { "aor",	{ Mdq, Gdq }, 0 },
> +  },

>This looks to match the PREFIX_0F38FC entry.

Done.

> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -49,3 +49,17 @@
> >      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >    },
> > +  /* REG_EVEX_0F38F3_L_0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "blsrS",		{ VexGdq, Edq }, 0 },
> > +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> > +    { "blsiS",		{ VexGdq, Edq }, 0 },
> > +  },
> 
> Compared to the REG_VEX_0F38F3_L_0 entry this lacks PREFIX_OPCODE, but
> then the question is why you do not re-use that entry in the first place.
> 

Added PREFIX_TABLE for them.

> > +  /* REG_EVEX_MAP4_D8_PREFIX_1 */
> > +  {
> > +    { "aesencwide128kl",	{ M }, 0 },
> > +    { "aesdecwide128kl",	{ M }, 0 },
> > +    { "aesencwide256kl",	{ M }, 0 },
> > +    { "aesdecwide256kl",	{ M }, 0 },
> > +  },
> 
> How is this different from the REG_0F38D8_PREFIX_1 entry?
> 

They are same, deleted REG_EVEX_MAP4_D8_PREFIX_1 and share REG_0F38D8_PREFIX_1.

> > --- /dev/null
> > +++ b/opcodes/i386-dis-evex-x86-64.h
> > @@ -0,0 +1,140 @@
> > +  /* X86_64_EVEX_0F3849 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
> > +  /* X86_64_EVEX_0F384B */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
> > +  /* X86_64_EVEX_0F38E0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
> 
> This and its sibling entries look to again fully match X86_64_VEX_0F38E<n>.

It cannot be moved to up level, it needs to go through i386-dis-evex-x86-64.h

> > --- a/opcodes/i386-dis-evex.h
> > +++ b/opcodes/i386-dis-evex.h
> >[...]
> > @@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
> 
> PREFIX_OPCODE? EVEX.W?
> 

Added PREFIX_TABLE , I think this instruction doesn't need to go through W_TABLE.

EVEX.LLZ.NP.MAP4. D4 /r ib
SHA1RNDS4 {NF=0} {ND=0} xmm1, xmm2/m128, imm8

> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      { Bad_Opcode },
> >      /* D8 */
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > -    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
> > +    { "sha1msg1", { XM, EXxmm }, 0 },
> 
> Again.
> 

Done.

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> >[...]
> > @@ -4522,12 +4594,13 @@ static const struct dis386 x86_64_table[][2] = {
> >      { "cmpnlexadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
> >    },
> >
> > +#include "i386-dis-evex-x86-64.h"
> > +
> >    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
> >    {
> >      { Bad_Opcode },
> >      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> >    },
> > -
> >  };
> 
> Please can EVEX entries come after VEX ones?
> 

Done.

> > @@ -8979,9 +9055,13 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >        if (!fetch_code (ins->info, ins->codep + 4))
> >  	return &err_opcode;
> >        /* The first byte after 0x62.  */
> > +      if (*ins->codep & 0x8)
> > +	ins->rex2 |= REX_B;
> > +      if (!(*ins->codep & 0x10))
> > +	ins->rex2 |= REX_R;
> 
> In patch 1 you check whether REX2 bits are properly consumed. While in
> principle I agree with re-using ->rex2 here, don't you need to take precautions
> to not print a random {rex2} (or whichever way it was
> expressed) when one of these bits ends up not contributing to any operand?
> 

We have discussed this issue in other email.

> > @@ -8994,12 +9074,19 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	case 0x3:
> >  	  vex_table_index = EVEX_0F3A;
> >  	  break;
> > +	case 0x4:
> > +	  vex_table_index = EVEX_MAP4;
> > +	  ins->evex_type = evex_from_legacy;
> > +	  break;
> >  	case 0x5:
> >  	  vex_table_index = EVEX_MAP5;
> >  	  break;
> >  	case 0x6:
> >  	  vex_table_index = EVEX_MAP6;
> >  	  break;
> > +	case 0x7:
> > +	  vex_table_index = EVEX_MAP7;
> > +	  break;
> >  	}
> 
> How is map7 coming into play here?
> 

Removed.

> > @@ -9042,12 +9128,31 @@ get_valid_dis386 (const struct dis386 *dp,
> > instr_info *ins)
> >
> >        if (ins->address_mode != mode_64bit)
> >  	{
> > +	  if (ins->evex_type != evex_default
> > +	      || (ins->rex2 & (REX_B | REX_X)))
> > +	    return &bad_opcode;
> >  	  /* In 16/32-bit mode silently ignore following bits.  */
> >  	  ins->rex &= ~REX_B;
> > -	  ins->vex.r = true;
> > +	  ins->rex2 &= ~REX_R;
> >  	}
> >
> >        ins->need_vex = 4;
> > +
> > +      /* EVEX from legacy instructions require that EVEX.L'L, EVEX.z and the
> > +	 lower 2 bits of EVEX.aaa must be 0.
> > +	 EVEX from evex instrucions require that EVEX.L'L and the lower 2 bits
> of
> > +	 EVEX.aaa must be 0.  */
> 
> EVEX from VEX you mean? Also, what about EVEX.z for those? Really the sole
> difference between the two forms is EVEX.nd. The doc mentions EVEX.l for
> the from-VEX case, but afaics there's no insn actually having that bit set
> (which also matches what you say above).
> 

EVEX.z  is "ins->vex.zeroing = *ins->codep & 0x80;" The old code was wrong , changed it to:

     case USE_X86_64_EVEX_FROM_VEX_TABLE:
       ins->evex_type = evex_from_vex;
+      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
+        the lower 2 bits of EVEX.aaa must be 0.  */
+      if ((ins->vex.mask_register_specifier & 0x3) != 0
+         || ins->vex.ll != 0
+         || ins->vex.zeroing != 0
+         || ins->vex.b)
+       return &bad_opcode;

     case USE_EVEX_TABLE:
+      /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
+        lower 2 bits of EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy
+         && ((ins->vex.mask_register_specifier & 0x3) != 0
+             || ins->vex.ll != 0
+             || ins->vex.zeroing != 0))
+       return &bad_opcode;


> > +      if (ins->evex_type == evex_from_legacy || ins->evex_type ==
> evex_from_vex)
> > +	{
> > +	  if ((*ins->codep & 0x3) != 0
> > +	      || (*ins->codep >> 6 & 0x3) != 0
> > +	      || (ins->evex_type == evex_from_legacy
> > +		  && (*ins->codep >> 5 & 0x1) != 0)
> 
> Please can the shifts be parenthesized agaist the &-s? Albeit - this is perhaps
> easier to read as just &-s anyway.
> 

 Code changed.

> > +	      || (ins->evex_type == evex_from_vex
> > +		  && !ins->vex.b))
> 
> This doesn't look to match any part of the comment.
> 

Code changed and added a test case for it.

> > @@ -9640,6 +9752,19 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        if (ins.last_repnz_prefix >= 0)
> >  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
> >        break;
> > +
> > +    case PREFIX_NP_OR_DATA:
> > +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
> > +	{
> > +	  i386_dis_printf (info, dis_style_text, "(bad)");
> > +	  ret = ins.end_codep - priv.the_buffer;
> > +	  goto out;
> > +	}
> > +      break;
> > +
> > +    default:
> > +      break;
> > +
> >      }
> 
> Why suddenly a default case? And why an extra blank line below it?
> 

Removed.

> > @@ -11242,7 +11367,7 @@ print_register (instr_info *ins, unsigned int reg,
> unsigned int rexmask,
           case b_mode:
> >      case b_swap_mode:
> >        if (reg & 4)
> >  	USED_REX (0);
> > -      if (ins->rex)
> > +      if (ins->rex || ins->rex2)
> >  	names = att_names8rex;
> >        else
> >  	names = att_names8;
> 
> Isn't this already needed in the REX2 patch?
> 

CRC32  --> Eb ---> b_mode

> > @@ -11460,7 +11585,7 @@ OP_E_memory (instr_info *ins, int bytemode,
> > int sizeflag)
> >
> >    add += (ins->rex2 & REX_B) ? 16 : 0;
> >
> > -  if (ins->vex.evex)
> > +  if (ins->vex.evex && ins->evex_type == evex_default)
> >      {
> >
> >        /* Zeroing-masking is invalid for memory destinations. Set the
> > flag @@ -11608,6 +11733,13 @@ OP_E_memory (instr_info *ins, int
> bytemode, int sizeflag)
> >  		abort ();
> >  	      if (ins->vex.evex)
> >  		{
> > +		  /* S/G EVEX insns require REX_X not to be set.  */
> > +		  if (ins->rex2 & REX_X)
> 
> REX_X in a comment can only mean REX.X. You mean EVEX.X4 though, I think
> (which internally is latched into the REX_X bit of ->rex2), and that also needs
> to be set, not clear.
> 
Done.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-14 10:29         ` Jan Beulich
@ 2023-11-15  8:39           ` Cui, Lili
  0 siblings, 0 replies; 26+ messages in thread
From: Cui, Lili @ 2023-11-15  8:39 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 14.11.2023 04:12, Cui, Lili wrote:
> >> On 13.11.2023 06:53, Cui, Lili wrote:
> >>>>> @@ -5624,19 +5653,42 @@ md_assemble (char *line) [...]
> >>>>> -      if (i.tm.opcode_modifier.vex)
> >>>>> +      if (is_any_apx_evex_encoding ())
> >>>>> +	{
> >>>>> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
> >>>> (i.prefix[DATA_PREFIX] != 0))
> >>>>> +	    {
> >>>>> +	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> >>>>
> >>>> Perhaps better assert that no other embedded prefix was already
> >>>> recorded here?
> >>>
> >>> Added the code as below, I added as_bad instead of assert, I think
> >>> this is a
> >> input error and not a gas internal error, right?  Besides REX_PREFIX?
> >>>
> >>>       if (check_if_any_vex_is_evex_apx_encoding ())
> >>>         {
> >>>           if (i.tm.opcode_space == SPACE_EVEXMAP4 &&
> >> (i.prefix[DATA_PREFIX] != 0))
> >>>             {
> >>>
> >>>               i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> >>>               i.prefix[DATA_PREFIX] = 0;
> >>>
> >>>               /*  Prefixes other than the rex prefix cannot be used
> >>> with the data
> >> prefix.  */
> >>>               const unsigned char *p = i.prefix;
> >>>
> >>>               for (j = 0; j < ARRAY_SIZE (i.prefix); ++j, ++p)
> >>>                 {
> >>>                   if (!*p)
> >>>                     continue;
> >>>
> >>>                   switch (j)
> >>>                     {
> >>>                     case DATA_PREFIX:
> >>>                     case REX_PREFIX:
> >>>                       break;
> >>>                     default:
> >>>                       as_bad (_("unexpecting prefix %x together with DATA "
> >>>                                 "prefix in front of evex-promoted apx "
> >>>                                 "instruction "), *p);
> >>>                       return;
> >>>                     }
> >>>                 }
> >>>             }
> >>>
> >>>           build_apx_evex_prefix ();
> >>>       }
> >>
> >> We already have code refusing most legacy prefixes when ahead of
> >> VEX/EVEX, don't we? I'd expect that code to take care of bogus DATA
> >> prefixes here as well. Possibly the conversion needs dealing with
> >> differently if here we can't tell a user-supplied
> >> i.prefix[DATA_PREFIX] from one internally derived from operands which
> >> were supplied? E.g. by not having
> >> process_suffix() invoke add_prefix() in this particular case at all,
> >> but instead modify i.tm accordingly?
> >>
> >
> > Done.
> >
> > -         if (!add_prefix (prefix))
> > -           return 0;
> > +         /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> > +            needs to be adjusted.  */
> > +         if (i.tm.opcode_space == SPACE_EVEXMAP4)
> > +           i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> > +         else
> > +           if (!add_prefix (prefix))
> > +             return 0;
> 
> Except that you want "else" and "if" on the same line, please.
> 
> >>>>> @@ -7043,7 +7096,7 @@ VEX_check_encoding (const insn_template
> *t)
> >>>>> static int  check_EgprOperands (const insn_template *t)  {
> >>>>> -  if (t->opcode_modifier.noegpr)
> >>>>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
> >>>>>      {
> >>>>>        for (unsigned int op = 0; op < i.operands; op++)
> >>>>>  	{
> >>>>
> >>>> What is this change about?
> >>>>
> >>>
> >>> After merging vex and evex, evex exits here early and all evex supports
> egpr.
> >>
> >> Yet / hence no EVEX template should ever have NoEgpr set. IOW I still
> >> don't follow why this change would be needed.
> >>
> >
> > After merging VEX and EVEX templates, they shared NoEgpr ==1 , and
> if !EVEX is not added, It will check for NoEgpr for EVEX instructions, but EVEX
> supports egpr.  That's why I added the code.
> >
> > ldtilecfg, 0x49/0, AMX_TILE&(AMX_TILE|APX_F),
> > Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf|NoEgpr, {
> > Unspecified|BaseIndex }
> >
> > When merging VEX and EVEX templates, it means that VEX +Egpr (illegal)
> has a corresponding EVEX + Egpr, and we don't need to check whether VEX is
> illegal (just set NoEgpr =0 ). Then we can remove "!need_evex_encoding()"
> and the following comments code. IOW merging templates means we cannot
> differentiate their NoEgpr.
> 
> Well, such merged templates necessarily allow Egpr, so they must not have
> NoEgpr.
> For them, use of an extended register simply excludes use of the VEX form of
> the insn. This may or may not be what the last paragraph of your reply is
> intended to be telling me.
> 
Changed the code in gen.c,  removed the !need_evex_encoding()  and the following comments' code.

Code changed. And removed !need_evex_encoding() and the following commented code.

+  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
+     For template supports both Vex and EVex allowing EGPR.  */
+  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
+      && !modifiers[EVex].value)
+    modifiers[NoEgpr].value = 1;
+
> >>>>> @@ -14252,6 +14306,9 @@ static bool check_register (const
> >>>>> reg_entry
> >>>>> *r)
> >>>>>
> >>>>>    if (r->reg_flags & RegRex2)
> >>>>>      {
> >>>>> +      if (is_evex_encoding (current_templates->start))
> >>>>> +	i.vec_encoding = vex_encoding_evex;
> >>>>
> >>>> What if the APX template isn't first in the group?
> >>>>
> >>>
> >>> If apx_f is not supported, it will return false, just after this
> >>> code. Oh, better
> >> to move it to the back. Done.
> >>>
> >>>   if (r->reg_flags & RegRex2)
> >>>     {
> >>>       if (current_templates->start->opcode_modifier.evex)
> >>>         i.vec_encoding = vex_encoding_evex;
> >>>
> >>>       if (!cpu_arch_flags.bitfield.cpuapx_f
> >>>           || flag_code != CODE_64BIT)
> >>>         return false;
> >>>     }
> >>
> >> Hmm, as before there's a use of current_templates here, which I'm
> >> afraid isn't appropriate. Whether a register is legitimate to use
> >> depends on only the present mode we're assembling in (flag_code +
> >> cpu_arch_flags, plus a few other globals for certain special cases).
> >> There may not be dependencies on the insn we're processing. Or if at
> >> all (yet even that would need a pretty good justification), then only
> >> on the collective set of all templates in the chosen template group.
> >
> > The code here is need by the upper comments.
> 
> I'm afraid I don't understand your reply here. It certainly doesn't address my
> comment. Why would setting i.vec_encoding depend on the first template in
> a group? I indicated earlier that using vex_encoding_evex may not be feasible
> here, because you don't mean to enforce EVEX encoding. You only want to
> record the fact that an encoding needs using which is eGPR-capable (i.e.
> either one with NoEgpr clear, or an EVEX one). Hence why I further suggested
> to possibly introduce vex_encoding_egpr. (The "vex" in the names and the
> "vec" in the field name are increasingly misleading, but we can address that
> later on.)
> 

I used !need_evex_encoding in upper comments, and for some VEX and EVEX merged templates,it's i.vec_encoding == vex_encoding_default, I need to assign a correct i.vec_encoding value for merged templates.

static INLINE bool need_evex_encoding (void)
{
  return i.vec_encoding == vex_encoding_evex
        || i.vec_encoding == vex_encoding_evex512
        || i.mask.reg;
}

Currently, I changed the code in gen.c, so we don't need it.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-15  6:03   ` Cui, Lili
@ 2023-11-15  9:11     ` Jan Beulich
  2023-11-15 11:43       ` Cui, Lili
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-11-15  9:11 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 15.11.2023 07:03, Cui, Lili wrote:
>>> --- a/opcodes/i386-dis-evex-reg.h
>>> +++ b/opcodes/i386-dis-evex-reg.h
>>> @@ -49,3 +49,17 @@
>>>      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>>>      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>>>    },
>>> +  /* REG_EVEX_0F38F3_L_0 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { "blsrS",		{ VexGdq, Edq }, 0 },
>>> +    { "blsmskS",	{ VexGdq, Edq }, 0 },
>>> +    { "blsiS",		{ VexGdq, Edq }, 0 },
>>> +  },
>>
>> Compared to the REG_VEX_0F38F3_L_0 entry this lacks PREFIX_OPCODE, but
>> then the question is why you do not re-use that entry in the first place.
> 
> Added PREFIX_TABLE for them.

PREFIX_OPCODE you mean? Or a decoding step through prefix_table[]? (The
former would be contrary to what we discussed elsewhere for use of
PREFIX_OPCODE with EVEX encodings. Albeit I still think the goal should
be to avoid going through prefix_table[] when we have a large set of
insns all with similar properties.)

>>> --- /dev/null
>>> +++ b/opcodes/i386-dis-evex-x86-64.h
>>> @@ -0,0 +1,140 @@
>>> +  /* X86_64_EVEX_0F3849 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
>>> +  /* X86_64_EVEX_0F384B */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
>>> +  /* X86_64_EVEX_0F38E0 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
>>
>> This and its sibling entries look to again fully match X86_64_VEX_0F38E<n>.
> 
> It cannot be moved to up level, it needs to go through i386-dis-evex-x86-64.h

I didn't suggest to move up a level. I suggested to re-use the existing
VEX entries.

>>> --- a/opcodes/i386-dis-evex.h
>>> +++ b/opcodes/i386-dis-evex.h
>>> [...]
>>> @@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
>>>      { Bad_Opcode },
>>>      { Bad_Opcode },
>>>      { Bad_Opcode },
>>> -    { Bad_Opcode },
>>> +    { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
>>
>> PREFIX_OPCODE? EVEX.W?
>>
> 
> Added PREFIX_TABLE , I think this instruction doesn't need to go through W_TABLE.
> 
> EVEX.LLZ.NP.MAP4. D4 /r ib
> SHA1RNDS4 {NF=0} {ND=0} xmm1, xmm2/m128, imm8

Well, this is again the curious / ambiguous absence of a respective word
past the last dot in the first line. It might be W0, and it also might be
IGNORED (or WIG in SDM terms).

>>> @@ -11242,7 +11367,7 @@ print_register (instr_info *ins, unsigned int reg,
>> unsigned int rexmask,
>            case b_mode:
>>>      case b_swap_mode:
>>>        if (reg & 4)
>>>  	USED_REX (0);
>>> -      if (ins->rex)
>>> +      if (ins->rex || ins->rex2)
>>>  	names = att_names8rex;
>>>        else
>>>  	names = att_names8;
>>
>> Isn't this already needed in the REX2 patch?
>>
> 
> CRC32  --> Eb ---> b_mode

I'm afraid I don't understand this reply. b_swap_mode is used by EbS, which
in turn is used by ADD, SUB, etc. Similarly b_mode / Eb are used by more
than just CRC32.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-15  9:11     ` Jan Beulich
@ 2023-11-15 11:43       ` Cui, Lili
  2023-11-16 13:57         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-15 11:43 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, November 15, 2023 5:12 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 15.11.2023 07:03, Cui, Lili wrote:
> >>> --- a/opcodes/i386-dis-evex-reg.h
> >>> +++ b/opcodes/i386-dis-evex-reg.h
> >>> @@ -49,3 +49,17 @@
> >>>      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >>>      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >>>    },
> >>> +  /* REG_EVEX_0F38F3_L_0 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { "blsrS",		{ VexGdq, Edq }, 0 },
> >>> +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> >>> +    { "blsiS",		{ VexGdq, Edq }, 0 },
> >>> +  },
> >>
> >> Compared to the REG_VEX_0F38F3_L_0 entry this lacks PREFIX_OPCODE,
> >> but then the question is why you do not re-use that entry in the first place.
> >
> > Added PREFIX_TABLE for them.
> 
> PREFIX_OPCODE you mean? Or a decoding step through prefix_table[]? (The
> former would be contrary to what we discussed elsewhere for use of
> PREFIX_OPCODE with EVEX encodings. Albeit I still think the goal should be to
> avoid going through prefix_table[] when we have a large set of insns all with
> similar properties.)
> 

Haha, I let them pass prefix_table[].

> >>> --- /dev/null
> >>> +++ b/opcodes/i386-dis-evex-x86-64.h
> >>> @@ -0,0 +1,140 @@
> >>> +  /* X86_64_EVEX_0F3849 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
> >>> +  /* X86_64_EVEX_0F384B */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
> >>> +  /* X86_64_EVEX_0F38E0 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
> >>
> >> This and its sibling entries look to again fully match
> X86_64_VEX_0F38E<n>.
> >
> > It cannot be moved to up level, it needs to go through
> > i386-dis-evex-x86-64.h
> 
> I didn't suggest to move up a level. I suggested to re-use the existing VEX
> entries.
> 

It's currently in the x86-64 table, it's a bit weird to revisit the x86-64 table, and moving up one level won't work.

{ X86_64_TABLE (X86_64_VEX_0F38E0) }

  /* X86_64_VEX_0F38E0 */
  {
    { Bad_Opcode },
    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
  },

> >>> @@ -11242,7 +11367,7 @@ print_register (instr_info *ins, unsigned
> >>> int reg,
> >> unsigned int rexmask,
> >            case b_mode:
> >>>      case b_swap_mode:
> >>>        if (reg & 4)
> >>>  	USED_REX (0);
> >>> -      if (ins->rex)
> >>> +      if (ins->rex || ins->rex2)
> >>>  	names = att_names8rex;
> >>>        else
> >>>  	names = att_names8;
> >>
> >> Isn't this already needed in the REX2 patch?
> >>
> >
> > CRC32  --> Eb ---> b_mode
> 
> I'm afraid I don't understand this reply. b_swap_mode is used by EbS, which
> in turn is used by ADD, SUB, etc. Similarly b_mode / Eb are used by more than
> just CRC32.
> 

I misunderstood, will move it to the patch 1/8.

Thanks,
Lili.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-15 11:43       ` Cui, Lili
@ 2023-11-16 13:57         ` Jan Beulich
  2023-11-16 15:10           ` Cui, Lili
  2023-11-16 16:12           ` Cui, Lili
  0 siblings, 2 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-16 13:57 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 15.11.2023 12:43, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, November 15, 2023 5:12 PM
>>
>> On 15.11.2023 07:03, Cui, Lili wrote:
>>>>> --- /dev/null
>>>>> +++ b/opcodes/i386-dis-evex-x86-64.h
>>>>> @@ -0,0 +1,140 @@
>>>>> +  /* X86_64_EVEX_0F3849 */
>>>>> +  {
>>>>> +    { Bad_Opcode },
>>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
>>>>> +  /* X86_64_EVEX_0F384B */
>>>>> +  {
>>>>> +    { Bad_Opcode },
>>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
>>>>> +  /* X86_64_EVEX_0F38E0 */
>>>>> +  {
>>>>> +    { Bad_Opcode },
>>>>> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
>>>>
>>>> This and its sibling entries look to again fully match
>> X86_64_VEX_0F38E<n>.
>>>
>>> It cannot be moved to up level, it needs to go through
>>> i386-dis-evex-x86-64.h
>>
>> I didn't suggest to move up a level. I suggested to re-use the existing VEX
>> entries.
>>
> 
> It's currently in the x86-64 table, it's a bit weird to revisit the x86-64 table, and moving up one level won't work.
> 
> { X86_64_TABLE (X86_64_VEX_0F38E0) }
> 
>   /* X86_64_VEX_0F38E0 */
>   {
>     { Bad_Opcode },
>     { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
>   },

Again - I never suggested to move up a level. I also didn't suggest to
re-visit x86_64_table[]. I merely requested to re-use X86_64_VEX_0F38Ex
in the evex_table[] entries, instead of adding 16 (each) identical new
entries to x86_64_table[] with 16 new X86_64_EVEX_0F38Ex enumerators.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-16 13:57         ` Jan Beulich
@ 2023-11-16 15:10           ` Cui, Lili
  2023-11-16 15:15             ` Jan Beulich
  2023-11-16 16:12           ` Cui, Lili
  1 sibling, 1 reply; 26+ messages in thread
From: Cui, Lili @ 2023-11-16 15:10 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, November 16, 2023 9:58 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
> 
> On 15.11.2023 12:43, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, November 15, 2023 5:12 PM
> >>
> >> On 15.11.2023 07:03, Cui, Lili wrote:
> >>>>> --- /dev/null
> >>>>> +++ b/opcodes/i386-dis-evex-x86-64.h
> >>>>> @@ -0,0 +1,140 @@
> >>>>> +  /* X86_64_EVEX_0F3849 */
> >>>>> +  {
> >>>>> +    { Bad_Opcode },
> >>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
> >>>>> +  /* X86_64_EVEX_0F384B */
> >>>>> +  {
> >>>>> +    { Bad_Opcode },
> >>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
> >>>>> +  /* X86_64_EVEX_0F38E0 */
> >>>>> +  {
> >>>>> +    { Bad_Opcode },
> >>>>> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
> >>>>
> >>>> This and its sibling entries look to again fully match
> >> X86_64_VEX_0F38E<n>.
> >>>
> >>> It cannot be moved to up level, it needs to go through
> >>> i386-dis-evex-x86-64.h
> >>
> >> I didn't suggest to move up a level. I suggested to re-use the
> >> existing VEX entries.
> >>
> >
> > It's currently in the x86-64 table, it's a bit weird to revisit the x86-64 table,
> and moving up one level won't work.
> >
> > { X86_64_TABLE (X86_64_VEX_0F38E0) }
> >
> >   /* X86_64_VEX_0F38E0 */
> >   {
> >     { Bad_Opcode },
> >     { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
> >   },
> 
> Again - I never suggested to move up a level. I also didn't suggest to re-visit
> x86_64_table[]. I merely requested to re-use X86_64_VEX_0F38Ex in the
> evex_table[] entries, instead of adding 16 (each) identical new entries to
> x86_64_table[] with 16 new X86_64_EVEX_0F38Ex enumerators.
> 

We need to go through this virtual table USE_X86_64_EVEX_FROM_VEX_TABLE to mark event_type = evex_from_vex;

Lili

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-16 15:10           ` Cui, Lili
@ 2023-11-16 15:15             ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-11-16 15:15 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 16.11.2023 16:10, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, November 16, 2023 9:58 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
>> Subject: Re: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
>>
>> On 15.11.2023 12:43, Cui, Lili wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, November 15, 2023 5:12 PM
>>>>
>>>> On 15.11.2023 07:03, Cui, Lili wrote:
>>>>>>> --- /dev/null
>>>>>>> +++ b/opcodes/i386-dis-evex-x86-64.h
>>>>>>> @@ -0,0 +1,140 @@
>>>>>>> +  /* X86_64_EVEX_0F3849 */
>>>>>>> +  {
>>>>>>> +    { Bad_Opcode },
>>>>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
>>>>>>> +  /* X86_64_EVEX_0F384B */
>>>>>>> +  {
>>>>>>> +    { Bad_Opcode },
>>>>>>> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
>>>>>>> +  /* X86_64_EVEX_0F38E0 */
>>>>>>> +  {
>>>>>>> +    { Bad_Opcode },
>>>>>>> +    { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },  },
>>>>>>
>>>>>> This and its sibling entries look to again fully match
>>>> X86_64_VEX_0F38E<n>.
>>>>>
>>>>> It cannot be moved to up level, it needs to go through
>>>>> i386-dis-evex-x86-64.h
>>>>
>>>> I didn't suggest to move up a level. I suggested to re-use the
>>>> existing VEX entries.
>>>>
>>>
>>> It's currently in the x86-64 table, it's a bit weird to revisit the x86-64 table,
>> and moving up one level won't work.
>>>
>>> { X86_64_TABLE (X86_64_VEX_0F38E0) }
>>>
>>>   /* X86_64_VEX_0F38E0 */
>>>   {
>>>     { Bad_Opcode },
>>>     { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
>>>   },
>>
>> Again - I never suggested to move up a level. I also didn't suggest to re-visit
>> x86_64_table[]. I merely requested to re-use X86_64_VEX_0F38Ex in the
>> evex_table[] entries, instead of adding 16 (each) identical new entries to
>> x86_64_table[] with 16 new X86_64_EVEX_0F38Ex enumerators.
>>
> 
> We need to go through this virtual table USE_X86_64_EVEX_FROM_VEX_TABLE to mark event_type = evex_from_vex;

Sure, but instead of

    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38E0) },

why can't you use

    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E0) },

? The handling of USE_X86_64_EVEX_FROM_VEX_TABLE simply falls through
to that of USE_X86_64_TABLE.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH V2 3/8] Support APX GPR32 with extend evex prefix
  2023-11-16 13:57         ` Jan Beulich
  2023-11-16 15:10           ` Cui, Lili
@ 2023-11-16 16:12           ` Cui, Lili
  1 sibling, 0 replies; 26+ messages in thread
From: Cui, Lili @ 2023-11-16 16:12 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> > It's currently in the x86-64 table, it's a bit weird to revisit the x86-64 table,
> and moving up one level won't work.
> >
> > { X86_64_TABLE (X86_64_VEX_0F38E0) }
> >
> >   /* X86_64_VEX_0F38E0 */
> >   {
> >     { Bad_Opcode },
> >     { "cmpoxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
> >   },
> 
> Again - I never suggested to move up a level. I also didn't suggest to re-visit
> x86_64_table[]. I merely requested to re-use X86_64_VEX_0F38Ex in the
> evex_table[] entries, instead of adding 16 (each) identical new entries to
> x86_64_table[] with 16 new X86_64_EVEX_0F38Ex enumerators.
> 

Really good.

Thanks,
Lili

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-11-16 16:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03 16:50 [PATCH V2 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-06 17:07 ` Jan Beulich
2023-11-13  5:53   ` Cui, Lili
2023-11-13  8:34     ` Jan Beulich
2023-11-14  3:12       ` Cui, Lili
2023-11-14 10:29         ` Jan Beulich
2023-11-15  8:39           ` Cui, Lili
2023-11-07 13:29 ` Jan Beulich
2023-11-09  8:38   ` Cui, Lili
2023-11-09 11:07     ` Jan Beulich
2023-11-09 11:12     ` Jan Beulich
2023-11-07 14:53 ` Jan Beulich
2023-11-09 12:31   ` Cui, Lili
2023-11-09 13:05     ` Jan Beulich
2023-11-09 14:57       ` Cui, Lili
2023-11-09 15:39         ` Jan Beulich
2023-11-14  7:42   ` Cui, Lili
2023-11-14 10:40     ` Jan Beulich
2023-11-14 14:46       ` Cui, Lili
2023-11-15  6:03   ` Cui, Lili
2023-11-15  9:11     ` Jan Beulich
2023-11-15 11:43       ` Cui, Lili
2023-11-16 13:57         ` Jan Beulich
2023-11-16 15:10           ` Cui, Lili
2023-11-16 15:15             ` Jan Beulich
2023-11-16 16:12           ` Cui, Lili

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).