[PATCH] Support Intel AVX10.1

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Support Intel AVX10.1
@ 2023-07-27  7:15 Haochen Jiang
  2023-07-27 11:23 ` Jan Beulich
  2023-07-28  6:53 ` Jan Beulich
  0 siblings, 2 replies; 28+ messages in thread
From: Haochen Jiang @ 2023-07-27  7:15 UTC (permalink / raw)
  To: binutils; +Cc: hjl.tools, jbeulich

Hi all,

This patch aimed to add Intel AVX10.1 support.

The information is based on the following two documentation:

Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
It describes the Intel Advanced Vector Extensions 10 Instruction Set
Architecture.
https://cdrdv2.intel.com/v1/dl/getContent/784267

The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
It provides introductory information regarding the converged vector ISA: Intel
Advanced Vector Extensions 10.
https://cdrdv2.intel.com/v1/dl/getContent/784343

Regtested on x86_64-pc-linux-gnu.

In the implementation, to avoid adding AVX10.1 everywhere in i386-opc.tbl, I
choose to handle it in i386-gen.c for those instructions included in AVX10.1.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel AVX10.1.
	* config/tc-i386.c
	(cpu_arch): Add avx10.1.
	(cpu_flags_match): Handle AVX10.1.
	(check_VecOperands): Ditto.
	(check_register): Allow zmm and mask register for avx10.1.
	* doc/c-i386.texi: Document .avx10.1.
	* testsuite/gas/i386/i386.exp: Run AVX10.1 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/avx-ifma-inval.l: Add .noavx10.1.
	* testsuite/gas/i386/avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/avx-ifma.s: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/avx-vnni.s: Ditto.
	* testsuite/gas/i386/noavx512-1.l: Ditto.
	* testsuite/gas/i386/noavx512-1.s: Ditto.
	* testsuite/gas/i386/noavx512-2.l: Ditto.
	* testsuite/gas/i386/noavx512-2.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/xmmhi32.s: Ditto.
	* testsuite/gas/i386/avx10_1-inval.l: New test.
	* testsuite/gas/i386/avx10_1-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.s: Ditto.

opcodes/ChangeLog:

	* i386-gen.c (isa_dependencies): Add AVX10_1.
	(cpu_flags): Ditto.
	(output_i386_opcode): Add AVX10_1 in table for allowed
	instructions.
	* i386-opc.h (CpuAVX10_1): New.
	(i386_cpu_flags): Add cpuavx10_1.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Ditto.
---
 gas/NEWS                                      |     2 +
 gas/config/tc-i386.c                          |    21 +-
 gas/doc/c-i386.texi                           |     3 +-
 gas/testsuite/gas/i386/avx-ifma-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-ifma-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-ifma.s             |     3 +
 gas/testsuite/gas/i386/avx-vnni-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-vnni-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-vnni.s             |     3 +
 gas/testsuite/gas/i386/avx10_1-inval.l        |     6 +
 gas/testsuite/gas/i386/avx10_1-inval.s        |    10 +
 gas/testsuite/gas/i386/i386.exp               |     1 +
 gas/testsuite/gas/i386/noavx512-1.l           |    39 +-
 gas/testsuite/gas/i386/noavx512-1.s           |     1 +
 gas/testsuite/gas/i386/noavx512-2.l           |   153 +-
 gas/testsuite/gas/i386/noavx512-2.s           |     1 +
 .../gas/i386/x86-64-avx-ifma-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-ifma-inval.s          |     1 +
 .../gas/i386/x86-64-avx-vnni-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-vnni-inval.s          |     1 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l |     6 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s |    10 +
 gas/testsuite/gas/i386/x86-64-avx10_1.d       |    97 +
 gas/testsuite/gas/i386/x86-64-avx10_1.s       |    97 +
 gas/testsuite/gas/i386/x86-64.exp             |     2 +
 gas/testsuite/gas/i386/xmmhi32.s              |     1 +
 opcodes/i386-gen.c                            |    22 +-
 opcodes/i386-init.h                           |   664 +-
 opcodes/i386-opc.h                            |     3 +
 opcodes/i386-tbl.h                            | 10430 ++++++++--------
 30 files changed, 5951 insertions(+), 5644 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/avx10_1-inval.l
 create mode 100644 gas/testsuite/gas/i386/avx10_1-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.s

diff --git a/gas/NEWS b/gas/NEWS
index 1ed043511eb..4f3cc01d66a 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel AVX10.1 instructions.
+
 * Add support for Intel PBNDKB instructions.
 
 * Add support for Intel SM4 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index e35e2660ed5..c948b993520 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1156,6 +1156,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (sm3, SM3, ANY_SM3, false),
   SUBARCH (sm4, SM4, ANY_SM4, false),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
+  SUBARCH (avx10.1, AVX10_1, ANY_AVX10_1, false),
 };
 
 #undef SUBARCH
@@ -1845,7 +1846,9 @@ cpu_flags_match (const insn_template *t)
       i386_cpu_flags cpu = cpu_arch_flags;
 
       /* AVX512VL is no standalone feature - match it and then strip it.  */
-      if (x.bitfield.cpuavx512vl && !cpu.bitfield.cpuavx512vl)
+      if (x.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx10_1)
 	return match;
       x.bitfield.cpuavx512vl = 0;
 
@@ -1871,8 +1874,9 @@ cpu_flags_match (const insn_template *t)
 	    }
 	  else if (x.bitfield.cpuavx512f)
 	    {
-	      /* We need to check a few extra flags with AVX512F.  */
-	      if (cpu.bitfield.cpuavx512f
+	      /* We need to check a few extra flags with AVX512F
+		 or AVX10.1.  */
+	      if ((cpu.bitfield.cpuavx512f || cpu.bitfield.cpuavx10_1)
 		  && (!x.bitfield.cpugfni || cpu.bitfield.cpugfni)
 		  && (!x.bitfield.cpuvaes || cpu.bitfield.cpuvaes)
 		  && (!x.bitfield.cpuvpclmulqdq || cpu.bitfield.cpuvpclmulqdq))
@@ -6382,7 +6386,10 @@ check_VecOperands (const insn_template *t)
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
       && !t->cpu_flags.bitfield.cpuavx512vl
-      && !cpu_arch_flags.bitfield.cpuavx512vl)
+      && !cpu_arch_flags.bitfield.cpuavx512vl
+      && (!t->cpu_flags.bitfield.cpuavx10_1
+	  || (t->cpu_flags.bitfield.cpuavx10_1
+	      && !cpu_arch_flags.bitfield.cpuavx10_1)))
     {
       for (op = 0; op < t->operands; ++op)
 	{
@@ -13794,7 +13801,8 @@ static bool check_register (const reg_entry *r)
   if (r->reg_type.bitfield.class == RegMMX && !cpu_arch_flags.bitfield.cpummx)
     return false;
 
-  if (!cpu_arch_flags.bitfield.cpuavx512f)
+  if (!cpu_arch_flags.bitfield.cpuavx512f
+      && !cpu_arch_flags.bitfield.cpuavx10_1)
     {
       if (r->reg_type.bitfield.zmmword
 	  || r->reg_type.bitfield.class == RegMask)
@@ -13826,7 +13834,8 @@ static bool check_register (const reg_entry *r)
      mode, and require EVEX encoding.  */
   if (r->reg_flags & RegVRex)
     {
-      if (!cpu_arch_flags.bitfield.cpuavx512f
+      if ((!cpu_arch_flags.bitfield.cpuavx512f
+	   && !cpu_arch_flags.bitfield.cpuavx10_1)
 	  || flag_code != CODE_64BIT)
 	return false;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index dd06282a5a3..3223c452e62 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -212,6 +212,7 @@ accept various extension mnemonics.  For example,
 @code{sm3},
 @code{sm4},
 @code{pbndkb},
+@code{avx10.1},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1642,7 +1643,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
 @item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
-@item @samp{.pbndkb}
+@item @samp{.pbndkb} @tab @samp{.avx10.1}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.l b/gas/testsuite/gas/i386/avx-ifma-inval.l
index 5294c2ca73d..d2f1cf1d544 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.l
@@ -1,3 +1,3 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
-.*:7: Error: operand .* `vpmadd52huq'
+.*:7: Error: unsupported .* `vpmadd52huq'
+.*:8: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.s b/gas/testsuite/gas/i386/avx-ifma-inval.s
index 4b763b6e450..a1a50dcacc7 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-ifma.s b/gas/testsuite/gas/i386/avx-ifma.s
index 81046966d70..8c1b3133a19 100644
--- a/gas/testsuite/gas/i386/avx-ifma.s
+++ b/gas/testsuite/gas/i386/avx-ifma.s
@@ -17,6 +17,7 @@ _start:
        test_insn vpmadd52luq
 
        .arch .noavx512vl
+       .arch .noavx10.1
 
        vpmadd52huq	  %zmm0, %zmm0, %zmm0
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@ _start:
 
        .arch default
        .arch .noavx512ifma
+       .arch .noavx10.1
        
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
 
        .arch default
        .arch .noavx512f
+       .arch .noavx10.1
 
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.l b/gas/testsuite/gas/i386/avx-vnni-inval.l
index 58535cf8deb..5b9b1a514f4 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.l
@@ -1,3 +1,3 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusd'
-.*:7: Error: operand .* `vpdpbusd'
+.*:7: Error: unsupported .* `vpdpbusd'
+.*:8: Error: operand .* `vpdpbusd'
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.s b/gas/testsuite/gas/i386/avx-vnni-inval.s
index 28366f1e6d2..a2b07957e1e 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusd %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusd %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-vnni.s b/gas/testsuite/gas/i386/avx-vnni.s
index 6260330cca4..a31af4c4376 100644
--- a/gas/testsuite/gas/i386/avx-vnni.s
+++ b/gas/testsuite/gas/i386/avx-vnni.s
@@ -17,6 +17,7 @@ _start:
 	test_insn vpdpwssds
 
 	.arch .noavx512vl
+	.arch .noavx10.1
 
 	vpdpbusd	%zmm0, %zmm0, %zmm0
 	vpdpbusd	%ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@ _start:
 
 	.arch default
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
 
 	.arch default
 	.arch .noavx512f
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/avx10_1-inval.l b/gas/testsuite/gas/i386/avx10_1-inval.l
new file mode 100644
index 00000000000..533271e9ef4
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx10_1-inval.l
@@ -0,0 +1,6 @@
+.* Assembler messages:
+.*:6: Error: `vp2intersectq' is not supported on `i386.noavx512f'
+.*:7: Error: `vgatherpf0dpd' is not supported on `i386.noavx512f'
+.*:8: Error: `vrcp28ss' is not supported on `i386.noavx512f'
+.*:9: Error: `vp4dpwssd' is not supported on `i386.noavx512f'
+.*:10: Error: `v4fnmaddss' is not supported on `i386.noavx512f'
diff --git a/gas/testsuite/gas/i386/avx10_1-inval.s b/gas/testsuite/gas/i386/avx10_1-inval.s
new file mode 100644
index 00000000000..6de248aa808
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx10_1-inval.s
@@ -0,0 +1,10 @@
+# Check invalid AVX10.1 instructions
+
+	.text
+	.arch .noavx512f
+__start:
+	vp2intersectq	%xmm1, %xmm2, %k3
+	vgatherpf0dpd	123(%ebp,%ymm7,8){%k1}
+	vrcp28ss	%xmm4, %xmm5, %xmm6{%k7}
+	vp4dpwssd	(%ecx), %zmm4, %zmm1
+	v4fnmaddss	(%ecx), %xmm4, %xmm1
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 90819d80f60..37c0058827a 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -506,6 +506,7 @@ if [gas_32_check] then {
     run_dump_test "sm4"
     run_dump_test "sm4-intel"
     run_list_test "pbndkb-inval"
+    run_list_test "avx10_1-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/noavx512-1.l b/gas/testsuite/gas/i386/noavx512-1.l
index 655a90de2ce..c636717086a 100644
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,44 +1,44 @@
 .*: Assembler messages:
-.*:8: Error: .*operand size mismatch.*
-.*:9: Error: .*unsupported masking.*
+.*:9: Error: .*operand size mismatch.*
 .*:10: Error: .*unsupported masking.*
-.*:25: Error: .*not supported.*
+.*:11: Error: .*unsupported masking.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:11: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:21: Error: .*operand.*mismatch.*
-.*:22: Error: .*unsupported masking.*
+.*:18: Error: .*not supported.*
+.*:22: Error: .*operand.*mismatch.*
 .*:23: Error: .*unsupported masking.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unsupported masking.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:8: Error: .*bad register name.*
-.*:9: Error: .*unknown vector operation.*
+.*:28: Error: .*not supported.*
+.*:9: Error: .*bad register name.*
 .*:10: Error: .*unknown vector operation.*
-.*:11: Error: .*not supported.*
+.*:11: Error: .*unknown vector operation.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:18: Error: .*bad register name.*
-.*:19: Error: .*unknown vector operation.*
+.*:18: Error: .*not supported.*
+.*:19: Error: .*bad register name.*
 .*:20: Error: .*unknown vector operation.*
-.*:21: Error: .*bad register name.*
-.*:22: Error: .*unknown vector operation.*
+.*:21: Error: .*unknown vector operation.*
+.*:22: Error: .*bad register name.*
 .*:23: Error: .*unknown vector operation.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unknown vector operation.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 #...
 [ 	]*[0-9]+[ 	]+\# Test \.arch \.noavx512XX
 [ 	]*[0-9]+[ 	]+\.text
@@ -49,6 +49,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch default
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -93,6 +94,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512bw
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
@@ -131,6 +133,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512cd
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -172,6 +175,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512dq
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -213,6 +217,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512er
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -256,6 +261,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512ifma
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -297,6 +303,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512pf
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -339,6 +346,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512vbmi
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -380,6 +388,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512f
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
diff --git a/gas/testsuite/gas/i386/noavx512-1.s b/gas/testsuite/gas/i386/noavx512-1.s
index ab3abdc5ceb..8f579474fdb 100644
--- a/gas/testsuite/gas/i386/noavx512-1.s
+++ b/gas/testsuite/gas/i386/noavx512-1.s
@@ -5,6 +5,7 @@
 
 	.arch default
 	.arch \isa
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/noavx512-2.l b/gas/testsuite/gas/i386/noavx512-2.l
index 02c92e0d8db..1a73eb0613a 100644
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,106 +1,107 @@
 .*: Assembler messages:
-.*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:29: Error: .*unsupported instruction.*
+.*:28: Error: .*unsupported masking.*
 .*:30: Error: .*unsupported instruction.*
-.*:32: Error: .*unsupported instruction.*
+.*:31: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported masking.*
+.*:34: Error: .*unsupported instruction.*
 .*:37: Error: .*unsupported masking.*
-.*:39: Error: .*unsupported instruction.*
+.*:38: Error: .*unsupported masking.*
 .*:40: Error: .*unsupported instruction.*
-.*:43: Error: .*unsupported instruction.*
+.*:41: Error: .*unsupported instruction.*
 .*:44: Error: .*unsupported instruction.*
+.*:45: Error: .*unsupported instruction.*
 GAS LISTING .*
 #...
 [ 	]*1[ 	]+\# Test \.arch \.noavx512vl
 [ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*3[ 	]+1CF5
-[ 	]*4[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*3[ 	]+\.arch \.noavx10.1
+[ 	]*4[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
 [ 	]*4[ 	]+1CF5
-[ 	]*5[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*5[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
 [ 	]*5[ 	]+1CF5
-[ 	]*6[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*6[ 	]+C4F5
-[ 	]*7[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*6[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*6[ 	]+1CF5
+[ 	]*7[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
 [ 	]*7[ 	]+C4F5
-[ 	]*8[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*8[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
 [ 	]*8[ 	]+C4F5
-[ 	]*9[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*9[ 	]+7B31
-[ 	]*10[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*9[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*9[ 	]+C4F5
+[ 	]*10[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
 [ 	]*10[ 	]+7B31
-[ 	]*11[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*11[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 [ 	]*11[ 	]+7B31
-[ 	]*12[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*12[ 	]+C8F5
-[ 	]*13[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*13[ 	]+58F4
-[ 	]*14[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*12[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*12[ 	]+7B31
+[ 	]*13[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*13[ 	]+C8F5
+[ 	]*14[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
 [ 	]*14[ 	]+58F4
-[ 	]*15[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*15[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
 [ 	]*15[ 	]+58F4
-[ 	]*16[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*16[ 	]+B4F4
-[ 	]*17[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*16[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*16[ 	]+58F4
+[ 	]*17[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
 [ 	]*17[ 	]+B4F4
-[ 	]*18[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*18[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
 [ 	]*18[ 	]+B4F4
-[ 	]*19[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*19[ 	]+C68CFD17 
-[ 	]*19[ 	]+000000
-[ 	]*20[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*20[ 	]+8DF4
-[ 	]*21[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*19[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*19[ 	]+B4F4
+[ 	]*20[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*20[ 	]+C68CFD17 
+[ 	]*20[ 	]+000000
+[ 	]*21[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
 [ 	]*21[ 	]+8DF4
-[ 	]*22[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*22[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
 [ 	]*22[ 	]+8DF4
-[ 	]*23[ 	]+
-[ 	]*24[ 	]+\.arch \.noavx512vl
-[ 	]*25[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*25[ 	]+1CF5
-[ 	]*26[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*27[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*28[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*28[ 	]+C4F5
-[ 	]*29[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
-[ 	]*30[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
-[ 	]*31[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*31[ 	]+7B31
-[ 	]*32[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
-[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*23[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*23[ 	]+8DF4
+[ 	]*24[ 	]+
+[ 	]*25[ 	]+\.arch \.noavx512vl
+[ 	]*26[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
+[ 	]*26[ 	]+1CF5
+[ 	]*27[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*28[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*29[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
+[ 	]*29[ 	]+C4F5
+[ 	]*30[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*31[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*32[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
+[ 	]*32[ 	]+7B31
+[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 \fGAS LISTING .*
 
 
-[ 	]*34[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*34[ 	]+C8F5
-[ 	]*35[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*35[ 	]+58F4
-[ 	]*36[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*37[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*38[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*38[ 	]+B4F4
-[ 	]*39[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*40[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*41[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*41[ 	]+C68CFD17 
-[ 	]*41[ 	]+000000
-[ 	]*42[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*42[ 	]+8DF4
-[ 	]*43[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*44[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*45[ 	]+
-[ 	]*46[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
-[ 	]*46[ 	]+F5
-[ 	]*47[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*34[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*35[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*35[ 	]+C8F5
+[ 	]*36[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
+[ 	]*36[ 	]+58F4
+[ 	]*37[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*38[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*39[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
+[ 	]*39[ 	]+B4F4
+[ 	]*40[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*41[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*42[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*42[ 	]+C68CFD17 
+[ 	]*42[ 	]+000000
+[ 	]*43[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
+[ 	]*43[ 	]+8DF4
+[ 	]*44[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*45[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*46[ 	]+
+[ 	]*47[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
 [ 	]*47[ 	]+F5
-[ 	]*48[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
-[ 	]*49[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
-[ 	]*50[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
-[ 	]*50[ 	]+F5
-[ 	]*51[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
-[ 	]*52[ 	]+
+[ 	]*48[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*48[ 	]+F5
+[ 	]*49[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
+[ 	]*50[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
+[ 	]*51[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
+[ 	]*51[ 	]+F5
+[ 	]*52[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
+[ 	]*53[ 	]+
 [ 	]*[1-9][0-9]*[ 	]+\.intel_syntax noprefix
 [ 	]*[1-9][0-9]*[ 	]+\?\?\?\? 62F3FD48 		vfpclasspd k0, \[eax], 0
 [ 	]*[1-9][0-9]*[ 	]+660000
diff --git a/gas/testsuite/gas/i386/noavx512-2.s b/gas/testsuite/gas/i386/noavx512-2.s
index d974bcf9df5..a63d0484c61 100644
--- a/gas/testsuite/gas/i386/noavx512-2.s
+++ b/gas/testsuite/gas/i386/noavx512-2.s
@@ -1,5 +1,6 @@
 # Test .arch .noavx512vl
 	.text
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
index fad43f6768c..0046cbcb5d1 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
@@ -1,4 +1,4 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
 .*:7: Error: unsupported .* `vpmadd52huq'
-.*:8: Error: operand .* `vpmadd52huq'
+.*:8: Error: unsupported .* `vpmadd52huq'
+.*:9: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
index 76da0f1a37d..b2175e8d066 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
index 61808668a8d..81aedddf4e2 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
@@ -1,4 +1,4 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusds'
 .*:7: Error: unsupported .* `vpdpbusds'
-.*:8: Error: operand .* `vpdpbusds'
+.*:8: Error: unsupported .* `vpdpbusds'
+.*:9: Error: operand .* `vpdpbusds'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
index 8b1b80cac5d..78284546650 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusds %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusds %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
new file mode 100644
index 00000000000..8a7ebdccad9
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
@@ -0,0 +1,6 @@
+.* Assembler messages:
+.*:6: Error: `vp2intersectq' is not supported on `x86_64.noavx512f'
+.*:7: Error: `vgatherpf0dpd' is not supported on `x86_64.noavx512f'
+.*:8: Error: `vrcp28ss' is not supported on `x86_64.noavx512f'
+.*:9: Error: `vp4dpwssd' is not supported on `x86_64.noavx512f'
+.*:10: Error: `v4fnmaddss' is not supported on `x86_64.noavx512f'
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
new file mode 100644
index 00000000000..6de248aa808
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
@@ -0,0 +1,10 @@
+# Check invalid AVX10.1 instructions
+
+	.text
+	.arch .noavx512f
+__start:
+	vp2intersectq	%xmm1, %xmm2, %k3
+	vgatherpf0dpd	123(%ebp,%ymm7,8){%k1}
+	vrcp28ss	%xmm4, %xmm5, %xmm6{%k7}
+	vp4dpwssd	(%ecx), %zmm4, %zmm1
+	v4fnmaddss	(%ecx), %xmm4, %xmm1
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.d b/gas/testsuite/gas/i386/x86-64-avx10_1.d
new file mode 100644
index 00000000000..6d721e440fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.d
@@ -0,0 +1,97 @@
+#objdump: -dw
+#name: x86_64 AVX10.1 instructions
+#source: x86-64-avx10_1.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e1 ed 4a d9\s+kaddd  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ed 4a d9\s+kaddb  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ec 4a d9\s+kaddw  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c4 e1 ec 4a d9\s+kaddq  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*67 c5 f9 90 29\s+kmovb  \(%ecx\),%k5
+\s*[a-f0-9]+:\s*67 c5 f9 91 ac f4 c0 1d fe ff\s+kmovb  %k5,-0x1e240\(%esp,%esi,8\)
+\s*[a-f0-9]+:\s*67 c4 e1 f9 90 ac f4 c0 1d fe ff\s+kmovd  -0x1e240\(%esp,%esi,8\),%k5
+\s*[a-f0-9]+:\s*c5 fb 92 ed\s+kmovd  %ebp,%k5
+\s*[a-f0-9]+:\s*67 c5 f8 91 29\s+kmovw  %k5,\(%ecx\)
+\s*[a-f0-9]+:\s*c5 f8 93 ed\s+kmovw  %k5,%ebp
+\s*[a-f0-9]+:\s*62 f1 d5 0f 58 f4\s+vaddpd %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 31\s+vaddpd \(%ecx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 30\s+vaddpd \(%eax\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 00 08 00 00\s+vaddpd 0x800\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 f0 f7 ff ff\s+vaddpd -0x810\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 00 04 00 00\s+vaddpd 0x400\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 f8 fb ff ff\s+vaddpd -0x408\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f1 d5 cf 58 f4\s+vaddpd %zmm4,%zmm5,%zmm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 b4 f4 c0 1d fe ff\s+vaddpd -0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 4f 58 b2 00 20 00 00\s+vaddpd 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 80\s+vaddpd -0x1000\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 7f\s+vaddpd 0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 5f 58 b2 00 f8 ff ff\s+vaddpd -0x800\(%edx\)\{1to8\},%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 0f ce f4 ab\s+vgf2p8affineqb \$0xab,%xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 2f ce b4 f4 c0 1d fe ff 7b\s+vgf2p8affineqb \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 3f ce 72 7f 7b\s+vgf2p8affineqb \$0x7b,0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 0f cf 72 7f 7b\s+vgf2p8affineinvqb \$0x7b,0x7f0\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 af cf f4 ab\s+vgf2p8affineinvqb \$0xab,%ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*62 f2 55 4f cf f4\s+vgf2p8mulb %zmm4,%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 0f cf b4 f4 c0 1d fe ff\s+vgf2p8mulb -0x1e240\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 4f cf b2 00 20 00 00\s+vgf2p8mulb 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 82 2d 20 dc f0\s+vaesenc %ymm24,%ymm26,%ymm22
+\s*[a-f0-9]+:\s*67 62 e2 05 08 de 84 f4 c0 1d fe ff\s+vaesdec -0x1e240\(%esp,%esi,8\),%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 02 2d 00 dd d8\s+vaesenclast %xmm24,%xmm26,%xmm27
+\s*[a-f0-9]+:\s*67 62 62 35 20 df 52 7f\s+vaesdeclast 0xfe0\(%edx\),%ymm25,%ymm26
+\s*[a-f0-9]+:\s*62 82 2d 40 de f0\s+vaesdec %zmm24,%zmm26,%zmm22
+\s*[a-f0-9]+:\s*67 62 62 2d 40 df 19\s+vaesdeclast \(%ecx\),%zmm26,%zmm27
+\s*[a-f0-9]+:\s*62 a3 4d 00 44 fe ab\s+vpclmulqdq \$0xab,%xmm22,%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 e3 4d 00 44 7a 7f 7b\s+vpclmulqdq \$0x7b,0x7f0\(%edx\),%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 73 7d 20 44 b4 f4 c0 1d fe ff 7b\s+vpclmulqdq \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm16,%ymm14
+\s*[a-f0-9]+:\s*62 23 45 00 44 c6 11\s+vpclmulhqhqdq %xmm22,%xmm23,%xmm24
+\s*[a-f0-9]+:\s*62 c3 05 08 44 c6 10\s+vpclmullqhqdq %xmm14,%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 23 45 20 44 c6 01\s+vpclmulhqlqdq %ymm22,%ymm23,%ymm24
+\s*[a-f0-9]+:\s*62 c3 05 48 44 c6 00\s+vpclmullqlqdq %zmm14,%zmm15,%zmm16
+\s*[a-f0-9]+:\s*c4 e1 ed 4a d9\s+kaddd  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ed 4a d9\s+kaddb  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ec 4a d9\s+kaddw  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c4 e1 ec 4a d9\s+kaddq  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*67 c5 f9 90 29\s+kmovb  \(%ecx\),%k5
+\s*[a-f0-9]+:\s*67 c5 f9 91 ac f4 c0 1d fe ff\s+kmovb  %k5,-0x1e240\(%esp,%esi,8\)
+\s*[a-f0-9]+:\s*67 c4 e1 f9 90 ac f4 c0 1d fe ff\s+kmovd  -0x1e240\(%esp,%esi,8\),%k5
+\s*[a-f0-9]+:\s*c5 fb 92 ed\s+kmovd  %ebp,%k5
+\s*[a-f0-9]+:\s*67 c5 f8 91 29\s+kmovw  %k5,\(%ecx\)
+\s*[a-f0-9]+:\s*c5 f8 93 ed\s+kmovw  %k5,%ebp
+\s*[a-f0-9]+:\s*62 f1 d5 0f 58 f4\s+vaddpd %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 31\s+vaddpd \(%ecx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 30\s+vaddpd \(%eax\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 00 08 00 00\s+vaddpd 0x800\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 f0 f7 ff ff\s+vaddpd -0x810\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 00 04 00 00\s+vaddpd 0x400\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 f8 fb ff ff\s+vaddpd -0x408\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f1 d5 af 58 f4\s+vaddpd %ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 b4 f4 c0 1d fe ff\s+vaddpd -0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 7f\s+vaddpd 0xfe0\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 80\s+vaddpd -0x1000\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 7f\s+vaddpd 0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 80\s+vaddpd -0x400\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 0f ce f4 ab\s+vgf2p8affineqb \$0xab,%xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 2f ce b4 f4 c0 1d fe ff 7b\s+vgf2p8affineqb \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 3f ce 72 7f 7b\s+vgf2p8affineqb \$0x7b,0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 0f cf 72 7f 7b\s+vgf2p8affineinvqb \$0x7b,0x7f0\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 af cf f4 ab\s+vgf2p8affineinvqb \$0xab,%ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*62 f2 55 0f cf f4\s+vgf2p8mulb %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 0f cf b4 f4 c0 1d fe ff\s+vgf2p8mulb -0x1e240\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 2f cf 72 7f\s+vgf2p8mulb 0xfe0\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*62 82 2d 20 dc f0\s+vaesenc %ymm24,%ymm26,%ymm22
+\s*[a-f0-9]+:\s*67 62 e2 05 08 de 84 f4 c0 1d fe ff\s+vaesdec -0x1e240\(%esp,%esi,8\),%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 02 2d 00 dd d8\s+vaesenclast %xmm24,%xmm26,%xmm27
+\s*[a-f0-9]+:\s*67 62 62 35 20 df 52 7f\s+vaesdeclast 0xfe0\(%edx\),%ymm25,%ymm26
+\s*[a-f0-9]+:\s*62 82 2d 00 de f0\s+vaesdec %xmm24,%xmm26,%xmm22
+\s*[a-f0-9]+:\s*67 62 62 2d 00 df 19\s+vaesdeclast \(%ecx\),%xmm26,%xmm27
+\s*[a-f0-9]+:\s*62 a3 4d 00 44 fe ab\s+vpclmulqdq \$0xab,%xmm22,%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 e3 4d 00 44 7a 7f 7b\s+vpclmulqdq \$0x7b,0x7f0\(%edx\),%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 73 7d 20 44 b4 f4 c0 1d fe ff 7b\s+vpclmulqdq \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm16,%ymm14
+\s*[a-f0-9]+:\s*62 23 45 00 44 c6 11\s+vpclmulhqhqdq %xmm22,%xmm23,%xmm24
+\s*[a-f0-9]+:\s*62 c3 05 08 44 c6 10\s+vpclmullqhqdq %xmm14,%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 23 45 20 44 c6 01\s+vpclmulhqlqdq %ymm22,%ymm23,%ymm24
+\s*[a-f0-9]+:\s*62 c3 05 28 44 c6 00\s+vpclmullqlqdq %ymm14,%ymm15,%ymm16
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.s b/gas/testsuite/gas/i386/x86-64-avx10_1.s
new file mode 100644
index 00000000000..4c24b057f27
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.s
@@ -0,0 +1,97 @@
+# Check AVX10.1 instructions
+
+	.text
+_start:
+	.arch .noavx512f
+
+	kaddd	%k1, %k2, %k3
+	kaddb	%k1, %k2, %k3
+	kaddw	%k1, %k2, %k3
+	kaddq	%k1, %k2, %k3
+	kmovb   (%ecx), %k5
+	kmovb   %k5, -123456(%esp,%esi,8)
+	kmovd   -123456(%esp,%esi,8), %k5
+	kmovd   %ebp, %k5
+	kmovw   %k5, (%ecx)
+	kmovw   %k5, %ebp
+	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
+	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
+	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  %zmm4, %zmm5, %zmm6{%k7}{z}
+	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vaddpd  8192(%edx), %zmm5, %zmm6{%k7}
+	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vaddpd  -2048(%edx){1to8}, %zmm5, %zmm6{%k7}
+	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
+	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
+	vgf2p8mulb	%zmm4, %zmm5, %zmm6{%k7}
+	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
+	vgf2p8mulb	8192(%edx), %zmm5, %zmm6{%k7}
+	vaesenc	%ymm24, %ymm26, %ymm22
+	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
+	vaesenclast	%xmm24, %xmm26, %xmm27
+	vaesdeclast     4064(%edx), %ymm25, %ymm26
+	vaesdec		%zmm24, %zmm26, %zmm22
+	vaesdeclast	(%ecx), %zmm26, %zmm27
+	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
+	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
+	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
+	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
+	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
+	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
+	vpclmullqlqdq	%zmm14, %zmm15, %zmm16
+
+	.arch .noavx512vl
+
+	kaddd	%k1, %k2, %k3
+	kaddb	%k1, %k2, %k3
+	kaddw	%k1, %k2, %k3
+	kaddq	%k1, %k2, %k3
+	kmovb   (%ecx), %k5
+	kmovb   %k5, -123456(%esp,%esi,8)
+	kmovd   -123456(%esp,%esi,8), %k5
+	kmovd   %ebp, %k5
+	kmovw   %k5, (%ecx)
+	kmovw   %k5, %ebp
+	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
+	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
+	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  %ymm4, %ymm5, %ymm6{%k7}{z}
+	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vaddpd  4064(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vaddpd  -1024(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
+	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
+	vgf2p8mulb	%xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
+	vgf2p8mulb	4064(%edx), %ymm5, %ymm6{%k7}
+	vaesenc	%ymm24, %ymm26, %ymm22
+	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
+	vaesenclast	%xmm24, %xmm26, %xmm27
+	vaesdeclast     4064(%edx), %ymm25, %ymm26
+	vaesdec		%xmm24, %xmm26, %xmm22
+	vaesdeclast	(%ecx), %xmm26, %xmm27
+	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
+	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
+	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
+	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
+	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
+	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
+	vpclmullqlqdq	%ymm14, %ymm15, %ymm16
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index b83deebe88f..89eb78aaf17 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -449,6 +449,8 @@ run_dump_test "x86-64-sm4"
 run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-pbndkb"
 run_dump_test "x86-64-pbndkb-intel"
+run_dump_test "x86-64-avx10_1"
+run_list_test "x86-64-avx10_1-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/gas/testsuite/gas/i386/xmmhi32.s b/gas/testsuite/gas/i386/xmmhi32.s
index 8e8767ac37d..f562711714a 100644
--- a/gas/testsuite/gas/i386/xmmhi32.s
+++ b/gas/testsuite/gas/i386/xmmhi32.s
@@ -26,6 +26,7 @@ xmm:
 	vmovdqa	ymm24, ymm0
 
 	.arch .noavx512f
+	.arch .noavx10.1
 	vaddps	xmm0, xmm1, xmm8
 	vaddps	xmm0, xmm1, xmm16
 	vaddps	xmm0, xmm1, xmm24
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 91c22c9e873..974a1375bf0 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
     "AVX2" },
   { "FRED",
     "LKGS" },
+  { "AVX10_1",
+    "AVX2" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -378,6 +380,7 @@ static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (AVX10_1),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -1217,7 +1220,7 @@ static void
 output_i386_opcode (FILE *table, const char *name, char *str,
 		    char *last, int lineno)
 {
-  unsigned int i, length, prefix = 0, space = 0;
+  unsigned int i, j, length, prefix = 0, space = 0, k = 0;
   char *base_opcode, *extension_opcode, *end, *ident;
   char *cpu_flags, *opcode_modifier, *operand_types [MAX_OPERANDS];
   unsigned long long opcode;
@@ -1315,6 +1318,20 @@ output_i386_opcode (FILE *table, const char *name, char *str,
   ident = mkident (name);
   fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
 	   ident, 2 * (int)length, opcode, end, i);
+
+  j = strlen(ident);
+  /* All AVX512F based instructions are usable for AVX10.1 except
+     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
+  if (strstr (cpu_flags, "AVX512")
+      && !strstr (cpu_flags, "AVX512PF")
+      && !strstr (cpu_flags, "AVX512ER")
+      && !strstr (cpu_flags, "4FMAPS")
+      && !strstr (cpu_flags, "4VNNIW")
+      && !strstr (cpu_flags, "VP2INTERSECT"))
+    {
+      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
+      k = 1;
+    }
   free (ident);
 
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
@@ -1322,6 +1339,9 @@ output_i386_opcode (FILE *table, const char *name, char *str,
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
+  if (k)
+    free (cpu_flags);
+
   fprintf (table, "    { ");
 
   for (i = 0; i < ARRAY_SIZE (operand_types); i++)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 284475076a1..9a4b7d28feb 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -241,6 +241,8 @@ enum
   CpuFRED,
   /* lkgs instruction required */
   CpuLKGS,
+  /* Intel AVX10.1 Instructions support required.  */
+  CpuAVX10_1,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -444,6 +446,7 @@ typedef union i386_cpu_flags
       unsigned int cpurao_int:1;
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
+      unsigned int cpuavx10_1:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Support Intel AVX10.1
  2023-07-27  7:15 [PATCH] Support Intel AVX10.1 Haochen Jiang
@ 2023-07-27 11:23 ` Jan Beulich
  2023-07-28  2:50   ` Jiang, Haochen
  2023-07-28  6:53 ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-07-27 11:23 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 27.07.2023 09:15, Haochen Jiang wrote:
> @@ -1845,7 +1846,9 @@ cpu_flags_match (const insn_template *t)
>        i386_cpu_flags cpu = cpu_arch_flags;
>  
>        /* AVX512VL is no standalone feature - match it and then strip it.  */
> -      if (x.bitfield.cpuavx512vl && !cpu.bitfield.cpuavx512vl)
> +      if (x.bitfield.cpuavx512vl
> +	  && !cpu.bitfield.cpuavx512vl
> +	  && !cpu.bitfield.cpuavx10_1)
>  	return match;
>        x.bitfield.cpuavx512vl = 0;

I _think_ the code change is correct, but the comment needs updating
(then also clarifying what the intention here is).

> @@ -6382,7 +6386,10 @@ check_VecOperands (const insn_template *t)
>    cpu = cpu_flags_and (t->cpu_flags, avx512);
>    if (!cpu_flags_all_zero (&cpu)
>        && !t->cpu_flags.bitfield.cpuavx512vl
> -      && !cpu_arch_flags.bitfield.cpuavx512vl)
> +      && !cpu_arch_flags.bitfield.cpuavx512vl
> +      && (!t->cpu_flags.bitfield.cpuavx10_1
> +	  || (t->cpu_flags.bitfield.cpuavx10_1
> +	      && !cpu_arch_flags.bitfield.cpuavx10_1)))

This first of all can be simplified to

  if (!cpu_flags_all_zero (&cpu)
      && !t->cpu_flags.bitfield.cpuavx512vl
      && !cpu_arch_flags.bitfield.cpuavx512vl
      && (!t->cpu_flags.bitfield.cpuavx10_1
	  || !cpu_arch_flags.bitfield.cpuavx10_1))

which doesn't look quite right. But of course the two features also
aren't symmetric, so I may well be wrong. First of all the remark at
the very bottom of this mail needs resolving, though. Also for ...

> @@ -13794,7 +13801,8 @@ static bool check_register (const reg_entry *r)
>    if (r->reg_type.bitfield.class == RegMMX && !cpu_arch_flags.bitfield.cpummx)
>      return false;
>  
> -  if (!cpu_arch_flags.bitfield.cpuavx512f)
> +  if (!cpu_arch_flags.bitfield.cpuavx512f
> +      && !cpu_arch_flags.bitfield.cpuavx10_1)
>      {
>        if (r->reg_type.bitfield.zmmword
>  	  || r->reg_type.bitfield.class == RegMask)
> @@ -13826,7 +13834,8 @@ static bool check_register (const reg_entry *r)
>       mode, and require EVEX encoding.  */
>    if (r->reg_flags & RegVRex)
>      {
> -      if (!cpu_arch_flags.bitfield.cpuavx512f
> +      if ((!cpu_arch_flags.bitfield.cpuavx512f
> +	   && !cpu_arch_flags.bitfield.cpuavx10_1)
>  	  || flag_code != CODE_64BIT)
>  	return false;

... the changes to make here.

> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -506,6 +506,7 @@ if [gas_32_check] then {
>      run_dump_test "sm4"
>      run_dump_test "sm4-intel"
>      run_list_test "pbndkb-inval"
> +    run_list_test "avx10_1-inval"
>      run_list_test "sg"
>      run_dump_test "clzero"
>      run_dump_test "invlpgb"

Only an inval test? I'm inclined to say you either want both here, or
leave to just the 64-bit testing.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-avx10_1.s
> @@ -0,0 +1,97 @@
> +# Check AVX10.1 instructions
> +
> +	.text
> +_start:
> +	.arch .noavx512f

This implies ...

> +	kaddd	%k1, %k2, %k3
> +	kaddb	%k1, %k2, %k3
> +	kaddw	%k1, %k2, %k3
> +	kaddq	%k1, %k2, %k3
> +	kmovb   (%ecx), %k5
> +	kmovb   %k5, -123456(%esp,%esi,8)
> +	kmovd   -123456(%esp,%esi,8), %k5
> +	kmovd   %ebp, %k5
> +	kmovw   %k5, (%ecx)
> +	kmovw   %k5, %ebp
> +	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
> +	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
> +	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
> +	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
> +	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
> +	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
> +	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
> +	vaddpd  %zmm4, %zmm5, %zmm6{%k7}{z}
> +	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
> +	vaddpd  8192(%edx), %zmm5, %zmm6{%k7}
> +	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
> +	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
> +	vaddpd  -2048(%edx){1to8}, %zmm5, %zmm6{%k7}
> +	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
> +	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
> +	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
> +	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
> +	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
> +	vgf2p8mulb	%zmm4, %zmm5, %zmm6{%k7}
> +	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
> +	vgf2p8mulb	8192(%edx), %zmm5, %zmm6{%k7}
> +	vaesenc	%ymm24, %ymm26, %ymm22
> +	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
> +	vaesenclast	%xmm24, %xmm26, %xmm27
> +	vaesdeclast     4064(%edx), %ymm25, %ymm26
> +	vaesdec		%zmm24, %zmm26, %zmm22
> +	vaesdeclast	(%ecx), %zmm26, %zmm27
> +	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
> +	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
> +	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
> +	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
> +	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
> +	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
> +	vpclmullqlqdq	%zmm14, %zmm15, %zmm16
> +
> +	.arch .noavx512vl

... this, so for the test to be useful I think the two parts of the
test need to be swapped.

> +	kaddd	%k1, %k2, %k3
> +	kaddb	%k1, %k2, %k3
> +	kaddw	%k1, %k2, %k3
> +	kaddq	%k1, %k2, %k3
> +	kmovb   (%ecx), %k5
> +	kmovb   %k5, -123456(%esp,%esi,8)
> +	kmovd   -123456(%esp,%esi,8), %k5
> +	kmovd   %ebp, %k5
> +	kmovw   %k5, (%ecx)
> +	kmovw   %k5, %ebp

There's also little point in having these twice. Having them once in
the more restricted case (noavx512f) ought to suffice.

> --- a/gas/testsuite/gas/i386/xmmhi32.s
> +++ b/gas/testsuite/gas/i386/xmmhi32.s
> @@ -26,6 +26,7 @@ xmm:
>  	vmovdqa	ymm24, ymm0
>  
>  	.arch .noavx512f
> +	.arch .noavx10.1
>  	vaddps	xmm0, xmm1, xmm8
>  	vaddps	xmm0, xmm1, xmm16
>  	vaddps	xmm0, xmm1, xmm24

This (and alike) addition(s) point out another issue: People may be
using .noavx512{f,vl} to make sure they'll know if they wrongly use
certain insns. That protection becomes void with the additions as
you presently make them. This also relates to the first comment below
on i386-gen.c.

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
>      "AVX2" },
>    { "FRED",
>      "LKGS" },
> +  { "AVX10_1",
> +    "AVX2" },

This can't be quite right (as in: is insufficient): There's no
restriction to the low 16 XMM/YMM registers in AVX10.1, so some of
AVX512 is also a prereq.

To also address the earlier comment, maybe we need an artificial (i.e.
not user selectable) feature underlying both AVX10 and AVX512? (But I
haven't properly thought this through, so there may be issues with
such an approach as well.)

> @@ -1217,7 +1220,7 @@ static void
>  output_i386_opcode (FILE *table, const char *name, char *str,
>  		    char *last, int lineno)
>  {
> -  unsigned int i, length, prefix = 0, space = 0;
> +  unsigned int i, j, length, prefix = 0, space = 0, k = 0;
>    char *base_opcode, *extension_opcode, *end, *ident;
>    char *cpu_flags, *opcode_modifier, *operand_types [MAX_OPERANDS];
>    unsigned long long opcode;
> @@ -1315,6 +1318,20 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>    ident = mkident (name);
>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>  	   ident, 2 * (int)length, opcode, end, i);
> +
> +  j = strlen(ident);
> +  /* All AVX512F based instructions are usable for AVX10.1 except
> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
> +  if (strstr (cpu_flags, "AVX512")
> +      && !strstr (cpu_flags, "AVX512PF")
> +      && !strstr (cpu_flags, "AVX512ER")
> +      && !strstr (cpu_flags, "4FMAPS")
> +      && !strstr (cpu_flags, "4VNNIW")
> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> +    {
> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> +      k = 1;
> +    }
>    free (ident);

Unless you know for sure that there aren't going to be further AVX512
sub-features, this looks pretty fragile.

The doc also lists AVX10.1/256 as a possible mode (see e.g. table 1-3),
which isn't reflected throughout the patch at all.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] Support Intel AVX10.1
  2023-07-27 11:23 ` Jan Beulich
@ 2023-07-28  2:50   ` Jiang, Haochen
  0 siblings, 0 replies; 28+ messages in thread
From: Jiang, Haochen @ 2023-07-28  2:50 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

Let's start from the end of the patch to make sure we are on the
same page.

> > --- a/gas/testsuite/gas/i386/xmmhi32.s
> > +++ b/gas/testsuite/gas/i386/xmmhi32.s
> > @@ -26,6 +26,7 @@ xmm:
> >  	vmovdqa	ymm24, ymm0
> >
> >  	.arch .noavx512f
> > +	.arch .noavx10.1
> >  	vaddps	xmm0, xmm1, xmm8
> >  	vaddps	xmm0, xmm1, xmm16
> >  	vaddps	xmm0, xmm1, xmm24
> 
> This (and alike) addition(s) point out another issue: People may be
> using .noavx512{f,vl} to make sure they'll know if they wrongly use
> certain insns. That protection becomes void with the additions as
> you presently make them. This also relates to the first comment below
> on i386-gen.c.
> 
> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
> >      "AVX2" },
> >    { "FRED",
> >      "LKGS" },
> > +  { "AVX10_1",
> > +    "AVX2" },
> 
> This can't be quite right (as in: is insufficient): There's no
> restriction to the low 16 XMM/YMM registers in AVX10.1, so some of
> AVX512 is also a prereq.
> 
> To also address the earlier comment, maybe we need an artificial (i.e.
> not user selectable) feature underlying both AVX10 and AVX512? (But I
> haven't properly thought this through, so there may be issues with
> such an approach as well.)
> 

The intention is that AVX10 and AVX512 features are orthogonal and both
based on AVX2. Therefore, if AVX10 or AVX512 is enabled, the instructions
could be used. So when it comes to .no directives, we need to disable both
of them.

> > @@ -1217,7 +1220,7 @@ static void
> >  output_i386_opcode (FILE *table, const char *name, char *str,
> >  		    char *last, int lineno)
> >  {
> > -  unsigned int i, length, prefix = 0, space = 0;
> > +  unsigned int i, j, length, prefix = 0, space = 0, k = 0;
> >    char *base_opcode, *extension_opcode, *end, *ident;
> >    char *cpu_flags, *opcode_modifier, *operand_types [MAX_OPERANDS];
> >    unsigned long long opcode;
> > @@ -1315,6 +1318,20 @@ output_i386_opcode (FILE *table, const char
> *name, char *str,
> >    ident = mkident (name);
> >    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >  	   ident, 2 * (int)length, opcode, end, i);
> > +
> > +  j = strlen(ident);
> > +  /* All AVX512F based instructions are usable for AVX10.1 except
> > +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
> > +  if (strstr (cpu_flags, "AVX512")
> > +      && !strstr (cpu_flags, "AVX512PF")
> > +      && !strstr (cpu_flags, "AVX512ER")
> > +      && !strstr (cpu_flags, "4FMAPS")
> > +      && !strstr (cpu_flags, "4VNNIW")
> > +      && !strstr (cpu_flags, "VP2INTERSECT"))
> > +    {
> > +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> > +      k = 1;
> > +    }
> >    free (ident);
> 
> Unless you know for sure that there aren't going to be further AVX512
> sub-features, this looks pretty fragile.

Yes, AVX512 will be frozen. All vector instructions in the future will be
under AVX10.

> 
> The doc also lists AVX10.1/256 as a possible mode (see e.g. table 1-3),
> which isn't reflected throughout the patch at all.

In AVX10 series, the vector width will be set 'globally', which means we
could not enable 512 bit for AVX10.1 while disable 512 bit for AVX10.2.

For the current implementation, we choose to enable all the size in one
version for convenience and left it to compiler (e.g. GCC) to emit the
correct instructions. I suppose compiler should take responsibility but
not gas/disassembler to check whether they are correct. 

Alternatively, we could have another bit like avx10_512bit to enable and
disable AVX10 512 bit vector size. If that it is needed is open for discussion.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Support Intel AVX10.1
  2023-07-27  7:15 [PATCH] Support Intel AVX10.1 Haochen Jiang
  2023-07-27 11:23 ` Jan Beulich
@ 2023-07-28  6:53 ` Jan Beulich
  2023-08-01  2:18   ` Jiang, Haochen
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-07-28  6:53 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 27.07.2023 09:15, Haochen Jiang wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -1156,6 +1156,7 @@ static const arch_entry cpu_arch[] =
>    SUBARCH (sm3, SM3, ANY_SM3, false),
>    SUBARCH (sm4, SM4, ANY_SM4, false),
>    SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
> +  SUBARCH (avx10.1, AVX10_1, ANY_AVX10_1, false),
>  };

Alternative proposal: No addition to i386-opc.h at all, and here
simply set/clear the combination of all covered AVX512 (sub)features
(for clearing that'll imply clearing all others as well, obviously).
From my earlier comments I think that'll leave only the /256 (and
the perhaps theoretical only /128) AVX10.x sub-feature handling.
That, I think, wants dealing with by merely disallowing use of the
ZMM registers (when disabling /512) and the high YMM ones (when
disabling /256, assuming we want to allow for the AVX10.1/128
feature, which I think we should even if the doc says nothing like
that is planned right now for hardware). How to neatly express that
is an open question, because we may want this to remain orthogonal
to the actual AVX10.x features. Maybe something like .noavx10.x/512
(with 'x' meaning literal 'x', i.e. not as kind of a "wildcard"
covering multiple such directives), except I think the slash is
going to be a problem (yet it would be nice to stick to doc naming).
In any event this wouldn't be possible to express by another table
entry, but would require handling "manually".

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] Support Intel AVX10.1
  2023-07-28  6:53 ` Jan Beulich
@ 2023-08-01  2:18   ` Jiang, Haochen
  2023-08-01  6:49     ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-01  2:18 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> Alternative proposal: No addition to i386-opc.h at all, and here
> simply set/clear the combination of all covered AVX512 (sub)features
> (for clearing that'll imply clearing all others as well, obviously).
> From my earlier comments I think that'll leave only the /256 (and
> the perhaps theoretical only /128) AVX10.x sub-feature handling.
> That, I think, wants dealing with by merely disallowing use of the
> ZMM registers (when disabling /512) and the high YMM ones (when
> disabling /256, assuming we want to allow for the AVX10.1/128
> feature, which I think we should even if the doc says nothing like
> that is planned right now for hardware). How to neatly express that
> is an open question, because we may want this to remain orthogonal
> to the actual AVX10.x features. Maybe something like .noavx10.x/512
> (with 'x' meaning literal 'x', i.e. not as kind of a "wildcard"
> covering multiple such directives), except I think the slash is
> going to be a problem (yet it would be nice to stick to doc naming).
> In any event this wouldn't be possible to express by another table
> entry, but would require handling "manually".

I will give a try on that, should not be too complex.

The biggest "manual" part might be mask instructions (e.g. kaddq)
since w/o vector registers, we do not know the size of the mask
registers from operand property. The only way to know them is
from the instruction name.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Support Intel AVX10.1
  2023-08-01  2:18   ` Jiang, Haochen
@ 2023-08-01  6:49     ` Jan Beulich
  2023-08-04  7:45       ` Jiang, Haochen
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-01  6:49 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 01.08.2023 04:18, Jiang, Haochen wrote:
>> Alternative proposal: No addition to i386-opc.h at all, and here
>> simply set/clear the combination of all covered AVX512 (sub)features
>> (for clearing that'll imply clearing all others as well, obviously).
>> From my earlier comments I think that'll leave only the /256 (and
>> the perhaps theoretical only /128) AVX10.x sub-feature handling.
>> That, I think, wants dealing with by merely disallowing use of the
>> ZMM registers (when disabling /512) and the high YMM ones (when
>> disabling /256, assuming we want to allow for the AVX10.1/128
>> feature, which I think we should even if the doc says nothing like
>> that is planned right now for hardware). How to neatly express that
>> is an open question, because we may want this to remain orthogonal
>> to the actual AVX10.x features. Maybe something like .noavx10.x/512
>> (with 'x' meaning literal 'x', i.e. not as kind of a "wildcard"
>> covering multiple such directives), except I think the slash is
>> going to be a problem (yet it would be nice to stick to doc naming).
>> In any event this wouldn't be possible to express by another table
>> entry, but would require handling "manually".
> 
> I will give a try on that, should not be too complex.
> 
> The biggest "manual" part might be mask instructions (e.g. kaddq)
> since w/o vector registers, we do not know the size of the mask
> registers from operand property. The only way to know them is
> from the instruction name.

Most k...q insns are VEX.W1 NP ones, aren't they? That should limit
the amount of special casing based on mnemonic.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] Support Intel AVX10.1
  2023-08-01  6:49     ` Jan Beulich
@ 2023-08-04  7:45       ` Jiang, Haochen
  2023-08-04  7:57         ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-04  7:45 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> Most k...q insns are VEX.W1 NP ones, aren't they? That should limit the
> amount of special casing based on mnemonic.

Yes, but not all of them.

Also a quick update on patches, I will send out a v2 patch next week.

Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Support Intel AVX10.1
  2023-08-04  7:45       ` Jiang, Haochen
@ 2023-08-04  7:57         ` Jan Beulich
  2023-08-14  6:45           ` [PATCH v2] " Haochen Jiang
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-04  7:57 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 04.08.2023 09:45, Jiang, Haochen wrote:
>> Most k...q insns are VEX.W1 NP ones, aren't they? That should limit the
>> amount of special casing based on mnemonic.
> 
> Yes, but not all of them.

Right, and meanwhile I came to the conclusion that a new attribute may be
better anyway. Something along the lines of EVex that we already have,
just that we can't [easily] re-use / re-purpose it here (because of
otherwise is_evex_encoding() becoming true). If I was to make such a
change, I'd first try re-purposing nevertheless ...

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] Support Intel AVX10.1
  2023-08-04  7:57         ` Jan Beulich
@ 2023-08-14  6:45           ` Haochen Jiang
  2023-08-14  8:19             ` Jan Beulich
  2023-08-18 13:03             ` Jan Beulich
  0 siblings, 2 replies; 28+ messages in thread
From: Haochen Jiang @ 2023-08-14  6:45 UTC (permalink / raw)
  To: binutils; +Cc: hjl.tools, jbeulich

Hi all,

Sorry for the patch delay since the hot discussion in GCC community in AVX10
last week occupied lots of my time.

I have just finished v2 patch for AVX10.1.

Changes in v2:

1. Added new attribute avx10_max_512bit to indicate 512 bit usage. The name is
aligned with the attribute used for GCC implementation. Since binutils uses
default on mode for attribute, I added check only when zmm is used or 64 bit
mask register instruction is used but not in the table.

I am open for the attribute name change or the implementation method change.

2. Removed 32 bit invalid test. 64 bit is enough. Also removed redundant
tests in x86-64-avx10_1.s

3. Added some comments and simpified the changes in gas/config/tc-i386.c.

This change is needed for AVX512_VP2INTERSECT table entry.

@@ -6382,7 +6400,9 @@ check_VecOperands (const insn_template *t)
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
       && !t->cpu_flags.bitfield.cpuavx512vl
-      && !cpu_arch_flags.bitfield.cpuavx512vl)
+      && !cpu_arch_flags.bitfield.cpuavx512vl
+      && (!t->cpu_flags.bitfield.cpuavx10_1
+         || !cpu_arch_flags.bitfield.cpuavx10_1))

Hope that I did not ignore something need to change in v1 patch. Thank for
your review.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel AVX10.1.
	* config/tc-i386.c
	(cpu_arch): Add avx10.1 and avx10_max_512bit.
	(cpu_flags_match): Handle AVX10.1 related instructions.
	(check_VecOperands): Ditto.
	(check_register): Allow zmm for avx10.1-512 and mask registers
	for avx10.1.
	* doc/c-i386.texi: Document .avx10.1 and .avx10_max_512bit.
	* testsuite/gas/i386/avx-ifma-inval.l: Add .noavx10.1.
	* testsuite/gas/i386/avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/avx-ifma.s: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/avx-vnni.s: Ditto.
	* testsuite/gas/i386/noavx512-1.l: Ditto.
	* testsuite/gas/i386/noavx512-1.s: Ditto.
	* testsuite/gas/i386/noavx512-2.l: Ditto.
	* testsuite/gas/i386/noavx512-2.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/xmmhi32.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run AVX10.1 tests.
	* testsuite/gas/i386/x86-64-avx10_1-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.s: Ditto.

opcodes/ChangeLog:

	* i386-gen.c (isa_dependencies): Add AVX10_1 and
	AVX10_MAX_512BIT.
	(cpu_flags): Ditto.
	(output_i386_opcode): Add AVX10_1 in table for allowed
	instructions.
	* i386-init.h: Regenerated.
	* i386-opc.h (CpuAVX10_1, CpuAVX10_MAX_512BIT): New.
	(i386_cpu_flags): Add cpuavx10_1 and cpuavx10_max_512bit.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Ditto.
---
 gas/NEWS                                      |     2 +
 gas/config/tc-i386.c                          |    43 +-
 gas/doc/c-i386.texi                           |     4 +-
 gas/testsuite/gas/i386/avx-ifma-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-ifma-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-ifma.s             |     3 +
 gas/testsuite/gas/i386/avx-vnni-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-vnni-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-vnni.s             |     3 +
 gas/testsuite/gas/i386/noavx512-1.l           |    39 +-
 gas/testsuite/gas/i386/noavx512-1.s           |     1 +
 gas/testsuite/gas/i386/noavx512-2.l           |   153 +-
 gas/testsuite/gas/i386/noavx512-2.s           |     1 +
 .../gas/i386/x86-64-avx-ifma-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-ifma-inval.s          |     1 +
 .../gas/i386/x86-64-avx-vnni-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-vnni-inval.s          |     1 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l |    20 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s |    27 +
 gas/testsuite/gas/i386/x86-64-avx10_1.d       |    54 +
 gas/testsuite/gas/i386/x86-64-avx10_1.s       |    50 +
 gas/testsuite/gas/i386/x86-64.exp             |     2 +
 gas/testsuite/gas/i386/xmmhi32.s              |     1 +
 opcodes/i386-gen.c                            |    25 +-
 opcodes/i386-init.h                           |   684 +-
 opcodes/i386-opc.h                            |     6 +
 opcodes/i386-tbl.h                            | 10436 ++++++++--------
 27 files changed, 5924 insertions(+), 5650 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.s

diff --git a/gas/NEWS b/gas/NEWS
index 1ed043511eb..4f3cc01d66a 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel AVX10.1 instructions.
+
 * Add support for Intel PBNDKB instructions.
 
 * Add support for Intel SM4 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index e35e2660ed5..aa0941b0428 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1156,6 +1156,8 @@ static const arch_entry cpu_arch[] =
   SUBARCH (sm3, SM3, ANY_SM3, false),
   SUBARCH (sm4, SM4, ANY_SM4, false),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
+  SUBARCH (avx10.1, AVX10_1, ANY_AVX10_1, false),
+  SUBARCH (avx10_max_512bit, AVX10_MAX_512BIT, ANY_AVX10_MAX_512BIT, false),
 };
 
 #undef SUBARCH
@@ -1844,8 +1846,12 @@ cpu_flags_match (const insn_template *t)
       /* This instruction is available only on some archs.  */
       i386_cpu_flags cpu = cpu_arch_flags;
 
-      /* AVX512VL is no standalone feature - match it and then strip it.  */
-      if (x.bitfield.cpuavx512vl && !cpu.bitfield.cpuavx512vl)
+      /* AVX512VL is no standalone feature - match it and then strip it.
+         AVX10.1 shares the same encoding with AVX512VL, we also need to
+	 check it is set or not.  */
+      if (x.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx10_1)
 	return match;
       x.bitfield.cpuavx512vl = 0;
 
@@ -1871,13 +1877,25 @@ cpu_flags_match (const insn_template *t)
 	    }
 	  else if (x.bitfield.cpuavx512f)
 	    {
-	      /* We need to check a few extra flags with AVX512F.  */
-	      if (cpu.bitfield.cpuavx512f
+	      /* We need to check a few extra flags with AVX512F
+		 or AVX10.1.  */
+	      if ((cpu.bitfield.cpuavx512f || cpu.bitfield.cpuavx10_1)
 		  && (!x.bitfield.cpugfni || cpu.bitfield.cpugfni)
 		  && (!x.bitfield.cpuvaes || cpu.bitfield.cpuvaes)
 		  && (!x.bitfield.cpuvpclmulqdq || cpu.bitfield.cpuvpclmulqdq))
 		match |= CPU_FLAGS_ARCH_MATCH;
 	    }
+	  else if (x.bitfield.cpuavx512bw)
+	    {
+	      /* We need to eliminate 64 bit mask instructions when AVX512BW
+		 and AVX10.1-512 are both disabled.  */
+	      if (cpu.bitfield.cpuavx512bw
+		  || cpu_arch_flags.bitfield.cpuavx10_max_512bit
+		  || t->opcode_modifier.evex || t->opcode_modifier.vexw != 2
+		  || (t->opcode_modifier.opcodeprefix == 1
+		      && t->opcode_space != 3))
+		match |= CPU_FLAGS_ARCH_MATCH;
+	    }
 	  else
 	    match |= CPU_FLAGS_ARCH_MATCH;
 	}
@@ -6382,7 +6400,9 @@ check_VecOperands (const insn_template *t)
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
       && !t->cpu_flags.bitfield.cpuavx512vl
-      && !cpu_arch_flags.bitfield.cpuavx512vl)
+      && !cpu_arch_flags.bitfield.cpuavx512vl
+      && (!t->cpu_flags.bitfield.cpuavx10_1
+	  || !cpu_arch_flags.bitfield.cpuavx10_1))
     {
       for (op = 0; op < t->operands; ++op)
 	{
@@ -13794,10 +13814,14 @@ static bool check_register (const reg_entry *r)
   if (r->reg_type.bitfield.class == RegMMX && !cpu_arch_flags.bitfield.cpummx)
     return false;
 
-  if (!cpu_arch_flags.bitfield.cpuavx512f)
+  if (!cpu_arch_flags.bitfield.cpuavx512f
+      && !cpu_arch_flags.bitfield.cpuavx10_max_512bit)
     {
-      if (r->reg_type.bitfield.zmmword
-	  || r->reg_type.bitfield.class == RegMask)
+      if (r->reg_type.bitfield.zmmword)
+	return false;
+
+      if (!cpu_arch_flags.bitfield.cpuavx10_1
+	  && r->reg_type.bitfield.class == RegMask)
 	return false;
 
       if (!cpu_arch_flags.bitfield.cpuavx)
@@ -13826,7 +13850,8 @@ static bool check_register (const reg_entry *r)
      mode, and require EVEX encoding.  */
   if (r->reg_flags & RegVRex)
     {
-      if (!cpu_arch_flags.bitfield.cpuavx512f
+      if ((!cpu_arch_flags.bitfield.cpuavx512f
+	   && !cpu_arch_flags.bitfield.cpuavx10_1)
 	  || flag_code != CODE_64BIT)
 	return false;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index dd06282a5a3..ddb6e4dec81 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -212,6 +212,8 @@ accept various extension mnemonics.  For example,
 @code{sm3},
 @code{sm4},
 @code{pbndkb},
+@code{avx10.1},
+@code{avx10_max_512bit},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1642,7 +1644,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
 @item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
-@item @samp{.pbndkb}
+@item @samp{.pbndkb} @tab @samp{.avx10.1} @tab @samp{.avx10_max_512bit}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.l b/gas/testsuite/gas/i386/avx-ifma-inval.l
index 5294c2ca73d..d2f1cf1d544 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.l
@@ -1,3 +1,3 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
-.*:7: Error: operand .* `vpmadd52huq'
+.*:7: Error: unsupported .* `vpmadd52huq'
+.*:8: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.s b/gas/testsuite/gas/i386/avx-ifma-inval.s
index 4b763b6e450..a1a50dcacc7 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-ifma.s b/gas/testsuite/gas/i386/avx-ifma.s
index 81046966d70..8c1b3133a19 100644
--- a/gas/testsuite/gas/i386/avx-ifma.s
+++ b/gas/testsuite/gas/i386/avx-ifma.s
@@ -17,6 +17,7 @@ _start:
        test_insn vpmadd52luq
 
        .arch .noavx512vl
+       .arch .noavx10.1
 
        vpmadd52huq	  %zmm0, %zmm0, %zmm0
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@ _start:
 
        .arch default
        .arch .noavx512ifma
+       .arch .noavx10.1
        
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
 
        .arch default
        .arch .noavx512f
+       .arch .noavx10.1
 
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.l b/gas/testsuite/gas/i386/avx-vnni-inval.l
index 58535cf8deb..5b9b1a514f4 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.l
@@ -1,3 +1,3 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusd'
-.*:7: Error: operand .* `vpdpbusd'
+.*:7: Error: unsupported .* `vpdpbusd'
+.*:8: Error: operand .* `vpdpbusd'
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.s b/gas/testsuite/gas/i386/avx-vnni-inval.s
index 28366f1e6d2..a2b07957e1e 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusd %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusd %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-vnni.s b/gas/testsuite/gas/i386/avx-vnni.s
index 6260330cca4..a31af4c4376 100644
--- a/gas/testsuite/gas/i386/avx-vnni.s
+++ b/gas/testsuite/gas/i386/avx-vnni.s
@@ -17,6 +17,7 @@ _start:
 	test_insn vpdpwssds
 
 	.arch .noavx512vl
+	.arch .noavx10.1
 
 	vpdpbusd	%zmm0, %zmm0, %zmm0
 	vpdpbusd	%ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@ _start:
 
 	.arch default
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
 
 	.arch default
 	.arch .noavx512f
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/noavx512-1.l b/gas/testsuite/gas/i386/noavx512-1.l
index 655a90de2ce..c636717086a 100644
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,44 +1,44 @@
 .*: Assembler messages:
-.*:8: Error: .*operand size mismatch.*
-.*:9: Error: .*unsupported masking.*
+.*:9: Error: .*operand size mismatch.*
 .*:10: Error: .*unsupported masking.*
-.*:25: Error: .*not supported.*
+.*:11: Error: .*unsupported masking.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:11: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:21: Error: .*operand.*mismatch.*
-.*:22: Error: .*unsupported masking.*
+.*:18: Error: .*not supported.*
+.*:22: Error: .*operand.*mismatch.*
 .*:23: Error: .*unsupported masking.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unsupported masking.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:8: Error: .*bad register name.*
-.*:9: Error: .*unknown vector operation.*
+.*:28: Error: .*not supported.*
+.*:9: Error: .*bad register name.*
 .*:10: Error: .*unknown vector operation.*
-.*:11: Error: .*not supported.*
+.*:11: Error: .*unknown vector operation.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:18: Error: .*bad register name.*
-.*:19: Error: .*unknown vector operation.*
+.*:18: Error: .*not supported.*
+.*:19: Error: .*bad register name.*
 .*:20: Error: .*unknown vector operation.*
-.*:21: Error: .*bad register name.*
-.*:22: Error: .*unknown vector operation.*
+.*:21: Error: .*unknown vector operation.*
+.*:22: Error: .*bad register name.*
 .*:23: Error: .*unknown vector operation.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unknown vector operation.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 #...
 [ 	]*[0-9]+[ 	]+\# Test \.arch \.noavx512XX
 [ 	]*[0-9]+[ 	]+\.text
@@ -49,6 +49,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch default
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -93,6 +94,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512bw
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
@@ -131,6 +133,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512cd
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -172,6 +175,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512dq
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -213,6 +217,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512er
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -256,6 +261,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512ifma
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -297,6 +303,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512pf
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -339,6 +346,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512vbmi
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -380,6 +388,7 @@
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512f
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
diff --git a/gas/testsuite/gas/i386/noavx512-1.s b/gas/testsuite/gas/i386/noavx512-1.s
index ab3abdc5ceb..8f579474fdb 100644
--- a/gas/testsuite/gas/i386/noavx512-1.s
+++ b/gas/testsuite/gas/i386/noavx512-1.s
@@ -5,6 +5,7 @@
 
 	.arch default
 	.arch \isa
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/noavx512-2.l b/gas/testsuite/gas/i386/noavx512-2.l
index 02c92e0d8db..1a73eb0613a 100644
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,106 +1,107 @@
 .*: Assembler messages:
-.*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:29: Error: .*unsupported instruction.*
+.*:28: Error: .*unsupported masking.*
 .*:30: Error: .*unsupported instruction.*
-.*:32: Error: .*unsupported instruction.*
+.*:31: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported masking.*
+.*:34: Error: .*unsupported instruction.*
 .*:37: Error: .*unsupported masking.*
-.*:39: Error: .*unsupported instruction.*
+.*:38: Error: .*unsupported masking.*
 .*:40: Error: .*unsupported instruction.*
-.*:43: Error: .*unsupported instruction.*
+.*:41: Error: .*unsupported instruction.*
 .*:44: Error: .*unsupported instruction.*
+.*:45: Error: .*unsupported instruction.*
 GAS LISTING .*
 #...
 [ 	]*1[ 	]+\# Test \.arch \.noavx512vl
 [ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*3[ 	]+1CF5
-[ 	]*4[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*3[ 	]+\.arch \.noavx10.1
+[ 	]*4[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
 [ 	]*4[ 	]+1CF5
-[ 	]*5[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*5[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
 [ 	]*5[ 	]+1CF5
-[ 	]*6[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*6[ 	]+C4F5
-[ 	]*7[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*6[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*6[ 	]+1CF5
+[ 	]*7[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
 [ 	]*7[ 	]+C4F5
-[ 	]*8[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*8[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
 [ 	]*8[ 	]+C4F5
-[ 	]*9[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*9[ 	]+7B31
-[ 	]*10[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*9[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*9[ 	]+C4F5
+[ 	]*10[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
 [ 	]*10[ 	]+7B31
-[ 	]*11[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*11[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 [ 	]*11[ 	]+7B31
-[ 	]*12[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*12[ 	]+C8F5
-[ 	]*13[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*13[ 	]+58F4
-[ 	]*14[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*12[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*12[ 	]+7B31
+[ 	]*13[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*13[ 	]+C8F5
+[ 	]*14[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
 [ 	]*14[ 	]+58F4
-[ 	]*15[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*15[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
 [ 	]*15[ 	]+58F4
-[ 	]*16[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*16[ 	]+B4F4
-[ 	]*17[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*16[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*16[ 	]+58F4
+[ 	]*17[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
 [ 	]*17[ 	]+B4F4
-[ 	]*18[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*18[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
 [ 	]*18[ 	]+B4F4
-[ 	]*19[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*19[ 	]+C68CFD17 
-[ 	]*19[ 	]+000000
-[ 	]*20[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*20[ 	]+8DF4
-[ 	]*21[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*19[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*19[ 	]+B4F4
+[ 	]*20[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*20[ 	]+C68CFD17 
+[ 	]*20[ 	]+000000
+[ 	]*21[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
 [ 	]*21[ 	]+8DF4
-[ 	]*22[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*22[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
 [ 	]*22[ 	]+8DF4
-[ 	]*23[ 	]+
-[ 	]*24[ 	]+\.arch \.noavx512vl
-[ 	]*25[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*25[ 	]+1CF5
-[ 	]*26[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*27[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*28[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*28[ 	]+C4F5
-[ 	]*29[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
-[ 	]*30[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
-[ 	]*31[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*31[ 	]+7B31
-[ 	]*32[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
-[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*23[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*23[ 	]+8DF4
+[ 	]*24[ 	]+
+[ 	]*25[ 	]+\.arch \.noavx512vl
+[ 	]*26[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
+[ 	]*26[ 	]+1CF5
+[ 	]*27[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*28[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*29[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
+[ 	]*29[ 	]+C4F5
+[ 	]*30[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*31[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*32[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
+[ 	]*32[ 	]+7B31
+[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 \fGAS LISTING .*
 
 
-[ 	]*34[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*34[ 	]+C8F5
-[ 	]*35[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*35[ 	]+58F4
-[ 	]*36[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*37[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*38[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*38[ 	]+B4F4
-[ 	]*39[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*40[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*41[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*41[ 	]+C68CFD17 
-[ 	]*41[ 	]+000000
-[ 	]*42[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*42[ 	]+8DF4
-[ 	]*43[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*44[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*45[ 	]+
-[ 	]*46[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
-[ 	]*46[ 	]+F5
-[ 	]*47[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*34[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*35[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*35[ 	]+C8F5
+[ 	]*36[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
+[ 	]*36[ 	]+58F4
+[ 	]*37[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*38[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*39[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
+[ 	]*39[ 	]+B4F4
+[ 	]*40[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*41[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*42[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*42[ 	]+C68CFD17 
+[ 	]*42[ 	]+000000
+[ 	]*43[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
+[ 	]*43[ 	]+8DF4
+[ 	]*44[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*45[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*46[ 	]+
+[ 	]*47[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
 [ 	]*47[ 	]+F5
-[ 	]*48[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
-[ 	]*49[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
-[ 	]*50[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
-[ 	]*50[ 	]+F5
-[ 	]*51[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
-[ 	]*52[ 	]+
+[ 	]*48[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*48[ 	]+F5
+[ 	]*49[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
+[ 	]*50[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
+[ 	]*51[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
+[ 	]*51[ 	]+F5
+[ 	]*52[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
+[ 	]*53[ 	]+
 [ 	]*[1-9][0-9]*[ 	]+\.intel_syntax noprefix
 [ 	]*[1-9][0-9]*[ 	]+\?\?\?\? 62F3FD48 		vfpclasspd k0, \[eax], 0
 [ 	]*[1-9][0-9]*[ 	]+660000
diff --git a/gas/testsuite/gas/i386/noavx512-2.s b/gas/testsuite/gas/i386/noavx512-2.s
index d974bcf9df5..a63d0484c61 100644
--- a/gas/testsuite/gas/i386/noavx512-2.s
+++ b/gas/testsuite/gas/i386/noavx512-2.s
@@ -1,5 +1,6 @@
 # Test .arch .noavx512vl
 	.text
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
index fad43f6768c..0046cbcb5d1 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
@@ -1,4 +1,4 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
 .*:7: Error: unsupported .* `vpmadd52huq'
-.*:8: Error: operand .* `vpmadd52huq'
+.*:8: Error: unsupported .* `vpmadd52huq'
+.*:9: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
index 76da0f1a37d..b2175e8d066 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
index 61808668a8d..81aedddf4e2 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
@@ -1,4 +1,4 @@
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusds'
 .*:7: Error: unsupported .* `vpdpbusds'
-.*:8: Error: operand .* `vpdpbusds'
+.*:8: Error: unsupported .* `vpdpbusds'
+.*:9: Error: operand .* `vpdpbusds'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
index 8b1b80cac5d..78284546650 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
@@ -2,6 +2,7 @@
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusds %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusds %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
new file mode 100644
index 00000000000..0e4b9269c62
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
@@ -0,0 +1,20 @@
+.* Assembler messages:
+.*:6: Error: `vp2intersectq' is not supported on `x86_64.noavx512f'
+.*:7: Error: `vgatherpf0dpd' is not supported on `x86_64.noavx512f'
+.*:8: Error: `vrcp28ss' is not supported on `x86_64.noavx512f'
+.*:9: Error: `vp4dpwssd' is not supported on `x86_64.noavx512f'
+.*:10: Error: `v4fnmaddss' is not supported on `x86_64.noavx512f'
+.*:14: Error: `kaddq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:15: Error: `kandq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:16: Error: `kandnq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:17: Error: `kmovq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:18: Error: `knotq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:19: Error: `korq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:20: Error: `kortestq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:21: Error: `kshiftlq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:22: Error: `kshiftrq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:23: Error: `ktestq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:24: Error: `kunpckdq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:25: Error: `kxnorq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:26: Error: `kxorq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:27: Error: bad register name `%zmm4'
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
new file mode 100644
index 00000000000..1d091b83ae4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
@@ -0,0 +1,27 @@
+# Check invalid AVX10.1 instructions
+
+	.text
+__start:
+	.arch .noavx512f
+	vp2intersectq	%xmm1, %xmm2, %k3
+	vgatherpf0dpd	123(%ebp,%ymm7,8){%k1}
+	vrcp28ss	%xmm4, %xmm5, %xmm6{%k7}
+	vp4dpwssd	(%ecx), %zmm4, %zmm1
+	v4fnmaddss	(%ecx), %xmm4, %xmm1
+
+	.arch .noavx512f
+	.arch .noavx10_max_512bit
+	kaddq	%k1, %k2, %k3
+	kandq	%k1, %k2, %k3
+	kandnq	%k1, %k2, %k3
+	kmovq	%k1, %k2
+	knotq	%k1, %k2
+	korq	%k1, %k2, %k3
+	kortestq	%k1, %k2
+	kshiftlq	$1, %k1, %k2
+	kshiftrq	$1, %k1, %k2
+	ktestq	%k1, %k2
+	kunpckdq	%k1, %k2, %k3
+	kxnorq	%k1, %k2, %k3
+	kxorq	%k1, %k2, %k3
+	vaddpd  %zmm4, %zmm5, %zmm6
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.d b/gas/testsuite/gas/i386/x86-64-avx10_1.d
new file mode 100644
index 00000000000..4225c2e2c58
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.d
@@ -0,0 +1,54 @@
+#objdump: -dw
+#name: x86_64 AVX10.1 instructions
+#source: x86-64-avx10_1.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e1 ed 4a d9\s+kaddd  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ed 4a d9\s+kaddb  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ec 4a d9\s+kaddw  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c4 e1 ec 4a d9\s+kaddq  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*67 c5 f9 90 29\s+kmovb  \(%ecx\),%k5
+\s*[a-f0-9]+:\s*67 c5 f9 91 ac f4 c0 1d fe ff\s+kmovb  %k5,-0x1e240\(%esp,%esi,8\)
+\s*[a-f0-9]+:\s*67 c4 e1 f9 90 ac f4 c0 1d fe ff\s+kmovd  -0x1e240\(%esp,%esi,8\),%k5
+\s*[a-f0-9]+:\s*c5 fb 92 ed\s+kmovd  %ebp,%k5
+\s*[a-f0-9]+:\s*67 c5 f8 91 29\s+kmovw  %k5,\(%ecx\)
+\s*[a-f0-9]+:\s*c5 f8 93 ed\s+kmovw  %k5,%ebp
+\s*[a-f0-9]+:\s*62 f1 d5 0f 58 f4\s+vaddpd %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 31\s+vaddpd \(%ecx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 30\s+vaddpd \(%eax\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 00 08 00 00\s+vaddpd 0x800\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 f0 f7 ff ff\s+vaddpd -0x810\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 00 04 00 00\s+vaddpd 0x400\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 f8 fb ff ff\s+vaddpd -0x408\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f1 d5 cf 58 f4\s+vaddpd %zmm4,%zmm5,%zmm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 b4 f4 c0 1d fe ff\s+vaddpd -0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 4f 58 b2 00 20 00 00\s+vaddpd 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 80\s+vaddpd -0x1000\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 7f\s+vaddpd 0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 5f 58 b2 00 f8 ff ff\s+vaddpd -0x800\(%edx\)\{1to8\},%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 0f ce f4 ab\s+vgf2p8affineqb \$0xab,%xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 2f ce b4 f4 c0 1d fe ff 7b\s+vgf2p8affineqb \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 3f ce 72 7f 7b\s+vgf2p8affineqb \$0x7b,0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 0f cf 72 7f 7b\s+vgf2p8affineinvqb \$0x7b,0x7f0\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 af cf f4 ab\s+vgf2p8affineinvqb \$0xab,%ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*62 f2 55 4f cf f4\s+vgf2p8mulb %zmm4,%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 0f cf b4 f4 c0 1d fe ff\s+vgf2p8mulb -0x1e240\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 4f cf b2 00 20 00 00\s+vgf2p8mulb 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 82 2d 20 dc f0\s+vaesenc %ymm24,%ymm26,%ymm22
+\s*[a-f0-9]+:\s*67 62 e2 05 08 de 84 f4 c0 1d fe ff\s+vaesdec -0x1e240\(%esp,%esi,8\),%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 02 2d 00 dd d8\s+vaesenclast %xmm24,%xmm26,%xmm27
+\s*[a-f0-9]+:\s*67 62 62 35 20 df 52 7f\s+vaesdeclast 0xfe0\(%edx\),%ymm25,%ymm26
+\s*[a-f0-9]+:\s*62 82 2d 40 de f0\s+vaesdec %zmm24,%zmm26,%zmm22
+\s*[a-f0-9]+:\s*67 62 62 2d 40 df 19\s+vaesdeclast \(%ecx\),%zmm26,%zmm27
+\s*[a-f0-9]+:\s*62 a3 4d 00 44 fe ab\s+vpclmulqdq \$0xab,%xmm22,%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 e3 4d 00 44 7a 7f 7b\s+vpclmulqdq \$0x7b,0x7f0\(%edx\),%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 73 7d 20 44 b4 f4 c0 1d fe ff 7b\s+vpclmulqdq \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm16,%ymm14
+\s*[a-f0-9]+:\s*62 23 45 00 44 c6 11\s+vpclmulhqhqdq %xmm22,%xmm23,%xmm24
+\s*[a-f0-9]+:\s*62 c3 05 08 44 c6 10\s+vpclmullqhqdq %xmm14,%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 23 45 20 44 c6 01\s+vpclmulhqlqdq %ymm22,%ymm23,%ymm24
+\s*[a-f0-9]+:\s*62 c3 05 48 44 c6 00\s+vpclmullqlqdq %zmm14,%zmm15,%zmm16
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.s b/gas/testsuite/gas/i386/x86-64-avx10_1.s
new file mode 100644
index 00000000000..5169d15ba6b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.s
@@ -0,0 +1,50 @@
+# Check AVX10.1 instructions
+
+	.text
+_start:
+	.arch .noavx512f
+
+	kaddd	%k1, %k2, %k3
+	kaddb	%k1, %k2, %k3
+	kaddw	%k1, %k2, %k3
+	kaddq	%k1, %k2, %k3
+	kmovb   (%ecx), %k5
+	kmovb   %k5, -123456(%esp,%esi,8)
+	kmovd   -123456(%esp,%esi,8), %k5
+	kmovd   %ebp, %k5
+	kmovw   %k5, (%ecx)
+	kmovw   %k5, %ebp
+	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
+	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
+	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  %zmm4, %zmm5, %zmm6{%k7}{z}
+	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vaddpd  8192(%edx), %zmm5, %zmm6{%k7}
+	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vaddpd  -2048(%edx){1to8}, %zmm5, %zmm6{%k7}
+	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
+	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
+	vgf2p8mulb	%zmm4, %zmm5, %zmm6{%k7}
+	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
+	vgf2p8mulb	8192(%edx), %zmm5, %zmm6{%k7}
+	vaesenc	%ymm24, %ymm26, %ymm22
+	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
+	vaesenclast	%xmm24, %xmm26, %xmm27
+	vaesdeclast     4064(%edx), %ymm25, %ymm26
+	vaesdec		%zmm24, %zmm26, %zmm22
+	vaesdeclast	(%ecx), %zmm26, %zmm27
+	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
+	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
+	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
+	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
+	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
+	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
+	vpclmullqlqdq	%zmm14, %zmm15, %zmm16
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 52711cdcf6f..07e711df559 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -450,6 +450,8 @@ run_dump_test "x86-64-sm4"
 run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-pbndkb"
 run_dump_test "x86-64-pbndkb-intel"
+run_dump_test "x86-64-avx10_1"
+run_list_test "x86-64-avx10_1-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/gas/testsuite/gas/i386/xmmhi32.s b/gas/testsuite/gas/i386/xmmhi32.s
index 8e8767ac37d..f562711714a 100644
--- a/gas/testsuite/gas/i386/xmmhi32.s
+++ b/gas/testsuite/gas/i386/xmmhi32.s
@@ -26,6 +26,7 @@ xmm:
 	vmovdqa	ymm24, ymm0
 
 	.arch .noavx512f
+	.arch .noavx10.1
 	vaddps	xmm0, xmm1, xmm8
 	vaddps	xmm0, xmm1, xmm16
 	vaddps	xmm0, xmm1, xmm24
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 91c22c9e873..499149356b1 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -168,6 +168,10 @@ static const dependency isa_dependencies[] =
     "AVX2" },
   { "FRED",
     "LKGS" },
+  { "AVX10_1",
+    "AVX2" },
+  { "AVX10_MAX_512BIT",
+    "AVX10_1" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -378,6 +382,8 @@ static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (AVX10_1),
+  BITFIELD (AVX10_MAX_512BIT),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -1217,7 +1223,7 @@ static void
 output_i386_opcode (FILE *table, const char *name, char *str,
 		    char *last, int lineno)
 {
-  unsigned int i, length, prefix = 0, space = 0;
+  unsigned int i, j, length, prefix = 0, space = 0, k = 0;
   char *base_opcode, *extension_opcode, *end, *ident;
   char *cpu_flags, *opcode_modifier, *operand_types [MAX_OPERANDS];
   unsigned long long opcode;
@@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char *name, char *str,
   ident = mkident (name);
   fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
 	   ident, 2 * (int)length, opcode, end, i);
+
+  j = strlen(ident);
+  /* All AVX512F based instructions are usable for AVX10.1 except
+     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
+  if (strstr (cpu_flags, "AVX512")
+      && !strstr (cpu_flags, "AVX512PF")
+      && !strstr (cpu_flags, "AVX512ER")
+      && !strstr (cpu_flags, "4FMAPS")
+      && !strstr (cpu_flags, "4VNNIW")
+      && !strstr (cpu_flags, "VP2INTERSECT"))
+    {
+      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
+      k = 1;
+    }
   free (ident);
 
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
@@ -1322,6 +1342,9 @@ output_i386_opcode (FILE *table, const char *name, char *str,
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
+  if (k)
+    free (cpu_flags);
+
   fprintf (table, "    { ");
 
   for (i = 0; i < ARRAY_SIZE (operand_types); i++)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 284475076a1..e34cc518834 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -241,6 +241,10 @@ enum
   CpuFRED,
   /* lkgs instruction required */
   CpuLKGS,
+  /* Intel AVX10.1 Instructions support required.  */
+  CpuAVX10_1,
+  /* Intel AVX10 512 bit vector width support required.  */
+  CpuAVX10_MAX_512BIT,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -444,6 +448,8 @@ typedef union i386_cpu_flags
       unsigned int cpurao_int:1;
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
+      unsigned int cpuavx10_1:1;
+      unsigned int cpuavx10_max_512bit:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-14  6:45           ` [PATCH v2] " Haochen Jiang
@ 2023-08-14  8:19             ` Jan Beulich
  2023-08-14  8:46               ` Jiang, Haochen
  2023-08-18 13:03             ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-14  8:19 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 14.08.2023 08:45, Haochen Jiang wrote:
> Changes in v2:
> 
> 1. Added new attribute avx10_max_512bit to indicate 512 bit usage. The name is
> aligned with the attribute used for GCC implementation. Since binutils uses
> default on mode for attribute, I added check only when zmm is used or 64 bit
> mask register instruction is used but not in the table.
> 
> I am open for the attribute name change or the implementation method change.
> 
> 2. Removed 32 bit invalid test. 64 bit is enough. Also removed redundant
> tests in x86-64-avx10_1.s
> 
> 3. Added some comments and simpified the changes in gas/config/tc-i386.c.
> 
> This change is needed for AVX512_VP2INTERSECT table entry.

Before I get into any details here, I'd like to understand why there still
is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned
of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
bitfield in struct i386_opcode_modifier), and then a more general purpose
one (so that by it being / becoming not just boolean it can later also be
used to deal with the - for now only theoretical - AVX10/128 case).

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-14  8:19             ` Jan Beulich
@ 2023-08-14  8:46               ` Jiang, Haochen
  2023-08-14 10:33                 ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-14  8:46 UTC (permalink / raw)
  To: Beulich, Jan, hjl.tools; +Cc: binutils

> Before I get into any details here, I'd like to understand why there still
> is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned

The reason is that we would like to keep the OR logic in the toolchain, which
means opening AVX10.1 but closing AVX512F should not disable the encoding.

But I just double think on that and get your point. GCC is using a default "off"
mode, if we are using OR logic, no code and current behavior are changed and
everything is natural and smooth. However, binutils is using a default "on"
mode, if we stick to OR logic just like GCC, it will eventually corrupt the current
behavior of .noavx512xxx, which could be a problem. I am slightly persuaded on
the proposal of setting and clearing bits of AVX512 for AVX10 in binutils.

H.J., what is your opinion?

> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
> bitfield in struct i386_opcode_modifier), and then a more general purpose
> one (so that by it being / becoming not just boolean it can later also be
> used to deal with the - for now only theoretical - AVX10/128 case).

For question 2, I misunderstood the meaning of attribute. But I suppose
AVX10/128 is too theoretical to be true. I will make it a boolean for now.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-14  8:46               ` Jiang, Haochen
@ 2023-08-14 10:33                 ` Jan Beulich
  2023-08-14 10:35                   ` Jan Beulich
  2023-08-15  8:32                   ` Jiang, Haochen
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Beulich @ 2023-08-14 10:33 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 14.08.2023 10:46, Jiang, Haochen wrote:
>> Before I get into any details here, I'd like to understand why there still
>> is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned
> 
> The reason is that we would like to keep the OR logic in the toolchain, which
> means opening AVX10.1 but closing AVX512F should not disable the encoding.
> 
> But I just double think on that and get your point. GCC is using a default "off"
> mode, if we are using OR logic, no code and current behavior are changed and
> everything is natural and smooth. However, binutils is using a default "on"
> mode, if we stick to OR logic just like GCC, it will eventually corrupt the current
> behavior of .noavx512xxx, which could be a problem. I am slightly persuaded on
> the proposal of setting and clearing bits of AVX512 for AVX10 in binutils.

The primary indication of things being done the wrong way is the need to
add several ".arch .noavx10.1" in the testsuite. Whatever the final
solution, this should not be necessary (because it indicates people may
also need to change their code then, if they want a guarantee that no
512-bit insns are used).

>> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
>> bitfield in struct i386_opcode_modifier), and then a more general purpose
>> one (so that by it being / becoming not just boolean it can later also be
>> used to deal with the - for now only theoretical - AVX10/128 case).
> 
> For question 2, I misunderstood the meaning of attribute. But I suppose
> AVX10/128 is too theoretical to be true. I will make it a boolean for now.

Right, a boolean is fine initially, but with the spec explicitly allowing
the 128-bits-only mode, I'm pretty sure we ought to support that rather
sooner than later. After all, more artificial environments (emulators,
virtualization) may expose feature combinations not ever seen on real
hardware.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-14 10:33                 ` Jan Beulich
@ 2023-08-14 10:35                   ` Jan Beulich
  2023-08-15  8:32                   ` Jiang, Haochen
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2023-08-14 10:35 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 14.08.2023 12:33, Jan Beulich via Binutils wrote:
> On 14.08.2023 10:46, Jiang, Haochen wrote:
>> For question 2, I misunderstood the meaning of attribute. But I suppose
>> AVX10/128 is too theoretical to be true. I will make it a boolean for now.
> 
> Right, a boolean is fine initially, but with the spec explicitly allowing
> the 128-bits-only mode, I'm pretty sure we ought to support that rather
> sooner than later. After all, more artificial environments (emulators,
> virtualization) may expose feature combinations not ever seen on real
> hardware.

Actually, making it a boolean isn't nice, because a boolean would be named
differently than a numeric field. So I think it wants to be numeric, but
with only 0 and one other value permitted for now.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-14 10:33                 ` Jan Beulich
  2023-08-14 10:35                   ` Jan Beulich
@ 2023-08-15  8:32                   ` Jiang, Haochen
  2023-08-15 14:10                     ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-15  8:32 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: binutils, hjl.tools

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 14, 2023 6:34 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: binutils@sourceware.org; hjl.tools@gmail.com
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 14.08.2023 10:46, Jiang, Haochen wrote:
> >> Before I get into any details here, I'd like to understand why there
> >> still is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also
> >> concerned
> >
> > The reason is that we would like to keep the OR logic in the
> > toolchain, which means opening AVX10.1 but closing AVX512F should not
> disable the encoding.
> >
> > But I just double think on that and get your point. GCC is using a default "off"
> > mode, if we are using OR logic, no code and current behavior are
> > changed and everything is natural and smooth. However, binutils is using a
> default "on"
> > mode, if we stick to OR logic just like GCC, it will eventually
> > corrupt the current behavior of .noavx512xxx, which could be a
> > problem. I am slightly persuaded on the proposal of setting and clearing bits
> of AVX512 for AVX10 in binutils.
> 
> The primary indication of things being done the wrong way is the need to add
> several ".arch .noavx10.1" in the testsuite. Whatever the final solution, this
> should not be necessary (because it indicates people may also need to change
> their code then, if they want a guarantee that no 512-bit insns are used).
> 

I have an open after digging into .arch directives corner cases when we choose
to set/clear bits for AVX512 in AVX10.1.

Should directives like .noavx512f .avx10.1 open zmm registers? For directive
.noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?

> >> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a
> >> new bitfield in struct i386_opcode_modifier), and then a more general
> >> purpose one (so that by it being / becoming not just boolean it can
> >> later also be used to deal with the - for now only theoretical - AVX10/128
> case).
> >
> > For question 2, I misunderstood the meaning of attribute. But I
> > suppose
> > AVX10/128 is too theoretical to be true. I will make it a boolean for now.
> 
> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
> bits-only mode, I'm pretty sure we ought to support that rather sooner than
> later. After all, more artificial environments (emulators,
> virtualization) may expose feature combinations not ever seen on real
> hardware.

After I think twice on that, I suppose maybe it is not that appropriate to put it
into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
I suppose i386_opcode_modifier is a feature for instructions but not CPU.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-15  8:32                   ` Jiang, Haochen
@ 2023-08-15 14:10                     ` Jan Beulich
  2023-08-16  8:21                       ` Jiang, Haochen
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-15 14:10 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 15.08.2023 10:32, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, August 14, 2023 6:34 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: binutils@sourceware.org; hjl.tools@gmail.com
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 14.08.2023 10:46, Jiang, Haochen wrote:
>>>> Before I get into any details here, I'd like to understand why there
>>>> still is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also
>>>> concerned
>>>
>>> The reason is that we would like to keep the OR logic in the
>>> toolchain, which means opening AVX10.1 but closing AVX512F should not
>> disable the encoding.
>>>
>>> But I just double think on that and get your point. GCC is using a default "off"
>>> mode, if we are using OR logic, no code and current behavior are
>>> changed and everything is natural and smooth. However, binutils is using a
>> default "on"
>>> mode, if we stick to OR logic just like GCC, it will eventually
>>> corrupt the current behavior of .noavx512xxx, which could be a
>>> problem. I am slightly persuaded on the proposal of setting and clearing bits
>> of AVX512 for AVX10 in binutils.
>>
>> The primary indication of things being done the wrong way is the need to add
>> several ".arch .noavx10.1" in the testsuite. Whatever the final solution, this
>> should not be necessary (because it indicates people may also need to change
>> their code then, if they want a guarantee that no 512-bit insns are used).
>>
> 
> I have an open after digging into .arch directives corner cases when we choose
> to set/clear bits for AVX512 in AVX10.1.
> 
> Should directives like .noavx512f .avx10.1 open zmm registers?

You mean the combination of the two, in that order? Yes, of course.

> For directive
> .noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?

And then yes here, too.

In both cases what remains to be determined is how vector size is to
be limited. I think that wants to be independent of the .avx10.<N>.

>>>> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a
>>>> new bitfield in struct i386_opcode_modifier), and then a more general
>>>> purpose one (so that by it being / becoming not just boolean it can
>>>> later also be used to deal with the - for now only theoretical - AVX10/128
>> case).
>>>
>>> For question 2, I misunderstood the meaning of attribute. But I
>>> suppose
>>> AVX10/128 is too theoretical to be true. I will make it a boolean for now.
>>
>> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
>> bits-only mode, I'm pretty sure we ought to support that rather sooner than
>> later. After all, more artificial environments (emulators,
>> virtualization) may expose feature combinations not ever seen on real
>> hardware.
> 
> After I think twice on that, I suppose maybe it is not that appropriate to put it
> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
> I suppose i386_opcode_modifier is a feature for instructions but not CPU.

I disagree. See the uses of EVex, for example. As said above, I think
maximum vector width and ISA extensions want dealing with separately,
and only the latter would generally qualify for Cpu* flags. Furthermore
recall that the attribute wants widening sooner or later, and Cpu*
flags are uniformly boolean. Only attributes may have numeric values.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-15 14:10                     ` Jan Beulich
@ 2023-08-16  8:21                       ` Jiang, Haochen
  2023-08-16  8:59                         ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-16  8:21 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: binutils, hjl.tools

> > I have an open after digging into .arch directives corner cases when we choose
> > to set/clear bits for AVX512 in AVX10.1.
> >
> > Should directives like .noavx512f .avx10.1 open zmm registers?
> 
> You mean the combination of the two, in that order? Yes, of course.
> 
> > For directive
> > .noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?
> 
> And then yes here, too.
> 
> In both cases what remains to be determined is how vector size is to
> be limited. I think that wants to be independent of the .avx10.<N>.
> 

That also met my expectation. And it will make everything easy to
understand.

> >> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
> >> bits-only mode, I'm pretty sure we ought to support that rather sooner than
> >> later. After all, more artificial environments (emulators,
> >> virtualization) may expose feature combinations not ever seen on real
> >> hardware.
> >
> > After I think twice on that, I suppose maybe it is not that appropriate to put it
> > into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
> > I suppose i386_opcode_modifier is a feature for instructions but not CPU.
> 
> I disagree. See the uses of EVex, for example. As said above, I think
> maximum vector width and ISA extensions want dealing with separately,
> and only the latter would generally qualify for Cpu* flags. Furthermore
> recall that the attribute wants widening sooner or later, and Cpu*
> flags are uniformly boolean. Only attributes may have numeric values.

After I checked code, I still miss the point here.

My concern is how to actually disable the zmm registers for AVX10/256
and ymm registers for theoretical AVX10/128. I suppose i386_opcode_modifier
is more related to building up the whole encoding. But each AVX10.X/256 is an
actual arch.

Adding a feature in i386_opcode_modifier can indicate what is the maximum
vector length the instruction is allowed on all archs but has nothing to do with
disabling zmm registers on an 256-bit only arch.

I might be wrong on the understanding on what to add in i386_opcode_modifier.
Please just correct if there is something wrong.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-16  8:21                       ` Jiang, Haochen
@ 2023-08-16  8:59                         ` Jan Beulich
  2023-08-17  9:08                           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-16  8:59 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 16.08.2023 10:21, Jiang, Haochen wrote:
>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>
>> I disagree. See the uses of EVex, for example. As said above, I think
>> maximum vector width and ISA extensions want dealing with separately,
>> and only the latter would generally qualify for Cpu* flags. Furthermore
>> recall that the attribute wants widening sooner or later, and Cpu*
>> flags are uniformly boolean. Only attributes may have numeric values.
> 
> After I checked code, I still miss the point here.
> 
> My concern is how to actually disable the zmm registers for AVX10/256
> and ymm registers for theoretical AVX10/128.

That's the easy part: That'll want doing in check_register(). The issue
is with insns which do 512-bit operation despite not using zmm registers
(think of vfpclassp* with memory operand).

> I suppose i386_opcode_modifier
> is more related to building up the whole encoding. But each AVX10.X/256 is an
> actual arch.

I wouldn't agree with the last sentence, but ...

> Adding a feature in i386_opcode_modifier can indicate what is the maximum
> vector length the instruction is allowed on all archs but has nothing to do with
> disabling zmm registers on an 256-bit only arch.

... you still have a point here. Maybe it only wants to be a boolean,
indicating that an insn is vector-length sensitive. Yet re-using the
EVex attribute continues to be an option: With vector length
constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
unavailable for encoding, and EVEXDYN would be equally constrained.
And if re-using that attribute continues to be an option, adding a
new non-boolean attribute clearly is also possible.

So I guess there may have been a slight misunderstanding: I was
suggesting an attribute expressing permissible vector lengths (hence
the consideration of re-using EVex), which would then be checked
against the established (through whatever directive / command line
option) maximum vector length. I did not suggest a new "max vector
length" attribute.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-16  8:59                         ` Jan Beulich
@ 2023-08-17  9:08                           ` Jan Beulich
  2023-08-18  6:53                             ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-17  9:08 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 16.08.2023 10:59, Jan Beulich via Binutils wrote:
> On 16.08.2023 10:21, Jiang, Haochen wrote:
>>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>>
>>> I disagree. See the uses of EVex, for example. As said above, I think
>>> maximum vector width and ISA extensions want dealing with separately,
>>> and only the latter would generally qualify for Cpu* flags. Furthermore
>>> recall that the attribute wants widening sooner or later, and Cpu*
>>> flags are uniformly boolean. Only attributes may have numeric values.
>>
>> After I checked code, I still miss the point here.
>>
>> My concern is how to actually disable the zmm registers for AVX10/256
>> and ymm registers for theoretical AVX10/128.
> 
> That's the easy part: That'll want doing in check_register(). The issue
> is with insns which do 512-bit operation despite not using zmm registers
> (think of vfpclassp* with memory operand).
> 
>> I suppose i386_opcode_modifier
>> is more related to building up the whole encoding. But each AVX10.X/256 is an
>> actual arch.
> 
> I wouldn't agree with the last sentence, but ...
> 
>> Adding a feature in i386_opcode_modifier can indicate what is the maximum
>> vector length the instruction is allowed on all archs but has nothing to do with
>> disabling zmm registers on an 256-bit only arch.
> 
> ... you still have a point here. Maybe it only wants to be a boolean,
> indicating that an insn is vector-length sensitive. Yet re-using the
> EVex attribute continues to be an option: With vector length
> constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
> unavailable for encoding, and EVEXDYN would be equally constrained.
> And if re-using that attribute continues to be an option, adding a
> new non-boolean attribute clearly is also possible.
> 
> So I guess there may have been a slight misunderstanding: I was
> suggesting an attribute expressing permissible vector lengths (hence
> the consideration of re-using EVex), which would then be checked
> against the established (through whatever directive / command line
> option) maximum vector length. I did not suggest a new "max vector
> length" attribute.

Just to mention it: I've meanwhile realized that re-using EVex here will
collide with APX introducing EVEX-encoded KMOV*. So it'll need to be a
very similar but distinct attribute. And if it turned out that the
attribute then is really only needed on the mask insns (using EVex
elsewhere), it could equally well be a "permitted vector lengths" or a
"maximum vector length" one, as both are then equal. Question is what
AVX10/128 would mean for VEX-encoded insns. It seems likely that 256-bit
forms wouldn't be permitted there either then, in which case applicable
VEX-encoded insns would then need to gain such attributes as well. In
that case it would of course be more logical to stick to "permitted
vector lengths".

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-17  9:08                           ` Jan Beulich
@ 2023-08-18  6:53                             ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2023-08-18  6:53 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: binutils, hjl.tools

On 17.08.2023 11:08, Jan Beulich via Binutils wrote:
> On 16.08.2023 10:59, Jan Beulich via Binutils wrote:
>> On 16.08.2023 10:21, Jiang, Haochen wrote:
>>>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>>>
>>>> I disagree. See the uses of EVex, for example. As said above, I think
>>>> maximum vector width and ISA extensions want dealing with separately,
>>>> and only the latter would generally qualify for Cpu* flags. Furthermore
>>>> recall that the attribute wants widening sooner or later, and Cpu*
>>>> flags are uniformly boolean. Only attributes may have numeric values.
>>>
>>> After I checked code, I still miss the point here.
>>>
>>> My concern is how to actually disable the zmm registers for AVX10/256
>>> and ymm registers for theoretical AVX10/128.
>>
>> That's the easy part: That'll want doing in check_register(). The issue
>> is with insns which do 512-bit operation despite not using zmm registers
>> (think of vfpclassp* with memory operand).
>>
>>> I suppose i386_opcode_modifier
>>> is more related to building up the whole encoding. But each AVX10.X/256 is an
>>> actual arch.
>>
>> I wouldn't agree with the last sentence, but ...
>>
>>> Adding a feature in i386_opcode_modifier can indicate what is the maximum
>>> vector length the instruction is allowed on all archs but has nothing to do with
>>> disabling zmm registers on an 256-bit only arch.
>>
>> ... you still have a point here. Maybe it only wants to be a boolean,
>> indicating that an insn is vector-length sensitive. Yet re-using the
>> EVex attribute continues to be an option: With vector length
>> constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
>> unavailable for encoding, and EVEXDYN would be equally constrained.
>> And if re-using that attribute continues to be an option, adding a
>> new non-boolean attribute clearly is also possible.
>>
>> So I guess there may have been a slight misunderstanding: I was
>> suggesting an attribute expressing permissible vector lengths (hence
>> the consideration of re-using EVex), which would then be checked
>> against the established (through whatever directive / command line
>> option) maximum vector length. I did not suggest a new "max vector
>> length" attribute.
> 
> Just to mention it: I've meanwhile realized that re-using EVex here will
> collide with APX introducing EVEX-encoded KMOV*. So it'll need to be a
> very similar but distinct attribute. And if it turned out that the
> attribute then is really only needed on the mask insns (using EVex
> elsewhere), it could equally well be a "permitted vector lengths" or a
> "maximum vector length" one, as both are then equal. Question is what
> AVX10/128 would mean for VEX-encoded insns. It seems likely that 256-bit
> forms wouldn't be permitted there either then, in which case applicable
> VEX-encoded insns would then need to gain such attributes as well. In
> that case it would of course be more logical to stick to "permitted
> vector lengths".

Sorry, yet another update. For one it can't be "maximum", but only
"minimum". And I meanwhile think "permitted" along the lines of EVex
won't catch it either. I guess I will want to take a stab myself ...

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-14  6:45           ` [PATCH v2] " Haochen Jiang
  2023-08-14  8:19             ` Jan Beulich
@ 2023-08-18 13:03             ` Jan Beulich
  2023-08-23  2:20               ` Jiang, Haochen
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-18 13:03 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 14.08.2023 08:45, Haochen Jiang wrote:
> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>    ident = mkident (name);
>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>  	   ident, 2 * (int)length, opcode, end, i);
> +
> +  j = strlen(ident);
> +  /* All AVX512F based instructions are usable for AVX10.1 except
> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
> +  if (strstr (cpu_flags, "AVX512")
> +      && !strstr (cpu_flags, "AVX512PF")
> +      && !strstr (cpu_flags, "AVX512ER")
> +      && !strstr (cpu_flags, "4FMAPS")
> +      && !strstr (cpu_flags, "4VNNIW")
> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> +    {
> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> +      k = 1;
> +    }
>    free (ident);

While making a patch myself along the lines of what I had outlined, I came
to realize that the above isn't enough. (I'm pretty sure I wouldn't have
spotted this by merely reviewing your patch.) This may be a result of the
spec being somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
There's a note there saying something about the respective EVEX encodings.
But that still requires the VEX encodings connected to these three
features to also become suitably available. While this works fine for
GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
surely are available when the 256-bit ones are, would become impossible
to use. The assembler would pick the (larger) EVEX forms instead. There
are two ways to solve this that I can see right away:
1) AES becomes a dependency of VAES (and PCLMULQDQ one of VPCLMULQDQ)
2) We put in place extra templates.
I'm wary of the first option as long as not at least informally supported
by you (Intel). Hence I went with option 2 for now.

I'm only done with the /512 patch, so I won't post right away. I'm still
debating with myself whether to control maximum vector length via a new
directive, or via a special form of .arch.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-18 13:03             ` Jan Beulich
@ 2023-08-23  2:20               ` Jiang, Haochen
  2023-08-23  3:34                 ` [RFC][PATCH v3] " Haochen Jiang
  2023-08-23  5:54                 ` [PATCH v2] " Jan Beulich
  0 siblings, 2 replies; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-23  2:20 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, August 18, 2023 9:03 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 14.08.2023 08:45, Haochen Jiang wrote:
> > @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
> *name, char *str,
> >    ident = mkident (name);
> >    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >  	   ident, 2 * (int)length, opcode, end, i);
> > +
> > +  j = strlen(ident);
> > +  /* All AVX512F based instructions are usable for AVX10.1 except
> > +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> > + (cpu_flags, "AVX512")
> > +      && !strstr (cpu_flags, "AVX512PF")
> > +      && !strstr (cpu_flags, "AVX512ER")
> > +      && !strstr (cpu_flags, "4FMAPS")
> > +      && !strstr (cpu_flags, "4VNNIW")
> > +      && !strstr (cpu_flags, "VP2INTERSECT"))
> > +    {
> > +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> > +      k = 1;
> > +    }
> >    free (ident);
> 
> While making a patch myself along the lines of what I had outlined, I came to
> realize that the above isn't enough. (I'm pretty sure I wouldn't have spotted
> this by merely reviewing your patch.) This may be a result of the spec being
> somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
> There's a note there saying something about the respective EVEX encodings.
> But that still requires the VEX encodings connected to these three features to
> also become suitably available. While this works fine for GFNI, it doesn't for
> the other two: The 128-bit VEX encodings, which surely are available when the
> 256-bit ones are, would become impossible to use. The assembler would pick
> the (larger) EVEX forms instead. There are two ways to solve this that I can see
> right away:
> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> VPCLMULQDQ)
> 2) We put in place extra templates.
> I'm wary of the first option as long as not at least informally supported by you
> (Intel). Hence I went with option 2 for now.
> 
> I'm only done with the /512 patch, so I won't post right away. I'm still
> debating with myself whether to control maximum vector length via a new
> directive, or via a special form of .arch.

Hi Jan,

Do you think a command line option like -mavx10maxvl=256/512 with default 512
is ok for this scenario? I am working to revise the AVX10.1 patch like that.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC][PATCH v3] Support Intel AVX10.1
  2023-08-23  2:20               ` Jiang, Haochen
@ 2023-08-23  3:34                 ` Haochen Jiang
  2023-08-23  7:17                   ` Jan Beulich
  2023-08-23  5:54                 ` [PATCH v2] " Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Haochen Jiang @ 2023-08-23  3:34 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools

Hi all,

In this version of patch, I tried with a command option named as
-mavx10maxvl= to limit the maximum vector size. If we are trying this
method, the option name could be changed.

For the kmovq promoted to EVEX128 in APX, I checked the encoding and we
could just change the condition in cpu_flag_match from
t->opcode_modifier.evex to
t->opcode_modifier.evex != 0 && t->opcode_modifier.evex != 2 to make
everything work.

Thx,
Haochen

Changes in v3:

1. Removed cpu attribute avx10_1 and avx10_max_512bit.

2. Let avx10.1 set/clear avx512xxx flags.

3. Added a new command line option -mavx10maxvl=256/512 with the default
value as 512.

Changes in v2:

1. Added new attribute avx10_max_512bit to indicate 512 bit usage. The name is
aligned with the attribute used for GCC implementation. Since binutils uses
default on mode for attribute, I added check only when zmm is used or 64 bit
mask register instruction is used but not in the table.

I am open for the attribute name change or the implementation method change.

2. Removed 32 bit invalid test. 64 bit is enough. Also removed redundant
tests in x86-64-avx10_1.s

3. Added some comments and simpified the changes in gas/config/tc-i386.c.

gas/ChangeLog:

	* NEWS: Support Intel AVX10.1.
	* config/tc-i386.c (avx10maxvl): New.
	(cpu_arch): Add avx10.1.
	(cpu_flags_match): Eliminate 64 bit mask register insturctions
	for -mavx10maxvl=256.
	(set_cpu_arch): Handle AVX10.1.
	(check_register): Disable zmm when -mavx10maxvl=256.
	(OPTION_MAVX10MAXVL): New.
	(md_longopts): Handle -mavx10maxvl.
	(md_parse_option): Ditto.
	(md_show_usage): Ditto.
	* doc/c-i386.texi: Document .avx10.1 and -mavx10maxvl.
	* testsuite/gas/i386/x86-64.exp: Run AVX10.1 tests.
	* testsuite/gas/i386/x86-64-avx10_1-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.s: Ditto.
---
 gas/NEWS                                      |  2 +
 gas/config/tc-i386.c                          | 72 +++++++++++++++++--
 gas/doc/c-i386.texi                           | 13 +++-
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l |  6 ++
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s | 11 +++
 gas/testsuite/gas/i386/x86-64-avx10_1.d       | 54 ++++++++++++++
 gas/testsuite/gas/i386/x86-64-avx10_1.s       | 51 +++++++++++++
 .../gas/i386/x86-64-mavx10maxvl256-inval.l    | 18 +++++
 .../gas/i386/x86-64-mavx10maxvl256-inval.s    | 20 ++++++
 gas/testsuite/gas/i386/x86-64.exp             |  3 +
 10 files changed, 244 insertions(+), 6 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.s

diff --git a/gas/NEWS b/gas/NEWS
index 07ba7566105..7ef7d0f9814 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel AVX10.1 instructions.
+
 * Add support for Intel PBNDKB instructions.
 
 * Add support for Intel SM4 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 3b00a1bc612..608a800bb60 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -873,6 +873,13 @@ static enum
 /* Value to encode in EVEX RC bits, for SAE-only instructions.  */
 static enum rc_type evexrcig = rne;
 
+/* Max vector length for AVX10 instructions.  */
+static enum
+  {
+    vl512 = 0,
+    vl256
+  } avx10maxvl;
+
 /* Pre-defined "_GLOBAL_OFFSET_TABLE_".  */
 static symbolS *GOT_symbol;
 
@@ -1156,6 +1163,19 @@ static const arch_entry cpu_arch[] =
   SUBARCH (sm3, SM3, ANY_SM3, false),
   SUBARCH (sm4, SM4, ANY_SM4, false),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
+  SUBARCH (avx10.1, AVX512F, ANY_AVX512F, false),
+  SUBARCH (avx10.1, AVX512CD, ANY_AVX512CD, false),
+  SUBARCH (avx10.1, AVX512DQ, ANY_AVX512DQ, false),
+  SUBARCH (avx10.1, AVX512BW, ANY_AVX512BW, false),
+  SUBARCH (avx10.1, AVX512VL, ANY_AVX512VL, false),
+  SUBARCH (avx10.1, AVX512IFMA, ANY_AVX512IFMA, false),
+  SUBARCH (avx10.1, AVX512VBMI, ANY_AVX512VBMI, false),
+  SUBARCH (avx10.1, AVX512_VPOPCNTDQ, ANY_AVX512_VPOPCNTDQ, false),
+  SUBARCH (avx10.1, AVX512_VBMI2, ANY_AVX512_VBMI2, false),
+  SUBARCH (avx10.1, AVX512_VNNI, ANY_AVX512_VNNI, false),
+  SUBARCH (avx10.1, AVX512_BITALG, ANY_AVX512_BITALG, false),
+  SUBARCH (avx10.1, AVX512_BF16, ANY_AVX512_BF16, false),
+  SUBARCH (avx10.1, AVX512_FP16, ANY_AVX512_FP16, false),
 };
 
 #undef SUBARCH
@@ -1923,6 +1943,16 @@ cpu_flags_match (const insn_template *t)
 		  && (!x.bitfield.cpuvpclmulqdq || cpu.bitfield.cpuvpclmulqdq))
 		match |= CPU_FLAGS_ARCH_MATCH;
 	    }
+	  else if (x.bitfield.cpuavx512bw)
+	    {
+	      /* We need to eliminate 64 bit mask instructions when AVX10 max
+		 vector length is not 512.  */
+	      if (avx10maxvl == vl512 || t->opcode_modifier.evex
+		  || t->opcode_modifier.vexw != 2
+		  || (t->opcode_modifier.opcodeprefix == 1
+		      && t->opcode_space != 3))
+		match |= CPU_FLAGS_ARCH_MATCH;
+	    }
 	  else
 	    match |= CPU_FLAGS_ARCH_MATCH;
 	}
@@ -2801,7 +2831,7 @@ set_cpu_arch (int dummy ATTRIBUTE_UNUSED)
       char *s;
       int e = get_symbol_name (&s);
       const char *string = s;
-      unsigned int j = 0;
+      unsigned int j = 0, avx10_used = 0;
       i386_cpu_flags flags;
 
       if (strcmp (string, "default") == 0)
@@ -2941,7 +2971,12 @@ set_cpu_arch (int dummy ATTRIBUTE_UNUSED)
 
 	      if (!cpu_flags_equal (&flags, &cpu_arch_flags))
 		{
-		  extend_cpu_sub_arch_name (string + 1);
+		  if (!avx10_used)
+		    {
+		      extend_cpu_sub_arch_name (string + 1);
+		      if (strcmp (cpu_arch[j].name, "avx10.1") == 0)
+			avx10_used = 1;
+		    }
 		  cpu_arch_flags = flags;
 		  cpu_arch_isa_flags = flags;
 		}
@@ -2949,12 +2984,22 @@ set_cpu_arch (int dummy ATTRIBUTE_UNUSED)
 		cpu_arch_isa_flags
 		  = cpu_flags_or (cpu_arch_isa_flags,
 				  cpu_arch[j].enable);
-	      (void) restore_line_pointer (e);
-	      demand_empty_rest_of_line ();
-	      return;
+	      if (!avx10_used)
+		{
+		  (void) restore_line_pointer (e);
+		  demand_empty_rest_of_line ();
+		  return;
+		}
 	    }
 	}
 
+      if (avx10_used)
+	{
+	  (void) restore_line_pointer (e);
+	  demand_empty_rest_of_line ();
+	  return;
+	}
+
       if (startswith (string, ".no") && j >= ARRAY_SIZE (cpu_arch))
 	{
 	  /* Disable an ISA extension.  */
@@ -13837,6 +13882,9 @@ static bool check_register (const reg_entry *r)
   if (r->reg_type.bitfield.class == RegMMX && !cpu_arch_flags.bitfield.cpummx)
     return false;
 
+  if (avx10maxvl == vl256 && r->reg_type.bitfield.zmmword)
+    return false;
+
   if (!cpu_arch_flags.bitfield.cpuavx512f)
     {
       if (r->reg_type.bitfield.zmmword
@@ -14159,6 +14207,7 @@ const char *md_shortopts = "qnO::";
 #define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
 #define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
 #define OPTION_MUSE_UNALIGNED_VECTOR_MOVE (OPTION_MD_BASE + 34)
+#define OPTION_MAVX10MAXVL (OPTION_MD_BASE + 35)
 
 struct option md_longopts[] =
 {
@@ -14195,6 +14244,7 @@ struct option md_longopts[] =
   {"mfence-as-lock-add", required_argument, NULL, OPTION_MFENCE_AS_LOCK_ADD},
   {"mrelax-relocations", required_argument, NULL, OPTION_MRELAX_RELOCATIONS},
   {"mevexrcig", required_argument, NULL, OPTION_MEVEXRCIG},
+  {"mavx10maxvl", required_argument, NULL, OPTION_MAVX10MAXVL},
   {"malign-branch-boundary", required_argument, NULL, OPTION_MALIGN_BRANCH_BOUNDARY},
   {"malign-branch-prefix-size", required_argument, NULL, OPTION_MALIGN_BRANCH_PREFIX_SIZE},
   {"malign-branch", required_argument, NULL, OPTION_MALIGN_BRANCH},
@@ -14552,6 +14602,15 @@ md_parse_option (int c, const char *arg)
 	as_fatal (_("invalid -mevexrcig= option: `%s'"), arg);
       break;
 
+    case OPTION_MAVX10MAXVL:
+      if (strcmp (arg, "256") == 0)
+	avx10maxvl = vl256;
+      else  if (strcmp (arg, "512") == 0)
+	avx10maxvl = vl512;
+      else
+	as_fatal (_("invalid -mevexlig= option: `%s'"), arg);
+      break;
+
     case OPTION_MEVEXWIG:
       if (strcmp (arg, "0") == 0)
 	evexwig = evexw0;
@@ -14940,6 +14999,9 @@ md_show_usage (FILE *stream)
                           encode EVEX instructions with specific EVEX.RC value\n\
                            for SAE-only ignored instructions\n"));
   fprintf (stream, _("\
+  -mavx10maxvl=[256|512] (default: 512)\n\
+                          max vector length AVX10 instructions can use\n"));
+  fprintf (stream, _("\
   -mmnemonic=[att|intel] "));
   if (SYSV386_COMPAT)
     fprintf (stream, _("(default: att)\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index dd06282a5a3..ce8eea1b148 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -212,6 +212,7 @@ accept various extension mnemonics.  For example,
 @code{sm3},
 @code{sm4},
 @code{pbndkb},
+@code{avx10.1},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -354,6 +355,16 @@ EVEX instructions with evex.w = 0, which is the default.
 @option{-mevexwig=@var{1}} will encode WIG EVEX instructions with
 evex.w = 1.
 
+@cindex @samp{-mavx10maxvl=} option, i386
+@cindex @samp{-mavx10maxvl=} option, x86-64
+@item -mevexlig=@var{512}
+@itemx -mevexlig=@var{256}
+These options control the max vector length the assembler should enable
+for AVX10 vector ISA set. @option{-mavx10maxvl=@var{512}} will enable up
+to 512 bit vector register and 64 bit mask register, which is the default.
+@option{-mavx10maxvl=@var{256}} and @option{-mevexlig=@var{512}} will
+enable up to 256 bit vector register and 32 bit mask register.
+
 @cindex @samp{-mmnemonic=} option, i386
 @cindex @samp{-mmnemonic=} option, x86-64
 @item -mmnemonic=@var{att}
@@ -1642,7 +1653,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
 @item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
-@item @samp{.pbndkb}
+@item @samp{.pbndkb} @tab @samp{.avx10.1}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
new file mode 100644
index 00000000000..5e2cb40a2d7
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
@@ -0,0 +1,6 @@
+.* Assembler messages:
+.*:7: Error: `vp2intersectq' is not supported on `x86_64.noavx512f.avx10.1'
+.*:8: Error: `vgatherpf0dpd' is not supported on `x86_64.noavx512f.avx10.1'
+.*:9: Error: `vrcp28ss' is not supported on `x86_64.noavx512f.avx10.1'
+.*:10: Error: `vp4dpwssd' is not supported on `x86_64.noavx512f.avx10.1'
+.*:11: Error: `v4fnmaddss' is not supported on `x86_64.noavx512f.avx10.1'
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
new file mode 100644
index 00000000000..86b8ac4a6e7
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
@@ -0,0 +1,11 @@
+# Check invalid AVX10.1 instructions
+
+	.text
+__start:
+	.arch .noavx512f
+	.arch .avx10.1
+	vp2intersectq	%xmm1, %xmm2, %k3
+	vgatherpf0dpd	123(%ebp,%ymm7,8){%k1}
+	vrcp28ss	%xmm4, %xmm5, %xmm6{%k7}
+	vp4dpwssd	(%ecx), %zmm4, %zmm1
+	v4fnmaddss	(%ecx), %xmm4, %xmm1
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.d b/gas/testsuite/gas/i386/x86-64-avx10_1.d
new file mode 100644
index 00000000000..4225c2e2c58
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.d
@@ -0,0 +1,54 @@
+#objdump: -dw
+#name: x86_64 AVX10.1 instructions
+#source: x86-64-avx10_1.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e1 ed 4a d9\s+kaddd  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ed 4a d9\s+kaddb  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ec 4a d9\s+kaddw  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c4 e1 ec 4a d9\s+kaddq  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*67 c5 f9 90 29\s+kmovb  \(%ecx\),%k5
+\s*[a-f0-9]+:\s*67 c5 f9 91 ac f4 c0 1d fe ff\s+kmovb  %k5,-0x1e240\(%esp,%esi,8\)
+\s*[a-f0-9]+:\s*67 c4 e1 f9 90 ac f4 c0 1d fe ff\s+kmovd  -0x1e240\(%esp,%esi,8\),%k5
+\s*[a-f0-9]+:\s*c5 fb 92 ed\s+kmovd  %ebp,%k5
+\s*[a-f0-9]+:\s*67 c5 f8 91 29\s+kmovw  %k5,\(%ecx\)
+\s*[a-f0-9]+:\s*c5 f8 93 ed\s+kmovw  %k5,%ebp
+\s*[a-f0-9]+:\s*62 f1 d5 0f 58 f4\s+vaddpd %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 31\s+vaddpd \(%ecx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 30\s+vaddpd \(%eax\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 00 08 00 00\s+vaddpd 0x800\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 f0 f7 ff ff\s+vaddpd -0x810\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 00 04 00 00\s+vaddpd 0x400\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 f8 fb ff ff\s+vaddpd -0x408\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f1 d5 cf 58 f4\s+vaddpd %zmm4,%zmm5,%zmm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 b4 f4 c0 1d fe ff\s+vaddpd -0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 4f 58 b2 00 20 00 00\s+vaddpd 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 80\s+vaddpd -0x1000\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 7f\s+vaddpd 0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 5f 58 b2 00 f8 ff ff\s+vaddpd -0x800\(%edx\)\{1to8\},%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 0f ce f4 ab\s+vgf2p8affineqb \$0xab,%xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 2f ce b4 f4 c0 1d fe ff 7b\s+vgf2p8affineqb \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 3f ce 72 7f 7b\s+vgf2p8affineqb \$0x7b,0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 0f cf 72 7f 7b\s+vgf2p8affineinvqb \$0x7b,0x7f0\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 af cf f4 ab\s+vgf2p8affineinvqb \$0xab,%ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*62 f2 55 4f cf f4\s+vgf2p8mulb %zmm4,%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 0f cf b4 f4 c0 1d fe ff\s+vgf2p8mulb -0x1e240\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 4f cf b2 00 20 00 00\s+vgf2p8mulb 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 82 2d 20 dc f0\s+vaesenc %ymm24,%ymm26,%ymm22
+\s*[a-f0-9]+:\s*67 62 e2 05 08 de 84 f4 c0 1d fe ff\s+vaesdec -0x1e240\(%esp,%esi,8\),%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 02 2d 00 dd d8\s+vaesenclast %xmm24,%xmm26,%xmm27
+\s*[a-f0-9]+:\s*67 62 62 35 20 df 52 7f\s+vaesdeclast 0xfe0\(%edx\),%ymm25,%ymm26
+\s*[a-f0-9]+:\s*62 82 2d 40 de f0\s+vaesdec %zmm24,%zmm26,%zmm22
+\s*[a-f0-9]+:\s*67 62 62 2d 40 df 19\s+vaesdeclast \(%ecx\),%zmm26,%zmm27
+\s*[a-f0-9]+:\s*62 a3 4d 00 44 fe ab\s+vpclmulqdq \$0xab,%xmm22,%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 e3 4d 00 44 7a 7f 7b\s+vpclmulqdq \$0x7b,0x7f0\(%edx\),%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 73 7d 20 44 b4 f4 c0 1d fe ff 7b\s+vpclmulqdq \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm16,%ymm14
+\s*[a-f0-9]+:\s*62 23 45 00 44 c6 11\s+vpclmulhqhqdq %xmm22,%xmm23,%xmm24
+\s*[a-f0-9]+:\s*62 c3 05 08 44 c6 10\s+vpclmullqhqdq %xmm14,%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 23 45 20 44 c6 01\s+vpclmulhqlqdq %ymm22,%ymm23,%ymm24
+\s*[a-f0-9]+:\s*62 c3 05 48 44 c6 00\s+vpclmullqlqdq %zmm14,%zmm15,%zmm16
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.s b/gas/testsuite/gas/i386/x86-64-avx10_1.s
new file mode 100644
index 00000000000..926ea832cc7
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.s
@@ -0,0 +1,51 @@
+# Check AVX10.1 instructions
+
+	.text
+_start:
+	.arch .noavx512f
+	.arch .avx10.1
+
+	kaddd	%k1, %k2, %k3
+	kaddb	%k1, %k2, %k3
+	kaddw	%k1, %k2, %k3
+	kaddq	%k1, %k2, %k3
+	kmovb   (%ecx), %k5
+	kmovb   %k5, -123456(%esp,%esi,8)
+	kmovd   -123456(%esp,%esi,8), %k5
+	kmovd   %ebp, %k5
+	kmovw   %k5, (%ecx)
+	kmovw   %k5, %ebp
+	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
+	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
+	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  %zmm4, %zmm5, %zmm6{%k7}{z}
+	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vaddpd  8192(%edx), %zmm5, %zmm6{%k7}
+	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vaddpd  -2048(%edx){1to8}, %zmm5, %zmm6{%k7}
+	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
+	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
+	vgf2p8mulb	%zmm4, %zmm5, %zmm6{%k7}
+	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
+	vgf2p8mulb	8192(%edx), %zmm5, %zmm6{%k7}
+	vaesenc	%ymm24, %ymm26, %ymm22
+	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
+	vaesenclast	%xmm24, %xmm26, %xmm27
+	vaesdeclast     4064(%edx), %ymm25, %ymm26
+	vaesdec		%zmm24, %zmm26, %zmm22
+	vaesdeclast	(%ecx), %zmm26, %zmm27
+	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
+	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
+	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
+	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
+	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
+	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
+	vpclmullqlqdq	%zmm14, %zmm15, %zmm16
diff --git a/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.l b/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.l
new file mode 100644
index 00000000000..db8a301d7c3
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.l
@@ -0,0 +1,18 @@
+.* Assembler messages:
+.*:4: Error: `kaddq' is not supported on `x86_64'
+.*:5: Error: `kandq' is not supported on `x86_64'
+.*:6: Error: `kandnq' is not supported on `x86_64'
+.*:7: Error: `kmovq' is not supported on `x86_64'
+.*:8: Error: `kmovq' is not supported on `x86_64'
+.*:9: Error: `kmovq' is not supported on `x86_64'
+.*:10: Error: `kmovq' is not supported on `x86_64'
+.*:11: Error: `knotq' is not supported on `x86_64'
+.*:12: Error: `korq' is not supported on `x86_64'
+.*:13: Error: `kortestq' is not supported on `x86_64'
+.*:14: Error: `kshiftlq' is not supported on `x86_64'
+.*:15: Error: `kshiftrq' is not supported on `x86_64'
+.*:16: Error: `ktestq' is not supported on `x86_64'
+.*:17: Error: `kunpckdq' is not supported on `x86_64'
+.*:18: Error: `kxnorq' is not supported on `x86_64'
+.*:19: Error: `kxorq' is not supported on `x86_64'
+.*:20: Error: bad register name `%zmm4'
diff --git a/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.s b/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.s
new file mode 100644
index 00000000000..7ed387115cb
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-mavx10maxvl256-inval.s
@@ -0,0 +1,20 @@
+# Check invalid AVX10.1 instructions
+	.text
+__start:
+	kaddq	%k1, %k2, %k3
+	kandq	%k1, %k2, %k3
+	kandnq  %k1, %k2, %k3
+	kmovq	(%rcx), %k1
+	kmovq	%k1, (%rcx)
+	kmovq	%rcx, %k1
+	kmovq	%k1, %rcx
+	knotq	%k1, %k2
+	korq	%k1, %k2, %k3
+	kortestq	%k1, %k2
+	kshiftlq	$1, %k1, %k2
+	kshiftrq	$1, %k1, %k2
+	ktestq	%k1, %k2
+	kunpckdq	%k1, %k2, %k3
+	kxnorq	%k1, %k2, %k3
+	kxorq	%k1, %k2, %k3
+	vaddpd	%zmm4, %zmm5, %zmm6
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 52711cdcf6f..b3155275759 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -450,6 +450,9 @@ run_dump_test "x86-64-sm4"
 run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-pbndkb"
 run_dump_test "x86-64-pbndkb-intel"
+run_dump_test "x86-64-avx10_1"
+run_list_test "x86-64-avx10_1-inval"
+run_list_test "x86-64-mavx10maxvl256-inval" "-mavx10maxvl=256"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-23  2:20               ` Jiang, Haochen
  2023-08-23  3:34                 ` [RFC][PATCH v3] " Haochen Jiang
@ 2023-08-23  5:54                 ` Jan Beulich
  2023-08-23  6:21                   ` Jiang, Haochen
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-23  5:54 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 23.08.2023 04:20, Jiang, Haochen wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, August 18, 2023 9:03 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
>> *name, char *str,
>>>    ident = mkident (name);
>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>  	   ident, 2 * (int)length, opcode, end, i);
>>> +
>>> +  j = strlen(ident);
>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>> + (cpu_flags, "AVX512")
>>> +      && !strstr (cpu_flags, "AVX512PF")
>>> +      && !strstr (cpu_flags, "AVX512ER")
>>> +      && !strstr (cpu_flags, "4FMAPS")
>>> +      && !strstr (cpu_flags, "4VNNIW")
>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>> +    {
>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>> +      k = 1;
>>> +    }
>>>    free (ident);
>>
>> While making a patch myself along the lines of what I had outlined, I came to
>> realize that the above isn't enough. (I'm pretty sure I wouldn't have spotted
>> this by merely reviewing your patch.) This may be a result of the spec being
>> somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
>> There's a note there saying something about the respective EVEX encodings.
>> But that still requires the VEX encodings connected to these three features to
>> also become suitably available. While this works fine for GFNI, it doesn't for
>> the other two: The 128-bit VEX encodings, which surely are available when the
>> 256-bit ones are, would become impossible to use. The assembler would pick
>> the (larger) EVEX forms instead. There are two ways to solve this that I can see
>> right away:
>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>> VPCLMULQDQ)
>> 2) We put in place extra templates.
>> I'm wary of the first option as long as not at least informally supported by you
>> (Intel). Hence I went with option 2 for now.
>>
>> I'm only done with the /512 patch, so I won't post right away. I'm still
>> debating with myself whether to control maximum vector length via a new
>> directive, or via a special form of .arch.
> 
> Do you think a command line option like -mavx10maxvl=256/512 with default 512
> is ok for this scenario? I am working to revise the AVX10.1 patch like that.

That's certainly an option, but right now I have different plans.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-23  5:54                 ` [PATCH v2] " Jan Beulich
@ 2023-08-23  6:21                   ` Jiang, Haochen
  2023-08-23  6:24                     ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-23  6:21 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, August 23, 2023 1:54 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 23.08.2023 04:20, Jiang, Haochen wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, August 18, 2023 9:03 PM
> >> To: Jiang, Haochen <haochen.jiang@intel.com>
> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>
> >> On 14.08.2023 08:45, Haochen Jiang wrote:
> >>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
> >> *name, char *str,
> >>>    ident = mkident (name);
> >>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >>>  	   ident, 2 * (int)length, opcode, end, i);
> >>> +
> >>> +  j = strlen(ident);
> >>> +  /* All AVX512F based instructions are usable for AVX10.1 except
> >>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> >>> + (cpu_flags, "AVX512")
> >>> +      && !strstr (cpu_flags, "AVX512PF")
> >>> +      && !strstr (cpu_flags, "AVX512ER")
> >>> +      && !strstr (cpu_flags, "4FMAPS")
> >>> +      && !strstr (cpu_flags, "4VNNIW")
> >>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> >>> +    {
> >>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> >>> +      k = 1;
> >>> +    }
> >>>    free (ident);
> >>
> >> While making a patch myself along the lines of what I had outlined, I
> >> came to realize that the above isn't enough. (I'm pretty sure I
> >> wouldn't have spotted this by merely reviewing your patch.) This may
> >> be a result of the spec being somewhat ambiguous when it comes to GFNI,
> VAES, and VPCLMULQDQ.
> >> There's a note there saying something about the respective EVEX
> encodings.
> >> But that still requires the VEX encodings connected to these three
> >> features to also become suitably available. While this works fine for
> >> GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
> >> surely are available when the 256-bit ones are, would become
> >> impossible to use. The assembler would pick the (larger) EVEX forms
> >> instead. There are two ways to solve this that I can see right away:
> >> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> >> VPCLMULQDQ)
> >> 2) We put in place extra templates.
> >> I'm wary of the first option as long as not at least informally
> >> supported by you (Intel). Hence I went with option 2 for now.
> >>
> >> I'm only done with the /512 patch, so I won't post right away. I'm
> >> still debating with myself whether to control maximum vector length
> >> via a new directive, or via a special form of .arch.
> >
> > Do you think a command line option like -mavx10maxvl=256/512 with
> > default 512 is ok for this scenario? I am working to revise the AVX10.1 patch
> like that.
> 
> That's certainly an option, but right now I have different plans.

Actually all the three options are ok for me, they should not be that complex
based on the current part of v3 patch setting/clearing AVX512 bit for AVX10.1.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-23  6:21                   ` Jiang, Haochen
@ 2023-08-23  6:24                     ` Jan Beulich
  2023-08-23  6:25                       ` Jiang, Haochen
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2023-08-23  6:24 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 23.08.2023 08:21, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, August 23, 2023 1:54 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 23.08.2023 04:20, Jiang, Haochen wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Friday, August 18, 2023 9:03 PM
>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>
>>>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
>>>> *name, char *str,
>>>>>    ident = mkident (name);
>>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>>>  	   ident, 2 * (int)length, opcode, end, i);
>>>>> +
>>>>> +  j = strlen(ident);
>>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>>>> + (cpu_flags, "AVX512")
>>>>> +      && !strstr (cpu_flags, "AVX512PF")
>>>>> +      && !strstr (cpu_flags, "AVX512ER")
>>>>> +      && !strstr (cpu_flags, "4FMAPS")
>>>>> +      && !strstr (cpu_flags, "4VNNIW")
>>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>>>> +    {
>>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>>>> +      k = 1;
>>>>> +    }
>>>>>    free (ident);
>>>>
>>>> While making a patch myself along the lines of what I had outlined, I
>>>> came to realize that the above isn't enough. (I'm pretty sure I
>>>> wouldn't have spotted this by merely reviewing your patch.) This may
>>>> be a result of the spec being somewhat ambiguous when it comes to GFNI,
>> VAES, and VPCLMULQDQ.
>>>> There's a note there saying something about the respective EVEX
>> encodings.
>>>> But that still requires the VEX encodings connected to these three
>>>> features to also become suitably available. While this works fine for
>>>> GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
>>>> surely are available when the 256-bit ones are, would become
>>>> impossible to use. The assembler would pick the (larger) EVEX forms
>>>> instead. There are two ways to solve this that I can see right away:
>>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>>>> VPCLMULQDQ)
>>>> 2) We put in place extra templates.
>>>> I'm wary of the first option as long as not at least informally
>>>> supported by you (Intel). Hence I went with option 2 for now.
>>>>
>>>> I'm only done with the /512 patch, so I won't post right away. I'm
>>>> still debating with myself whether to control maximum vector length
>>>> via a new directive, or via a special form of .arch.
>>>
>>> Do you think a command line option like -mavx10maxvl=256/512 with
>>> default 512 is ok for this scenario? I am working to revise the AVX10.1 patch
>> like that.
>>
>> That's certainly an option, but right now I have different plans.
> 
> Actually all the three options are ok for me, they should not be that complex
> based on the current part of v3 patch setting/clearing AVX512 bit for AVX10.1.

Mind me asking what "all the three options" you're referring to here?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2] Support Intel AVX10.1
  2023-08-23  6:24                     ` Jan Beulich
@ 2023-08-23  6:25                       ` Jiang, Haochen
  2023-08-23  6:39                         ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Haochen @ 2023-08-23  6:25 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, August 23, 2023 2:24 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 23.08.2023 08:21, Jiang, Haochen wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, August 23, 2023 1:54 PM
> >> To: Jiang, Haochen <haochen.jiang@intel.com>
> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>
> >> On 23.08.2023 04:20, Jiang, Haochen wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: Friday, August 18, 2023 9:03 PM
> >>>> To: Jiang, Haochen <haochen.jiang@intel.com>
> >>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>>>
> >>>> On 14.08.2023 08:45, Haochen Jiang wrote:
> >>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const
> char
> >>>> *name, char *str,
> >>>>>    ident = mkident (name);
> >>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >>>>>  	   ident, 2 * (int)length, opcode, end, i);
> >>>>> +
> >>>>> +  j = strlen(ident);
> >>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
> >>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> >>>>> + (cpu_flags, "AVX512")
> >>>>> +      && !strstr (cpu_flags, "AVX512PF")
> >>>>> +      && !strstr (cpu_flags, "AVX512ER")
> >>>>> +      && !strstr (cpu_flags, "4FMAPS")
> >>>>> +      && !strstr (cpu_flags, "4VNNIW")
> >>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> >>>>> +    {
> >>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> >>>>> +      k = 1;
> >>>>> +    }
> >>>>>    free (ident);
> >>>>
> >>>> While making a patch myself along the lines of what I had outlined,
> >>>> I came to realize that the above isn't enough. (I'm pretty sure I
> >>>> wouldn't have spotted this by merely reviewing your patch.) This
> >>>> may be a result of the spec being somewhat ambiguous when it comes
> >>>> to GFNI,
> >> VAES, and VPCLMULQDQ.
> >>>> There's a note there saying something about the respective EVEX
> >> encodings.
> >>>> But that still requires the VEX encodings connected to these three
> >>>> features to also become suitably available. While this works fine
> >>>> for GFNI, it doesn't for the other two: The 128-bit VEX encodings,
> >>>> which surely are available when the 256-bit ones are, would become
> >>>> impossible to use. The assembler would pick the (larger) EVEX forms
> >>>> instead. There are two ways to solve this that I can see right away:
> >>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> >>>> VPCLMULQDQ)
> >>>> 2) We put in place extra templates.
> >>>> I'm wary of the first option as long as not at least informally
> >>>> supported by you (Intel). Hence I went with option 2 for now.
> >>>>
> >>>> I'm only done with the /512 patch, so I won't post right away. I'm
> >>>> still debating with myself whether to control maximum vector length
> >>>> via a new directive, or via a special form of .arch.
> >>>
> >>> Do you think a command line option like -mavx10maxvl=256/512 with
> >>> default 512 is ok for this scenario? I am working to revise the
> >>> AVX10.1 patch
> >> like that.
> >>
> >> That's certainly an option, but right now I have different plans.
> >
> > Actually all the three options are ok for me, they should not be that
> > complex based on the current part of v3 patch setting/clearing AVX512 bit
> for AVX10.1.
> 
> Mind me asking what "all the three options" you're referring to here?

A new directive, a special form of .arch or the command line option.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Support Intel AVX10.1
  2023-08-23  6:25                       ` Jiang, Haochen
@ 2023-08-23  6:39                         ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2023-08-23  6:39 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 23.08.2023 08:25, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, August 23, 2023 2:24 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 23.08.2023 08:21, Jiang, Haochen wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, August 23, 2023 1:54 PM
>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>
>>>> On 23.08.2023 04:20, Jiang, Haochen wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: Friday, August 18, 2023 9:03 PM
>>>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>>>
>>>>>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>>>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const
>> char
>>>>>> *name, char *str,
>>>>>>>    ident = mkident (name);
>>>>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>>>>>  	   ident, 2 * (int)length, opcode, end, i);
>>>>>>> +
>>>>>>> +  j = strlen(ident);
>>>>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>>>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>>>>>> + (cpu_flags, "AVX512")
>>>>>>> +      && !strstr (cpu_flags, "AVX512PF")
>>>>>>> +      && !strstr (cpu_flags, "AVX512ER")
>>>>>>> +      && !strstr (cpu_flags, "4FMAPS")
>>>>>>> +      && !strstr (cpu_flags, "4VNNIW")
>>>>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>>>>>> +    {
>>>>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>>>>>> +      k = 1;
>>>>>>> +    }
>>>>>>>    free (ident);
>>>>>>
>>>>>> While making a patch myself along the lines of what I had outlined,
>>>>>> I came to realize that the above isn't enough. (I'm pretty sure I
>>>>>> wouldn't have spotted this by merely reviewing your patch.) This
>>>>>> may be a result of the spec being somewhat ambiguous when it comes
>>>>>> to GFNI,
>>>> VAES, and VPCLMULQDQ.
>>>>>> There's a note there saying something about the respective EVEX
>>>> encodings.
>>>>>> But that still requires the VEX encodings connected to these three
>>>>>> features to also become suitably available. While this works fine
>>>>>> for GFNI, it doesn't for the other two: The 128-bit VEX encodings,
>>>>>> which surely are available when the 256-bit ones are, would become
>>>>>> impossible to use. The assembler would pick the (larger) EVEX forms
>>>>>> instead. There are two ways to solve this that I can see right away:
>>>>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>>>>>> VPCLMULQDQ)
>>>>>> 2) We put in place extra templates.
>>>>>> I'm wary of the first option as long as not at least informally
>>>>>> supported by you (Intel). Hence I went with option 2 for now.
>>>>>>
>>>>>> I'm only done with the /512 patch, so I won't post right away. I'm
>>>>>> still debating with myself whether to control maximum vector length
>>>>>> via a new directive, or via a special form of .arch.
>>>>>
>>>>> Do you think a command line option like -mavx10maxvl=256/512 with
>>>>> default 512 is ok for this scenario? I am working to revise the
>>>>> AVX10.1 patch
>>>> like that.
>>>>
>>>> That's certainly an option, but right now I have different plans.
>>>
>>> Actually all the three options are ok for me, they should not be that
>>> complex based on the current part of v3 patch setting/clearing AVX512 bit
>> for AVX10.1.
>>
>> Mind me asking what "all the three options" you're referring to here?
> 
> A new directive, a special form of .arch or the command line option.

Oh, I see. Whatever we choose, it'll need to come in both command line
and directive form, I think. And then both want to be sufficiently
similar. As mentioned, I have a firm plan now, but of course I need to
see whether it ends up looking sensibly once actually carried out.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v3] Support Intel AVX10.1
  2023-08-23  3:34                 ` [RFC][PATCH v3] " Haochen Jiang
@ 2023-08-23  7:17                   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2023-08-23  7:17 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 23.08.2023 05:34, Haochen Jiang wrote:
> In this version of patch, I tried with a command option named as
> -mavx10maxvl= to limit the maximum vector size. If we are trying this
> method, the option name could be changed.
> 
> For the kmovq promoted to EVEX128 in APX, I checked the encoding and we
> could just change the condition in cpu_flag_match from
> t->opcode_modifier.evex to
> t->opcode_modifier.evex != 0 && t->opcode_modifier.evex != 2 to make
> everything work.

I'm afraid this (including the special casing you do there right now) is
prone to cause problems later on. I also think testsuite coverage needs
to be broader. Please allow me to finish my alternative attempt.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-08-23  7:17 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27  7:15 [PATCH] Support Intel AVX10.1 Haochen Jiang
2023-07-27 11:23 ` Jan Beulich
2023-07-28  2:50   ` Jiang, Haochen
2023-07-28  6:53 ` Jan Beulich
2023-08-01  2:18   ` Jiang, Haochen
2023-08-01  6:49     ` Jan Beulich
2023-08-04  7:45       ` Jiang, Haochen
2023-08-04  7:57         ` Jan Beulich
2023-08-14  6:45           ` [PATCH v2] " Haochen Jiang
2023-08-14  8:19             ` Jan Beulich
2023-08-14  8:46               ` Jiang, Haochen
2023-08-14 10:33                 ` Jan Beulich
2023-08-14 10:35                   ` Jan Beulich
2023-08-15  8:32                   ` Jiang, Haochen
2023-08-15 14:10                     ` Jan Beulich
2023-08-16  8:21                       ` Jiang, Haochen
2023-08-16  8:59                         ` Jan Beulich
2023-08-17  9:08                           ` Jan Beulich
2023-08-18  6:53                             ` Jan Beulich
2023-08-18 13:03             ` Jan Beulich
2023-08-23  2:20               ` Jiang, Haochen
2023-08-23  3:34                 ` [RFC][PATCH v3] " Haochen Jiang
2023-08-23  7:17                   ` Jan Beulich
2023-08-23  5:54                 ` [PATCH v2] " Jan Beulich
2023-08-23  6:21                   ` Jiang, Haochen
2023-08-23  6:24                     ` Jan Beulich
2023-08-23  6:25                       ` Jiang, Haochen
2023-08-23  6:39                         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).