public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs
@ 2023-07-13  6:32 Haochen Jiang
  2023-07-13  6:32 ` [PATCH 1/5] Support Intel AVX-VNNI-INT16 Haochen Jiang
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:32 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra

Hi all,

These five patches aimed to add Intel Arrow Lake/Lunar Lake
instructions, including AVX-VNNI-INT16, SHA512, SM3, SM4 and
PBNDKB.

The information is based on newly released
Intel Architecture Instruction Set Extensions and Future Features.

The document comes following:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Regtested on x86_64-pc-linux-gnu. In our previous new ISA patches,
it might fail on x86_64-w64-mingw32 but I have no such machines
to test with. Alan, could you have a look at the testcases see if
there is any potential problems?


BRs,
Haochen



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/5] Support Intel AVX-VNNI-INT16
  2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
@ 2023-07-13  6:32 ` Haochen Jiang
  2023-07-13  9:29   ` Jan Beulich
  2023-07-13  6:33 ` [PATCH 2/5] Support Intel SHA512 Haochen Jiang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:32 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra, konglin1

From: konglin1 <lingling.kong@intel.com>

gas/ChangeLog:

	* NEWS: Support Intel AVX-VNNI-INT16.
	* config/tc-i386.c: Add avx_vnni_int16.
	* doc/c-i386.texi: Document avx_vnni_int16.
	* testsuite/gas/i386/i386.exp: Run AVX VNNI INT16 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/avx-vnni-int16-intel.d: New test.
	* testsuite/gas/i386/avx-vnni-int16.d: New test.
	* testsuite/gas/i386/avx-vnni-int16.s: New test.
	* testsuite/gas/i386/x86-64-avx-vnni-int16-intel.d: New test.
	* testsuite/gas/i386/x86-64-avx-vnni-int16.d: New test.
	* testsuite/gas/i386/x86-64-avx-vnni-int16.s: New test.

opcodes/ChangeLog:

	* i386-dis.c (PREFIX_VEX_0F38D2): New.
	(PREFIX_VEX_0F38D3): Ditto.
	(VEX_W_0F38D2_P_0): Ditto.
	(VEX_W_0F38D2_P_1): Ditto.
	(VEX_W_0F38D2_P_2): Ditto.
	(VEX_W_0F38D3_P_0): Ditto.
	(VEX_W_0F38D3_P_1): Ditto.
	(VEX_W_0F38D3_P_2): Ditto.
	(prefix_table): Add PREFIX_VEX_0F38D2 and PREFIX_VEX_0F38D3.
	(vex_table): Add PREFIX_VEX_0F38D2 and PREFIX_VEX_0F38D3,
	delete VEX_W_0F38D2 and VEX_W_0F38D3.
	(vex_w_table): Add VEX_W_0F38D2_P_0, VEX_W_0F38D2_P_1, VEX_W_0F38D2_P_2,
	VEX_W_0F38D3_P_0, VEX_W_0F38D3_P_1, VEX_W_0F38D3_P_2.
	* i386-gen.c (isa_dependencies): Add AVX_VNNI_INT16.
	(cpu_flag): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h: (CpuAVX_VNNI_INT16): New.
	* i386-opc.tbl: Add Intel AVX_VNNI_INT16 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                      |     2 +
 gas/config/tc-i386.c                          |     1 +
 gas/doc/c-i386.texi                           |     5 +-
 gas/testsuite/gas/i386/avx-vnni-int16-intel.d |   130 +
 gas/testsuite/gas/i386/avx-vnni-int16.d       |   130 +
 gas/testsuite/gas/i386/avx-vnni-int16.s       |   127 +
 gas/testsuite/gas/i386/i386.exp               |     2 +
 .../gas/i386/x86-64-avx-vnni-int16-intel.d    |   130 +
 .../gas/i386/x86-64-avx-vnni-int16.d          |   130 +
 .../gas/i386/x86-64-avx-vnni-int16.s          |   127 +
 gas/testsuite/gas/i386/x86-64.exp             |     2 +
 opcodes/i386-dis.c                            |    32 +-
 opcodes/i386-gen.c                            |     3 +
 opcodes/i386-init.h                           |   822 +-
 opcodes/i386-mnem.h                           |  2678 ++--
 opcodes/i386-opc.h                            |     3 +
 opcodes/i386-opc.tbl                          |    11 +
 opcodes/i386-tbl.h                            | 11767 +++++++++++-----
 18 files changed, 10570 insertions(+), 5532 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/avx-vnni-int16-intel.d
 create mode 100644 gas/testsuite/gas/i386/avx-vnni-int16.d
 create mode 100644 gas/testsuite/gas/i386/avx-vnni-int16.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int16-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int16.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int16.s

diff --git a/gas/NEWS b/gas/NEWS
index 59bdd30aaaa..5e9ed5ab4bc 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel AVX-VNNI-INT16 instructions.
+
 Changes in 2.41:
 
 * Add support for Intel FRED instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index bc02f8e0abf..0d3d7560efe 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1151,6 +1151,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (rmpquery, RMPQUERY, ANY_RMPQUERY, false),
   SUBARCH (fred, FRED, ANY_FRED, false),
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
+  SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 49b6e3b1abb..40ba942d9cb 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -207,6 +207,7 @@ accept various extension mnemonics.  For example,
 @code{rao_int},
 @code{fred},
 @code{lkgs},
+@code{avx_vnni_int16},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1635,8 +1636,8 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.clwb} @tab @samp{.rdpid} @tab @samp{.ptwrite} @tab @samp{.ibt}
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
-@item @samp{.avx_ne_convert} @tab @samp{.rao_int}
-@item @samp{.fred} @tab @samp{.lkgs}
+@item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
+@item @samp{.avx_vnni_int16}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/avx-vnni-int16-intel.d b/gas/testsuite/gas/i386/avx-vnni-int16-intel.d
new file mode 100644
index 00000000000..649e89fd4a4
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx-vnni-int16-intel.d
@@ -0,0 +1,130 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 AVX-VNNI-INT16 insns (Intel disassembly)
+#source: avx-vnni-int16.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b4 f4 00 00 00 10\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 31\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b4 f4 00 00 00 10\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 31\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b4 f4 00 00 00 10\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 31\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b4 f4 00 00 00 10\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 31\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b4 f4 00 00 00 10\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 31\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b4 f4 00 00 00 10\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 31\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b4 f4 00 00 00 10\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 31\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b4 f4 00 00 00 10\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 31\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b4 f4 00 00 00 10\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 31\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b4 f4 00 00 00 10\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 31\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b4 f4 00 00 00 10\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 31\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b4 f4 00 00 00 10\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 31\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b4 f4 00 00 00 10\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 31\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b4 f4 00 00 00 10\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 31\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b4 f4 00 00 00 10\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 31\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b4 f4 00 00 00 10\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 31\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b4 f4 00 00 00 10\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 31\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b4 f4 00 00 00 10\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 31\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b4 f4 00 00 00 10\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 31\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b4 f4 00 00 00 10\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 31\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b4 f4 00 00 00 10\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 31\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b4 f4 00 00 00 10\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 31\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b4 f4 00 00 00 10\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 31\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b4 f4 00 00 00 10\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 31\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
diff --git a/gas/testsuite/gas/i386/avx-vnni-int16.d b/gas/testsuite/gas/i386/avx-vnni-int16.d
new file mode 100644
index 00000000000..01a7ad29e53
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx-vnni-int16.d
@@ -0,0 +1,130 @@
+#as:
+#objdump: -dw
+#name: i386 AVX-VNNI-INT16 insns
+#source: avx-vnni-int16.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b4 f4 00 00 00 10\s+vpdpwsud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 31\s+vpdpwsud \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b4 f4 00 00 00 10\s+vpdpwsud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 31\s+vpdpwsud \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b4 f4 00 00 00 10\s+vpdpwsuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 31\s+vpdpwsuds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b4 f4 00 00 00 10\s+vpdpwsuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 31\s+vpdpwsuds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b4 f4 00 00 00 10\s+vpdpwusd 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 31\s+vpdpwusd \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b4 f4 00 00 00 10\s+vpdpwusd 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 31\s+vpdpwusd \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b4 f4 00 00 00 10\s+vpdpwusds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 31\s+vpdpwusds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b4 f4 00 00 00 10\s+vpdpwusds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 31\s+vpdpwusds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b4 f4 00 00 00 10\s+vpdpwuud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 31\s+vpdpwuud \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b4 f4 00 00 00 10\s+vpdpwuud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 31\s+vpdpwuud \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b4 f4 00 00 00 10\s+vpdpwuuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 31\s+vpdpwuuds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b4 f4 00 00 00 10\s+vpdpwuuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 31\s+vpdpwuuds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b4 f4 00 00 00 10\s+vpdpwsud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 31\s+vpdpwsud \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b4 f4 00 00 00 10\s+vpdpwsud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 31\s+vpdpwsud \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b4 f4 00 00 00 10\s+vpdpwsuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 31\s+vpdpwsuds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b4 f4 00 00 00 10\s+vpdpwsuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 31\s+vpdpwsuds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b4 f4 00 00 00 10\s+vpdpwusd 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 31\s+vpdpwusd \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b4 f4 00 00 00 10\s+vpdpwusd 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 31\s+vpdpwusd \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b4 f4 00 00 00 10\s+vpdpwusds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 31\s+vpdpwusds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b4 f4 00 00 00 10\s+vpdpwusds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 31\s+vpdpwusds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b4 f4 00 00 00 10\s+vpdpwuud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 31\s+vpdpwuud \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b4 f4 00 00 00 10\s+vpdpwuud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 31\s+vpdpwuud \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b4 f4 00 00 00 10\s+vpdpwuuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 31\s+vpdpwuuds \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b4 f4 00 00 00 10\s+vpdpwuuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 31\s+vpdpwuuds \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds -0x800\(%edx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/avx-vnni-int16.s b/gas/testsuite/gas/i386/avx-vnni-int16.s
new file mode 100644
index 00000000000..4a04de3073f
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx-vnni-int16.s
@@ -0,0 +1,127 @@
+# Check 32bit AVX-VNNI-INT16 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vpdpwsud	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsud	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsud	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsud	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwsuds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsuds	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsuds	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsuds	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusd	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusd	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusd	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusd	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusds	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusds	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusds	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuud	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuud	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuud	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuud	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuuds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	0x10000000(%esp, %esi, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	4064(%ecx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuuds	-4096(%edx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuuds	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	2032(%ecx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuuds	-2048(%edx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vpdpwsud	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwsuds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusd	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuud	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuuds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [edx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [ecx]	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index d78f1937c84..b69c692cd16 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -496,6 +496,8 @@ if [gas_32_check] then {
     run_dump_test "raoint"
     run_dump_test "raoint-intel"
     run_list_test "amx-complex-inval"
+    run_dump_test "avx-vnni-int16"
+    run_dump_test "avx-vnni-int16-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int16-intel.d b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16-intel.d
new file mode 100644
index 00000000000..7b9616f7d7b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16-intel.d
@@ -0,0 +1,130 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 AVX-VNNI-INT16 insns (Intel disassembly)
+#source: x86-64-avx-vnni-int16.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 d2 b4 f5 00 00 00 10\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 d2 31\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 d2 b4 f5 00 00 00 10\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 d2 31\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 d3 b4 f5 00 00 00 10\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 d3 31\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 d3 b4 f5 00 00 00 10\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 d3 31\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 55 d2 b4 f5 00 00 00 10\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 55 d2 31\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 51 d2 b4 f5 00 00 00 10\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 d2 31\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 55 d3 b4 f5 00 00 00 10\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 55 d3 31\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 51 d3 b4 f5 00 00 00 10\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 d3 31\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 54 d2 b4 f5 00 00 00 10\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 54 d2 31\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 50 d2 b4 f5 00 00 00 10\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 d2 31\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 54 d3 b4 f5 00 00 00 10\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 54 d3 31\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 50 d3 b4 f5 00 00 00 10\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 d3 31\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 d2 b4 f5 00 00 00 10\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 d2 31\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 d2 b4 f5 00 00 00 10\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 d2 31\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 d3 b4 f5 00 00 00 10\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 d3 31\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 d3 b4 f5 00 00 00 10\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 d3 31\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 55 d2 b4 f5 00 00 00 10\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 55 d2 31\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 51 d2 b4 f5 00 00 00 10\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 d2 31\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 55 d3 b4 f5 00 00 00 10\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 55 d3 31\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 51 d3 b4 f5 00 00 00 10\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 d3 31\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 54 d2 b4 f5 00 00 00 10\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 54 d2 31\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 50 d2 b4 f5 00 00 00 10\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 d2 31\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 54 d3 b4 f5 00 00 00 10\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 54 d3 31\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 50 d3 b4 f5 00 00 00 10\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 d3 31\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.d b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.d
new file mode 100644
index 00000000000..8a3542f4b0e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.d
@@ -0,0 +1,130 @@
+#as:
+#objdump: -dw
+#name: x86_64 AVX-VNNI-INT16 insns
+#source: x86-64-avx-vnni-int16.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 d2 b4 f5 00 00 00 10\s+vpdpwsud 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 d2 31\s+vpdpwsud \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 d2 b4 f5 00 00 00 10\s+vpdpwsud 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 d2 31\s+vpdpwsud \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 d3 b4 f5 00 00 00 10\s+vpdpwsuds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 d3 31\s+vpdpwsuds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 d3 b4 f5 00 00 00 10\s+vpdpwsuds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 d3 31\s+vpdpwsuds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 55 d2 b4 f5 00 00 00 10\s+vpdpwusd 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 55 d2 31\s+vpdpwusd \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 51 d2 b4 f5 00 00 00 10\s+vpdpwusd 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 d2 31\s+vpdpwusd \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 55 d3 b4 f5 00 00 00 10\s+vpdpwusds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 55 d3 31\s+vpdpwusds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 51 d3 b4 f5 00 00 00 10\s+vpdpwusds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 d3 31\s+vpdpwusds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 54 d2 b4 f5 00 00 00 10\s+vpdpwuud 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 54 d2 31\s+vpdpwuud \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 50 d2 b4 f5 00 00 00 10\s+vpdpwuud 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 d2 31\s+vpdpwuud \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 54 d3 b4 f5 00 00 00 10\s+vpdpwuuds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 54 d3 31\s+vpdpwuuds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 50 d3 b4 f5 00 00 00 10\s+vpdpwuuds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 d3 31\s+vpdpwuuds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 f4\s+vpdpwsud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 f4\s+vpdpwsud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 d2 b4 f5 00 00 00 10\s+vpdpwsud 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 d2 31\s+vpdpwsud \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b1 e0 0f 00 00\s+vpdpwsud 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d2 b2 00 f0 ff ff\s+vpdpwsud -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 d2 b4 f5 00 00 00 10\s+vpdpwsud 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 d2 31\s+vpdpwsud \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b1 f0 07 00 00\s+vpdpwsud 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d2 b2 00 f8 ff ff\s+vpdpwsud -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 f4\s+vpdpwsuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 f4\s+vpdpwsuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 d3 b4 f5 00 00 00 10\s+vpdpwsuds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 d3 31\s+vpdpwsuds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b1 e0 0f 00 00\s+vpdpwsuds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 d3 b2 00 f0 ff ff\s+vpdpwsuds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 d3 b4 f5 00 00 00 10\s+vpdpwsuds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 d3 31\s+vpdpwsuds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b1 f0 07 00 00\s+vpdpwsuds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 d3 b2 00 f8 ff ff\s+vpdpwsuds -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 f4\s+vpdpwusd %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 f4\s+vpdpwusd %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 55 d2 b4 f5 00 00 00 10\s+vpdpwusd 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 55 d2 31\s+vpdpwusd \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b1 e0 0f 00 00\s+vpdpwusd 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d2 b2 00 f0 ff ff\s+vpdpwusd -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 51 d2 b4 f5 00 00 00 10\s+vpdpwusd 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 d2 31\s+vpdpwusd \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b1 f0 07 00 00\s+vpdpwusd 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d2 b2 00 f8 ff ff\s+vpdpwusd -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 f4\s+vpdpwusds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 f4\s+vpdpwusds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 55 d3 b4 f5 00 00 00 10\s+vpdpwusds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 55 d3 31\s+vpdpwusds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b1 e0 0f 00 00\s+vpdpwusds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 55 d3 b2 00 f0 ff ff\s+vpdpwusds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 51 d3 b4 f5 00 00 00 10\s+vpdpwusds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 d3 31\s+vpdpwusds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b1 f0 07 00 00\s+vpdpwusds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 d3 b2 00 f8 ff ff\s+vpdpwusds -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 f4\s+vpdpwuud %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 f4\s+vpdpwuud %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 54 d2 b4 f5 00 00 00 10\s+vpdpwuud 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 54 d2 31\s+vpdpwuud \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b1 e0 0f 00 00\s+vpdpwuud 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d2 b2 00 f0 ff ff\s+vpdpwuud -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 50 d2 b4 f5 00 00 00 10\s+vpdpwuud 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 d2 31\s+vpdpwuud \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b1 f0 07 00 00\s+vpdpwuud 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d2 b2 00 f8 ff ff\s+vpdpwuud -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 f4\s+vpdpwuuds %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 f4\s+vpdpwuuds %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 54 d3 b4 f5 00 00 00 10\s+vpdpwuuds 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 54 d3 31\s+vpdpwuuds \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b1 e0 0f 00 00\s+vpdpwuuds 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 54 d3 b2 00 f0 ff ff\s+vpdpwuuds -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 50 d3 b4 f5 00 00 00 10\s+vpdpwuuds 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 d3 31\s+vpdpwuuds \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b1 f0 07 00 00\s+vpdpwuuds 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 d3 b2 00 f8 ff ff\s+vpdpwuuds -0x800\(%rdx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.s b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.s
new file mode 100644
index 00000000000..8ba0f54e45d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int16.s
@@ -0,0 +1,127 @@
+# Check 64bit AVX-VNNI-INT16 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vpdpwsud	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsud	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsud	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsud	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsud	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsud	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwsuds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwsuds	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsuds	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsuds	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwsuds	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsuds	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusd	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusd	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusd	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusd	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusd	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusd	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwusds	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusds	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusds	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwusds	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusds	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuud	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuud	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuud	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuud	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuud	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuud	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuuds	%ymm4, %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	%xmm4, %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	(%r9), %ymm5, %ymm6	 #AVX-VNNI-INT16
+	vpdpwuuds	4064(%rcx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuuds	-4096(%rdx), %ymm5, %ymm6	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuuds	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	(%r9), %xmm5, %xmm6	 #AVX-VNNI-INT16
+	vpdpwuuds	2032(%rcx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuuds	-2048(%rdx), %xmm5, %xmm6	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vpdpwsud	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsud	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsud	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwsuds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwsuds	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwsuds	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusd	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusd	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusd	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwusds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwusds	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwusds	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuud	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuud	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuud	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
+	vpdpwuuds	ymm6, ymm5, ymm4	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, xmm4	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [rcx+4064]	 #AVX-VNNI-INT16 Disp32(e00f0000)
+	vpdpwuuds	ymm6, ymm5, YMMWORD PTR [rdx-4096]	 #AVX-VNNI-INT16 Disp32(00f0ffff)
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [r9]	 #AVX-VNNI-INT16
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #AVX-VNNI-INT16 Disp32(f0070000)
+	vpdpwuuds	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #AVX-VNNI-INT16 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 49205f9b996..0f2903c6185 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -438,6 +438,8 @@ run_list_test "x86-64-amx-complex-inval"
 run_dump_test "x86-64-fred"
 run_dump_test "x86-64-lkgs"
 run_list_test "x86-64-lkgs-inval"
+run_dump_test "x86-64-avx-vnni-int16"
+run_dump_test "x86-64-avx-vnni-int16-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 9905317c110..9311d832342 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1062,6 +1062,8 @@ enum
   PREFIX_VEX_0F3872,
   PREFIX_VEX_0F38B0_W_0,
   PREFIX_VEX_0F38B1_W_0,
+  PREFIX_VEX_0F38D2_W_0,
+  PREFIX_VEX_0F38D3_W_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1472,6 +1474,8 @@ enum
   VEX_W_0F38B4,
   VEX_W_0F38B5,
   VEX_W_0F38CF,
+  VEX_W_0F38D2,
+  VEX_W_0F38D3,
   VEX_W_0F3A00_L_1,
   VEX_W_0F3A01_L_1,
   VEX_W_0F3A02,
@@ -3909,7 +3913,21 @@ static const struct dis386 prefix_table[][4] = {
     { "vbcstnebf162ps", { XM, Mw }, 0 },
     { "vbcstnesh2ps", { XM, Mw }, 0 },
   },
- 
+  
+  /* PREFIX_VEX_0F38D2 */
+  {
+    { "vpdpwuud",	{ XM, Vex, EXx }, 0 },
+    { "vpdpwsud",	{ XM, Vex, EXx }, 0 },
+    { "vpdpwusd",	{ XM, Vex, EXx }, 0 },
+  },
+
+  /* PREFIX_VEX_0F38D3 */
+  {
+    { "vpdpwuuds",	{ XM, Vex, EXx }, 0 },
+    { "vpdpwsuds",	{ XM, Vex, EXx }, 0 },
+    { "vpdpwusds",	{ XM, Vex, EXx }, 0 },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6370,8 +6388,8 @@ static const struct dis386 vex_table[][256] = {
     /* d0 */
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38D2) },
+    { VEX_W_TABLE (VEX_W_0F38D3) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -7600,6 +7618,14 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F38CF */
     { "%XEvgf2p8mulb", { XM, Vex, EXx }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F38D2 */
+    { PREFIX_TABLE (PREFIX_VEX_0F38D2_W_0) },
+  },
+  {
+    /* VEX_W_0F38D3 */
+    { PREFIX_TABLE (PREFIX_VEX_0F38D3_W_0) },
+  },
   {
     /* VEX_W_0F3A00_L_1 */
     { Bad_Opcode },
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 1db555d8615..9796977a2aa 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -166,6 +166,8 @@ static const dependency isa_dependencies[] =
     "AVX2" },
   { "FRED",
     "LKGS" },
+  { "AVX_VNNI_INT16",
+    "AVX2" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -366,6 +368,7 @@ static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (AVX_VNNI_INT16),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 3318bcfec33..4a225202e64 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -233,6 +233,8 @@ enum
   CpuFRED,
   /* lkgs instruction required */
   CpuLKGS,
+  /* Intel AVX VNNI-INT16 Instructions support required.  */
+  CpuAVX_VNNI_INT16,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -430,6 +432,7 @@ typedef union i386_cpu_flags
       unsigned int cpurao_int:1;
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
+      unsigned int cpuavx_vnni_int16:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index b6263f88605..4903d3b2361 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3364,3 +3364,14 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}
 eretu, 0xf30f01ca, FRED|x64, NoSuf, {}
 
 // FRED instructions end.
+
+// AVX_VNNI_INT16 instructions.
+
+vpdpwuud, 0xd2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwuuds, 0xd3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwusd, 0x66d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwusds, 0x66d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+
+// AVX_VNNI_INT16 instructions end.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/5] Support Intel SHA512
  2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
  2023-07-13  6:32 ` [PATCH 1/5] Support Intel AVX-VNNI-INT16 Haochen Jiang
@ 2023-07-13  6:33 ` Haochen Jiang
  2023-07-13 10:02   ` Jan Beulich
  2023-07-13  6:33 ` [PATCH 3/5] Support Intel SM3 Haochen Jiang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:33 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra

Hi Jan,

In SHA512 patch, I have considered to eliminate the ModR/M table pass
for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
Uxmm.

However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
for both instructions, they are VEX256. Therefore, I still keep the
ModR/M table pass in the patch.

BRs,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SHA512.
	* config/tc-i386.c: Add sha512.
	* doc/c-i386.texi: Document .sha512.
	* testsuite/gas/i386/i386.exp: Run SHA512 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sha512-intel.d: New test.
	* testsuite/gas/i386/sha512-inval.l: Ditto.
	* testsuite/gas/i386/sha512-inval.s: Ditto.
	* testsuite/gas/i386/sha512.d: Ditto.
	* testsuite/gas/i386/sha512.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (Uymm): New.
	(MOD_VEX_0F38CB_P_3_W_0_L_1): Ditto.
	(MOD_VEX_0F38CC_P_3_W_0_L_1): Ditto.
	(PREFIX_VEX_0F38CB): Ditto.
	(PREFIX_VEX_0F38CC): Ditto.
	(PREFIX_VEX_0F38CD): Ditto.
	(VEX_LEN_0F38CB_P_3_W_0): Ditto.
	(VEX_LEN_0F38CC_P_3_W_0): Ditto.
	(VEX_LEN_0F38CD_P_3_W_0): Ditto.
	(VEX_W_0F38CB_P_3): Ditto.
	(VEX_W_0F38CC_P_3): Ditto.
	(VEX_W_0F38CD_P_3): Ditto.
	(mod_table): Add MOD_VEX_0F38CB_P_3_W_0_L_1, MOD_VEX_0F38CC_P_3_W_0_L_1,
	(prefix_table): Add PREFIX_VEX_0F38CB, PREFIX_VEX_0F38CC,
	PREFIX_VEX_0F38CD.
	(vex_len_table): Add VEX_LEN_0F38CB_P_3_W_0,
	VEX_LEN_0F38CC_P_3_W_0, VEX_LEN_0F38CD_P_3_W_0.
	(vex_w_table): Add VEX_W_0F38CB_P_3, VEX_W_0F38CC_P_3, VEX_W_0F38CD_P_3.
	* i386-gen.c (isa_dependencies): Add SHA512.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSHA512): New.
	(i386_cpu_flags): Add cpusha512.
	* i386-opc.tbl: Add SHA512 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                     |    2 +
 gas/config/tc-i386.c                         |    1 +
 gas/doc/c-i386.texi                          |    3 +-
 gas/testsuite/gas/i386/i386.exp              |    2 +
 gas/testsuite/gas/i386/sha512-intel.d        |   16 +
 gas/testsuite/gas/i386/sha512.d              |   16 +
 gas/testsuite/gas/i386/sha512.s              |   13 +
 gas/testsuite/gas/i386/x86-64-sha512-intel.d |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.d       |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.s       |   13 +
 gas/testsuite/gas/i386/x86-64.exp            |    2 +
 opcodes/i386-dis.c                           |   82 +-
 opcodes/i386-gen.c                           |    3 +
 opcodes/i386-init.h                          |  648 +-
 opcodes/i386-mnem.h                          | 3949 ++++----
 opcodes/i386-opc.h                           |    3 +
 opcodes/i386-opc.tbl                         |    8 +
 opcodes/i386-tbl.h                           | 8555 +++++++++---------
 18 files changed, 6806 insertions(+), 6542 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/sha512.d
 create mode 100644 gas/testsuite/gas/i386/sha512.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.s

diff --git a/gas/NEWS b/gas/NEWS
index 5e9ed5ab4bc..fe2c055fa7f 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SHA512 instructions.
+
 * Add support for Intel AVX-VNNI-INT16 instructions.
 
 Changes in 2.41:
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 0d3d7560efe..836640d9123 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1152,6 +1152,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (fred, FRED, ANY_FRED, false),
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
+  SUBARCH (sha512, SHA512, ANY_SHA512, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 40ba942d9cb..21fb71e54ab 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -208,6 +208,7 @@ accept various extension mnemonics.  For example,
 @code{fred},
 @code{lkgs},
 @code{avx_vnni_int16},
+@code{sha512},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1637,7 +1638,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index b69c692cd16..487811ad988 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -498,6 +498,8 @@ if [gas_32_check] then {
     run_list_test "amx-complex-inval"
     run_dump_test "avx-vnni-int16"
     run_dump_test "avx-vnni-int16-intel"
+    run_dump_test "sha512"
+    run_dump_test "sha512-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sha512-intel.d b/gas/testsuite/gas/i386/sha512-intel.d
new file mode 100644
index 00000000000..c1cc85b9f26
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SHA512 insns (Intel disassembly)
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/sha512.d b/gas/testsuite/gas/i386/sha512.d
new file mode 100644
index 00000000000..b90019954ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: i386 SHA512 insns
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/sha512.s b/gas/testsuite/gas/i386/sha512.s
new file mode 100644
index 00000000000..e238c272970
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.s
@@ -0,0 +1,13 @@
+# Check 32bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-intel.d b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
new file mode 100644
index 00000000000..e644168e311
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SHA512 insns (Intel disassembly)
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.d b/gas/testsuite/gas/i386/x86-64-sha512.d
new file mode 100644
index 00000000000..fcb8ae61fee
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: x86_64 SHA512 insns
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.s b/gas/testsuite/gas/i386/x86-64-sha512.s
new file mode 100644
index 00000000000..5eaadb3bade
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.s
@@ -0,0 +1,13 @@
+# Check 64bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 0f2903c6185..64d8c3726d4 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -440,6 +440,8 @@ run_dump_test "x86-64-lkgs"
 run_list_test "x86-64-lkgs-inval"
 run_dump_test "x86-64-avx-vnni-int16"
 run_dump_test "x86-64-avx-vnni-int16-intel"
+run_dump_test "x86-64-sha512"
+run_dump_test "x86-64-sha512-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 9311d832342..430238c3e4e 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -530,6 +530,7 @@ fetch_error (const instr_info *ins)
 #define Nq { OP_R, q_mode }
 #define Ux { OP_R, x_mode }
 #define Uxmm { OP_R, xmm_mode }
+#define Uymm { OP_R, ymm_mode }
 #define Rtmm { OP_R, tmm_mode }
 #define EMCq { OP_EMC, q_mode }
 #define MXC { OP_MXC, 0 }
@@ -895,6 +896,8 @@ enum
   MOD_0F38DC_PREFIX_1,
 
   MOD_VEX_0F3849_X86_64_L_0_W_0,
+  MOD_VEX_0F38CB_P_3_W_0_L_1,
+  MOD_VEX_0F38CC_P_3_W_0_L_1,
 };
 
 enum
@@ -1064,6 +1067,9 @@ enum
   PREFIX_VEX_0F38B1_W_0,
   PREFIX_VEX_0F38D2_W_0,
   PREFIX_VEX_0F38D3_W_0,
+  PREFIX_VEX_0F38CB,
+  PREFIX_VEX_0F38CC,
+  PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1306,6 +1312,9 @@ enum
   VEX_LEN_0F385C_X86_64,
   VEX_LEN_0F385E_X86_64,
   VEX_LEN_0F386C_X86_64,
+  VEX_LEN_0F38CB_P_3_W_0,
+  VEX_LEN_0F38CC_P_3_W_0,
+  VEX_LEN_0F38CD_P_3_W_0,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1473,6 +1482,9 @@ enum
   VEX_W_0F38B1,
   VEX_W_0F38B4,
   VEX_W_0F38B5,
+  VEX_W_0F38CB_P_3,
+  VEX_W_0F38CC_P_3,
+  VEX_W_0F38CD_P_3,
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
@@ -3928,6 +3940,30 @@ static const struct dis386 prefix_table[][4] = {
     { "vpdpwusds",	{ XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38CB */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CB_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CC */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CC_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CD */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6380,9 +6416,9 @@ static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CB) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CC) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CD) },
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F38CF) },
     /* d0 */
@@ -6944,6 +6980,24 @@ static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F386C_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F38CB_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_0F38CB_P_3_W_0_L_1) },
+  },
+
+  /* VEX_LEN_0F38CC_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_0F38CC_P_3_W_0_L_1) },
+  },
+
+  /* VEX_LEN_0F38CD_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg2", { XM, Uymm }, 0 },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7614,6 +7668,18 @@ static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XVvpmadd52huq",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F38CB_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CB_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CC_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CC_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CD_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CD_P_3_W_0) },
+  },
   {
     /* VEX_W_0F38CF */
     { "%XEvgf2p8mulb", { XM, Vex, EXx }, PREFIX_DATA },
@@ -8055,6 +8121,16 @@ static const struct dis386 mod_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0) },
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1) },
   },
+  {
+    /* MOD_VEX_0F38CB_P_3_W_0_L_1 */
+    { Bad_Opcode },
+    { "vsha512rnds2", { XM, Vex, EXxmm }, 0 },
+  },
+  {
+    /* MOD_VEX_0F38CC_P_3_W_0_L_1 */
+    { Bad_Opcode },
+    { "vsha512msg1", { XM, EXxmm }, 0 },
+  },
 
 #include "i386-dis-evex-mod.h"
 };
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 9796977a2aa..8a163533eeb 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
     "LKGS" },
   { "AVX_VNNI_INT16",
     "AVX2" },
+  { "SHA512",
+    "AVX" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -369,6 +371,7 @@ static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (AVX_VNNI_INT16),
+  BITFIELD (SHA512),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 4a225202e64..224ca04661e 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -235,6 +235,8 @@ enum
   CpuLKGS,
   /* Intel AVX VNNI-INT16 Instructions support required.  */
   CpuAVX_VNNI_INT16,
+  /* Intel SHA512 Instructions support required.  */
+  CpuSHA512,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -433,6 +435,7 @@ typedef union i386_cpu_flags
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
       unsigned int cpuavx_vnni_int16:1;
+      unsigned int cpusha512:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 4903d3b2361..18ea2f1500e 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3375,3 +3375,11 @@ vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperand
 vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX_VNNI_INT16 instructions end.
+
+// SHA512 instructions.
+
+vsha512rnds2, 0xf2cb, SHA512, Vex256|Space0F38|Modrm|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
+vsha512msg1, 0xf2cc, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegXMM, RegYMM }
+vsha512msg2, 0xf2cd, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegYMM, RegYMM }
+
+// SHA512 instructions end.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/5] Support Intel SM3
  2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
  2023-07-13  6:32 ` [PATCH 1/5] Support Intel AVX-VNNI-INT16 Haochen Jiang
  2023-07-13  6:33 ` [PATCH 2/5] Support Intel SHA512 Haochen Jiang
@ 2023-07-13  6:33 ` Haochen Jiang
  2023-07-13 10:20   ` Jan Beulich
  2023-07-13  6:33 ` [PATCH 4/5] Support Intel SM4 Haochen Jiang
  2023-07-13  6:33 ` [PATCH 5/5] Support Intel PBNDKB Haochen Jiang
  4 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:33 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra

gas/ChangeLog:

	* NEWS: Support Intel SM3.
	* config/tc-i386.c: Add sm3.
	* doc/c-i386.texi: Document .sm3 and nosm3.
	* testsuite/gas/i386/i386.exp: Run sm3 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sm3-intel.d: New test.
	* testsuite/gas/i386/sm3.d: Ditto.
	* testsuite/gas/i386/sm3.s: Ditto.
	* testsuite/gas/i386/x86-64-sm3-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sm3.d: Ditto.
	* testsuite/gas/i386/x86-64-sm3.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (PREFIX_VEX_0F38DA_W_0_L_0): New.
	(VEX_LEN_0F38DA_W_0): Ditto.
	(VEX_LEN_0F3ADE_W_0): Ditto.
	(VEX_W_0F38DA): Ditto.
	(VEX_W_0F3ADE): Ditto.
	(prefix_table): Add PREFIX_VEX_0F38DA_W_0_L_0.
	(vex_len_table): Add VEX_LEN_0F38DA_W_0, VEX_LEN_0F3ADE_W_0.
	(vex_w_table): Add VEX_W_0F38DA, VEX_W_0F3ADE.
	* i386-gen.c (isa_dependencies): Add SM3.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSM3): New.
	(i386_cpu_flags): Add cpusm3.
	* i386-opc.tbl: Add SM3 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                  |    2 +
 gas/config/tc-i386.c                      |    1 +
 gas/doc/c-i386.texi                       |    3 +-
 gas/testsuite/gas/i386/i386.exp           |    2 +
 gas/testsuite/gas/i386/sm3-intel.d        |   40 +
 gas/testsuite/gas/i386/sm3.d              |   40 +
 gas/testsuite/gas/i386/sm3.s              |   37 +
 gas/testsuite/gas/i386/x86-64-sm3-intel.d |   40 +
 gas/testsuite/gas/i386/x86-64-sm3.d       |   40 +
 gas/testsuite/gas/i386/x86-64-sm3.s       |   37 +
 gas/testsuite/gas/i386/x86-64.exp         |    2 +
 opcodes/i386-dis.c                        |   34 +-
 opcodes/i386-gen.c                        |    3 +
 opcodes/i386-init.h                       |  652 +-
 opcodes/i386-mnem.h                       | 3953 +++++-----
 opcodes/i386-opc.h                        |    3 +
 opcodes/i386-opc.tbl                      |    7 +
 opcodes/i386-tbl.h                        | 8401 +++++++++++----------
 18 files changed, 6832 insertions(+), 6465 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sm3-intel.d
 create mode 100644 gas/testsuite/gas/i386/sm3.d
 create mode 100644 gas/testsuite/gas/i386/sm3.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3.s

diff --git a/gas/NEWS b/gas/NEWS
index fe2c055fa7f..42bda657f21 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SM3 instructions.
+
 * Add support for Intel SHA512 instructions.
 
 * Add support for Intel AVX-VNNI-INT16 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 836640d9123..7424fa41c44 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1153,6 +1153,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
   SUBARCH (sha512, SHA512, ANY_SHA512, false),
+  SUBARCH (sm3, SM3, ANY_SM3, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 21fb71e54ab..6ef1da21370 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -209,6 +209,7 @@ accept various extension mnemonics.  For example,
 @code{lkgs},
 @code{avx_vnni_int16},
 @code{sha512},
+@code{sm3},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1638,7 +1639,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 487811ad988..87ddd71be14 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -500,6 +500,8 @@ if [gas_32_check] then {
     run_dump_test "avx-vnni-int16-intel"
     run_dump_test "sha512"
     run_dump_test "sha512-intel"
+    run_dump_test "sm3"
+    run_dump_test "sm3-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sm3-intel.d b/gas/testsuite/gas/i386/sm3-intel.d
new file mode 100644
index 00000000000..4ab4ce2ddb4
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3-intel.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SM3 insns (Intel disassembly)
+#source: sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\],0x7b
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\],0x7b
diff --git a/gas/testsuite/gas/i386/sm3.d b/gas/testsuite/gas/i386/sm3.d
new file mode 100644
index 00000000000..7507a8b4c7f
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw
+#name: i386 SM3 insns
+#source: sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%edx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/sm3.s b/gas/testsuite/gas/i386/sm3.s
new file mode 100644
index 00000000000..d1bc967a6f3
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3.s
@@ -0,0 +1,37 @@
+# Check 32bit SM3 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg1	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg1	(%ecx), %xmm5, %xmm6	 #SM3
+	vsm3msg1	2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg1	-2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg2	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg2	(%ecx), %xmm5, %xmm6	 #SM3
+	vsm3msg2	2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg2	-2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	$123, %xmm4, %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, (%ecx), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3rnds2	$123, -2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vsm3msg1	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [ecx]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [ecx]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	xmm6, xmm5, xmm4, 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [ecx], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [ecx+2032], 123	 #SM3 Disp32(f0070000)
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [edx-2048], 123	 #SM3 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/x86-64-sm3-intel.d b/gas/testsuite/gas/i386/x86-64-sm3-intel.d
new file mode 100644
index 00000000000..5b533681029
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3-intel.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SM3 insns (Intel disassembly)
+#source: x86-64-sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[r9\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\],0x7b
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[r9\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\],0x7b
diff --git a/gas/testsuite/gas/i386/x86-64-sm3.d b/gas/testsuite/gas/i386/x86-64-sm3.d
new file mode 100644
index 00000000000..8f417de4f7f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw
+#name: x86_64 SM3 insns
+#source: x86-64-sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%rdx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-sm3.s b/gas/testsuite/gas/i386/x86-64-sm3.s
new file mode 100644
index 00000000000..fa80b4b15a8
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3.s
@@ -0,0 +1,37 @@
+# Check 64bit SM3 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg1	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg1	(%r9), %xmm5, %xmm6	 #SM3
+	vsm3msg1	2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg1	-2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg2	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg2	(%r9), %xmm5, %xmm6	 #SM3
+	vsm3msg2	2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg2	-2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	$123, %xmm4, %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, (%r9), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3rnds2	$123, -2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vsm3msg1	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [r9]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [r9]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	xmm6, xmm5, xmm4, 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [r9], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rcx+2032], 123	 #SM3 Disp32(f0070000)
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rdx-2048], 123	 #SM3 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 64d8c3726d4..5717443d2f6 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -442,6 +442,8 @@ run_dump_test "x86-64-avx-vnni-int16"
 run_dump_test "x86-64-avx-vnni-int16-intel"
 run_dump_test "x86-64-sha512"
 run_dump_test "x86-64-sha512-intel"
+run_dump_test "x86-64-sm3"
+run_dump_test "x86-64-sm3-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 430238c3e4e..da9568acdd5 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1070,6 +1070,7 @@ enum
   PREFIX_VEX_0F38CB,
   PREFIX_VEX_0F38CC,
   PREFIX_VEX_0F38CD,
+  PREFIX_VEX_0F38DA_W_0_L_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1315,6 +1316,7 @@ enum
   VEX_LEN_0F38CB_P_3_W_0,
   VEX_LEN_0F38CC_P_3_W_0,
   VEX_LEN_0F38CD_P_3_W_0,
+  VEX_LEN_0F38DA_W_0,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1345,6 +1347,7 @@ enum
   VEX_LEN_0F3A61,
   VEX_LEN_0F3A62,
   VEX_LEN_0F3A63,
+  VEX_LEN_0F3ADE_W_0,
   VEX_LEN_0F3ADF,
   VEX_LEN_0F3AF0,
   VEX_LEN_XOP_08_85,
@@ -1488,6 +1491,7 @@ enum
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
+  VEX_W_0F38DA,
   VEX_W_0F3A00_L_1,
   VEX_W_0F3A01_L_1,
   VEX_W_0F3A02,
@@ -1505,6 +1509,7 @@ enum
   VEX_W_0F3A4C,
   VEX_W_0F3ACE,
   VEX_W_0F3ACF,
+  VEX_W_0F3ADE,
 
   VEX_W_XOP_08_85_L_0,
   VEX_W_XOP_08_86_L_0,
@@ -3964,6 +3969,13 @@ static const struct dis386 prefix_table[][4] = {
     { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
   },
 
+  /* PREFIX_VEX_0F38DA_W_0_L_0 */
+  {
+    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
+    { Bad_Opcode },
+    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6433,7 +6445,7 @@ static const struct dis386 vex_table[][256] = {
     /* d8 */
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38DA) },
     { VEX_LEN_TABLE (VEX_LEN_0F38DB) },
     { "vaesenc",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "vaesenclast",	{ XM, Vex, EXx }, PREFIX_DATA },
@@ -6728,7 +6740,7 @@ static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3ADE) },
     { VEX_LEN_TABLE (VEX_LEN_0F3ADF) },
     /* e0 */
     { Bad_Opcode },
@@ -6998,6 +7010,11 @@ static const struct dis386 vex_len_table[][2] = {
     { "vsha512msg2", { XM, Uymm }, 0 },
   },
 
+  /* VEX_LEN_0F38DA_W_0 */
+  {
+    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7156,6 +7173,11 @@ static const struct dis386 vex_len_table[][2] = {
     { "vpcmpistri",	{ XM, EXx, Ib }, PREFIX_DATA },
   },
 
+  /* VEX_LEN_0F3ADE_W_0 */
+  {
+    { "vsm3rnds2", { XM, Vex, EXxmm, Ib }, PREFIX_DATA },
+  },
+
   /* VEX_LEN_0F3ADF */
   {
     { "vaeskeygenassist", { XM, EXx, Ib }, PREFIX_DATA },
@@ -7692,6 +7714,10 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F38D3 */
     { PREFIX_TABLE (PREFIX_VEX_0F38D3_W_0) },
   },
+  {
+    /* VEX_W_0F38DA */
+    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0) },
+  },
   {
     /* VEX_W_0F3A00_L_1 */
     { Bad_Opcode },
@@ -7764,6 +7790,10 @@ static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XEvgf2p8affineinvqb",  { XM, Vex, EXx, Ib }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F3ADE */
+    { VEX_LEN_TABLE (VEX_LEN_0F3ADE_W_0) },
+  },
   /* VEX_W_XOP_08_85_L_0 */
   {
     { "vpmacssww", 	{ XM, Vex, EXx, XMVexI4 }, 0 },
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 8a163533eeb..a5743cd5ee5 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -170,6 +170,8 @@ static const dependency isa_dependencies[] =
     "AVX2" },
   { "SHA512",
     "AVX" },
+  { "SM3",
+    "AVX" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -372,6 +374,7 @@ static bitfield cpu_flags[] =
   BITFIELD (LKGS),
   BITFIELD (AVX_VNNI_INT16),
   BITFIELD (SHA512),
+  BITFIELD (SM3),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 224ca04661e..42d69167c8a 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -237,6 +237,8 @@ enum
   CpuAVX_VNNI_INT16,
   /* Intel SHA512 Instructions support required.  */
   CpuSHA512,
+  /* Intel SM3 Instructions support required.  */
+  CpuSM3,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -436,6 +438,7 @@ typedef union i386_cpu_flags
       unsigned int cpulkgs:1;
       unsigned int cpuavx_vnni_int16:1;
       unsigned int cpusha512:1;
+      unsigned int cpusm3:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 18ea2f1500e..630337b108b 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3383,3 +3383,10 @@ vsha512msg1, 0xf2cc, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegXMM, RegYM
 vsha512msg2, 0xf2cd, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegYMM, RegYMM }
 
 // SHA512 instructions end.
+
+// SM3 instructions.
+vsm3rnds2, 0x66de, SM3, Modrm|Space0F3A|Vex128|VexVVVV|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg1, 0xda, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+
+// SM3 instructions end.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/5] Support Intel SM4
  2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
                   ` (2 preceding siblings ...)
  2023-07-13  6:33 ` [PATCH 3/5] Support Intel SM3 Haochen Jiang
@ 2023-07-13  6:33 ` Haochen Jiang
  2023-07-13 10:25   ` Jan Beulich
  2023-07-13  6:33 ` [PATCH 5/5] Support Intel PBNDKB Haochen Jiang
  4 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:33 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra

gas/ChangeLog:

	* NEWS: Support Intel SM4.
	* config/tc-i386.c: Add sm4.
	* doc/c-i386.texi: Document .sm4.
	* testsuite/gas/i386/i386.exp: Run SM4 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sm4-intel.d: Add SM4 tests.
	* testsuite/gas/i386/sm4.d: Ditto.
	* testsuite/gas/i386/sm4.s: Ditto.
	* testsuite/gas/i386/x86-64-sm4-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sm4.d: Ditto.
	* testsuite/gas/i386/x86-64-sm4.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (PREFIX_VEX_0F38DA_W_0_L_0): Remove.
	(VEX_LEN_0F38DA_W_0): Ditto.
	(PREFIX_VEX_0F38DA_W_0): New.
	(VEX_LEN_0F38DA_W_0_P_0): Ditto.
	(VEX_LEN_0F38DA_W_0_P_2): Ditto.
	(prefix_table):
	Change from PREFIX_VEX_0F38DA_W_0_L_0 to PREFIX_VEX_0F38DA_W_0.
	Add SM4 instruction table entry and adjust SM3 table entry.
	(vex_len_table): Remove VEX_LEN_0F38DA_W_0.
	Add VEX_LEN_0F38DA_W_0_P_0, VEX_LEN_0F38DA_W_0_P_2.
	(vex_w_table): Adjust table entry from vex_len_table to prefix_table.
	* i386-gen.c (isa_dependencies): Add SM4.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSM4): New.
	(i386_cpu_flags): Add cpusm4.
	* i386-opc.tbl: Add SM4 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                  |    2 +
 gas/config/tc-i386.c                      |    1 +
 gas/doc/c-i386.texi                       |    3 +-
 gas/testsuite/gas/i386/i386.exp           |    2 +
 gas/testsuite/gas/i386/sm4-intel.d        |   50 +
 gas/testsuite/gas/i386/sm4.d              |   50 +
 gas/testsuite/gas/i386/sm4.s              |   47 +
 gas/testsuite/gas/i386/x86-64-sm4-intel.d |   50 +
 gas/testsuite/gas/i386/x86-64-sm4.d       |   50 +
 gas/testsuite/gas/i386/x86-64-sm4.s       |   47 +
 gas/testsuite/gas/i386/x86-64.exp         |    2 +
 opcodes/i386-dis.c                        |   25 +-
 opcodes/i386-gen.c                        |    3 +
 opcodes/i386-init.h                       |  660 +-
 opcodes/i386-mnem.h                       | 3854 +++++-----
 opcodes/i386-opc.h                        |    3 +
 opcodes/i386-opc.tbl                      |    7 +
 opcodes/i386-tbl.h                        | 7972 +++++++++++----------
 18 files changed, 6605 insertions(+), 6223 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sm4-intel.d
 create mode 100644 gas/testsuite/gas/i386/sm4.d
 create mode 100644 gas/testsuite/gas/i386/sm4.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4.s

diff --git a/gas/NEWS b/gas/NEWS
index 42bda657f21..26e75bde391 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SM4 instructions.
+
 * Add support for Intel SM3 instructions.
 
 * Add support for Intel SHA512 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 7424fa41c44..686dd4c70f4 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1154,6 +1154,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
   SUBARCH (sha512, SHA512, ANY_SHA512, false),
   SUBARCH (sm3, SM3, ANY_SM3, false),
+  SUBARCH (sm4, SM4, ANY_SM4, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 6ef1da21370..54b0d7d738c 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -210,6 +210,7 @@ accept various extension mnemonics.  For example,
 @code{avx_vnni_int16},
 @code{sha512},
 @code{sm3},
+@code{sm4},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1639,7 +1640,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 87ddd71be14..5e575660d7c 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -502,6 +502,8 @@ if [gas_32_check] then {
     run_dump_test "sha512-intel"
     run_dump_test "sm3"
     run_dump_test "sm3-intel"
+    run_dump_test "sm4"
+    run_dump_test "sm4-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sm4-intel.d b/gas/testsuite/gas/i386/sm4-intel.d
new file mode 100644
index 00000000000..03ccdb4a67b
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4-intel.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SM4 insns (Intel disassembly)
+#source: sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
diff --git a/gas/testsuite/gas/i386/sm4.d b/gas/testsuite/gas/i386/sm4.d
new file mode 100644
index 00000000000..48dcda66271
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw
+#name: i386 SM4 insns
+#source: sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%edx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/sm4.s b/gas/testsuite/gas/i386/sm4.s
new file mode 100644
index 00000000000..0eb7b2fcb7b
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4.s
@@ -0,0 +1,47 @@
+# Check 32bit SM4 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm4key4	%ymm4, %ymm5, %ymm6
+	vsm4key4	%xmm4, %xmm5, %xmm6
+	vsm4key4	0x10000000(%esp, %esi, 8), %ymm5, %ymm6
+	vsm4key4	(%ecx), %ymm5, %ymm6
+	vsm4key4	4064(%ecx), %ymm5, %ymm6
+	vsm4key4	-4096(%edx), %ymm5, %ymm6
+	vsm4key4	0x10000000(%esp, %esi, 8), %xmm5, %xmm6
+	vsm4key4	(%ecx), %xmm5, %xmm6
+	vsm4key4	2032(%ecx), %xmm5, %xmm6
+	vsm4key4	-2048(%edx), %xmm5, %xmm6
+	vsm4rnds4	%ymm4, %ymm5, %ymm6
+	vsm4rnds4	%xmm4, %xmm5, %xmm6
+	vsm4rnds4	0x10000000(%esp, %esi, 8), %ymm5, %ymm6
+	vsm4rnds4	(%ecx), %ymm5, %ymm6
+	vsm4rnds4	4064(%ecx), %ymm5, %ymm6
+	vsm4rnds4	-4096(%edx), %ymm5, %ymm6
+	vsm4rnds4	0x10000000(%esp, %esi, 8), %xmm5, %xmm6
+	vsm4rnds4	(%ecx), %xmm5, %xmm6
+	vsm4rnds4	2032(%ecx), %xmm5, %xmm6
+	vsm4rnds4	-2048(%edx), %xmm5, %xmm6
+
+.intel_syntax noprefix
+	vsm4key4	ymm6, ymm5, ymm4
+	vsm4key4	xmm6, xmm5, xmm4
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [ecx]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [ecx+4064]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [edx-4096]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [ecx]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [ecx+2032]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [edx-2048]
+	vsm4rnds4	ymm6, ymm5, ymm4
+	vsm4rnds4	xmm6, xmm5, xmm4
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [ecx]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [ecx+4064]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [edx-4096]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [ecx]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [ecx+2032]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [edx-2048]
diff --git a/gas/testsuite/gas/i386/x86-64-sm4-intel.d b/gas/testsuite/gas/i386/x86-64-sm4-intel.d
new file mode 100644
index 00000000000..9bfa59592ae
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4-intel.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SM4 insns (Intel disassembly)
+#source: x86-64-sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
diff --git a/gas/testsuite/gas/i386/x86-64-sm4.d b/gas/testsuite/gas/i386/x86-64-sm4.d
new file mode 100644
index 00000000000..2c1f6737d9a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw
+#name: x86_64 SM4 insns
+#source: x86-64-sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%rdx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-sm4.s b/gas/testsuite/gas/i386/x86-64-sm4.s
new file mode 100644
index 00000000000..cc680cb1a89
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4.s
@@ -0,0 +1,47 @@
+# Check 64bit SM4 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm4key4	%ymm4, %ymm5, %ymm6 
+	vsm4key4	%xmm4, %xmm5, %xmm6
+	vsm4key4	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6
+	vsm4key4	(%r9), %ymm5, %ymm6
+	vsm4key4	4064(%rcx), %ymm5, %ymm6
+	vsm4key4	-4096(%rdx), %ymm5, %ymm6
+	vsm4key4	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6
+	vsm4key4	(%r9), %xmm5, %xmm6
+	vsm4key4	2032(%rcx), %xmm5, %xmm6
+	vsm4key4	-2048(%rdx), %xmm5, %xmm6
+	vsm4rnds4	%ymm4, %ymm5, %ymm6
+	vsm4rnds4	%xmm4, %xmm5, %xmm6
+	vsm4rnds4	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6
+	vsm4rnds4	(%r9), %ymm5, %ymm6
+	vsm4rnds4	4064(%rcx), %ymm5, %ymm6
+	vsm4rnds4	-4096(%rdx), %ymm5, %ymm6
+	vsm4rnds4	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6
+	vsm4rnds4	(%r9), %xmm5, %xmm6
+	vsm4rnds4	2032(%rcx), %xmm5, %xmm6
+	vsm4rnds4	-2048(%rdx), %xmm5, %xmm6
+
+.intel_syntax noprefix
+	vsm4key4	ymm6, ymm5, ymm4
+	vsm4key4	xmm6, xmm5, xmm4
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [r9]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rcx+4064]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rdx-4096]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [r9]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rcx+2032]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rdx-2048]
+	vsm4rnds4	ymm6, ymm5, ymm4
+	vsm4rnds4	xmm6, xmm5, xmm4
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [r9]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rcx+4064]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rdx-4096]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [r9]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rcx+2032]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rdx-2048]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 5717443d2f6..36bde0ac372 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -444,6 +444,8 @@ run_dump_test "x86-64-sha512"
 run_dump_test "x86-64-sha512-intel"
 run_dump_test "x86-64-sm3"
 run_dump_test "x86-64-sm3-intel"
+run_dump_test "x86-64-sm4"
+run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index da9568acdd5..e21ad7ae005 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1070,7 +1070,7 @@ enum
   PREFIX_VEX_0F38CB,
   PREFIX_VEX_0F38CC,
   PREFIX_VEX_0F38CD,
-  PREFIX_VEX_0F38DA_W_0_L_0,
+  PREFIX_VEX_0F38DA_W_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1316,7 +1316,8 @@ enum
   VEX_LEN_0F38CB_P_3_W_0,
   VEX_LEN_0F38CC_P_3_W_0,
   VEX_LEN_0F38CD_P_3_W_0,
-  VEX_LEN_0F38DA_W_0,
+  VEX_LEN_0F38DA_W_0_P_0,
+  VEX_LEN_0F38DA_W_0_P_2,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -3969,11 +3970,12 @@ static const struct dis386 prefix_table[][4] = {
     { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
   },
 
-  /* PREFIX_VEX_0F38DA_W_0_L_0 */
+  /* PREFIX_VEX_0F38DA_W_0 */
   {
-    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
-    { Bad_Opcode },
-    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
+    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_0) },
+    { "vsm4key4", { XM, Vex, EXx }, 0 },
+    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_2) },
+    { "vsm4rnds4", { XM, Vex, EXx }, 0 },
   },
 
   /* PREFIX_VEX_0F38F5_L_0 */
@@ -7010,9 +7012,14 @@ static const struct dis386 vex_len_table[][2] = {
     { "vsha512msg2", { XM, Uymm }, 0 },
   },
 
-  /* VEX_LEN_0F38DA_W_0 */
+  /* VEX_LEN_0F38DA_W_0_P_0 */
+  {
+    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
+  },
+
+  /* VEX_LEN_0F38DA_W_0_P_2 */
   {
-    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0_L_0) },
+    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
   },
 
   /* VEX_LEN_0F38DB */
@@ -7716,7 +7723,7 @@ static const struct dis386 vex_w_table[][2] = {
   },
   {
     /* VEX_W_0F38DA */
-    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0) },
   },
   {
     /* VEX_W_0F3A00_L_1 */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index a5743cd5ee5..24a9943da47 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -172,6 +172,8 @@ static const dependency isa_dependencies[] =
     "AVX" },
   { "SM3",
     "AVX" },
+  { "SM4",
+    "AVX" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -375,6 +377,7 @@ static bitfield cpu_flags[] =
   BITFIELD (AVX_VNNI_INT16),
   BITFIELD (SHA512),
   BITFIELD (SM3),
+  BITFIELD (SM4),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 42d69167c8a..fcbdb4c5f8d 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -239,6 +239,8 @@ enum
   CpuSHA512,
   /* Intel SM3 Instructions support required.  */
   CpuSM3,
+  /* Intel SM4 Instructions support required.  */
+  CpuSM4,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -439,6 +441,7 @@ typedef union i386_cpu_flags
       unsigned int cpuavx_vnni_int16:1;
       unsigned int cpusha512:1;
       unsigned int cpusm3:1;
+      unsigned int cpusm4:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 630337b108b..eca60b467f2 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3390,3 +3390,10 @@ vsm3msg1, 0xda, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspec
 vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
 
 // SM3 instructions end.
+
+// SM4 instructions.
+
+vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+
+// SM4 instructions end.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/5] Support Intel PBNDKB
  2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
                   ` (3 preceding siblings ...)
  2023-07-13  6:33 ` [PATCH 4/5] Support Intel SM4 Haochen Jiang
@ 2023-07-13  6:33 ` Haochen Jiang
  2023-07-13 10:29   ` Jan Beulich
  4 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-13  6:33 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hjl.tools, amodra, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

gas/ChangeLog:

	* NEWS: Support Intel PBNDKB.
	* config/tc-i386.c: Add pbndkb.
	* doc/c-i386.texi: Document .pbndkb.
	* testsuite/gas/i386/i386.exp: Add PBNDKB tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/pbndkb-inval.l: New test.
	* testsuite/gas/i386/pbndkb-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-pbndkb-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-pbndkb.d: Ditto.
	* testsuite/gas/i386/x86-64-pbndkb.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (X86_64_0F01_REG_0_MOD_3_RM_7): New.
	(X86_64_0F01_REG_0_MOD_3_RM_7_P_0): Ditto.
	(prefix_table): Add PREFIX_0F01_REG_0_MOD_3_RM_7.
	(x86_64_table): Add X86_64_0F01_REG_0_MOD_3_RM_7_P_0.
	(rm_table): New entry for pbndkb.
	* i386-gen.c (cpu_flag): Add PBNDKB.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuPBNDKB): New.
	(i386_cpu_flags): Add cpupbndkb.
	* i386-opc.tbl: Add PBNDKB instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                     |    2 +
 gas/config/tc-i386.c                         |    1 +
 gas/doc/c-i386.texi                          |    2 +
 gas/testsuite/gas/i386/i386.exp              |    1 +
 gas/testsuite/gas/i386/pbndkb-inval.l        |    2 +
 gas/testsuite/gas/i386/pbndkb-inval.s        |    6 +
 gas/testsuite/gas/i386/x86-64-pbndkb-intel.d |   12 +
 gas/testsuite/gas/i386/x86-64-pbndkb.d       |   12 +
 gas/testsuite/gas/i386/x86-64-pbndkb.s       |    9 +
 gas/testsuite/gas/i386/x86-64.exp            |    2 +
 opcodes/i386-dis.c                           |   14 +
 opcodes/i386-gen.c                           |    1 +
 opcodes/i386-init.h                          |  652 +-
 opcodes/i386-mnem.h                          | 3673 ++++----
 opcodes/i386-opc.h                           |    3 +
 opcodes/i386-opc.tbl                         |    6 +
 opcodes/i386-tbl.h                           | 7949 +++++++++---------
 17 files changed, 6223 insertions(+), 6124 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/pbndkb-inval.l
 create mode 100644 gas/testsuite/gas/i386/pbndkb-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-pbndkb-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-pbndkb.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-pbndkb.s

diff --git a/gas/NEWS b/gas/NEWS
index 26e75bde391..1ed043511eb 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel PBNDKB instructions.
+
 * Add support for Intel SM4 instructions.
 
 * Add support for Intel SM3 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 686dd4c70f4..e35e2660ed5 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1155,6 +1155,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (sha512, SHA512, ANY_SHA512, false),
   SUBARCH (sm3, SM3, ANY_SM3, false),
   SUBARCH (sm4, SM4, ANY_SM4, false),
+  SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 54b0d7d738c..dd06282a5a3 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -211,6 +211,7 @@ accept various extension mnemonics.  For example,
 @code{sha512},
 @code{sm3},
 @code{sm4},
+@code{pbndkb},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1641,6 +1642,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
 @item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
+@item @samp{.pbndkb}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 5e575660d7c..90f6e20bf5a 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -504,6 +504,7 @@ if [gas_32_check] then {
     run_dump_test "sm3-intel"
     run_dump_test "sm4"
     run_dump_test "sm4-intel"
+    run_list_test "pbndkb-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/pbndkb-inval.l b/gas/testsuite/gas/i386/pbndkb-inval.l
new file mode 100644
index 00000000000..673bfff67e7
--- /dev/null
+++ b/gas/testsuite/gas/i386/pbndkb-inval.l
@@ -0,0 +1,2 @@
+.* Assembler messages:
+.*:6: Error: `pbndkb' is only supported in 64-bit mode
diff --git a/gas/testsuite/gas/i386/pbndkb-inval.s b/gas/testsuite/gas/i386/pbndkb-inval.s
new file mode 100644
index 00000000000..ac8aa00f8d1
--- /dev/null
+++ b/gas/testsuite/gas/i386/pbndkb-inval.s
@@ -0,0 +1,6 @@
+# Check Illegal PBNDKB instructions
+
+	.allow_index_reg
+	.text
+_start:
+	pbndkb		 #PBNDKB 
diff --git a/gas/testsuite/gas/i386/x86-64-pbndkb-intel.d b/gas/testsuite/gas/i386/x86-64-pbndkb-intel.d
new file mode 100644
index 00000000000..0ac10b2d1ba
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-pbndkb-intel.d
@@ -0,0 +1,12 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 PBNDKB insns (Intel disassembly)
+#source: x86-64-pbndkb.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*0f 01 c7\s+pbndkb
+\s*[a-f0-9]+:\s*0f 01 c7\s+pbndkb
diff --git a/gas/testsuite/gas/i386/x86-64-pbndkb.d b/gas/testsuite/gas/i386/x86-64-pbndkb.d
new file mode 100644
index 00000000000..cff4f509194
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-pbndkb.d
@@ -0,0 +1,12 @@
+#as:
+#objdump: -dw
+#name: x86_64 PBNDKB insns
+#source: x86-64-pbndkb.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*0f 01 c7\s+pbndkb
+\s*[a-f0-9]+:\s*0f 01 c7\s+pbndkb
diff --git a/gas/testsuite/gas/i386/x86-64-pbndkb.s b/gas/testsuite/gas/i386/x86-64-pbndkb.s
new file mode 100644
index 00000000000..defcce9cb08
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-pbndkb.s
@@ -0,0 +1,9 @@
+# Check 64bit PBNDKB instructions
+
+	.allow_index_reg
+	.text
+_start:
+	pbndkb		 #PBNDKB 
+
+.intel_syntax noprefix
+	pbndkb		 #PBNDKB 
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 36bde0ac372..5e329147ffd 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -446,6 +446,8 @@ run_dump_test "x86-64-sm3"
 run_dump_test "x86-64-sm3-intel"
 run_dump_test "x86-64-sm4"
 run_dump_test "x86-64-sm4-intel"
+run_dump_test "x86-64-pbndkb"
+run_dump_test "x86-64-pbndkb-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e21ad7ae005..c28522d4bcc 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -924,6 +924,7 @@ enum
   PREFIX_90 = 0,
   PREFIX_0F00_REG_6_X86_64,
   PREFIX_0F01_REG_0_MOD_3_RM_6,
+  PREFIX_0F01_REG_0_MOD_3_RM_7,
   PREFIX_0F01_REG_1_RM_2,
   PREFIX_0F01_REG_1_RM_4,
   PREFIX_0F01_REG_1_RM_5,
@@ -1198,6 +1199,7 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_0_MOD_3_RM_6_P_1,
   X86_64_0F01_REG_0_MOD_3_RM_6_P_3,
+  X86_64_0F01_REG_0_MOD_3_RM_7_P_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_1_RM_2_PREFIX_1,
   X86_64_0F01_REG_1_RM_2_PREFIX_3,
@@ -2910,6 +2912,11 @@ static const struct dis386 prefix_table[][4] = {
     { X86_64_TABLE (X86_64_0F01_REG_0_MOD_3_RM_6_P_3) },
   },
 
+  /* PREFIX_0F01_REG_0_MOD_3_RM_7 */
+  {
+    { X86_64_TABLE (X86_64_0F01_REG_0_MOD_3_RM_7_P_0) },
+  },
+
   /* PREFIX_0F01_REG_1_RM_2 */
   {
     { "clac",		{ Skip_MODRM }, 0 },
@@ -4194,6 +4201,12 @@ static const struct dis386 x86_64_table[][2] = {
     { "rdmsrlist",	{ Skip_MODRM }, 0 },
   },
 
+  /* X86_64_0F01_REG_0_MOD_3_RM_7_P_0 */
+  {
+    { Bad_Opcode },
+    { "pbndkb",		{ Skip_MODRM }, 0 },
+  },
+
   /* X86_64_0F01_REG_1 */
   {
     { "sidt{Q|Q}", { M }, 0 },
@@ -8190,6 +8203,7 @@ static const struct dis386 rm_table[][8] = {
     { "vmxoff",		{ Skip_MODRM }, 0 },
     { "pconfig",	{ Skip_MODRM }, 0 },
     { PREFIX_TABLE (PREFIX_0F01_REG_0_MOD_3_RM_6) },
+    { PREFIX_TABLE (PREFIX_0F01_REG_0_MOD_3_RM_7) },
   },
   {
     /* RM_0F01_REG_1 */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 24a9943da47..17f17d38d35 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -378,6 +378,7 @@ static bitfield cpu_flags[] =
   BITFIELD (SHA512),
   BITFIELD (SM3),
   BITFIELD (SM4),
+  BITFIELD (PBNDKB),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index fcbdb4c5f8d..62067dde5a6 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -241,6 +241,8 @@ enum
   CpuSM3,
   /* Intel SM4 Instructions support required.  */
   CpuSM4,
+  /* Intel PBNDKB Instructions support required.  */
+  CpuPBNDKB,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -442,6 +444,7 @@ typedef union i386_cpu_flags
       unsigned int cpusha512:1;
       unsigned int cpusm3:1;
       unsigned int cpusm4:1;
+      unsigned int cpupbndkb:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index eca60b467f2..6bd0f505118 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3397,3 +3397,9 @@ vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf,
 vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // SM4 instructions end.
+
+// PBNDKB instruction.
+
+pbndkb, 0x0f01c7, PBNDKB|x64, NoSuf, {}
+
+// PBNDKB instruction end.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] Support Intel AVX-VNNI-INT16
  2023-07-13  6:32 ` [PATCH 1/5] Support Intel AVX-VNNI-INT16 Haochen Jiang
@ 2023-07-13  9:29   ` Jan Beulich
  2023-07-14  5:51     ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-13  9:29 UTC (permalink / raw)
  To: Haochen Jiang, konglin1; +Cc: hjl.tools, amodra, binutils

On 13.07.2023 08:32, Haochen Jiang wrote:
> @@ -3909,7 +3913,21 @@ static const struct dis386 prefix_table[][4] = {
>      { "vbcstnebf162ps", { XM, Mw }, 0 },
>      { "vbcstnesh2ps", { XM, Mw }, 0 },
>    },
> - 
> +  
> +  /* PREFIX_VEX_0F38D2 */

This and ...

> +  {
> +    { "vpdpwuud",	{ XM, Vex, EXx }, 0 },
> +    { "vpdpwsud",	{ XM, Vex, EXx }, 0 },
> +    { "vpdpwusd",	{ XM, Vex, EXx }, 0 },
> +  },
> +
> +  /* PREFIX_VEX_0F38D3 */

... this comment want to mention the correct enumerator names.

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -166,6 +166,8 @@ static const dependency isa_dependencies[] =
>      "AVX2" },
>    { "FRED",
>      "LKGS" },
> +  { "AVX_VNNI_INT16",
> +    "AVX2" },

Can this please be moved up ahead of FRED, perhaps immediately after
AVX_VNNI_INT8?

> @@ -366,6 +368,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (RAO_INT),
>    BITFIELD (FRED),
>    BITFIELD (LKGS),
> +  BITFIELD (AVX_VNNI_INT16),

While not as relevant here, moving up would be nice in this case as well.

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -233,6 +233,8 @@ enum
>    CpuFRED,
>    /* lkgs instruction required */
>    CpuLKGS,
> +  /* Intel AVX VNNI-INT16 Instructions support required.  */
> +  CpuAVX_VNNI_INT16,
>    /* mwaitx instruction required */
>    CpuMWAITX,
>    /* Clzero instruction required */
> @@ -430,6 +432,7 @@ typedef union i386_cpu_flags
>        unsigned int cpurao_int:1;
>        unsigned int cpufred:1;
>        unsigned int cpulkgs:1;
> +      unsigned int cpuavx_vnni_int16:1;
>        unsigned int cpumwaitx:1;
>        unsigned int cpuclzero:1;
>        unsigned int cpuospke:1;

Adjustments to this file may then also be needed.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3364,3 +3364,14 @@ erets, 0xf20f01ca, FRED|x64, NoSuf, {}
>  eretu, 0xf30f01ca, FRED|x64, NoSuf, {}
>  
>  // FRED instructions end.
> +
> +// AVX_VNNI_INT16 instructions.
> +
> +vpdpwuud, 0xd2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpwuuds, 0xd3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpwusd, 0x66d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpwusds, 0x66d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +
> +// AVX_VNNI_INT16 instructions end.

While purely cosmetic here, I think it is a bad habit to always add to
the bottom of the file. Not only does this result in related entries
sometimes being far apart, but it also increases the risk of conflicts
between patches. Therefore I'd like to ask to put this next to
AVX-VNNI-INT8 as well (and note the dashes used there, which you will
want to use here as well).

Okay with all of these adjustments. In case you disagree with any of
the requests, please submit a v2.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] Support Intel SHA512
  2023-07-13  6:33 ` [PATCH 2/5] Support Intel SHA512 Haochen Jiang
@ 2023-07-13 10:02   ` Jan Beulich
  2023-07-14  3:40     ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-13 10:02 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, amodra, binutils

Up-front question on title and naming in the patch: Doc indeed says just
SHA512 (same for SM3 and SM4), but are you (including those who
assigned those names) sure that's going to stay this way by the time
this is merged into the SDM? Considering other ISA names, AVX-SHA512
would seem more consistent to me.

On 13.07.2023 08:33, Haochen Jiang wrote:
> In SHA512 patch, I have considered to eliminate the ModR/M table pass
> for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
> Uxmm.
> 
> However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
> for both instructions, they are VEX256. Therefore, I still keep the
> ModR/M table pass in the patch.

I guess I don't (fully) understand. Uxmm and xmm_mode aren't well suited
here anyway. What's wrong with introducing

#define Rxmmq { OP_R, xmmq_mode }

(or Uxmmq) and using it there, rejecting VEX.L==0 just like VEX.L==1 is
rejected for xmm_mode?

> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -498,6 +498,8 @@ if [gas_32_check] then {
>      run_list_test "amx-complex-inval"
>      run_dump_test "avx-vnni-int16"
>      run_dump_test "avx-vnni-int16-intel"
> +    run_dump_test "sha512"
> +    run_dump_test "sha512-intel"

Perhaps worth having further tests proving that both assembler and
disassembler correctly deal with (invalid) memory operands / encodings?
(The disassembler part may not need to be a separate test; I think we
already have one which could be extended: disassem.[sd] and its 64-bit
counterpart.)

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
>      "LKGS" },
>    { "AVX_VNNI_INT16",
>      "AVX2" },
> +  { "SHA512",
> +    "AVX" },

Like for the earlier patch this wants to move up a little. I also
question that it's AVX that's the baseline feature here. While correct
for SM3, I expect it needs to be AVX2 both here and for SM4, for AVX
offering no real 256-bit integer operations. (Obviously this wants
taking care of in the doc as well.)

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3375,3 +3375,11 @@ vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperand
>  vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
>  
>  // AVX_VNNI_INT16 instructions end.
> +
> +// SHA512 instructions.
> +
> +vsha512rnds2, 0xf2cb, SHA512, Vex256|Space0F38|Modrm|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
> +vsha512msg1, 0xf2cc, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegXMM, RegYMM }
> +vsha512msg2, 0xf2cd, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegYMM, RegYMM }

Can we please stick to Modrm coming first?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] Support Intel SM3
  2023-07-13  6:33 ` [PATCH 3/5] Support Intel SM3 Haochen Jiang
@ 2023-07-13 10:20   ` Jan Beulich
  2023-07-18  8:09     ` [PATCH v2] " Haochen Jiang
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-13 10:20 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, amodra, binutils

On 13.07.2023 08:33, Haochen Jiang wrote:
> gas/ChangeLog:
> 
> 	* NEWS: Support Intel SM3.
> 	* config/tc-i386.c: Add sm3.
> 	* doc/c-i386.texi: Document .sm3 and nosm3.
> 	* testsuite/gas/i386/i386.exp: Run sm3 tests.
> 	* testsuite/gas/i386/x86-64.exp: Ditto.
> 	* testsuite/gas/i386/sm3-intel.d: New test.
> 	* testsuite/gas/i386/sm3.d: Ditto.
> 	* testsuite/gas/i386/sm3.s: Ditto.
> 	* testsuite/gas/i386/x86-64-sm3-intel.d: Ditto.
> 	* testsuite/gas/i386/x86-64-sm3.d: Ditto.
> 	* testsuite/gas/i386/x86-64-sm3.s: Ditto.
> 
> opcodes/ChangeLog:
> 
> 	* i386-dis.c (PREFIX_VEX_0F38DA_W_0_L_0): New.
> 	(VEX_LEN_0F38DA_W_0): Ditto.
> 	(VEX_LEN_0F3ADE_W_0): Ditto.
> 	(VEX_W_0F38DA): Ditto.
> 	(VEX_W_0F3ADE): Ditto.
> 	(prefix_table): Add PREFIX_VEX_0F38DA_W_0_L_0.
> 	(vex_len_table): Add VEX_LEN_0F38DA_W_0, VEX_LEN_0F3ADE_W_0.
> 	(vex_w_table): Add VEX_W_0F38DA, VEX_W_0F3ADE.
> 	* i386-gen.c (isa_dependencies): Add SM3.
> 	(cpu_flags): Ditto.
> 	* i386-init.h: Regenerated.
> 	* i386-mnem.h: Ditto.
> 	* i386-opc.h (CpuSM3): New.
> 	(i386_cpu_flags): Add cpusm3.
> 	* i386-opc.tbl: Add SM3 instructions.
> 	* i386-tbl.h: Regenerated.

Looks okay to me, assuming it'll follow suit in moving some of the
entries that you're adding, once re-based on top of the adjusted
earlier patches.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] Support Intel SM4
  2023-07-13  6:33 ` [PATCH 4/5] Support Intel SM4 Haochen Jiang
@ 2023-07-13 10:25   ` Jan Beulich
  2023-07-18  7:21     ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-13 10:25 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, amodra, binutils

On 13.07.2023 08:33, Haochen Jiang wrote:
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -1070,7 +1070,7 @@ enum
>    PREFIX_VEX_0F38CB,
>    PREFIX_VEX_0F38CC,
>    PREFIX_VEX_0F38CD,
> -  PREFIX_VEX_0F38DA_W_0_L_0,
> +  PREFIX_VEX_0F38DA_W_0,
>    PREFIX_VEX_0F38F5_L_0,
>    PREFIX_VEX_0F38F6_L_0,
>    PREFIX_VEX_0F38F7_L_0,
> @@ -1316,7 +1316,8 @@ enum
>    VEX_LEN_0F38CB_P_3_W_0,
>    VEX_LEN_0F38CC_P_3_W_0,
>    VEX_LEN_0F38CD_P_3_W_0,
> -  VEX_LEN_0F38DA_W_0,
> +  VEX_LEN_0F38DA_W_0_P_0,
> +  VEX_LEN_0F38DA_W_0_P_2,
>    VEX_LEN_0F38DB,
>    VEX_LEN_0F38F2,
>    VEX_LEN_0F38F3,
> @@ -3969,11 +3970,12 @@ static const struct dis386 prefix_table[][4] = {
>      { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
>    },
>  
> -  /* PREFIX_VEX_0F38DA_W_0_L_0 */
> +  /* PREFIX_VEX_0F38DA_W_0 */
>    {
> -    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
> -    { Bad_Opcode },
> -    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_0) },
> +    { "vsm4key4", { XM, Vex, EXx }, 0 },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_2) },
> +    { "vsm4rnds4", { XM, Vex, EXx }, 0 },
>    },
>  
>    /* PREFIX_VEX_0F38F5_L_0 */
> @@ -7010,9 +7012,14 @@ static const struct dis386 vex_len_table[][2] = {
>      { "vsha512msg2", { XM, Uymm }, 0 },
>    },
>  
> -  /* VEX_LEN_0F38DA_W_0 */
> +  /* VEX_LEN_0F38DA_W_0_P_0 */
> +  {
> +    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
> +  },
> +
> +  /* VEX_LEN_0F38DA_W_0_P_2 */
>    {
> -    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0_L_0) },
> +    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
>    },
>  
>    /* VEX_LEN_0F38DB */
> @@ -7716,7 +7723,7 @@ static const struct dis386 vex_w_table[][2] = {
>    },
>    {
>      /* VEX_W_0F38DA */
> -    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0) },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0) },
>    },
>    {
>      /* VEX_W_0F3A00_L_1 */

I think it would be nice if this patch didn't need to re-do what the
immediately preceding patch does. Can that earlier patch be adjusted
to the final intended decode order?

Some of the comments given for earlier patches also apply here, ftaod.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] Support Intel PBNDKB
  2023-07-13  6:33 ` [PATCH 5/5] Support Intel PBNDKB Haochen Jiang
@ 2023-07-13 10:29   ` Jan Beulich
  2023-07-14  7:15     ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-13 10:29 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, amodra, Hu, Lin1, binutils

On 13.07.2023 08:33, Haochen Jiang wrote:
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3397,3 +3397,9 @@ vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf,
>  vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
>  
>  // SM4 instructions end.
> +
> +// PBNDKB instruction.
> +
> +pbndkb, 0x0f01c7, PBNDKB|x64, NoSuf, {}
> +
> +// PBNDKB instruction end.

Aiui this is a sibling to PCONFIG. Can this addition please be put next
to it? Entries in i386-gen.c and i386-opc.h may also want to move in
the same spirit, but there I'm less fussed.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 2/5] Support Intel SHA512
  2023-07-13 10:02   ` Jan Beulich
@ 2023-07-14  3:40     ` Jiang, Haochen
  2023-07-14  7:12       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-14  3:40 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> Up-front question on title and naming in the patch: Doc indeed says just
> SHA512 (same for SM3 and SM4), but are you (including those who assigned
> those names) sure that's going to stay this way by the time this is merged
> into the SDM? Considering other ISA names, AVX-SHA512 would seem more
> consistent to me.

SHA512 is not an ISA under AVX set. So AVX-SHA512 is not used.

The actual meaning in SDM/ISE is that we need to check both AVX and SHA512
feature bit to use the instruction.

I could drop the imply in implementation and change to checking both ISA bit
set. But since it will use xmm/ymm register, in current implementation, we
choose to imply AVX for SHA512 for convenience.

Whether it should be AVX/AVX2 will be mentioned below.

> On 13.07.2023 08:33, Haochen Jiang wrote:
> > In SHA512 patch, I have considered to eliminate the ModR/M table pass
> > for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
> > Uxmm.
> >
> > However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
> > for both instructions, they are VEX256. Therefore, I still keep the
> > ModR/M table pass in the patch.
> 
> I guess I don't (fully) understand. Uxmm and xmm_mode aren't well suited
> here anyway. What's wrong with introducing
> 
> #define Rxmmq { OP_R, xmmq_mode }
> 
> (or Uxmmq) and using it there, rejecting VEX.L==0 just like VEX.L==1 is
> rejected for xmm_mode?

Since xmm_mode and xmmq_mode does same under VEX.L==1, it could
be used here. I will change to that.

> > --- a/gas/testsuite/gas/i386/i386.exp
> > +++ b/gas/testsuite/gas/i386/i386.exp
> > @@ -498,6 +498,8 @@ if [gas_32_check] then {
> >      run_list_test "amx-complex-inval"
> >      run_dump_test "avx-vnni-int16"
> >      run_dump_test "avx-vnni-int16-intel"
> > +    run_dump_test "sha512"
> > +    run_dump_test "sha512-intel"
> 
> Perhaps worth having further tests proving that both assembler and
> disassembler correctly deal with (invalid) memory operands / encodings?
> (The disassembler part may not need to be a separate test; I think we
> already have one which could be extended: disassem.[sd] and its 64-bit
> counterpart.)

I will try to add that in next version.

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
> >      "LKGS" },
> >    { "AVX_VNNI_INT16",
> >      "AVX2" },
> > +  { "SHA512",
> > +    "AVX" },
> 
> Like for the earlier patch this wants to move up a little. I also question that it's
> AVX that's the baseline feature here. While correct for SM3, I expect it needs
> to be AVX2 both here and for SM4, for AVX offering no real 256-bit integer
> operations. (Obviously this wants taking care of in the doc as well.)

You got a point here.

I will check with the design and HW team since it is actually AVX2 introduces the
256-bit integer operations to see if this is a misuse.

One reason I can think of using AVX only is that SHA512 and SM4 actually do not
need other integer operations to help with. It only needs VMOV, which is introduced
by AVX. So when hardware checking XSTATE, AVX is enough.

Thx,
Haochen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 1/5] Support Intel AVX-VNNI-INT16
  2023-07-13  9:29   ` Jan Beulich
@ 2023-07-14  5:51     ` Jiang, Haochen
  0 siblings, 0 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-14  5:51 UTC (permalink / raw)
  To: Beulich, Jan, Kong, Lingling; +Cc: hjl.tools, amodra, binutils

>
> Okay with all of these adjustments. In case you disagree with any of the
> requests, please submit a v2.

I have changed the patch according to the request. They are quite reasonable.

If there is no other comments, I will commit this patch next Tuesday.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] Support Intel SHA512
  2023-07-14  3:40     ` Jiang, Haochen
@ 2023-07-14  7:12       ` Jan Beulich
  2023-07-18  7:20         ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-14  7:12 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 14.07.2023 05:40, Jiang, Haochen wrote:
>> Up-front question on title and naming in the patch: Doc indeed says just
>> SHA512 (same for SM3 and SM4), but are you (including those who assigned
>> those names) sure that's going to stay this way by the time this is merged
>> into the SDM? Considering other ISA names, AVX-SHA512 would seem more
>> consistent to me.
> 
> SHA512 is not an ISA under AVX set.

I'm afraid I don't understand. How is it not? It uses YMM registers.
And conceivably there could be EVEX encodings of these (allowing the
full 32 register set to be used), which I'd then call AVX512-SHA512.

It's also not possible to potentially express the same thing in
legacy encodings (unlike e.g. GFNI). Even for SM3, where only 128-
bit operations are used, that's not possible, as the insns have 3
inputs (the destination is r/w).

> So AVX-SHA512 is not used.
> 
> The actual meaning in SDM/ISE is that we need to check both AVX and SHA512
> feature bit to use the instruction.
> 
> I could drop the imply in implementation and change to checking both ISA bit
> set. But since it will use xmm/ymm register, in current implementation, we
> choose to imply AVX for SHA512 for convenience.
> 
> Whether it should be AVX/AVX2 will be mentioned below.
>[...]
>>> --- a/opcodes/i386-gen.c
>>> +++ b/opcodes/i386-gen.c
>>> @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
>>>      "LKGS" },
>>>    { "AVX_VNNI_INT16",
>>>      "AVX2" },
>>> +  { "SHA512",
>>> +    "AVX" },
>>
>> Like for the earlier patch this wants to move up a little. I also question that it's
>> AVX that's the baseline feature here. While correct for SM3, I expect it needs
>> to be AVX2 both here and for SM4, for AVX offering no real 256-bit integer
>> operations. (Obviously this wants taking care of in the doc as well.)
> 
> You got a point here.
> 
> I will check with the design and HW team since it is actually AVX2 introduces the
> 256-bit integer operations to see if this is a misuse.
> 
> One reason I can think of using AVX only is that SHA512 and SM4 actually do not
> need other integer operations to help with. It only needs VMOV, which is introduced
> by AVX. So when hardware checking XSTATE, AVX is enough.

So for a feature check requirement referencing just AVX may be okay. But
there's not going to be any SHA512 without AVX anyway, for there not
being any YMM registers without AVX; you wouldn't be able to fill the
register operands. Hence the extra feature check is redundant (and would
hence better be omitted).

As to implying baseline functionality, using AVX (rather than AVX2) makes
little sense, so even if the feature check remained (note that various
other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
secondary requirement), I'd still be fairly insistent on having the
base feature named here (and for SM4) be AVX2 (to be in line with other
similar baseline selections).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 5/5] Support Intel PBNDKB
  2023-07-13 10:29   ` Jan Beulich
@ 2023-07-14  7:15     ` Jiang, Haochen
  0 siblings, 0 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-14  7:15 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, amodra, Hu, Lin1, binutils

> Aiui this is a sibling to PCONFIG. Can this addition please be put next to it?
> Entries in i386-gen.c and i386-opc.h may also want to move in the same spirit,
> but there I'm less fussed.

I have moved them next to PCONFIG. If there is no other comments, I will commit
this patch next Tuesday since there is nearly no dependency on previous patches.

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 2/5] Support Intel SHA512
  2023-07-14  7:12       ` Jan Beulich
@ 2023-07-18  7:20         ` Jiang, Haochen
  2023-07-18  7:54           ` [PATCH v2] " Haochen Jiang
  2023-07-18  8:11           ` [PATCH 2/5] " Jan Beulich
  0 siblings, 2 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-18  7:20 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> I'm afraid I don't understand. How is it not? It uses YMM registers.
> And conceivably there could be EVEX encodings of these (allowing the
> full 32 register set to be used), which I'd then call AVX512-SHA512.
> 
> It's also not possible to potentially express the same thing in
> legacy encodings (unlike e.g. GFNI). Even for SM3, where only 128-
> bit operations are used, that's not possible, as the insns have 3
> inputs (the destination is r/w).

I am actually expressing that to the same thing as GFNI although it does not
has legacy encoding.

Actually, we somehow want to show the evolution from previous SHA. I will
move the entry of them just after the SHA since they are both crypto related
ISAs.

> [...]
> So for a feature check requirement referencing just AVX may be okay. But
> there's not going to be any SHA512 without AVX anyway, for there not
> being any YMM registers without AVX; you wouldn't be able to fill the
> register operands. Hence the extra feature check is redundant (and would
> hence better be omitted).
> 
> As to implying baseline functionality, using AVX (rather than AVX2) makes
> little sense, so even if the feature check remained (note that various
> other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
> secondary requirement), I'd still be fairly insistent on having the
> base feature named here (and for SM4) be AVX2 (to be in line with other
> similar baseline selections).

I confirmed that AVX in doc here means a state of the whole AVX ISA,
which should include AVX and AVX2. 

I will change the imply of SHA512 and SM4 to AVX2 since it looks much more
reasonable.

Should we also change the imply of SM3 here?

Thx,
Haochen

> 
> Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 4/5] Support Intel SM4
  2023-07-13 10:25   ` Jan Beulich
@ 2023-07-18  7:21     ` Jiang, Haochen
  2023-07-18  8:13       ` [PATCH v2] " Haochen Jiang
  0 siblings, 1 reply; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-18  7:21 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, amodra, binutils

> I think it would be nice if this patch didn't need to re-do what the
> immediately preceding patch does. Can that earlier patch be adjusted
> to the final intended decode order?

I will change that in the coming v2 patch.

Haochen

> 
> Some of the comments given for earlier patches also apply here, ftaod.
> 
> Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2] Support Intel SHA512
  2023-07-18  7:20         ` Jiang, Haochen
@ 2023-07-18  7:54           ` Haochen Jiang
  2023-07-18  7:59             ` Jiang, Haochen
  2023-07-18  8:51             ` Jan Beulich
  2023-07-18  8:11           ` [PATCH 2/5] " Jan Beulich
  1 sibling, 2 replies; 31+ messages in thread
From: Haochen Jiang @ 2023-07-18  7:54 UTC (permalink / raw)
  To: binutils, jbeulich; +Cc: hjl.tools

Hi all,

This is the v2 patch for SHA512 with the following changes comparing to
the initial patch:

1. Added invalid test in disassem.[ds] and [x86-64-]sha512-inval.[ls].

2. Changed the imply of SHA512 from AVX to AVX2.

3. Moved the entry of SHA512 next to SHA. Put Modrm to the first in table.

4. Using Rxmmq instead of passing mod table. Also renamed Uymm to Rymm.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SHA512.
	* config/tc-i386.c: Add sha512.
	* doc/c-i386.texi: Document .sha512.
	* testsuite/gas/i386/disassem.d: Add SHA512 tests.
	* testsuite/gas/i386/disassem.s: Ditto.
	* testsuite/gas/i386/i386.exp: Run SHA512 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sha512-intel.d: New test.
	* testsuite/gas/i386/sha512-inval.l: Ditto.
	* testsuite/gas/i386/sha512-inval.s: Ditto.
	* testsuite/gas/i386/sha512.d: Ditto.
	* testsuite/gas/i386/sha512.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (Rxmmq): New.
	(Rymm): Ditto.
	(MOD_VEX_0F38CB_P_3_W_0_L_1): Ditto.
	(MOD_VEX_0F38CC_P_3_W_0_L_1): Ditto.
	(PREFIX_VEX_0F38CB): Ditto.
	(PREFIX_VEX_0F38CC): Ditto.
	(PREFIX_VEX_0F38CD): Ditto.
	(VEX_LEN_0F38CB_P_3_W_0): Ditto.
	(VEX_LEN_0F38CC_P_3_W_0): Ditto.
	(VEX_LEN_0F38CD_P_3_W_0): Ditto.
	(VEX_W_0F38CB_P_3): Ditto.
	(VEX_W_0F38CC_P_3): Ditto.
	(VEX_W_0F38CD_P_3): Ditto.
	(mod_table): Add MOD_VEX_0F38CB_P_3_W_0_L_1, MOD_VEX_0F38CC_P_3_W_0_L_1,
	(prefix_table): Add PREFIX_VEX_0F38CB, PREFIX_VEX_0F38CC,
	PREFIX_VEX_0F38CD.
	(vex_len_table): Add VEX_LEN_0F38CB_P_3_W_0,
	VEX_LEN_0F38CC_P_3_W_0, VEX_LEN_0F38CD_P_3_W_0.
	(vex_w_table): Add VEX_W_0F38CB_P_3, VEX_W_0F38CC_P_3, VEX_W_0F38CD_P_3.
	* i386-gen.c (isa_dependencies): Add SHA512.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSHA512): New.
	(i386_cpu_flags): Add cpusha512.
	* i386-opc.tbl: Add SHA512 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                     |    2 +
 gas/config/tc-i386.c                         |    1 +
 gas/doc/c-i386.texi                          |    3 +-
 gas/testsuite/gas/i386/disassem.d            |    6 +
 gas/testsuite/gas/i386/disassem.s            |    6 +
 gas/testsuite/gas/i386/i386.exp              |    3 +
 gas/testsuite/gas/i386/sha512-intel.d        |   16 +
 gas/testsuite/gas/i386/sha512-inval.l        |    4 +
 gas/testsuite/gas/i386/sha512-inval.s        |    8 +
 gas/testsuite/gas/i386/sha512.d              |   16 +
 gas/testsuite/gas/i386/sha512.s              |   13 +
 gas/testsuite/gas/i386/x86-64-sha512-intel.d |   16 +
 gas/testsuite/gas/i386/x86-64-sha512-inval.l |    4 +
 gas/testsuite/gas/i386/x86-64-sha512-inval.s |    8 +
 gas/testsuite/gas/i386/x86-64-sha512.d       |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.s       |   13 +
 gas/testsuite/gas/i386/x86-64.exp            |    3 +
 opcodes/i386-dis.c                           |   79 +-
 opcodes/i386-gen.c                           |    3 +
 opcodes/i386-init.h                          |  776 +-
 opcodes/i386-mnem.h                          | 3949 ++++----
 opcodes/i386-opc.h                           |    3 +
 opcodes/i386-opc.tbl                         |    8 +
 opcodes/i386-tbl.h                           | 9447 +++++++++---------
 24 files changed, 7351 insertions(+), 7052 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/sha512-inval.l
 create mode 100644 gas/testsuite/gas/i386/sha512-inval.s
 create mode 100644 gas/testsuite/gas/i386/sha512.d
 create mode 100644 gas/testsuite/gas/i386/sha512.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.s

diff --git a/gas/NEWS b/gas/NEWS
index 5e9ed5ab4bc..fe2c055fa7f 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SHA512 instructions.
+
 * Add support for Intel AVX-VNNI-INT16 instructions.
 
 Changes in 2.41:
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 0d3d7560efe..836640d9123 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1152,6 +1152,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (fred, FRED, ANY_FRED, false),
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
+  SUBARCH (sha512, SHA512, ANY_SHA512, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 40ba942d9cb..21fb71e54ab 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -208,6 +208,7 @@ accept various extension mnemonics.  For example,
 @code{fred},
 @code{lkgs},
 @code{avx_vnni_int16},
+@code{sha512},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1637,7 +1638,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/disassem.d b/gas/testsuite/gas/i386/disassem.d
index 8ee0a664e0b..f528d8ab169 100644
--- a/gas/testsuite/gas/i386/disassem.d
+++ b/gas/testsuite/gas/i386/disassem.d
@@ -345,6 +345,12 @@ Disassembly of section \.text:
 [ 	]*[a-f0-9]+:[ 	]*c4 e2 01 1c[ 	]*\(bad\)
 [ 	]*[a-f0-9]+:[ 	]*41[ 	]*inc[ 	]*%ecx
 [ 	]*[a-f0-9]+:[ 	]*37[ 	]*aaa
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7f cc[ 	]+vsha512msg1[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*09 90 90 90 90 90[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7f cd[ 	]+vsha512msg2[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*09 90 90 90 90 90[ 	]+or.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 6f cb[ 	]+vsha512rnds2[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*09 90 90 90 90 90[ 	]+or.*
 [ 	]*[a-f0-9]+:[ 	]*62 f2 ad 08 1c[ 	]*\(bad\)
 [ 	]*[a-f0-9]+:[ 	]*01 01[ 	]*add[ 	]*%eax,\(%ecx\)
 [ 	]*[a-f0-9]+:[ 	]*62 f3 7d 28 1b[ 	]*\(bad\)
diff --git a/gas/testsuite/gas/i386/disassem.s b/gas/testsuite/gas/i386/disassem.s
index c74a9353933..eeeb38974dd 100644
--- a/gas/testsuite/gas/i386/disassem.s
+++ b/gas/testsuite/gas/i386/disassem.s
@@ -168,6 +168,12 @@
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x6F
 	.insn VEX.L0.66.0f.W1 0x93, (%edi), %k7
 .byte 0xc4, 0xe2, 0x1, 0x1c, 0x41, 0x37
+	.insn VEX.L1.F2.0f38.W0 0xCC, (%ecx), %ymm1
+.fill 0x5, 0x1, 0x90
+	.insn VEX.L1.F2.0f38.W0 0xCD, (%ecx), %ymm1
+.fill 0x5, 0x1, 0x90
+	.insn VEX.L1.F2.0f38.W0 0xCB, (%ecx), %ymm2, %ymm1
+.fill 0x5, 0x1, 0x90
 .byte 0x62, 0xf2, 0xad, 0x08, 0x1c, 0x01
 .byte 0x1
 	.insn EVEX.66.0f3a.W0 0x1b, $0x25, %ymm0, %xmm1
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index b69c692cd16..1208d5372d7 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -498,6 +498,9 @@ if [gas_32_check] then {
     run_list_test "amx-complex-inval"
     run_dump_test "avx-vnni-int16"
     run_dump_test "avx-vnni-int16-intel"
+    run_dump_test "sha512"
+    run_dump_test "sha512-intel"
+    run_list_test "sha512-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sha512-intel.d b/gas/testsuite/gas/i386/sha512-intel.d
new file mode 100644
index 00000000000..c1cc85b9f26
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SHA512 insns (Intel disassembly)
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/sha512-inval.l b/gas/testsuite/gas/i386/sha512-inval.l
new file mode 100644
index 00000000000..6d9455fd741
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-inval.l
@@ -0,0 +1,4 @@
+.* Assembler messages:
+.*:6: Error: operand size mismatch for `vsha512msg1'
+.*:7: Error: operand size mismatch for `vsha512msg2'
+.*:8: Error: operand size mismatch for `vsha512rnds2'
diff --git a/gas/testsuite/gas/i386/sha512-inval.s b/gas/testsuite/gas/i386/sha512-inval.s
new file mode 100644
index 00000000000..d3ae819c563
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-inval.s
@@ -0,0 +1,8 @@
+# Check Illegal SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	(%ecx), %ymm6
+	vsha512msg2	(%ecx), %ymm6
+	vsha512rnds2	(%ecx), %ymm5, %ymm6
diff --git a/gas/testsuite/gas/i386/sha512.d b/gas/testsuite/gas/i386/sha512.d
new file mode 100644
index 00000000000..b90019954ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: i386 SHA512 insns
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/sha512.s b/gas/testsuite/gas/i386/sha512.s
new file mode 100644
index 00000000000..e238c272970
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.s
@@ -0,0 +1,13 @@
+# Check 32bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-intel.d b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
new file mode 100644
index 00000000000..e644168e311
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SHA512 insns (Intel disassembly)
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-inval.l b/gas/testsuite/gas/i386/x86-64-sha512-inval.l
new file mode 100644
index 00000000000..6d9455fd741
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-inval.l
@@ -0,0 +1,4 @@
+.* Assembler messages:
+.*:6: Error: operand size mismatch for `vsha512msg1'
+.*:7: Error: operand size mismatch for `vsha512msg2'
+.*:8: Error: operand size mismatch for `vsha512rnds2'
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-inval.s b/gas/testsuite/gas/i386/x86-64-sha512-inval.s
new file mode 100644
index 00000000000..d3ae819c563
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-inval.s
@@ -0,0 +1,8 @@
+# Check Illegal SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	(%ecx), %ymm6
+	vsha512msg2	(%ecx), %ymm6
+	vsha512rnds2	(%ecx), %ymm5, %ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.d b/gas/testsuite/gas/i386/x86-64-sha512.d
new file mode 100644
index 00000000000..fcb8ae61fee
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: x86_64 SHA512 insns
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.s b/gas/testsuite/gas/i386/x86-64-sha512.s
new file mode 100644
index 00000000000..5eaadb3bade
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.s
@@ -0,0 +1,13 @@
+# Check 64bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 0f2903c6185..c6ec9be3d43 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -440,6 +440,9 @@ run_dump_test "x86-64-lkgs"
 run_list_test "x86-64-lkgs-inval"
 run_dump_test "x86-64-avx-vnni-int16"
 run_dump_test "x86-64-avx-vnni-int16-intel"
+run_dump_test "x86-64-sha512"
+run_dump_test "x86-64-sha512-intel"
+run_list_test "x86-64-sha512-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 36a839d1652..0043b62f324 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -530,6 +530,8 @@ fetch_error (const instr_info *ins)
 #define Nq { OP_R, q_mode }
 #define Ux { OP_R, x_mode }
 #define Uxmm { OP_R, xmm_mode }
+#define Rxmmq { OP_R, xmmq_mode }
+#define Rymm { OP_R, ymm_mode }
 #define Rtmm { OP_R, tmm_mode }
 #define EMCq { OP_EMC, q_mode }
 #define MXC { OP_MXC, 0 }
@@ -1064,6 +1066,9 @@ enum
   PREFIX_VEX_0F38B1_W_0,
   PREFIX_VEX_0F38D2_W_0,
   PREFIX_VEX_0F38D3_W_0,
+  PREFIX_VEX_0F38CB,
+  PREFIX_VEX_0F38CC,
+  PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1306,6 +1311,9 @@ enum
   VEX_LEN_0F385C_X86_64,
   VEX_LEN_0F385E_X86_64,
   VEX_LEN_0F386C_X86_64,
+  VEX_LEN_0F38CB_P_3_W_0,
+  VEX_LEN_0F38CC_P_3_W_0,
+  VEX_LEN_0F38CD_P_3_W_0,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1473,6 +1481,9 @@ enum
   VEX_W_0F38B1,
   VEX_W_0F38B4,
   VEX_W_0F38B5,
+  VEX_W_0F38CB_P_3,
+  VEX_W_0F38CC_P_3,
+  VEX_W_0F38CD_P_3,
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
@@ -3928,6 +3939,30 @@ static const struct dis386 prefix_table[][4] = {
     { "vpdpwusds",	{ XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38CB */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CB_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CC */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CC_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CD */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6380,9 +6415,9 @@ static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CB) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CC) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CD) },
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F38CF) },
     /* d0 */
@@ -6944,6 +6979,24 @@ static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F386C_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F38CB_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512rnds2", { XM, Vex, Rxmmq }, 0 },
+  },
+
+  /* VEX_LEN_0F38CC_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg1", { XM, Rxmmq }, 0 },
+  },
+
+  /* VEX_LEN_0F38CD_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg2", { XM, Rymm }, 0 },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7614,6 +7667,18 @@ static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XVvpmadd52huq",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F38CB_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CB_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CC_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CC_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CD_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CD_P_3_W_0) },
+  },
   {
     /* VEX_W_0F38CF */
     { "%XEvgf2p8mulb", { XM, Vex, EXx }, PREFIX_DATA },
@@ -8055,6 +8120,14 @@ static const struct dis386 mod_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0) },
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1) },
   },
+  {
+    /* MOD_VEX_0F38CB_P_3_W_0_L_1 */
+    { Bad_Opcode },
+  },
+  {
+    /* MOD_VEX_0F38CC_P_3_W_0_L_1 */
+    { Bad_Opcode },
+  },
 
 #include "i386-dis-evex-mod.h"
 };
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 6ad7d6951db..70843eb251f 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -214,6 +214,8 @@ static const dependency isa_dependencies[] =
     "XSAVE" },
   { "SHA",
     "SSE2" },
+  { "SHA512",
+    "AVX2" },
   { "XSAVES",
     "XSAVEC" },
   { "XSAVEC",
@@ -369,6 +371,7 @@ static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (SHA512),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index f9a68b4c513..b3359e47aa6 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -175,6 +175,8 @@ enum
   CpuSMAP,
   /* SHA instructions required.  */
   CpuSHA,
+  /* SHA512 instructions required.  */
+  CpuSHA512,
   /* CLFLUSHOPT instruction required */
   CpuClflushOpt,
   /* XSAVES/XRSTORS instruction required */
@@ -403,6 +405,7 @@ typedef union i386_cpu_flags
       unsigned int cpuprfchw:1;
       unsigned int cpusmap:1;
       unsigned int cpusha:1;
+      unsigned int cpusha512:1;
       unsigned int cpuclflushopt:1;
       unsigned int cpuxsaves:1;
       unsigned int cpuxsavec:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index f62e5280982..c9a5730f90a 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -2043,6 +2043,14 @@ sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
+// SHA512 instructions.
+
+vsha512rnds2, 0xf2cb, SHA512, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
+vsha512msg1, 0xf2cc, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegXMM, RegYMM }
+vsha512msg2, 0xf2cd, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegYMM, RegYMM }
+
+// SHA512 instructions end.
+
 // VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2] Support Intel SHA512
  2023-07-18  7:54           ` [PATCH v2] " Haochen Jiang
@ 2023-07-18  7:59             ` Jiang, Haochen
  2023-07-18  8:51             ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-18  7:59 UTC (permalink / raw)
  To: Jiang, Haochen, binutils, Beulich, Jan; +Cc: hjl.tools

> diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c index
> 6ad7d6951db..70843eb251f 100644
> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -214,6 +214,8 @@ static const dependency isa_dependencies[] =
>      "XSAVE" },
>    { "SHA",
>      "SSE2" },
> +  { "SHA512",
> +    "AVX2" },
>    { "XSAVES",
>      "XSAVEC" },
>    { "XSAVEC",
> @@ -369,6 +371,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (RAO_INT),
>    BITFIELD (FRED),
>    BITFIELD (LKGS),
> +  BITFIELD (SHA512),

Oops, I sent an outdated patch here, it should have also been moved next to SHA.

You could see that in SM3/SM4 patches.

Thx,
Haochen

>    BITFIELD (MWAITX),
>    BITFIELD (CLZERO),
>    BITFIELD (OSPKE),


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2] Support Intel SM3
  2023-07-13 10:20   ` Jan Beulich
@ 2023-07-18  8:09     ` Haochen Jiang
  2023-07-18  9:03       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-18  8:09 UTC (permalink / raw)
  To: binutils, jbeulich; +Cc: hjl.tools

Hi all,

The v2 patch in SM3 has the following changes:

1. Moved the entries next to SHA/SHA512.

2. Adjust the table in i386-dis.c to avoid re-do in SM4 patch.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SM3.
	* config/tc-i386.c: Add sm3.
	* doc/c-i386.texi: Document .sm3.
	* testsuite/gas/i386/i386.exp: Run sm3 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sm3-intel.d: New test.
	* testsuite/gas/i386/sm3.d: Ditto.
	* testsuite/gas/i386/sm3.s: Ditto.
	* testsuite/gas/i386/x86-64-sm3-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sm3.d: Ditto.
	* testsuite/gas/i386/x86-64-sm3.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (PREFIX_VEX_0F38DA_W_0): New.
	(VEX_LEN_0F38DA_W_0_P_0): Ditto.
	(VEX_LEN_0F38DA_W_0_P_2): Ditto.
	(VEX_LEN_0F3ADE_W_0): Ditto.
	(VEX_W_0F38DA): Ditto.
	(VEX_W_0F3ADE): Ditto.
	(prefix_table): Add PREFIX_VEX_0F38DA_W_0.
	(vex_len_table): Add VEX_LEN_0F38DA_W_0_P_0,
	VEX_LEN_0F38DA_W_0_P_2, VEX_LEN_0F3ADE_W_0.
	(vex_w_table): Add VEX_W_0F38DA, VEX_W_0F3ADE.
	* i386-gen.c (isa_dependencies): Add SM3.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSM3): New.
	(i386_cpu_flags): Add cpusm3.
	* i386-opc.tbl: Add SM3 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                  |    2 +
 gas/config/tc-i386.c                      |    1 +
 gas/doc/c-i386.texi                       |    3 +-
 gas/testsuite/gas/i386/i386.exp           |    2 +
 gas/testsuite/gas/i386/sm3-intel.d        |   40 +
 gas/testsuite/gas/i386/sm3.d              |   40 +
 gas/testsuite/gas/i386/sm3.s              |   37 +
 gas/testsuite/gas/i386/x86-64-sm3-intel.d |   40 +
 gas/testsuite/gas/i386/x86-64-sm3.d       |   40 +
 gas/testsuite/gas/i386/x86-64-sm3.s       |   37 +
 gas/testsuite/gas/i386/x86-64.exp         |    2 +
 opcodes/i386-dis.c                        |   40 +-
 opcodes/i386-gen.c                        |    3 +
 opcodes/i386-init.h                       |  780 +-
 opcodes/i386-mnem.h                       | 3953 ++++-----
 opcodes/i386-opc.h                        |    3 +
 opcodes/i386-opc.tbl                      |    7 +
 opcodes/i386-tbl.h                        | 9293 +++++++++++----------
 18 files changed, 7348 insertions(+), 6975 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sm3-intel.d
 create mode 100644 gas/testsuite/gas/i386/sm3.d
 create mode 100644 gas/testsuite/gas/i386/sm3.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm3.s

diff --git a/gas/NEWS b/gas/NEWS
index fe2c055fa7f..42bda657f21 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SM3 instructions.
+
 * Add support for Intel SHA512 instructions.
 
 * Add support for Intel AVX-VNNI-INT16 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 836640d9123..7424fa41c44 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1153,6 +1153,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
   SUBARCH (sha512, SHA512, ANY_SHA512, false),
+  SUBARCH (sm3, SM3, ANY_SM3, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 21fb71e54ab..6ef1da21370 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -209,6 +209,7 @@ accept various extension mnemonics.  For example,
 @code{lkgs},
 @code{avx_vnni_int16},
 @code{sha512},
+@code{sm3},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1638,7 +1639,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 1208d5372d7..2fcd3be1f98 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -501,6 +501,8 @@ if [gas_32_check] then {
     run_dump_test "sha512"
     run_dump_test "sha512-intel"
     run_list_test "sha512-inval"
+    run_dump_test "sm3"
+    run_dump_test "sm3-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sm3-intel.d b/gas/testsuite/gas/i386/sm3-intel.d
new file mode 100644
index 00000000000..4ab4ce2ddb4
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3-intel.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SM3 insns (Intel disassembly)
+#source: sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\],0x7b
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[edx-0x800\],0x7b
diff --git a/gas/testsuite/gas/i386/sm3.d b/gas/testsuite/gas/i386/sm3.d
new file mode 100644
index 00000000000..7507a8b4c7f
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw
+#name: i386 SM3 insns
+#source: sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b4 f4 00 00 00 10\s+vsm3msg1 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da 31\s+vsm3msg1 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b4 f4 00 00 00 10\s+vsm3msg2 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da 31\s+vsm3msg2 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b4 f4 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%edx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/sm3.s b/gas/testsuite/gas/i386/sm3.s
new file mode 100644
index 00000000000..d1bc967a6f3
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm3.s
@@ -0,0 +1,37 @@
+# Check 32bit SM3 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg1	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg1	(%ecx), %xmm5, %xmm6	 #SM3
+	vsm3msg1	2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg1	-2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg2	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg2	(%ecx), %xmm5, %xmm6	 #SM3
+	vsm3msg2	2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg2	-2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	$123, %xmm4, %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, (%ecx), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3rnds2	$123, -2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vsm3msg1	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [ecx]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [ecx]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [edx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	xmm6, xmm5, xmm4, 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [ecx], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [ecx+2032], 123	 #SM3 Disp32(f0070000)
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [edx-2048], 123	 #SM3 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/x86-64-sm3-intel.d b/gas/testsuite/gas/i386/x86-64-sm3-intel.d
new file mode 100644
index 00000000000..5b533681029
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3-intel.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SM3 insns (Intel disassembly)
+#source: x86-64-sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[r9\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\],0x7b
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 xmm6,xmm5,xmm4,0x7b
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\],0x7b
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[r9\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\],0x7b
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\],0x7b
diff --git a/gas/testsuite/gas/i386/x86-64-sm3.d b/gas/testsuite/gas/i386/x86-64-sm3.d
new file mode 100644
index 00000000000..8f417de4f7f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3.d
@@ -0,0 +1,40 @@
+#as:
+#objdump: -dw
+#name: x86_64 SM3 insns
+#source: x86-64-sm3.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da f4\s+vsm3msg1 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 50 da b4 f5 00 00 00 10\s+vsm3msg1 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 50 da 31\s+vsm3msg1 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b1 f0 07 00 00\s+vsm3msg1 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 50 da b2 00 f8 ff ff\s+vsm3msg1 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da f4\s+vsm3msg2 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 51 da b4 f5 00 00 00 10\s+vsm3msg2 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 51 da 31\s+vsm3msg2 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b1 f0 07 00 00\s+vsm3msg2 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 51 da b2 00 f8 ff ff\s+vsm3msg2 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de f4 7b\s+vsm3rnds2 \$0x7b,%xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a3 51 de b4 f5 00 00 00 10 7b\s+vsm3rnds2 \$0x7b,0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c3 51 de 31 7b\s+vsm3rnds2 \$0x7b,\(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b1 f0 07 00 00 7b\s+vsm3rnds2 \$0x7b,0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e3 51 de b2 00 f8 ff ff 7b\s+vsm3rnds2 \$0x7b,-0x800\(%rdx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-sm3.s b/gas/testsuite/gas/i386/x86-64-sm3.s
new file mode 100644
index 00000000000..fa80b4b15a8
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm3.s
@@ -0,0 +1,37 @@
+# Check 64bit SM3 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg1	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg1	(%r9), %xmm5, %xmm6	 #SM3
+	vsm3msg1	2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg1	-2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	%xmm4, %xmm5, %xmm6	 #SM3
+	vsm3msg2	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3msg2	(%r9), %xmm5, %xmm6	 #SM3
+	vsm3msg2	2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3msg2	-2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	$123, %xmm4, %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 0x10000000(%rbp, %r14, 8), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, (%r9), %xmm5, %xmm6	 #SM3
+	vsm3rnds2	$123, 2032(%rcx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
+	vsm3rnds2	$123, -2048(%rdx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)
+
+.intel_syntax noprefix
+	vsm3msg1	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [r9]	 #SM3
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg1	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3msg2	xmm6, xmm5, xmm4	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [r9]	 #SM3
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rcx+2032]	 #SM3 Disp32(f0070000)
+	vsm3msg2	xmm6, xmm5, XMMWORD PTR [rdx-2048]	 #SM3 Disp32(00f8ffff)
+	vsm3rnds2	xmm6, xmm5, xmm4, 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [r9], 123	 #SM3
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rcx+2032], 123	 #SM3 Disp32(f0070000)
+	vsm3rnds2	xmm6, xmm5, XMMWORD PTR [rdx-2048], 123	 #SM3 Disp32(00f8ffff)
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index c6ec9be3d43..d31bb40b32b 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -443,6 +443,8 @@ run_dump_test "x86-64-avx-vnni-int16-intel"
 run_dump_test "x86-64-sha512"
 run_dump_test "x86-64-sha512-intel"
 run_list_test "x86-64-sha512-inval"
+run_dump_test "x86-64-sm3"
+run_dump_test "x86-64-sm3-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 0043b62f324..006e38a16a9 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1069,6 +1069,7 @@ enum
   PREFIX_VEX_0F38CB,
   PREFIX_VEX_0F38CC,
   PREFIX_VEX_0F38CD,
+  PREFIX_VEX_0F38DA_W_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1314,6 +1315,8 @@ enum
   VEX_LEN_0F38CB_P_3_W_0,
   VEX_LEN_0F38CC_P_3_W_0,
   VEX_LEN_0F38CD_P_3_W_0,
+  VEX_LEN_0F38DA_W_0_P_0,
+  VEX_LEN_0F38DA_W_0_P_2,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1344,6 +1347,7 @@ enum
   VEX_LEN_0F3A61,
   VEX_LEN_0F3A62,
   VEX_LEN_0F3A63,
+  VEX_LEN_0F3ADE_W_0,
   VEX_LEN_0F3ADF,
   VEX_LEN_0F3AF0,
   VEX_LEN_XOP_08_85,
@@ -1487,6 +1491,7 @@ enum
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
+  VEX_W_0F38DA,
   VEX_W_0F3A00_L_1,
   VEX_W_0F3A01_L_1,
   VEX_W_0F3A02,
@@ -1504,6 +1509,7 @@ enum
   VEX_W_0F3A4C,
   VEX_W_0F3ACE,
   VEX_W_0F3ACF,
+  VEX_W_0F3ADE,
 
   VEX_W_XOP_08_85_L_0,
   VEX_W_XOP_08_86_L_0,
@@ -3963,6 +3969,13 @@ static const struct dis386 prefix_table[][4] = {
     { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
   },
 
+  /* PREFIX_VEX_0F38DA_W_0 */
+  {
+    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_0) },
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_2) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6432,7 +6445,7 @@ static const struct dis386 vex_table[][256] = {
     /* d8 */
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38DA) },
     { VEX_LEN_TABLE (VEX_LEN_0F38DB) },
     { "vaesenc",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "vaesenclast",	{ XM, Vex, EXx }, PREFIX_DATA },
@@ -6727,7 +6740,7 @@ static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3ADE) },
     { VEX_LEN_TABLE (VEX_LEN_0F3ADF) },
     /* e0 */
     { Bad_Opcode },
@@ -6997,6 +7010,16 @@ static const struct dis386 vex_len_table[][2] = {
     { "vsha512msg2", { XM, Rymm }, 0 },
   },
 
+  /* VEX_LEN_0F38DA_W_0_P_0 */
+  {
+    { "vsm3msg1", { XM, Vex, EXxmm }, 0 },
+  },
+
+  /* VEX_LEN_0F38DA_W_0_P_2 */
+  {
+    { "vsm3msg2", { XM, Vex, EXxmm }, 0 },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7155,6 +7178,11 @@ static const struct dis386 vex_len_table[][2] = {
     { "vpcmpistri",	{ XM, EXx, Ib }, PREFIX_DATA },
   },
 
+  /* VEX_LEN_0F3ADE_W_0 */
+  {
+    { "vsm3rnds2", { XM, Vex, EXxmm, Ib }, PREFIX_DATA },
+  },
+
   /* VEX_LEN_0F3ADF */
   {
     { "vaeskeygenassist", { XM, EXx, Ib }, PREFIX_DATA },
@@ -7691,6 +7719,10 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F38D3 */
     { PREFIX_TABLE (PREFIX_VEX_0F38D3_W_0) },
   },
+  {
+    /* VEX_W_0F38DA */
+    { PREFIX_TABLE (PREFIX_VEX_0F38DA_W_0) },
+  },
   {
     /* VEX_W_0F3A00_L_1 */
     { Bad_Opcode },
@@ -7763,6 +7795,10 @@ static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XEvgf2p8affineinvqb",  { XM, Vex, EXx, Ib }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F3ADE */
+    { VEX_LEN_TABLE (VEX_LEN_0F3ADE_W_0) },
+  },
   /* VEX_W_XOP_08_85_L_0 */
   {
     { "vpmacssww", 	{ XM, Vex, EXx, XMVexI4 }, 0 },
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index e2528932d84..11af743ffe7 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -216,6 +216,8 @@ static const dependency isa_dependencies[] =
     "SSE2" },
   { "SHA512",
     "AVX2" },
+  { "SM3",
+    "AVX" },
   { "XSAVES",
     "XSAVEC" },
   { "XSAVEC",
@@ -341,6 +343,7 @@ static bitfield cpu_flags[] =
   BITFIELD (SMAP),
   BITFIELD (SHA),
   BITFIELD (SHA512),
+  BITFIELD (SM3),
   BITFIELD (ClflushOpt),
   BITFIELD (XSAVES),
   BITFIELD (XSAVEC),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index b3359e47aa6..256ed532211 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -177,6 +177,8 @@ enum
   CpuSHA,
   /* SHA512 instructions required.  */
   CpuSHA512,
+  /* SM3 instructions required.  */
+  CpuSM3,
   /* CLFLUSHOPT instruction required */
   CpuClflushOpt,
   /* XSAVES/XRSTORS instruction required */
@@ -406,6 +408,7 @@ typedef union i386_cpu_flags
       unsigned int cpusmap:1;
       unsigned int cpusha:1;
       unsigned int cpusha512:1;
+      unsigned int cpusm3:1;
       unsigned int cpuclflushopt:1;
       unsigned int cpuxsaves:1;
       unsigned int cpuxsavec:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index c9a5730f90a..653b1cbc587 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -2051,6 +2051,13 @@ vsha512msg2, 0xf2cd, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegYMM, RegYM
 
 // SHA512 instructions end.
 
+// SM3 instructions.
+vsm3rnds2, 0x66de, SM3, Modrm|Space0F3A|Vex128|VexVVVV|VexW0|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg1, 0xda, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+
+// SM3 instructions end.
+
 // VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] Support Intel SHA512
  2023-07-18  7:20         ` Jiang, Haochen
  2023-07-18  7:54           ` [PATCH v2] " Haochen Jiang
@ 2023-07-18  8:11           ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2023-07-18  8:11 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 18.07.2023 09:20, Jiang, Haochen wrote:
>> As to implying baseline functionality, using AVX (rather than AVX2) makes
>> little sense, so even if the feature check remained (note that various
>> other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
>> secondary requirement), I'd still be fairly insistent on having the
>> base feature named here (and for SM4) be AVX2 (to be in line with other
>> similar baseline selections).
> 
> I confirmed that AVX in doc here means a state of the whole AVX ISA,
> which should include AVX and AVX2. 
> 
> I will change the imply of SHA512 and SM4 to AVX2 since it looks much more
> reasonable.

Thanks.

> Should we also change the imply of SM3 here?

AVX looks sufficient there, so I'd say only if you have a good justification.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2] Support Intel SM4
  2023-07-18  7:21     ` Jiang, Haochen
@ 2023-07-18  8:13       ` Haochen Jiang
  2023-07-18  9:11         ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-18  8:13 UTC (permalink / raw)
  To: binutils, jbeulich; +Cc: hjl.tools

Hi all,

The v2 SM4 patch has the following changes:

1. Changed the imply from AVX to AVX2.

2. Moved entries near SHA/SHA512/SM3.

3. Avoided the redo in i386-dis.c since the SM3 change.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SM4.
	* config/tc-i386.c: Add sm4.
	* doc/c-i386.texi: Document .sm4.
	* testsuite/gas/i386/i386.exp: Run SM4 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sm4-intel.d: Add SM4 tests.
	* testsuite/gas/i386/sm4.d: Ditto.
	* testsuite/gas/i386/sm4.s: Ditto.
	* testsuite/gas/i386/x86-64-sm4-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sm4.d: Ditto.
	* testsuite/gas/i386/x86-64-sm4.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (prefix_table): Add SM4 instructions.
	* i386-gen.c (isa_dependencies): Add SM4.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSM4): New.
	(i386_cpu_flags): Add cpusm4.
	* i386-opc.tbl: Add SM4 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                  |    2 +
 gas/config/tc-i386.c                      |    1 +
 gas/doc/c-i386.texi                       |    3 +-
 gas/testsuite/gas/i386/i386.exp           |    2 +
 gas/testsuite/gas/i386/sm4-intel.d        |   50 +
 gas/testsuite/gas/i386/sm4.d              |   50 +
 gas/testsuite/gas/i386/sm4.s              |   47 +
 gas/testsuite/gas/i386/x86-64-sm4-intel.d |   50 +
 gas/testsuite/gas/i386/x86-64-sm4.d       |   50 +
 gas/testsuite/gas/i386/x86-64-sm4.s       |   47 +
 gas/testsuite/gas/i386/x86-64.exp         |    2 +
 opcodes/i386-dis.c                        |    3 +-
 opcodes/i386-gen.c                        |    3 +
 opcodes/i386-init.h                       |  790 +-
 opcodes/i386-mnem.h                       | 3854 ++++-----
 opcodes/i386-opc.h                        |    3 +
 opcodes/i386-opc.tbl                      |    7 +
 opcodes/i386-tbl.h                        | 8916 +++++++++++----------
 18 files changed, 7128 insertions(+), 6752 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sm4-intel.d
 create mode 100644 gas/testsuite/gas/i386/sm4.d
 create mode 100644 gas/testsuite/gas/i386/sm4.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sm4.s

diff --git a/gas/NEWS b/gas/NEWS
index 42bda657f21..26e75bde391 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SM4 instructions.
+
 * Add support for Intel SM3 instructions.
 
 * Add support for Intel SHA512 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 7424fa41c44..686dd4c70f4 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1154,6 +1154,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
   SUBARCH (sha512, SHA512, ANY_SHA512, false),
   SUBARCH (sm3, SM3, ANY_SM3, false),
+  SUBARCH (sm4, SM4, ANY_SM4, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 6ef1da21370..54b0d7d738c 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -210,6 +210,7 @@ accept various extension mnemonics.  For example,
 @code{avx_vnni_int16},
 @code{sha512},
 @code{sm3},
+@code{sm4},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1639,7 +1640,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 2fcd3be1f98..e862d413c7e 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -503,6 +503,8 @@ if [gas_32_check] then {
     run_list_test "sha512-inval"
     run_dump_test "sm3"
     run_dump_test "sm3-intel"
+    run_dump_test "sm4"
+    run_dump_test "sm4-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sm4-intel.d b/gas/testsuite/gas/i386/sm4-intel.d
new file mode 100644
index 00000000000..03ccdb4a67b
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4-intel.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SM4 insns (Intel disassembly)
+#source: sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
diff --git a/gas/testsuite/gas/i386/sm4.d b/gas/testsuite/gas/i386/sm4.d
new file mode 100644
index 00000000000..48dcda66271
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw
+#name: i386 SM4 insns
+#source: sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da 31\s+vsm4key4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b4 f4 00 00 00 10\s+vsm4key4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da 31\s+vsm4key4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%edx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da 31\s+vsm4rnds4 \(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%ecx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%edx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b4 f4 00 00 00 10\s+vsm4rnds4 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da 31\s+vsm4rnds4 \(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%ecx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%edx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/sm4.s b/gas/testsuite/gas/i386/sm4.s
new file mode 100644
index 00000000000..0eb7b2fcb7b
--- /dev/null
+++ b/gas/testsuite/gas/i386/sm4.s
@@ -0,0 +1,47 @@
+# Check 32bit SM4 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm4key4	%ymm4, %ymm5, %ymm6
+	vsm4key4	%xmm4, %xmm5, %xmm6
+	vsm4key4	0x10000000(%esp, %esi, 8), %ymm5, %ymm6
+	vsm4key4	(%ecx), %ymm5, %ymm6
+	vsm4key4	4064(%ecx), %ymm5, %ymm6
+	vsm4key4	-4096(%edx), %ymm5, %ymm6
+	vsm4key4	0x10000000(%esp, %esi, 8), %xmm5, %xmm6
+	vsm4key4	(%ecx), %xmm5, %xmm6
+	vsm4key4	2032(%ecx), %xmm5, %xmm6
+	vsm4key4	-2048(%edx), %xmm5, %xmm6
+	vsm4rnds4	%ymm4, %ymm5, %ymm6
+	vsm4rnds4	%xmm4, %xmm5, %xmm6
+	vsm4rnds4	0x10000000(%esp, %esi, 8), %ymm5, %ymm6
+	vsm4rnds4	(%ecx), %ymm5, %ymm6
+	vsm4rnds4	4064(%ecx), %ymm5, %ymm6
+	vsm4rnds4	-4096(%edx), %ymm5, %ymm6
+	vsm4rnds4	0x10000000(%esp, %esi, 8), %xmm5, %xmm6
+	vsm4rnds4	(%ecx), %xmm5, %xmm6
+	vsm4rnds4	2032(%ecx), %xmm5, %xmm6
+	vsm4rnds4	-2048(%edx), %xmm5, %xmm6
+
+.intel_syntax noprefix
+	vsm4key4	ymm6, ymm5, ymm4
+	vsm4key4	xmm6, xmm5, xmm4
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [ecx]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [ecx+4064]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [edx-4096]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [ecx]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [ecx+2032]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [edx-2048]
+	vsm4rnds4	ymm6, ymm5, ymm4
+	vsm4rnds4	xmm6, xmm5, xmm4
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [ecx]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [ecx+4064]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [edx-4096]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [ecx]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [ecx+2032]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [edx-2048]
diff --git a/gas/testsuite/gas/i386/x86-64-sm4-intel.d b/gas/testsuite/gas/i386/x86-64-sm4-intel.d
new file mode 100644
index 00000000000..9bfa59592ae
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4-intel.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SM4 insns (Intel disassembly)
+#source: x86-64-sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 ymm6,ymm5,ymm4
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 xmm6,xmm5,xmm4
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rcx\+0xfe0\]
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 ymm6,ymm5,YMMWORD PTR \[rdx-0x1000\]
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[r9\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rcx\+0x7f0\]
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 xmm6,xmm5,XMMWORD PTR \[rdx-0x800\]
diff --git a/gas/testsuite/gas/i386/x86-64-sm4.d b/gas/testsuite/gas/i386/x86-64-sm4.d
new file mode 100644
index 00000000000..2c1f6737d9a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4.d
@@ -0,0 +1,50 @@
+#as:
+#objdump: -dw
+#name: x86_64 SM4 insns
+#source: x86-64-sm4.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 56 da f4\s+vsm4key4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 52 da f4\s+vsm4key4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 56 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 56 da 31\s+vsm4key4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b1 e0 0f 00 00\s+vsm4key4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 56 da b2 00 f0 ff ff\s+vsm4key4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 52 da b4 f5 00 00 00 10\s+vsm4key4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 52 da 31\s+vsm4key4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b1 f0 07 00 00\s+vsm4key4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 52 da b2 00 f8 ff ff\s+vsm4key4 -0x800\(%rdx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 57 da f4\s+vsm4rnds4 %ymm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 53 da f4\s+vsm4rnds4 %xmm4,%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 a2 57 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 c2 57 da 31\s+vsm4rnds4 \(%r9\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b1 e0 0f 00 00\s+vsm4rnds4 0xfe0\(%rcx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 da b2 00 f0 ff ff\s+vsm4rnds4 -0x1000\(%rdx\),%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 a2 53 da b4 f5 00 00 00 10\s+vsm4rnds4 0x10000000\(%rbp,%r14,8\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 c2 53 da 31\s+vsm4rnds4 \(%r9\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b1 f0 07 00 00\s+vsm4rnds4 0x7f0\(%rcx\),%xmm5,%xmm6
+\s*[a-f0-9]+:\s*c4 e2 53 da b2 00 f8 ff ff\s+vsm4rnds4 -0x800\(%rdx\),%xmm5,%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-sm4.s b/gas/testsuite/gas/i386/x86-64-sm4.s
new file mode 100644
index 00000000000..cc680cb1a89
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sm4.s
@@ -0,0 +1,47 @@
+# Check 64bit SM4 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsm4key4	%ymm4, %ymm5, %ymm6 
+	vsm4key4	%xmm4, %xmm5, %xmm6
+	vsm4key4	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6
+	vsm4key4	(%r9), %ymm5, %ymm6
+	vsm4key4	4064(%rcx), %ymm5, %ymm6
+	vsm4key4	-4096(%rdx), %ymm5, %ymm6
+	vsm4key4	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6
+	vsm4key4	(%r9), %xmm5, %xmm6
+	vsm4key4	2032(%rcx), %xmm5, %xmm6
+	vsm4key4	-2048(%rdx), %xmm5, %xmm6
+	vsm4rnds4	%ymm4, %ymm5, %ymm6
+	vsm4rnds4	%xmm4, %xmm5, %xmm6
+	vsm4rnds4	0x10000000(%rbp, %r14, 8), %ymm5, %ymm6
+	vsm4rnds4	(%r9), %ymm5, %ymm6
+	vsm4rnds4	4064(%rcx), %ymm5, %ymm6
+	vsm4rnds4	-4096(%rdx), %ymm5, %ymm6
+	vsm4rnds4	0x10000000(%rbp, %r14, 8), %xmm5, %xmm6
+	vsm4rnds4	(%r9), %xmm5, %xmm6
+	vsm4rnds4	2032(%rcx), %xmm5, %xmm6
+	vsm4rnds4	-2048(%rdx), %xmm5, %xmm6
+
+.intel_syntax noprefix
+	vsm4key4	ymm6, ymm5, ymm4
+	vsm4key4	xmm6, xmm5, xmm4
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [r9]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rcx+4064]
+	vsm4key4	ymm6, ymm5, YMMWORD PTR [rdx-4096]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [r9]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rcx+2032]
+	vsm4key4	xmm6, xmm5, XMMWORD PTR [rdx-2048]
+	vsm4rnds4	ymm6, ymm5, ymm4
+	vsm4rnds4	xmm6, xmm5, xmm4
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [r9]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rcx+4064]
+	vsm4rnds4	ymm6, ymm5, YMMWORD PTR [rdx-4096]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rbp+r14*8+0x10000000]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [r9]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rcx+2032]
+	vsm4rnds4	xmm6, xmm5, XMMWORD PTR [rdx-2048]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index d31bb40b32b..386d3dfd456 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -445,6 +445,8 @@ run_dump_test "x86-64-sha512-intel"
 run_list_test "x86-64-sha512-inval"
 run_dump_test "x86-64-sm3"
 run_dump_test "x86-64-sm3-intel"
+run_dump_test "x86-64-sm4"
+run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 006e38a16a9..6a5b3b7571a 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -3972,8 +3972,9 @@ static const struct dis386 prefix_table[][4] = {
   /* PREFIX_VEX_0F38DA_W_0 */
   {
     { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_0) },
-    { Bad_Opcode },
+    { "vsm4key4", { XM, Vex, EXx }, 0 },
     { VEX_LEN_TABLE (VEX_LEN_0F38DA_W_0_P_2) },
+    { "vsm4rnds4", { XM, Vex, EXx }, 0 },
   },
 
   /* PREFIX_VEX_0F38F5_L_0 */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 11af743ffe7..a0614f92cda 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -218,6 +218,8 @@ static const dependency isa_dependencies[] =
     "AVX2" },
   { "SM3",
     "AVX" },
+  { "SM4",
+    "AVX2" },
   { "XSAVES",
     "XSAVEC" },
   { "XSAVEC",
@@ -344,6 +346,7 @@ static bitfield cpu_flags[] =
   BITFIELD (SHA),
   BITFIELD (SHA512),
   BITFIELD (SM3),
+  BITFIELD (SM4),
   BITFIELD (ClflushOpt),
   BITFIELD (XSAVES),
   BITFIELD (XSAVEC),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 256ed532211..68cbfaa95e5 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -179,6 +179,8 @@ enum
   CpuSHA512,
   /* SM3 instructions required.  */
   CpuSM3,
+  /* SM4 instructions required.  */
+  CpuSM4,
   /* CLFLUSHOPT instruction required */
   CpuClflushOpt,
   /* XSAVES/XRSTORS instruction required */
@@ -409,6 +411,7 @@ typedef union i386_cpu_flags
       unsigned int cpusha:1;
       unsigned int cpusha512:1;
       unsigned int cpusm3:1;
+      unsigned int cpusm4:1;
       unsigned int cpuclflushopt:1;
       unsigned int cpuxsaves:1;
       unsigned int cpuxsavec:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 653b1cbc587..fdb436e38df 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -2058,6 +2058,13 @@ vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|VexVVVV|VexW0|NoSuf, { RegXMM|Unsp
 
 // SM3 instructions end.
 
+// SM4 instructions.
+
+vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
+
+// SM4 instructions end.
+
 // VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Support Intel SHA512
  2023-07-18  7:54           ` [PATCH v2] " Haochen Jiang
  2023-07-18  7:59             ` Jiang, Haochen
@ 2023-07-18  8:51             ` Jan Beulich
  2023-07-20  8:32               ` Jiang, Haochen
  2023-07-20  8:32               ` [PATCH] " Haochen Jiang
  1 sibling, 2 replies; 31+ messages in thread
From: Jan Beulich @ 2023-07-18  8:51 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 18.07.2023 09:54, Haochen Jiang wrote:
> --- a/gas/testsuite/gas/i386/disassem.s
> +++ b/gas/testsuite/gas/i386/disassem.s
> @@ -168,6 +168,12 @@
>  .byte 0xC4, 0xE1, 0xF9, 0x93, 0x6F
>  	.insn VEX.L0.66.0f.W1 0x93, (%edi), %k7
>  .byte 0xc4, 0xe2, 0x1, 0x1c, 0x41, 0x37
> +	.insn VEX.L1.F2.0f38.W0 0xCC, (%ecx), %ymm1
> +.fill 0x5, 0x1, 0x90
> +	.insn VEX.L1.F2.0f38.W0 0xCD, (%ecx), %ymm1
> +.fill 0x5, 0x1, 0x90
> +	.insn VEX.L1.F2.0f38.W0 0xCB, (%ecx), %ymm2, %ymm1
> +.fill 0x5, 0x1, 0x90

In new additions here (and to similar files) please can you avoid
- .fill / .byte and alike whenever possible,
- unindented directives?
The latter is purely style, I know, but strictly speaking directives
should never start in the first column. Present gas, presumably for
historical reasons, simply is overly forgiving in this regard.

To deal with the former, more careful selection of operands is all
it takes. With how the disassembler presently works, what you want
is that the nominal ModR/M byte disassembles as a single-byte opcode.
That's very easy to achieve: Opcodes 40-5f (50-5f for 64-bit) are all
single-byte, i.e. you won't need much more that ModR/M.mod = 1, i.e.
the Disp8 encoding form with a displacement that then also
disassembles as a single-byte opcode.

Alternatively (and perhaps even better) you can arrange for ModR/M
bytes of 69, 6a, 6b, or 70-7f, with a suitable displacement byte
(any will do afaict for 6a and 70-7f, while 69 and 6b would require
the top two bits to be set).

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/sha512-inval.l
> @@ -0,0 +1,4 @@
> +.* Assembler messages:
> +.*:6: Error: operand size mismatch for `vsha512msg1'
> +.*:7: Error: operand size mismatch for `vsha512msg2'
> +.*:8: Error: operand size mismatch for `vsha512rnds2'

Just as a remark, no action expected from your side: This of course
isn't the correct error message to be emitted here. It should be
"type", not "size". You _may_ want to replace "size" by ".*" to
allow for a future assembler adjustment without the need to touch
this testcase again.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/sha512.d
> @@ -0,0 +1,16 @@
> +#as:

What purpose does this line (present in several of the tests) have?

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/sha512.s
> @@ -0,0 +1,13 @@
> +# Check 32bit SHA512 instructions
> +
> +	.allow_index_reg

This doesn't look to be needed either.

> +	.text
> +_start:
> +	vsha512msg1	%xmm5, %ymm6	 #SHA512
> +	vsha512msg2	%ymm5, %ymm6	 #SHA512
> +	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
> +
> +.intel_syntax noprefix

See remark above about indentation of directives.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-sha512.s
> @@ -0,0 +1,13 @@
> +# Check 64bit SHA512 instructions
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +	vsha512msg1	%xmm5, %ymm6	 #SHA512
> +	vsha512msg2	%ymm5, %ymm6	 #SHA512
> +	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
> +
> +.intel_syntax noprefix
> +	vsha512msg1	ymm6, xmm5	 #SHA512
> +	vsha512msg2	ymm6, ymm5	 #SHA512
> +	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512

Maybe worthwhile to use higher register numbers as well, e.g.

_start:
	vsha512msg1	%xmm14, %ymm5	 #SHA512
	vsha512msg2	%ymm4, %ymm15	 #SHA512
	vsha512rnds2	%xmm6, %ymm5, %ymm14	 #SHA512

	.intel_syntax noprefix
	vsha512msg1	ymm14, xmm5	 #SHA512
	vsha512msg2	ymm6, ymm15	 #SHA512
	vsha512rnds2	ymm6, ymm5, xmm14	 #SHA512

?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Support Intel SM3
  2023-07-18  8:09     ` [PATCH v2] " Haochen Jiang
@ 2023-07-18  9:03       ` Jan Beulich
  2023-07-24  2:54         ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-18  9:03 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 18.07.2023 10:09, Haochen Jiang wrote:
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/sm3.s
> @@ -0,0 +1,37 @@
> +# Check 32bit SM3 instructions
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
> +	vsm3msg1	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
> +	vsm3msg1	(%ecx), %xmm5, %xmm6	 #SM3
> +	vsm3msg1	2032(%ecx), %xmm5, %xmm6	 #SM3 Disp32(f0070000)
> +	vsm3msg1	-2048(%edx), %xmm5, %xmm6	 #SM3 Disp32(00f8ffff)

The numbers in parentheses are odd. I'd prefer if they were omitted,
but I'd also be okay of you flipped their byte order so they properly
correspond (as numbers) to the displacements used.

That said, I'm not sure about their usefulness: The two specific
displacement values chosen are apparently AVX512-inherited, where
they would correspond to the largest/smallest displacements still
compressible. Since these are VEX, not EVEX, encodings, I don't
think these values are of particular interest, and you have memory
forms of the insn earlier. So my suggestion (without insisting)
would be to simply drop these (and similar) lines altogether, or if
at all use forms which in fact use Disp8 encoding.

Other testcase related comments given for the SHA512 patch apply here
as well.

Okay with all of these taken care of.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Support Intel SM4
  2023-07-18  8:13       ` [PATCH v2] " Haochen Jiang
@ 2023-07-18  9:11         ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2023-07-18  9:11 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 18.07.2023 10:13, Haochen Jiang wrote:
> Hi all,
> 
> The v2 SM4 patch has the following changes:
> 
> 1. Changed the imply from AVX to AVX2.
> 
> 2. Moved entries near SHA/SHA512/SM3.
> 
> 3. Avoided the redo in i386-dis.c since the SM3 change.

Some of the testcase related comments on earlier patches apply here
as well. Okay with them taken care of.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2] Support Intel SHA512
  2023-07-18  8:51             ` Jan Beulich
@ 2023-07-20  8:32               ` Jiang, Haochen
  2023-07-20 10:37                 ` Jan Beulich
  2023-07-20  8:32               ` [PATCH] " Haochen Jiang
  1 sibling, 1 reply; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-20  8:32 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/sha512.d
> > @@ -0,0 +1,16 @@
> > +#as:
> 
> What purpose does this line (present in several of the tests) have?
> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/sha512.s
> > @@ -0,0 +1,13 @@
> > +# Check 32bit SHA512 instructions
> > +
> > +	.allow_index_reg
> 
> This doesn't look to be needed either.
> 

We use script to generate the testcases so these two are to fit all
circumstances since actually script does not know what will happen
for a new ISA. (The former is for some extra option in as the latter is
for index reg.)

We could omit that but one thing I need to mention is that there are
also some redundant things in all the existing testcases. If we want to
eliminate all of them, some may need careful manual work. I am
wondering if that is time-worthy to change all of them. Therefore, I
propose not to omit that to keep align with all the testcases since it
is not wrong. 

All the other mentioned in the review has been fixed in my v3 patch
which will be sent out later.

Thx,
Haochen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] Support Intel SHA512
  2023-07-18  8:51             ` Jan Beulich
  2023-07-20  8:32               ` Jiang, Haochen
@ 2023-07-20  8:32               ` Haochen Jiang
  2023-07-20 11:07                 ` [PATCH v3] " Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Haochen Jiang @ 2023-07-20  8:32 UTC (permalink / raw)
  To: binutils, jbeulich; +Cc: hjl.tools

Hi all,

This is the v3 patch for SHA512 with the following changes comparing to
the v2 patch:

1. Adjusted the testcases for disassem.[ds] to avoid .fill.

2. Indented directives in [x86-64-]sha512.s.

3. Changed to higher registers for x86-64-sha512.[ds].

Changes in v2:

1. Added invalid test in disassem.[ds] and [x86-64-]sha512-inval.[ls].

2. Changed the imply of SHA512 from AVX to AVX2.

3. Moved the entry of SHA512 next to SHA. Put Modrm to the first in table.

4. Using Rxmmq instead of passing mod table. Also renamed Uymm to Rymm.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SHA512.
	* config/tc-i386.c: Add sha512.
	* doc/c-i386.texi: Document .sha512.
	* testsuite/gas/i386/disassem.d: Add SHA512 tests.
	* testsuite/gas/i386/disassem.s: Ditto.
	* testsuite/gas/i386/i386.exp: Run SHA512 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sha512-intel.d: New test.
	* testsuite/gas/i386/sha512-inval.l: Ditto.
	* testsuite/gas/i386/sha512-inval.s: Ditto.
	* testsuite/gas/i386/sha512.d: Ditto.
	* testsuite/gas/i386/sha512.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (Rxmmq): New.
	(Rymm): Ditto.
	(MOD_VEX_0F38CB_P_3_W_0_L_1): Ditto.
	(MOD_VEX_0F38CC_P_3_W_0_L_1): Ditto.
	(PREFIX_VEX_0F38CB): Ditto.
	(PREFIX_VEX_0F38CC): Ditto.
	(PREFIX_VEX_0F38CD): Ditto.
	(VEX_LEN_0F38CB_P_3_W_0): Ditto.
	(VEX_LEN_0F38CC_P_3_W_0): Ditto.
	(VEX_LEN_0F38CD_P_3_W_0): Ditto.
	(VEX_W_0F38CB_P_3): Ditto.
	(VEX_W_0F38CC_P_3): Ditto.
	(VEX_W_0F38CD_P_3): Ditto.
	(mod_table): Add MOD_VEX_0F38CB_P_3_W_0_L_1, MOD_VEX_0F38CC_P_3_W_0_L_1,
	(prefix_table): Add PREFIX_VEX_0F38CB, PREFIX_VEX_0F38CC,
	PREFIX_VEX_0F38CD.
	(vex_len_table): Add VEX_LEN_0F38CB_P_3_W_0,
	VEX_LEN_0F38CC_P_3_W_0, VEX_LEN_0F38CD_P_3_W_0.
	(vex_w_table): Add VEX_W_0F38CB_P_3, VEX_W_0F38CC_P_3, VEX_W_0F38CD_P_3.
	* i386-gen.c (isa_dependencies): Add SHA512.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSHA512): New.
	(i386_cpu_flags): Add cpusha512.
	* i386-opc.tbl: Add SHA512 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                     |    2 +
 gas/config/tc-i386.c                         |    1 +
 gas/doc/c-i386.texi                          |    3 +-
 gas/testsuite/gas/i386/disassem.d            |    6 +
 gas/testsuite/gas/i386/disassem.s            |    3 +
 gas/testsuite/gas/i386/i386.exp              |    3 +
 gas/testsuite/gas/i386/sha512-intel.d        |   16 +
 gas/testsuite/gas/i386/sha512-inval.l        |    4 +
 gas/testsuite/gas/i386/sha512-inval.s        |    8 +
 gas/testsuite/gas/i386/sha512.d              |   16 +
 gas/testsuite/gas/i386/sha512.s              |   13 +
 gas/testsuite/gas/i386/x86-64-sha512-intel.d |   16 +
 gas/testsuite/gas/i386/x86-64-sha512-inval.l |    4 +
 gas/testsuite/gas/i386/x86-64-sha512-inval.s |    8 +
 gas/testsuite/gas/i386/x86-64-sha512.d       |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.s       |   13 +
 gas/testsuite/gas/i386/x86-64.exp            |    3 +
 opcodes/i386-dis.c                           |   79 +-
 opcodes/i386-gen.c                           |    3 +
 opcodes/i386-init.h                          |  776 +-
 opcodes/i386-mnem.h                          | 3949 ++++----
 opcodes/i386-opc.h                           |    3 +
 opcodes/i386-opc.tbl                         |    8 +
 opcodes/i386-tbl.h                           | 9447 +++++++++---------
 24 files changed, 7348 insertions(+), 7052 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/sha512-inval.l
 create mode 100644 gas/testsuite/gas/i386/sha512-inval.s
 create mode 100644 gas/testsuite/gas/i386/sha512.d
 create mode 100644 gas/testsuite/gas/i386/sha512.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.s

diff --git a/gas/NEWS b/gas/NEWS
index 5e9ed5ab4bc..fe2c055fa7f 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@
 -*- text -*-
 
+* Add support for Intel SHA512 instructions.
+
 * Add support for Intel AVX-VNNI-INT16 instructions.
 
 Changes in 2.41:
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 0d3d7560efe..836640d9123 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1152,6 +1152,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (fred, FRED, ANY_FRED, false),
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
+  SUBARCH (sha512, SHA512, ANY_SHA512, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 40ba942d9cb..21fb71e54ab 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -208,6 +208,7 @@ accept various extension mnemonics.  For example,
 @code{fred},
 @code{lkgs},
 @code{avx_vnni_int16},
+@code{sha512},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1637,7 +1638,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/disassem.d b/gas/testsuite/gas/i386/disassem.d
index 8ee0a664e0b..eae69db6553 100644
--- a/gas/testsuite/gas/i386/disassem.d
+++ b/gas/testsuite/gas/i386/disassem.d
@@ -345,6 +345,12 @@ Disassembly of section \.text:
 [ 	]*[a-f0-9]+:[ 	]*c4 e2 01 1c[ 	]*\(bad\)
 [ 	]*[a-f0-9]+:[ 	]*41[ 	]*inc[ 	]*%ecx
 [ 	]*[a-f0-9]+:[ 	]*37[ 	]*aaa
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7f cc[ 	]+vsha512msg1[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*71 20[ 	]+jno.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7f cd[ 	]+vsha512msg2[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*71 20[ 	]+jno.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 6f cb[ 	]+vsha512rnds2[ 	]*\(bad\),.*
+[ 	]*[a-f0-9]+:[ 	]*71 20[ 	]+jno.*
 [ 	]*[a-f0-9]+:[ 	]*62 f2 ad 08 1c[ 	]*\(bad\)
 [ 	]*[a-f0-9]+:[ 	]*01 01[ 	]*add[ 	]*%eax,\(%ecx\)
 [ 	]*[a-f0-9]+:[ 	]*62 f3 7d 28 1b[ 	]*\(bad\)
diff --git a/gas/testsuite/gas/i386/disassem.s b/gas/testsuite/gas/i386/disassem.s
index c74a9353933..0fb0dd48b54 100644
--- a/gas/testsuite/gas/i386/disassem.s
+++ b/gas/testsuite/gas/i386/disassem.s
@@ -168,6 +168,9 @@
 .byte 0xC4, 0xE1, 0xF9, 0x93, 0x6F
 	.insn VEX.L0.66.0f.W1 0x93, (%edi), %k7
 .byte 0xc4, 0xe2, 0x1, 0x1c, 0x41, 0x37
+	.insn VEX.L1.F2.0f38.W0 0xCC, 32(%ecx), %ymm6
+	.insn VEX.L1.F2.0f38.W0 0xCD, 32(%ecx), %ymm6
+	.insn VEX.L1.F2.0f38.W0 0xCB, 32(%ecx), %ymm2, %ymm6
 .byte 0x62, 0xf2, 0xad, 0x08, 0x1c, 0x01
 .byte 0x1
 	.insn EVEX.66.0f3a.W0 0x1b, $0x25, %ymm0, %xmm1
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index b69c692cd16..1208d5372d7 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -498,6 +498,9 @@ if [gas_32_check] then {
     run_list_test "amx-complex-inval"
     run_dump_test "avx-vnni-int16"
     run_dump_test "avx-vnni-int16-intel"
+    run_dump_test "sha512"
+    run_dump_test "sha512-intel"
+    run_list_test "sha512-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sha512-intel.d b/gas/testsuite/gas/i386/sha512-intel.d
new file mode 100644
index 00000000000..c1cc85b9f26
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: i386 SHA512 insns (Intel disassembly)
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/sha512-inval.l b/gas/testsuite/gas/i386/sha512-inval.l
new file mode 100644
index 00000000000..cb9b81ced84
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-inval.l
@@ -0,0 +1,4 @@
+.* Assembler messages:
+.*:6: Error: operand .* mismatch for `vsha512msg1'
+.*:7: Error: operand .* mismatch for `vsha512msg2'
+.*:8: Error: operand .* mismatch for `vsha512rnds2'
diff --git a/gas/testsuite/gas/i386/sha512-inval.s b/gas/testsuite/gas/i386/sha512-inval.s
new file mode 100644
index 00000000000..d3ae819c563
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-inval.s
@@ -0,0 +1,8 @@
+# Check Illegal SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	(%ecx), %ymm6
+	vsha512msg2	(%ecx), %ymm6
+	vsha512rnds2	(%ecx), %ymm5, %ymm6
diff --git a/gas/testsuite/gas/i386/sha512.d b/gas/testsuite/gas/i386/sha512.d
new file mode 100644
index 00000000000..b90019954ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: i386 SHA512 insns
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/sha512.s b/gas/testsuite/gas/i386/sha512.s
new file mode 100644
index 00000000000..710dc8995ac
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.s
@@ -0,0 +1,13 @@
+# Check 32bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+	.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-intel.d b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
new file mode 100644
index 00000000000..9f50ff1f51f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SHA512 insns (Intel disassembly)
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 c2 7f cc f7\s+vsha512msg1 ymm6,xmm15
+\s*[a-f0-9]+:\s*c4 62 7f cd fd\s+vsha512msg2 ymm15,ymm5
+\s*[a-f0-9]+:\s*c4 62 57 cb f4\s+vsha512rnds2 ymm14,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 c2 7f cc f7\s+vsha512msg1 ymm6,xmm15
+\s*[a-f0-9]+:\s*c4 62 7f cd fd\s+vsha512msg2 ymm15,ymm5
+\s*[a-f0-9]+:\s*c4 62 57 cb f4\s+vsha512rnds2 ymm14,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-inval.l b/gas/testsuite/gas/i386/x86-64-sha512-inval.l
new file mode 100644
index 00000000000..cb9b81ced84
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-inval.l
@@ -0,0 +1,4 @@
+.* Assembler messages:
+.*:6: Error: operand .* mismatch for `vsha512msg1'
+.*:7: Error: operand .* mismatch for `vsha512msg2'
+.*:8: Error: operand .* mismatch for `vsha512rnds2'
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-inval.s b/gas/testsuite/gas/i386/x86-64-sha512-inval.s
new file mode 100644
index 00000000000..d3ae819c563
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-inval.s
@@ -0,0 +1,8 @@
+# Check Illegal SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	(%ecx), %ymm6
+	vsha512msg2	(%ecx), %ymm6
+	vsha512rnds2	(%ecx), %ymm5, %ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.d b/gas/testsuite/gas/i386/x86-64-sha512.d
new file mode 100644
index 00000000000..e616ace44fc
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.d
@@ -0,0 +1,16 @@
+#as:
+#objdump: -dw
+#name: x86_64 SHA512 insns
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 c2 7f cc f7\s+vsha512msg1 %xmm15,%ymm6
+\s*[a-f0-9]+:\s*c4 62 7f cd fd\s+vsha512msg2 %ymm5,%ymm15
+\s*[a-f0-9]+:\s*c4 62 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm14
+\s*[a-f0-9]+:\s*c4 c2 7f cc f7\s+vsha512msg1 %xmm15,%ymm6
+\s*[a-f0-9]+:\s*c4 62 7f cd fd\s+vsha512msg2 %ymm5,%ymm15
+\s*[a-f0-9]+:\s*c4 62 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm14
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.s b/gas/testsuite/gas/i386/x86-64-sha512.s
new file mode 100644
index 00000000000..131a6f05d39
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.s
@@ -0,0 +1,13 @@
+# Check 64bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm15, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm15	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm14	 #SHA512
+
+	.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm15	 #SHA512
+	vsha512msg2	ymm15, ymm5	 #SHA512
+	vsha512rnds2	ymm14, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 0f2903c6185..c6ec9be3d43 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -440,6 +440,9 @@ run_dump_test "x86-64-lkgs"
 run_list_test "x86-64-lkgs-inval"
 run_dump_test "x86-64-avx-vnni-int16"
 run_dump_test "x86-64-avx-vnni-int16-intel"
+run_dump_test "x86-64-sha512"
+run_dump_test "x86-64-sha512-intel"
+run_list_test "x86-64-sha512-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 36a839d1652..0043b62f324 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -530,6 +530,8 @@ fetch_error (const instr_info *ins)
 #define Nq { OP_R, q_mode }
 #define Ux { OP_R, x_mode }
 #define Uxmm { OP_R, xmm_mode }
+#define Rxmmq { OP_R, xmmq_mode }
+#define Rymm { OP_R, ymm_mode }
 #define Rtmm { OP_R, tmm_mode }
 #define EMCq { OP_EMC, q_mode }
 #define MXC { OP_MXC, 0 }
@@ -1064,6 +1066,9 @@ enum
   PREFIX_VEX_0F38B1_W_0,
   PREFIX_VEX_0F38D2_W_0,
   PREFIX_VEX_0F38D3_W_0,
+  PREFIX_VEX_0F38CB,
+  PREFIX_VEX_0F38CC,
+  PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1306,6 +1311,9 @@ enum
   VEX_LEN_0F385C_X86_64,
   VEX_LEN_0F385E_X86_64,
   VEX_LEN_0F386C_X86_64,
+  VEX_LEN_0F38CB_P_3_W_0,
+  VEX_LEN_0F38CC_P_3_W_0,
+  VEX_LEN_0F38CD_P_3_W_0,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1473,6 +1481,9 @@ enum
   VEX_W_0F38B1,
   VEX_W_0F38B4,
   VEX_W_0F38B5,
+  VEX_W_0F38CB_P_3,
+  VEX_W_0F38CC_P_3,
+  VEX_W_0F38CD_P_3,
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
@@ -3928,6 +3939,30 @@ static const struct dis386 prefix_table[][4] = {
     { "vpdpwusds",	{ XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38CB */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CB_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CC */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CC_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CD */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6380,9 +6415,9 @@ static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CB) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CC) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CD) },
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F38CF) },
     /* d0 */
@@ -6944,6 +6979,24 @@ static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F386C_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F38CB_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512rnds2", { XM, Vex, Rxmmq }, 0 },
+  },
+
+  /* VEX_LEN_0F38CC_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg1", { XM, Rxmmq }, 0 },
+  },
+
+  /* VEX_LEN_0F38CD_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg2", { XM, Rymm }, 0 },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7614,6 +7667,18 @@ static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XVvpmadd52huq",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F38CB_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CB_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CC_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CC_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CD_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CD_P_3_W_0) },
+  },
   {
     /* VEX_W_0F38CF */
     { "%XEvgf2p8mulb", { XM, Vex, EXx }, PREFIX_DATA },
@@ -8055,6 +8120,14 @@ static const struct dis386 mod_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0) },
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1) },
   },
+  {
+    /* MOD_VEX_0F38CB_P_3_W_0_L_1 */
+    { Bad_Opcode },
+  },
+  {
+    /* MOD_VEX_0F38CC_P_3_W_0_L_1 */
+    { Bad_Opcode },
+  },
 
 #include "i386-dis-evex-mod.h"
 };
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 6ad7d6951db..e2528932d84 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -214,6 +214,8 @@ static const dependency isa_dependencies[] =
     "XSAVE" },
   { "SHA",
     "SSE2" },
+  { "SHA512",
+    "AVX2" },
   { "XSAVES",
     "XSAVEC" },
   { "XSAVEC",
@@ -338,6 +340,7 @@ static bitfield cpu_flags[] =
   BITFIELD (PRFCHW),
   BITFIELD (SMAP),
   BITFIELD (SHA),
+  BITFIELD (SHA512),
   BITFIELD (ClflushOpt),
   BITFIELD (XSAVES),
   BITFIELD (XSAVEC),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index f9a68b4c513..b3359e47aa6 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -175,6 +175,8 @@ enum
   CpuSMAP,
   /* SHA instructions required.  */
   CpuSHA,
+  /* SHA512 instructions required.  */
+  CpuSHA512,
   /* CLFLUSHOPT instruction required */
   CpuClflushOpt,
   /* XSAVES/XRSTORS instruction required */
@@ -403,6 +405,7 @@ typedef union i386_cpu_flags
       unsigned int cpuprfchw:1;
       unsigned int cpusmap:1;
       unsigned int cpusha:1;
+      unsigned int cpusha512:1;
       unsigned int cpuclflushopt:1;
       unsigned int cpuxsaves:1;
       unsigned int cpuxsavec:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index f62e5280982..c9a5730f90a 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -2043,6 +2043,14 @@ sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
+// SHA512 instructions.
+
+vsha512rnds2, 0xf2cb, SHA512, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
+vsha512msg1, 0xf2cc, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegXMM, RegYMM }
+vsha512msg2, 0xf2cd, SHA512, Modrm|Vex256|Space0F38|VexW0|NoSuf, { RegYMM, RegYMM }
+
+// SHA512 instructions end.
+
 // VPCLMULQDQ instructions
 
 vpclmulqdq, 0x6644, VPCLMULQDQ, Modrm|Vex256|Space0F3A|VexWIG|VexVVVV|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Support Intel SHA512
  2023-07-20  8:32               ` Jiang, Haochen
@ 2023-07-20 10:37                 ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2023-07-20 10:37 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: hjl.tools, binutils

On 20.07.2023 10:32, Jiang, Haochen wrote:
>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/sha512.d
>>> @@ -0,0 +1,16 @@
>>> +#as:
>>
>> What purpose does this line (present in several of the tests) have?
>>
>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/sha512.s
>>> @@ -0,0 +1,13 @@
>>> +# Check 32bit SHA512 instructions
>>> +
>>> +	.allow_index_reg
>>
>> This doesn't look to be needed either.
>>
> 
> We use script to generate the testcases so these two are to fit all
> circumstances since actually script does not know what will happen
> for a new ISA. (The former is for some extra option in as the latter is
> for index reg.)
> 
> We could omit that but one thing I need to mention is that there are
> also some redundant things in all the existing testcases. If we want to
> eliminate all of them, some may need careful manual work. I am
> wondering if that is time-worthy to change all of them. Therefore, I
> propose not to omit that to keep align with all the testcases since it
> is not wrong. 

I guess H.J. was more permissive in what he allowed in. I'm concerned
of pieces in testcases which aren't relevant: It easily raises questions
of why things are there. I'd be happy to - over time - clean up that
aspect as well in the testsuite, just like I've been cleaning up other
oddities. I'd prefer if new testcases contained just what is needed in
there for the test to fulfill its purpose.

> All the other mentioned in the review has been fixed in my v3 patch
> which will be sent out later.

Thanks; looks like it wasn't marked as being v3.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Support Intel SHA512
  2023-07-20  8:32               ` [PATCH] " Haochen Jiang
@ 2023-07-20 11:07                 ` Jan Beulich
  2023-07-27  5:52                   ` Jiang, Haochen
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2023-07-20 11:07 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: hjl.tools, binutils

On 20.07.2023 10:32, Haochen Jiang wrote:
> This is the v3 patch for SHA512 with the following changes comparing to
> the v2 patch:
> 
> 1. Adjusted the testcases for disassem.[ds] to avoid .fill.
> 
> 2. Indented directives in [x86-64-]sha512.s.
> 
> 3. Changed to higher registers for x86-64-sha512.[ds].

Thanks for making those adjustments. I'm not happy to approve new testcases
going in with (apparently) superfluous content. I'll leave approving to H.J.
in that case, in the expectation that he'll weigh both perspectives.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2] Support Intel SM3
  2023-07-18  9:03       ` Jan Beulich
@ 2023-07-24  2:54         ` Jiang, Haochen
  0 siblings, 0 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-24  2:54 UTC (permalink / raw)
  To: Beulich, Jan, hjl.tools; +Cc: binutils

> On 18.07.2023 10:09, Haochen Jiang wrote:
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/sm3.s
> > @@ -0,0 +1,37 @@
> > +# Check 32bit SM3 instructions
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +	vsm3msg1	%xmm4, %xmm5, %xmm6	 #SM3
> > +	vsm3msg1	0x10000000(%esp, %esi, 8), %xmm5, %xmm6	 #SM3
> > +	vsm3msg1	(%ecx), %xmm5, %xmm6	 #SM3
> > +	vsm3msg1	2032(%ecx), %xmm5, %xmm6	 #SM3
> Disp32(f0070000)
> > +	vsm3msg1	-2048(%edx), %xmm5, %xmm6	 #SM3
> Disp32(00f8ffff)
> 

My final patch will delete these Disp32 lines based on SHA512 comments. Also same
for SM4 patch.

After H.J. gives his opinion on the "#as:" in the beginning and ".allow_index_reg",
I will commit all the patches. However, I suppose it should not be a blocking
issue since if we want to delete those, another patch to deal with all the
testcases is needed.

Thx,
Haochen

> The numbers in parentheses are odd. I'd prefer if they were omitted,
> but I'd also be okay of you flipped their byte order so they properly
> correspond (as numbers) to the displacements used.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v3] Support Intel SHA512
  2023-07-20 11:07                 ` [PATCH v3] " Jan Beulich
@ 2023-07-27  5:52                   ` Jiang, Haochen
  0 siblings, 0 replies; 31+ messages in thread
From: Jiang, Haochen @ 2023-07-27  5:52 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: hjl.tools, binutils

> Thanks for making those adjustments. I'm not happy to approve new
> testcases going in with (apparently) superfluous content. I'll leave approving
> to H.J.

Hi Jan,

We decided to drop those lines after discussion. I will commit those patches
with "#as:" and ".allow_index_reg" removed.

Also, I will write a patch to remove those lines in other test files.

Thx,
Haochen

> in that case, in the expectation that he'll weigh both perspectives.
> 
> Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-07-27  5:52 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13  6:32 [PATCH 0/5] Support Intel Arrow Lake/Lunar Lake ISAs Haochen Jiang
2023-07-13  6:32 ` [PATCH 1/5] Support Intel AVX-VNNI-INT16 Haochen Jiang
2023-07-13  9:29   ` Jan Beulich
2023-07-14  5:51     ` Jiang, Haochen
2023-07-13  6:33 ` [PATCH 2/5] Support Intel SHA512 Haochen Jiang
2023-07-13 10:02   ` Jan Beulich
2023-07-14  3:40     ` Jiang, Haochen
2023-07-14  7:12       ` Jan Beulich
2023-07-18  7:20         ` Jiang, Haochen
2023-07-18  7:54           ` [PATCH v2] " Haochen Jiang
2023-07-18  7:59             ` Jiang, Haochen
2023-07-18  8:51             ` Jan Beulich
2023-07-20  8:32               ` Jiang, Haochen
2023-07-20 10:37                 ` Jan Beulich
2023-07-20  8:32               ` [PATCH] " Haochen Jiang
2023-07-20 11:07                 ` [PATCH v3] " Jan Beulich
2023-07-27  5:52                   ` Jiang, Haochen
2023-07-18  8:11           ` [PATCH 2/5] " Jan Beulich
2023-07-13  6:33 ` [PATCH 3/5] Support Intel SM3 Haochen Jiang
2023-07-13 10:20   ` Jan Beulich
2023-07-18  8:09     ` [PATCH v2] " Haochen Jiang
2023-07-18  9:03       ` Jan Beulich
2023-07-24  2:54         ` Jiang, Haochen
2023-07-13  6:33 ` [PATCH 4/5] Support Intel SM4 Haochen Jiang
2023-07-13 10:25   ` Jan Beulich
2023-07-18  7:21     ` Jiang, Haochen
2023-07-18  8:13       ` [PATCH v2] " Haochen Jiang
2023-07-18  9:11         ` Jan Beulich
2023-07-13  6:33 ` [PATCH 5/5] Support Intel PBNDKB Haochen Jiang
2023-07-13 10:29   ` Jan Beulich
2023-07-14  7:15     ` Jiang, Haochen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).