public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Haochen Jiang <haochen.jiang@intel.com>
Cc: binutils@sourceware.org, jbeulich@suse.com,
	 "Cui,Lili" <lili.cui@intel.com>
Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
Date: Mon, 31 Oct 2022 09:53:08 -0700	[thread overview]
Message-ID: <CAMe9rOqEXe5O8uWJhOPsQqNd2_8S3oC6PbrUzMr-Wo_zFqkreA@mail.gmail.com> (raw)
In-Reply-To: <20221031030507.35588-3-haochen.jiang@intel.com>

On Sun, Oct 30, 2022 at 8:07 PM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> From: "Cui,Lili" <lili.cui@intel.com>
>
> gas/
>         * NEWS: Support Intel AVX-VNNI-INT8.
>         * config/tc-i386.c: Add avx_vnni_int8.
>         * doc/c-i386.texi: Document avx_vnni_int8.
>         * testsuite/gas/i386/avx-vnni-int8-intel.d: New file.
>         * testsuite/gas/i386/avx-vnni-int8.d: Likewise.
>         * testsuite/gas/i386/avx-vnni-int8.s: Likewise.
>         * testsuite/gas/i386/x86-64-avx-vnni-int8-intel.d: Likewise.
>         * testsuite/gas/i386/x86-64-avx-vnni-int8.d: Likewise.
>         * testsuite/gas/i386/x86-64-avx-vnni-int8.s: Likewise.
>         * testsuite/gas/i386/i386.exp: Run AVX VNNI INT8 tests.
>
> opcodes/
>         * i386-dis.c: (PREFIX_VEX_0F3850) New.
>         (PREFIX_VEX_0F3851): Likewise.
>         (VEX_W_0F3850_P_0): Likewise.
>         (VEX_W_0F3850_P_1): Likewise.
>         (VEX_W_0F3850_P_2): Likewise.
>         (VEX_W_0F3850_P_3): Likewise.
>         (VEX_W_0F3851_P_0): Likewise.
>         (VEX_W_0F3851_P_1): Likewise.
>         (VEX_W_0F3851_P_2): Likewise.
>         (VEX_W_0F3851_P_3): Likewise.
>         (VEX_W_0F3850): Delete.
>         (VEX_W_0F3851): Likewise.
>         (prefix_table): Add PREFIX_VEX_0F3850 and PREFIX_VEX_0F3851.
>         (vex_table): Add PREFIX_VEX_0F3850 and PREFIX_VEX_0F3851,
>         delete VEX_W_0F3850 and VEX_W_0F3851.
>         (vex_w_table): Add VEX_W_0F3850_P_0, VEX_W_0F3850_P_1, VEX_W_0F3850_P_2
>         VEX_W_0F3850_P_3, VEX_W_0F3851_P_0, VEX_W_0F3851_P_1, VEX_W_0F3851_P_2
>         and VEX_W_0F3851_P_3, delete VEX_W_0F3850 and VEX_W_0F3851.
>         * i386-gen.c: (cpu_flag_init): Add CPU_AVX_VNNI_INT8_FLAGS
>         and CPU_ANY_AVX_VNNI_INT8_FLAGS.
>         (cpu_flags): Add CpuAVX_VNNI_INT8.
>         * i386-opc.h (CpuAVX_VNNI_INT8): New.
>         * i386-opc.tbl: Add Intel AVX_VNNI_INT8 instructions.
>         * i386-init.h: Regenerated.
>         * i386-tbl.h: Likewise.
> ---
>  gas/NEWS                                      |   2 +
>  gas/config/tc-i386.c                          |   1 +
>  gas/doc/c-i386.texi                           |   3 +-
>  gas/testsuite/gas/i386/avx-vnni-int8-intel.d  |  71 ++
>  gas/testsuite/gas/i386/avx-vnni-int8.d        |  71 ++
>  gas/testsuite/gas/i386/avx-vnni-int8.s        | 127 +++
>  gas/testsuite/gas/i386/i386.exp               |   4 +
>  .../gas/i386/x86-64-avx-vnni-int8-intel.d     |  71 ++
>  gas/testsuite/gas/i386/x86-64-avx-vnni-int8.d |  71 ++
>  gas/testsuite/gas/i386/x86-64-avx-vnni-int8.s | 127 +++
>  opcodes/i386-dis.c                            |  23 +-
>  opcodes/i386-gen.c                            |   7 +-
>  opcodes/i386-init.h                           | 140 +--
>  opcodes/i386-opc.h                            |   5 +-
>  opcodes/i386-opc.tbl                          |  11 +
>  opcodes/i386-tbl.h                            | 882 ++++++++++--------
>  16 files changed, 1159 insertions(+), 457 deletions(-)
>  create mode 100644 gas/testsuite/gas/i386/avx-vnni-int8-intel.d
>  create mode 100644 gas/testsuite/gas/i386/avx-vnni-int8.d
>  create mode 100644 gas/testsuite/gas/i386/avx-vnni-int8.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int8-intel.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int8.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-avx-vnni-int8.s
>
> diff --git a/gas/NEWS b/gas/NEWS
> index 121aaa80c5..1547bfd469 100644
> --- a/gas/NEWS
> +++ b/gas/NEWS
> @@ -1,5 +1,7 @@
>  -*- text -*-
>
> +* Add support for Intel AVX-VNNI-INT8 instructions.
> +
>  * Add support for Intel AVX-IFMA instructions.
>
>  * Add support for Intel PREFETCHI instructions.
> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
> index adbc22de8d..26d8efb47e 100644
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -1097,6 +1097,7 @@ static const arch_entry cpu_arch[] =
>    SUBARCH (avx512_fp16, AVX512_FP16, ANY_AVX512_FP16, false),
>    SUBARCH (prefetchi, PREFETCHI, ANY_PREFETCHI, false),
>    SUBARCH (avx_ifma, AVX_IFMA, ANY_AVX_IFMA, false),
> +  SUBARCH (avx_vnni_int8, AVX_VNNI_INT8, ANY_AVX_VNNI_INT8, false),
>  };
>
>  #undef SUBARCH
> diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
> index 7bdbd26538..029f5f2e04 100644
> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -196,6 +196,7 @@ accept various extension mnemonics.  For example,
>  @code{avx512_fp16},
>  @code{prefetchi},
>  @code{avx_ifma},
> +@code{avx_vnni_int8},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -1489,7 +1490,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
>  @item @samp{.avx512_bitalg} @tab @samp{.avx512_bf16} @tab @samp{.avx512_vp2intersect}
>  @item @samp{.tdx} @tab @samp{.avx_vnni}  @tab @samp{.avx512_fp16}
>  @item @samp{.clwb} @tab @samp{.rdpid} @tab @samp{.ptwrite} @tab @samp{.ibt}
> -@item @samp{.prefetchi} @tab @samp{.avx_ifma}
> +@item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
>  @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
>  @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
>  @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
> diff --git a/gas/testsuite/gas/i386/avx-vnni-int8-intel.d b/gas/testsuite/gas/i386/avx-vnni-int8-intel.d
> new file mode 100644
> index 0000000000..1d7d162f20
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/avx-vnni-int8-intel.d
> @@ -0,0 +1,71 @@
> +#as:
> +#objdump: -dw -Mintel
> +#name: i386 AVX-VNNI-INT8 insns (Intel disassembly)
> +#source: avx-vnni-int8.s
> +
> +.*: +file format .*
> +
> +Disassembly of section \.text:
> +
> +0+ <_start>:
> +\s*[a-f0-9]+:\s*c4 e2 57 50 f4\s+vpdpbssd ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 53 50 f4\s+vpdpbssd xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b4 f4 00 00 00 10\s+vpdpbssd ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 57 50 31\s+vpdpbssd ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b1 e0 0f 00 00\s+vpdpbssd ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b2 00 f0 ff ff\s+vpdpbssd ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b4 f4 00 00 00 10\s+vpdpbssd xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 53 50 31\s+vpdpbssd xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b1 f0 07 00 00\s+vpdpbssd xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b2 00 f8 ff ff\s+vpdpbssd xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +\s*[a-f0-9]+:\s*c4 e2 57 51 f4\s+vpdpbssds ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 53 51 f4\s+vpdpbssds xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b4 f4 00 00 00 10\s+vpdpbssds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 57 51 31\s+vpdpbssds ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b1 e0 0f 00 00\s+vpdpbssds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b2 00 f0 ff ff\s+vpdpbssds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b4 f4 00 00 00 10\s+vpdpbssds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 53 51 31\s+vpdpbssds xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b1 f0 07 00 00\s+vpdpbssds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b2 00 f8 ff ff\s+vpdpbssds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +\s*[a-f0-9]+:\s*c4 e2 56 50 f4\s+vpdpbsud ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 52 50 f4\s+vpdpbsud xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b4 f4 00 00 00 10\s+vpdpbsud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 56 50 31\s+vpdpbsud ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b1 e0 0f 00 00\s+vpdpbsud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b2 00 f0 ff ff\s+vpdpbsud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b4 f4 00 00 00 10\s+vpdpbsud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 52 50 31\s+vpdpbsud xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b1 f0 07 00 00\s+vpdpbsud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b2 00 f8 ff ff\s+vpdpbsud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +\s*[a-f0-9]+:\s*c4 e2 56 51 f4\s+vpdpbsuds ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 52 51 f4\s+vpdpbsuds xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b4 f4 00 00 00 10\s+vpdpbsuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 56 51 31\s+vpdpbsuds ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b1 e0 0f 00 00\s+vpdpbsuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b2 00 f0 ff ff\s+vpdpbsuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b4 f4 00 00 00 10\s+vpdpbsuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 52 51 31\s+vpdpbsuds xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b1 f0 07 00 00\s+vpdpbsuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b2 00 f8 ff ff\s+vpdpbsuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +\s*[a-f0-9]+:\s*c4 e2 54 50 f4\s+vpdpbuud ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 50 50 f4\s+vpdpbuud xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b4 f4 00 00 00 10\s+vpdpbuud ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 54 50 31\s+vpdpbuud ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b1 e0 0f 00 00\s+vpdpbuud ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b2 00 f0 ff ff\s+vpdpbuud ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b4 f4 00 00 00 10\s+vpdpbuud xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 50 50 31\s+vpdpbuud xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b1 f0 07 00 00\s+vpdpbuud xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b2 00 f8 ff ff\s+vpdpbuud xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +\s*[a-f0-9]+:\s*c4 e2 54 51 f4\s+vpdpbuuds ymm6,ymm5,ymm4
> +\s*[a-f0-9]+:\s*c4 e2 50 51 f4\s+vpdpbuuds xmm6,xmm5,xmm4
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b4 f4 00 00 00 10\s+vpdpbuuds ymm6,ymm5,YMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 54 51 31\s+vpdpbuuds ymm6,ymm5,YMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b1 e0 0f 00 00\s+vpdpbuuds ymm6,ymm5,YMMWORD PTR \[ecx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b2 00 f0 ff ff\s+vpdpbuuds ymm6,ymm5,YMMWORD PTR \[edx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b4 f4 00 00 00 10\s+vpdpbuuds xmm6,xmm5,XMMWORD PTR \[esp\+esi\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 e2 50 51 31\s+vpdpbuuds xmm6,xmm5,XMMWORD PTR \[ecx\]
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b1 f0 07 00 00\s+vpdpbuuds xmm6,xmm5,XMMWORD PTR \[ecx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b2 00 f8 ff ff\s+vpdpbuuds xmm6,xmm5,XMMWORD PTR \[edx-0x800\]
> +#pass
> diff --git a/gas/testsuite/gas/i386/avx-vnni-int8.d b/gas/testsuite/gas/i386/avx-vnni-int8.d
> new file mode 100644
> index 0000000000..cd4499e59f
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/avx-vnni-int8.d
> @@ -0,0 +1,71 @@
> +#as:
> +#objdump: -dw
> +#name: i386 AVX-VNNI-INT8 insns
> +#source: avx-vnni-int8.s
> +
> +.*: +file format .*
> +
> +Disassembly of section \.text:
> +
> +0+ <_start>:
> +\s*[a-f0-9]+:\s*c4 e2 57 50 f4\s+vpdpbssd %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 53 50 f4\s+vpdpbssd %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b4 f4 00 00 00 10\s+vpdpbssd 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 50 31\s+vpdpbssd \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b1 e0 0f 00 00\s+vpdpbssd 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 50 b2 00 f0 ff ff\s+vpdpbssd -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b4 f4 00 00 00 10\s+vpdpbssd 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 50 31\s+vpdpbssd \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b1 f0 07 00 00\s+vpdpbssd 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 50 b2 00 f8 ff ff\s+vpdpbssd -0x800\(%edx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 57 51 f4\s+vpdpbssds %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 53 51 f4\s+vpdpbssds %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b4 f4 00 00 00 10\s+vpdpbssds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 51 31\s+vpdpbssds \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b1 e0 0f 00 00\s+vpdpbssds 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 57 51 b2 00 f0 ff ff\s+vpdpbssds -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b4 f4 00 00 00 10\s+vpdpbssds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 51 31\s+vpdpbssds \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b1 f0 07 00 00\s+vpdpbssds 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 53 51 b2 00 f8 ff ff\s+vpdpbssds -0x800\(%edx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 56 50 f4\s+vpdpbsud %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 52 50 f4\s+vpdpbsud %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b4 f4 00 00 00 10\s+vpdpbsud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 50 31\s+vpdpbsud \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b1 e0 0f 00 00\s+vpdpbsud 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 50 b2 00 f0 ff ff\s+vpdpbsud -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b4 f4 00 00 00 10\s+vpdpbsud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 50 31\s+vpdpbsud \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b1 f0 07 00 00\s+vpdpbsud 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 50 b2 00 f8 ff ff\s+vpdpbsud -0x800\(%edx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 56 51 f4\s+vpdpbsuds %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 52 51 f4\s+vpdpbsuds %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b4 f4 00 00 00 10\s+vpdpbsuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 51 31\s+vpdpbsuds \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b1 e0 0f 00 00\s+vpdpbsuds 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 56 51 b2 00 f0 ff ff\s+vpdpbsuds -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b4 f4 00 00 00 10\s+vpdpbsuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 51 31\s+vpdpbsuds \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b1 f0 07 00 00\s+vpdpbsuds 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 52 51 b2 00 f8 ff ff\s+vpdpbsuds -0x800\(%edx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 54 50 f4\s+vpdpbuud %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 50 50 f4\s+vpdpbuud %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b4 f4 00 00 00 10\s+vpdpbuud 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 50 31\s+vpdpbuud \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b1 e0 0f 00 00\s+vpdpbuud 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 50 b2 00 f0 ff ff\s+vpdpbuud -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b4 f4 00 00 00 10\s+vpdpbuud 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 50 31\s+vpdpbuud \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b1 f0 07 00 00\s+vpdpbuud 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 50 b2 00 f8 ff ff\s+vpdpbuud -0x800\(%edx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 54 51 f4\s+vpdpbuuds %ymm4,%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 50 51 f4\s+vpdpbuuds %xmm4,%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b4 f4 00 00 00 10\s+vpdpbuuds 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 51 31\s+vpdpbuuds \(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b1 e0 0f 00 00\s+vpdpbuuds 0xfe0\(%ecx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 54 51 b2 00 f0 ff ff\s+vpdpbuuds -0x1000\(%edx\),%ymm5,%ymm6
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b4 f4 00 00 00 10\s+vpdpbuuds 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 51 31\s+vpdpbuuds \(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b1 f0 07 00 00\s+vpdpbuuds 0x7f0\(%ecx\),%xmm5,%xmm6
> +\s*[a-f0-9]+:\s*c4 e2 50 51 b2 00 f8 ff ff\s+vpdpbuuds -0x800\(%edx\),%xmm5,%xmm6
> +#pass
> diff --git a/gas/testsuite/gas/i386/avx-vnni-int8.s b/gas/testsuite/gas/i386/avx-vnni-int8.s
> new file mode 100644
> index 0000000000..e3cfeb6680
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/avx-vnni-int8.s
> @@ -0,0 +1,127 @@
> +# Check 32bit AVX-VNNI-INT8 instructions
> +
> +       .allow_index_reg
> +       .text
> +_start:
> +       vpdpbssd        %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbssd        %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbssd        0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbssd        (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbssd        4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssd        -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssd        0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbssd        (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbssd        2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssd        -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbssds       %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbssds       %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbssds       0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbssds       (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbssds       4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssds       -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssds       0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbssds       (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbssds       2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssds       -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsud        %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbsud        %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbsud        0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbsud        (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbsud        4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsud        -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsud        0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbsud        (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbsud        2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsud        -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsuds       %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbsuds       %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbsuds       0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbsuds       (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbsuds       4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsuds       -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsuds       0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbsuds       (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbsuds       2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsuds       -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuud        %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbuud        %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbuud        0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbuud        (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbuud        4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuud        -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuud        0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbuud        (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbuud        2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuud        -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuuds       %ymm4, %ymm5, %ymm6      #AVX-VNNI-INT8
> +       vpdpbuuds       %xmm4, %xmm5, %xmm6      #AVX-VNNI-INT8
> +       vpdpbuuds       0x10000000(%esp, %esi, 8), %ymm5, %ymm6  #AVX-VNNI-INT8
> +       vpdpbuuds       (%ecx), %ymm5, %ymm6     #AVX-VNNI-INT8
> +       vpdpbuuds       4064(%ecx), %ymm5, %ymm6         #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuuds       -4096(%edx), %ymm5, %ymm6        #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuuds       0x10000000(%esp, %esi, 8), %xmm5, %xmm6  #AVX-VNNI-INT8
> +       vpdpbuuds       (%ecx), %xmm5, %xmm6     #AVX-VNNI-INT8
> +       vpdpbuuds       2032(%ecx), %xmm5, %xmm6         #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuuds       -2048(%edx), %xmm5, %xmm6        #AVX-VNNI-INT8 Disp32(00f8ffff)
> +
> +.intel_syntax noprefix
> +       vpdpbssd        ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbssd        xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbssd        ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbssd        ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbssd        ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssd        ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssd        xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbssd        xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbssd        xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssd        xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbssds       ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbssds       xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbssds       ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbssds       ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbssds       ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssds       ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssds       xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbssds       xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbssds       xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssds       xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsud        ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbsud        xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbsud        ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbsud        ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbsud        ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsud        ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsud        xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbsud        xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbsud        xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsud        xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsuds       ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbsuds       xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbsuds       ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbsuds       ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbsuds       ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsuds       ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsuds       xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbsuds       xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbsuds       xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsuds       xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuud        ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbuud        xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbuud        ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbuud        ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbuud        ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuud        ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuud        xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbuud        xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbuud        xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuud        xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuuds       ymm6, ymm5, ymm4         #AVX-VNNI-INT8
> +       vpdpbuuds       xmm6, xmm5, xmm4         #AVX-VNNI-INT8
> +       vpdpbuuds       ymm6, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbuuds       ymm6, ymm5, YMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbuuds       ymm6, ymm5, YMMWORD PTR [ecx+4064]       #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuuds       ymm6, ymm5, YMMWORD PTR [edx-4096]       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuuds       xmm6, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]   #AVX-VNNI-INT8
> +       vpdpbuuds       xmm6, xmm5, XMMWORD PTR [ecx]    #AVX-VNNI-INT8
> +       vpdpbuuds       xmm6, xmm5, XMMWORD PTR [ecx+2032]       #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuuds       xmm6, xmm5, XMMWORD PTR [edx-2048]       #AVX-VNNI-INT8 Disp32(00f8ffff)
> diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
> index 96ab1a02d1..b75fe85cb3 100644
> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -477,6 +477,8 @@ if [gas_32_check] then {
>      run_dump_test "avx-ifma"
>      run_dump_test "avx-ifma-intel"
>      run_list_test "avx-ifma-inval"
> +    run_dump_test "avx-vnni-int8"
> +    run_dump_test "avx-vnni-int8-intel"
>      run_list_test "sg"
>      run_dump_test "clzero"
>      run_dump_test "invlpgb"
> @@ -1148,6 +1150,8 @@ if [gas_64_check] then {
>      run_dump_test "x86-64-avx-ifma"
>      run_dump_test "x86-64-avx-ifma-intel"
>      run_list_test "x86-64-avx-ifma-inval"
> +    run_dump_test "x86-64-avx-vnni-int8"
> +    run_dump_test "x86-64-avx-vnni-int8-intel"
>      run_dump_test "x86-64-clzero"
>      run_dump_test "x86-64-mwaitx-bdver4"
>      run_list_test "x86-64-mwaitx-reg"
> diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int8-intel.d b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8-intel.d
> new file mode 100644
> index 0000000000..61c01124ef
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8-intel.d
> @@ -0,0 +1,71 @@
> +#as:
> +#objdump: -dw -Mintel
> +#name: x86_64 AVX-VNNI-INT8 insns (Intel disassembly)
> +#source: x86-64-avx-vnni-int8.s
> +
> +.*: +file format .*
> +
> +Disassembly of section \.text:
> +
> +0+ <_start>:
> +\s*[a-f0-9]+:\s*c4 42 37 50 d0\s+vpdpbssd ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 33 50 d0\s+vpdpbssd xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 37 50 94 f5 00 00 00 10\s+vpdpbssd ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 37 50 11\s+vpdpbssd ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 37 50 91 e0 0f 00 00\s+vpdpbssd ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 37 50 92 00 f0 ff ff\s+vpdpbssd ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 33 50 94 f5 00 00 00 10\s+vpdpbssd xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 33 50 11\s+vpdpbssd xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 33 50 91 f0 07 00 00\s+vpdpbssd xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 33 50 92 00 f8 ff ff\s+vpdpbssd xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +\s*[a-f0-9]+:\s*c4 42 37 51 d0\s+vpdpbssds ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 33 51 d0\s+vpdpbssds xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 37 51 94 f5 00 00 00 10\s+vpdpbssds ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 37 51 11\s+vpdpbssds ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 37 51 91 e0 0f 00 00\s+vpdpbssds ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 37 51 92 00 f0 ff ff\s+vpdpbssds ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 33 51 94 f5 00 00 00 10\s+vpdpbssds xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 33 51 11\s+vpdpbssds xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 33 51 91 f0 07 00 00\s+vpdpbssds xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 33 51 92 00 f8 ff ff\s+vpdpbssds xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +\s*[a-f0-9]+:\s*c4 42 36 50 d0\s+vpdpbsud ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 32 50 d0\s+vpdpbsud xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 36 50 94 f5 00 00 00 10\s+vpdpbsud ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 36 50 11\s+vpdpbsud ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 36 50 91 e0 0f 00 00\s+vpdpbsud ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 36 50 92 00 f0 ff ff\s+vpdpbsud ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 32 50 94 f5 00 00 00 10\s+vpdpbsud xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 32 50 11\s+vpdpbsud xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 32 50 91 f0 07 00 00\s+vpdpbsud xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 32 50 92 00 f8 ff ff\s+vpdpbsud xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +\s*[a-f0-9]+:\s*c4 42 36 51 d0\s+vpdpbsuds ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 32 51 d0\s+vpdpbsuds xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 36 51 94 f5 00 00 00 10\s+vpdpbsuds ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 36 51 11\s+vpdpbsuds ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 36 51 91 e0 0f 00 00\s+vpdpbsuds ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 36 51 92 00 f0 ff ff\s+vpdpbsuds ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 32 51 94 f5 00 00 00 10\s+vpdpbsuds xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 32 51 11\s+vpdpbsuds xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 32 51 91 f0 07 00 00\s+vpdpbsuds xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 32 51 92 00 f8 ff ff\s+vpdpbsuds xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +\s*[a-f0-9]+:\s*c4 42 34 50 d0\s+vpdpbuud ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 30 50 d0\s+vpdpbuud xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 34 50 94 f5 00 00 00 10\s+vpdpbuud ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 34 50 11\s+vpdpbuud ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 34 50 91 e0 0f 00 00\s+vpdpbuud ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 34 50 92 00 f0 ff ff\s+vpdpbuud ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 30 50 94 f5 00 00 00 10\s+vpdpbuud xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 30 50 11\s+vpdpbuud xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 30 50 91 f0 07 00 00\s+vpdpbuud xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 30 50 92 00 f8 ff ff\s+vpdpbuud xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +\s*[a-f0-9]+:\s*c4 42 34 51 d0\s+vpdpbuuds ymm10,ymm9,ymm8
> +\s*[a-f0-9]+:\s*c4 42 30 51 d0\s+vpdpbuuds xmm10,xmm9,xmm8
> +\s*[a-f0-9]+:\s*c4 22 34 51 94 f5 00 00 00 10\s+vpdpbuuds ymm10,ymm9,YMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 34 51 11\s+vpdpbuuds ymm10,ymm9,YMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 34 51 91 e0 0f 00 00\s+vpdpbuuds ymm10,ymm9,YMMWORD PTR \[rcx\+0xfe0\]
> +\s*[a-f0-9]+:\s*c4 62 34 51 92 00 f0 ff ff\s+vpdpbuuds ymm10,ymm9,YMMWORD PTR \[rdx-0x1000\]
> +\s*[a-f0-9]+:\s*c4 22 30 51 94 f5 00 00 00 10\s+vpdpbuuds xmm10,xmm9,XMMWORD PTR \[rbp\+r14\*8\+0x10000000\]
> +\s*[a-f0-9]+:\s*c4 42 30 51 11\s+vpdpbuuds xmm10,xmm9,XMMWORD PTR \[r9\]
> +\s*[a-f0-9]+:\s*c4 62 30 51 91 f0 07 00 00\s+vpdpbuuds xmm10,xmm9,XMMWORD PTR \[rcx\+0x7f0\]
> +\s*[a-f0-9]+:\s*c4 62 30 51 92 00 f8 ff ff\s+vpdpbuuds xmm10,xmm9,XMMWORD PTR \[rdx-0x800\]
> +#pass
> diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.d b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.d
> new file mode 100644
> index 0000000000..90faed581b
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.d
> @@ -0,0 +1,71 @@
> +#as:
> +#objdump: -dw
> +#name: x86_64 AVX-VNNI-INT8 insns
> +#source: x86-64-avx-vnni-int8.s
> +
> +.*: +file format .*
> +
> +Disassembly of section \.text:
> +
> +0+ <_start>:
> +\s*[a-f0-9]+:\s*c4 42 37 50 d0\s+vpdpbssd %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 33 50 d0\s+vpdpbssd %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 37 50 94 f5 00 00 00 10\s+vpdpbssd 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 37 50 11\s+vpdpbssd \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 37 50 91 e0 0f 00 00\s+vpdpbssd 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 37 50 92 00 f0 ff ff\s+vpdpbssd -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 33 50 94 f5 00 00 00 10\s+vpdpbssd 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 33 50 11\s+vpdpbssd \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 33 50 91 f0 07 00 00\s+vpdpbssd 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 33 50 92 00 f8 ff ff\s+vpdpbssd -0x800\(%rdx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 37 51 d0\s+vpdpbssds %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 33 51 d0\s+vpdpbssds %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 37 51 94 f5 00 00 00 10\s+vpdpbssds 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 37 51 11\s+vpdpbssds \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 37 51 91 e0 0f 00 00\s+vpdpbssds 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 37 51 92 00 f0 ff ff\s+vpdpbssds -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 33 51 94 f5 00 00 00 10\s+vpdpbssds 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 33 51 11\s+vpdpbssds \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 33 51 91 f0 07 00 00\s+vpdpbssds 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 33 51 92 00 f8 ff ff\s+vpdpbssds -0x800\(%rdx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 36 50 d0\s+vpdpbsud %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 32 50 d0\s+vpdpbsud %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 36 50 94 f5 00 00 00 10\s+vpdpbsud 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 36 50 11\s+vpdpbsud \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 36 50 91 e0 0f 00 00\s+vpdpbsud 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 36 50 92 00 f0 ff ff\s+vpdpbsud -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 32 50 94 f5 00 00 00 10\s+vpdpbsud 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 32 50 11\s+vpdpbsud \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 32 50 91 f0 07 00 00\s+vpdpbsud 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 32 50 92 00 f8 ff ff\s+vpdpbsud -0x800\(%rdx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 36 51 d0\s+vpdpbsuds %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 32 51 d0\s+vpdpbsuds %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 36 51 94 f5 00 00 00 10\s+vpdpbsuds 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 36 51 11\s+vpdpbsuds \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 36 51 91 e0 0f 00 00\s+vpdpbsuds 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 36 51 92 00 f0 ff ff\s+vpdpbsuds -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 32 51 94 f5 00 00 00 10\s+vpdpbsuds 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 32 51 11\s+vpdpbsuds \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 32 51 91 f0 07 00 00\s+vpdpbsuds 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 32 51 92 00 f8 ff ff\s+vpdpbsuds -0x800\(%rdx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 34 50 d0\s+vpdpbuud %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 30 50 d0\s+vpdpbuud %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 34 50 94 f5 00 00 00 10\s+vpdpbuud 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 34 50 11\s+vpdpbuud \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 34 50 91 e0 0f 00 00\s+vpdpbuud 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 34 50 92 00 f0 ff ff\s+vpdpbuud -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 30 50 94 f5 00 00 00 10\s+vpdpbuud 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 30 50 11\s+vpdpbuud \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 30 50 91 f0 07 00 00\s+vpdpbuud 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 30 50 92 00 f8 ff ff\s+vpdpbuud -0x800\(%rdx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 34 51 d0\s+vpdpbuuds %ymm8,%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 30 51 d0\s+vpdpbuuds %xmm8,%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 22 34 51 94 f5 00 00 00 10\s+vpdpbuuds 0x10000000\(%rbp,%r14,8\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 42 34 51 11\s+vpdpbuuds \(%r9\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 34 51 91 e0 0f 00 00\s+vpdpbuuds 0xfe0\(%rcx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 62 34 51 92 00 f0 ff ff\s+vpdpbuuds -0x1000\(%rdx\),%ymm9,%ymm10
> +\s*[a-f0-9]+:\s*c4 22 30 51 94 f5 00 00 00 10\s+vpdpbuuds 0x10000000\(%rbp,%r14,8\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 42 30 51 11\s+vpdpbuuds \(%r9\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 30 51 91 f0 07 00 00\s+vpdpbuuds 0x7f0\(%rcx\),%xmm9,%xmm10
> +\s*[a-f0-9]+:\s*c4 62 30 51 92 00 f8 ff ff\s+vpdpbuuds -0x800\(%rdx\),%xmm9,%xmm10
> +#pass
> diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.s b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.s
> new file mode 100644
> index 0000000000..bc9145b26f
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-int8.s
> @@ -0,0 +1,127 @@
> +# Check 64bit AVX-VNNI-INT8 instructions
> +
> +       .allow_index_reg
> +       .text
> +_start:
> +       vpdpbssd        %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbssd        %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbssd        0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbssd        (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbssd        4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssd        -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssd        0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbssd        (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbssd        2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssd        -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbssds       %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbssds       %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbssds       0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbssds       (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbssds       4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssds       -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssds       0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbssds       (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbssds       2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssds       -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsud        %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbsud        %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbsud        0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbsud        (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbsud        4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsud        -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsud        0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbsud        (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbsud        2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsud        -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsuds       %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbsuds       %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbsuds       0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbsuds       (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbsuds       4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsuds       -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsuds       0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbsuds       (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbsuds       2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsuds       -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuud        %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbuud        %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbuud        0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbuud        (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbuud        4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuud        -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuud        0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbuud        (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbuud        2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuud        -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuuds       %ymm8, %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbuuds       %xmm8, %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbuuds       0x10000000(%rbp, %r14, 8), %ymm9, %ymm10         #AVX-VNNI-INT8
> +       vpdpbuuds       (%r9), %ymm9, %ymm10     #AVX-VNNI-INT8
> +       vpdpbuuds       4064(%rcx), %ymm9, %ymm10        #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuuds       -4096(%rdx), %ymm9, %ymm10       #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuuds       0x10000000(%rbp, %r14, 8), %xmm9, %xmm10         #AVX-VNNI-INT8
> +       vpdpbuuds       (%r9), %xmm9, %xmm10     #AVX-VNNI-INT8
> +       vpdpbuuds       2032(%rcx), %xmm9, %xmm10        #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuuds       -2048(%rdx), %xmm9, %xmm10       #AVX-VNNI-INT8 Disp32(00f8ffff)
> +
> +.intel_syntax noprefix
> +       vpdpbssd        ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbssd        xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbssd        ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbssd        ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbssd        ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssd        ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssd        xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbssd        xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbssd        xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssd        xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbssds       ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbssds       xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbssds       ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbssds       ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbssds       ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbssds       ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbssds       xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbssds       xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbssds       xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbssds       xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsud        ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbsud        xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbsud        ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbsud        ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbsud        ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsud        ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsud        xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbsud        xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbsud        xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsud        xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbsuds       ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbsuds       xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbsuds       ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbsuds       ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbsuds       ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbsuds       ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbsuds       xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbsuds       xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbsuds       xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbsuds       xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuud        ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbuud        xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbuud        ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbuud        ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbuud        ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuud        ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuud        xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbuud        xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbuud        xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuud        xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> +       vpdpbuuds       ymm10, ymm9, ymm8        #AVX-VNNI-INT8
> +       vpdpbuuds       xmm10, xmm9, xmm8        #AVX-VNNI-INT8
> +       vpdpbuuds       ymm10, ymm9, YMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbuuds       ymm10, ymm9, YMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbuuds       ymm10, ymm9, YMMWORD PTR [rcx+4064]      #AVX-VNNI-INT8 Disp32(e00f0000)
> +       vpdpbuuds       ymm10, ymm9, YMMWORD PTR [rdx-4096]      #AVX-VNNI-INT8 Disp32(00f0ffff)
> +       vpdpbuuds       xmm10, xmm9, XMMWORD PTR [rbp+r14*8+0x10000000]  #AVX-VNNI-INT8
> +       vpdpbuuds       xmm10, xmm9, XMMWORD PTR [r9]    #AVX-VNNI-INT8
> +       vpdpbuuds       xmm10, xmm9, XMMWORD PTR [rcx+2032]      #AVX-VNNI-INT8 Disp32(f0070000)
> +       vpdpbuuds       xmm10, xmm9, XMMWORD PTR [rdx-2048]      #AVX-VNNI-INT8 Disp32(00f8ffff)
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index ba232939d7..436d2e7a08 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -1132,6 +1132,8 @@ enum
>    PREFIX_VEX_0FF0,
>    PREFIX_VEX_0F3849_X86_64,
>    PREFIX_VEX_0F384B_X86_64,
> +  PREFIX_VEX_0F3850_W_0,
> +  PREFIX_VEX_0F3851_W_0,
>    PREFIX_VEX_0F385C_X86_64,
>    PREFIX_VEX_0F385E_X86_64,
>    PREFIX_VEX_0F38F5_L_0,
> @@ -4014,6 +4016,21 @@ static const struct dis386 prefix_table[][4] = {
>      { VEX_W_TABLE (VEX_W_0F384B_X86_64_P_3) },
>    },
>
> +  /* PREFIX_VEX_0F3850_W_0 */
> +  {
> +    { "vpdpbuud",      { XM, Vex, EXx }, 0 },
> +    { "vpdpbsud",      { XM, Vex, EXx }, 0 },
> +    { "%XVvpdpbusd",   { XM, Vex, EXx }, 0 },
> +    { "vpdpbssd",      { XM, Vex, EXx }, 0 },
> +  },
> +
> +  /* PREFIX_VEX_0F3851_W_0 */
> +  {
> +    { "vpdpbuuds",     { XM, Vex, EXx }, 0 },
> +    { "vpdpbsuds",     { XM, Vex, EXx }, 0 },
> +    { "%XVvpdpbusds",  { XM, Vex, EXx }, 0 },
> +    { "vpdpbssds",     { XM, Vex, EXx }, 0 },
> +  },
>    /* PREFIX_VEX_0F385C_X86_64 */
>    {
>      { Bad_Opcode },
> @@ -7575,11 +7592,11 @@ static const struct dis386 vex_w_table[][2] = {
>    },
>    {
>      /* VEX_W_0F3850 */
> -    { "%XVvpdpbusd",   { XM, Vex, EXx }, PREFIX_DATA },
> +    { PREFIX_TABLE (PREFIX_VEX_0F3850_W_0) },
>    },
>    {
> -    /* VEX_W_0F3851 */
> -    { "%XVvpdpbusds",  { XM, Vex, EXx }, PREFIX_DATA },
> +    /* VEX_W_0F3851_P_0 */
> +    { PREFIX_TABLE (PREFIX_VEX_0F3851_W_0) },
>    },
>    {
>      /* VEX_W_0F3852 */
> diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
> index dd759fbc7c..21986220d6 100644
> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -249,6 +249,8 @@ static initializer cpu_flag_init[] =
>      "CpuPREFETCHI"},
>    { "CPU_AVX_IFMA_FLAGS",
>      "CPU_AVX2_FLAGS|CpuAVX_IFMA" },
> +  { "CPU_AVX_VNNI_INT8_FLAGS",
> +    "CPU_AVX2_FLAGS|CpuAVX_VNNI_INT8" },
>    { "CPU_IAMCU_FLAGS",
>      "Cpu186|Cpu286|Cpu386|Cpu486|Cpu586|CpuIAMCU" },
>    { "CPU_ADX_FLAGS",
> @@ -376,7 +378,7 @@ static initializer cpu_flag_init[] =
>    { "CPU_ANY_AVX_FLAGS",
>      "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },
>    { "CPU_ANY_AVX2_FLAGS",
> -    "CPU_ANY_AVX512F_FLAGS|CpuAVX2|CpuAVX_VNNI|CpuAVX_IFMA" },
> +    "CPU_ANY_AVX512F_FLAGS|CpuAVX2|CpuAVX_VNNI|CpuAVX_IFMA|CpuAVX_VNNI_INT8" },
>    { "CPU_ANY_AVX512F_FLAGS",
>      "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CPU_ANY_AVX512BW_FLAGS|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG|CpuAVX512_BF16|CpuAVX512_VP2INTERSECT" },
>    { "CPU_ANY_AVX512CD_FLAGS",
> @@ -449,6 +451,8 @@ static initializer cpu_flag_init[] =
>      "CpuPREFETCHI" },
>    { "CPU_ANY_AVX_IFMA_FLAGS",
>      "CpuAVX_IFMA" },
> +  { "CPU_ANY_AVX_VNNI_INT8_FLAGS",
> +    "CpuAVX_VNNI_INT8" },
>  };
>
>  static initializer operand_type_init[] =
> @@ -652,6 +656,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (CpuAVX512_FP16),
>    BITFIELD (CpuPREFETCHI),
>    BITFIELD (CpuAVX_IFMA),
> +  BITFIELD (CpuAVX_VNNI_INT8),
>    BITFIELD (CpuMWAITX),
>    BITFIELD (CpuCLZERO),
>    BITFIELD (CpuOSPKE),
> diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
> index 7cd601e924..905908749b 100644
> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -213,6 +213,8 @@ enum
>    CpuPREFETCHI,
>    /* Intel AVX IFMA Instructions support required.  */
>    CpuAVX_IFMA,
> +  /* Intel AVX VNNI-INT8 Instructions support required.  */
> +  CpuAVX_VNNI_INT8,
>    /* mwaitx instruction required */
>    CpuMWAITX,
>    /* Clzero instruction required */
> @@ -296,7 +298,7 @@ enum
>
>  /* If you get a compiler error for zero width of the unused field,
>     comment it out.  */
> -#define CpuUnused      (CpuMax + 1)
> +// #define CpuUnused   (CpuMax + 1)
>
>  /* We can check if an instruction is available with array instead
>     of bitfield. */
> @@ -396,6 +398,7 @@ typedef union i386_cpu_flags
>        unsigned int cpuavx512_fp16:1;
>        unsigned int cpuprefetchi:1;
>        unsigned int cpuavx_ifma:1;
> +      unsigned int cpuavx_vnni_int8:1;
>        unsigned int cpumwaitx:1;
>        unsigned int cpuclzero:1;
>        unsigned int cpuospke:1;
> diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
> index 489a5335e2..77a5787c4b 100644
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -2888,6 +2888,17 @@ vpdpwssds, 0x6653, None, CpuAVX_VNNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckReg
>
>  // AVX_VNNI instructions end
>
> +// AVX-VNNI-INT8 instructions.
> +
> +vpdpbuud, 0x50, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpbuuds, 0x51, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpbssd, 0xf250, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpbssds, 0xf251, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpbsud, 0xf350, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +vpdpbsuds, 0xf351, None, CpuAVX_VNNI_INT8, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> +
> +// AVX-VNNI-INT8 instructions end.
> +
>  // AVX512_BITALG instructions
>
>  vpopcnt<bw>, 0x6654, None, CpuAVX512_BITALG, Modrm|Masking=3|Space0F38|<bw:vexw>|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
> --
> 2.18.1
>

OK.

Thanks.

-- 
H.J.

  reply	other threads:[~2022-10-31 16:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  3:05 [PATCH v4 0/6] Support Intel Sierra Forest Instructions Haochen Jiang
2022-10-31  3:05 ` [PATCH 1/6] Support Intel AVX-IFMA Haochen Jiang
2022-10-31 16:52   ` H.J. Lu
2022-10-31  3:05 ` [PATCH 2/6] Support Intel AVX-VNNI-INT8 Haochen Jiang
2022-10-31 16:53   ` H.J. Lu [this message]
2022-11-02 10:45   ` Jan Beulich
2022-10-31  3:05 ` [PATCH 3/6] Support Intel CMPccXADD Haochen Jiang
2022-10-31 16:54   ` H.J. Lu
2022-11-02 10:52   ` Jan Beulich
2022-11-02 16:25     ` H.J. Lu
2022-10-31  3:05 ` [PATCH 4/6] Add handler for more i386_cpu_flags Haochen Jiang
2022-10-31 16:54   ` H.J. Lu
2022-10-31  3:05 ` [PATCH 5/6] Support Intel WRMSRNS Haochen Jiang
2022-10-31 16:56   ` H.J. Lu
2022-11-02 10:56   ` Jan Beulich
2022-11-02 14:35   ` Jan Beulich
2022-10-31  3:05 ` [PATCH 6/6] Support Intel MSRLIST Haochen Jiang
2022-10-31 16:55   ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMe9rOqEXe5O8uWJhOPsQqNd2_8S3oC6PbrUzMr-Wo_zFqkreA@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=haochen.jiang@intel.com \
    --cc=jbeulich@suse.com \
    --cc=lili.cui@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).