From: Haochen Jiang <haochen.jiang@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com,
Hongyu Wang <hongyu.wang@intel.com>
Subject: [PATCH 03/12] [PATCH 2/2] AVX10.2: Support media instructions
Date: Mon, 19 Aug 2024 01:56:47 -0700 [thread overview]
Message-ID: <20240819085717.193256-4-haochen.jiang@intel.com> (raw)
In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/avx10_2-512mediaintrin.h: Add new intrins.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT16 and AVX10.2.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/sse.md (unspec): Add UNSPEC_VDPPHPS.
(<mask_codefor><sse4_1_avx2>_mpsadbw<mask_name>): New define_insn.
(avx10_2_mpsadbw<mask_name>): Ditto.
(vpdp<vpdpwprodtype>_<mode>): Add AVX10_2_256.
(vpdp<vpdpwprodtype>_v16si): New defin_insn.
(vpdp<vpdpwprodtype>_<mode>_mask): Ditto.
(*vpdp<vpdpwprodtype>_<mode>_maskz): Ditto.
(vpdp<vpdpwprodtype>_<mode>_maskz): New expander.
(vdpphps_<mode>): New define_insn.
(vdpphps_<mode>_mask): Ditto.
(*vdpphps_<mode>_maskz): Ditto.
(vdpphps_<mode>_maskz): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avxvnniint16-1.c: Add new macro test.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Add test.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avxvnniint16-builtin.c: New test.
* gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.
Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
---
gcc/config/i386/avx10_2-512mediaintrin.h | 280 +++++++++++
gcc/config/i386/avx10_2mediaintrin.h | 472 ++++++++++++++++++
gcc/config/i386/i386-builtin.def | 76 ++-
gcc/config/i386/i386-builtins.cc | 11 +-
gcc/config/i386/i386-expand.cc | 3 +
gcc/config/i386/sse.md | 145 +++++-
gcc/testsuite/gcc.target/i386/avx-1.c | 8 +
.../gcc.target/i386/avx10_2-512-media-1.c | 60 +++
.../gcc.target/i386/avx10_2-512-vdpphps-2.c | 71 +++
.../gcc.target/i386/avx10_2-512-vmpsadbw-2.c | 93 ++++
.../gcc.target/i386/avx10_2-512-vpdpwsud-2.c | 71 +++
.../gcc.target/i386/avx10_2-512-vpdpwsuds-2.c | 74 +++
.../gcc.target/i386/avx10_2-512-vpdpwusd-2.c | 71 +++
.../gcc.target/i386/avx10_2-512-vpdpwusds-2.c | 74 +++
.../gcc.target/i386/avx10_2-512-vpdpwuud-2.c | 70 +++
.../gcc.target/i386/avx10_2-512-vpdpwuuds-2.c | 73 +++
.../gcc.target/i386/avx10_2-builtin-2.c | 8 +
.../gcc.target/i386/avx10_2-media-1.c | 112 +++++
.../gcc.target/i386/avx10_2-vdpphps-2.c | 16 +
.../gcc.target/i386/avx10_2-vmpsadbw-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwsud-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwsuds-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwusd-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwusds-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwuud-2.c | 16 +
.../gcc.target/i386/avx10_2-vpdpwuuds-2.c | 16 +
.../gcc.target/i386/avxvnniint16-1.c | 42 +-
.../gcc.target/i386/avxvnniint16-builtin.c | 8 +
gcc/testsuite/gcc.target/i386/sse-13.c | 8 +
gcc/testsuite/gcc.target/i386/sse-14.c | 11 +
gcc/testsuite/gcc.target/i386/sse-22.c | 11 +
gcc/testsuite/gcc.target/i386/sse-23.c | 8 +
32 files changed, 1953 insertions(+), 35 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c
diff --git a/gcc/config/i386/avx10_2-512mediaintrin.h b/gcc/config/i386/avx10_2-512mediaintrin.h
index 02d826b24cd..e471c83b1c4 100644
--- a/gcc/config/i386/avx10_2-512mediaintrin.h
+++ b/gcc/config/i386/avx10_2-512mediaintrin.h
@@ -226,6 +226,286 @@ _mm512_maskz_dpbuuds_epi32 (__mmask16 __U, __m512i __W,
(__mmask16) __U);
}
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwsud_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwsud_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsud_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwsud_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsud_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwsuds_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwsuds_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsuds_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwsuds_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwsuds_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwusd_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusd512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwusd_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusd_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwusd_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusd_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwusds_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwusds_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusds_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwusds_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwusds_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwuud_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuud512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwuud_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuud_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwuud_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuud_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpwuuds_epi32 (__m512i __W, __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuuds512 ((__v16si) __W, (__v16si) __A, (__v16si) __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpwuuds_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuuds_v16si_mask ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpwuuds_epi32 (__mmask16 __U, __m512i __W,
+ __m512i __A, __m512i __B)
+{
+ return (__m512i)
+ __builtin_ia32_vpdpwuuds_v16si_maskz ((__v16si) __W,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_dpph_ps (__m512 __W, __m512h __A, __m512h __B)
+{
+ return (__m512)
+ __builtin_ia32_vdpphps512_mask ((__v16sf) __W,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) -1);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_dpph_ps (__m512 __W, __mmask16 __U, __m512h __A,
+ __m512h __B)
+{
+ return (__m512)
+ __builtin_ia32_vdpphps512_mask ((__v16sf) __W,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_dpph_ps (__mmask16 __U, __m512 __W, __m512h __A,
+ __m512h __B)
+{
+ return (__m512)
+ __builtin_ia32_vdpphps512_maskz ((__v16sf) __W,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) __U);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mpsadbw_epu8 (__m512i __X, __m512i __Y, const int __M)
+{
+ return (__m512i) __builtin_ia32_mpsadbw512 ((__v64qi) __X,
+ (__v64qi) __Y,
+ __M);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mpsadbw_epu8 (__m512i __W, __mmask32 __U, __m512i __X,
+ __m512i __Y, const int __M)
+{
+ return (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi) __X,
+ (__v64qi) __Y,
+ __M,
+ (__v32hi) __W,
+ __U);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mpsadbw_epu8 (__mmask32 __U, __m512i __X,
+ __m512i __Y, const int __M)
+{
+ return (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi) __X,
+ (__v64qi) __Y,
+ __M,
+ (__v32hi) _mm512_setzero_epi32 (),
+ __U);
+}
+#else
+#define _mm512_mpsadbw_epu8(X, Y, M) \
+ (__m512i) __builtin_ia32_mpsadbw512 ((__v64qi)(__m512i)(X), \
+ (__v64qi)(__m512i)(Y), (int)(M))
+
+#define _mm512_mask_mpsadbw_epu8(W, U, X, Y, M) \
+ (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi)(__m512i)(X), \
+ (__v64qi)(__m512i)(Y), \
+ (int)(M), \
+ (__v32hi)(__m512i)(W), \
+ (__mmask32)(U))
+
+#define _mm512_maskz_mpsadbw_epu8(U, X, Y, M) \
+ (__m512i) __builtin_ia32_mpsadbw512_mask ((__v64qi)(__m512i)(X), \
+ (__v64qi)(__m512i)(Y), \
+ (int)(M), \
+ (__v32hi) _mm512_setzero_epi32 (), \
+ (__mmask32)(U))
+#endif
+
#ifdef __DISABLE_AVX10_2_512__
#undef __DISABLE_AVX10_2_512__
#pragma GCC pop_options
diff --git a/gcc/config/i386/avx10_2mediaintrin.h b/gcc/config/i386/avx10_2mediaintrin.h
index e668af62e36..5456c185284 100644
--- a/gcc/config/i386/avx10_2mediaintrin.h
+++ b/gcc/config/i386/avx10_2mediaintrin.h
@@ -70,6 +70,42 @@
#define _mm256_dpbuuds_epi32(W, A, B) \
(__m256i) __builtin_ia32_vpdpbuuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+#define _mm_dpwsud_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwsud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm_dpwsuds_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwsuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm_dpwusd_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwusd128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm_dpwusds_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwusds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm_dpwuud_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwuud128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm_dpwuuds_epi32(W, A, B) \
+ (__m128i) __builtin_ia32_vpdpwuuds128 ((__v4si) (W), (__v4si) (A), (__v4si) (B))
+
+#define _mm256_dpwsud_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwsud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
+#define _mm256_dpwsuds_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwsuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
+#define _mm256_dpwusd_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwusd256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
+#define _mm256_dpwusds_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwusds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
+#define _mm256_dpwuud_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwuud256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
+#define _mm256_dpwuuds_epi32(W, A, B) \
+ (__m256i) __builtin_ia32_vpdpwuuds256 ((__v8si) (W), (__v8si) (A), (__v8si) (B))
+
extern __inline __m128i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_mask_dpbssd_epi32 (__m128i __W, __mmask8 __U,
@@ -358,6 +394,442 @@ _mm256_maskz_dpbuuds_epi32 (__mmask8 __U, __m256i __W,
(__mmask8) __U);
}
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwsud_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwsud_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwsud_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwsud_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwsuds_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwsuds_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwsuds_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwsuds_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwusd_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwusd_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwusd_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwusd_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwusds_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwusds_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwusds_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwusds_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwuud_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwuud_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwuud_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwuud_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpwuuds_epi32 (__m128i __W, __mmask8 __U,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwuuds_v4si_mask ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpwuuds_epi32 (__mmask8 __U, __m128i __W,
+ __m128i __A, __m128i __B)
+{
+ return (__m128i)
+ __builtin_ia32_vpdpwuuds_v4si_maskz ((__v4si) __W,
+ (__v4si) __A,
+ (__v4si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwsud_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwsud_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwsud_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwsud_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwsuds_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwsuds_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwsuds_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwsuds_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwusd_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwusd_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwusd_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwusd_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwusds_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwusds_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwusds_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwusds_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwuud_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwuud_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwuud_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwuud_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpwuuds_epi32 (__m256i __W, __mmask8 __U,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwuuds_v8si_mask ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpwuuds_epi32 (__mmask8 __U, __m256i __W,
+ __m256i __A, __m256i __B)
+{
+ return (__m256i)
+ __builtin_ia32_vpdpwuuds_v8si_maskz ((__v8si) __W,
+ (__v8si) __A,
+ (__v8si) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_dpph_ps (__m256 __W, __m256h __A, __m256h __B)
+{
+ return (__m256)
+ __builtin_ia32_vdpphps256_mask ((__v8sf) __W,
+ (__v8sf) __A,
+ (__v8sf) __B,
+ (__mmask8) -1);
+}
+
+extern __inline __m256
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_dpph_ps (__m256 __W, __mmask8 __U, __m256h __A,
+ __m256h __B)
+{
+ return (__m256)
+ __builtin_ia32_vdpphps256_mask ((__v8sf) __W,
+ (__v8sf) __A,
+ (__v8sf) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m256
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_dpph_ps (__mmask8 __U, __m256 __W, __m256h __A,
+ __m256h __B)
+{
+ return (__m256)
+ __builtin_ia32_vdpphps256_maskz ((__v8sf) __W,
+ (__v8sf) __A,
+ (__v8sf) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_dpph_ps (__m128 __W, __m128h __A, __m128h __B)
+{
+ return (__m128)
+ __builtin_ia32_vdpphps128_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) -1);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_dpph_ps (__m128 __W, __mmask8 __U, __m128h __A,
+ __m128h __B)
+{
+ return (__m128)
+ __builtin_ia32_vdpphps128_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_dpph_ps (__mmask8 __U, __m128 __W, __m128h __A,
+ __m128h __B)
+{
+ return (__m128)
+ __builtin_ia32_vdpphps128_maskz ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mpsadbw_epu8 (__m128i __W, __mmask8 __U, __m128i __X,
+ __m128i __Y, const int __M)
+{
+ return (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi) __X,
+ (__v16qi) __Y,
+ __M,
+ (__v8hi) __W,
+ __U);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_mpsadbw_epu8 (__mmask8 __U, __m128i __X,
+ __m128i __Y, const int __M)
+{
+ return (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi) __X,
+ (__v16qi) __Y,
+ __M,
+ (__v8hi) _mm_setzero_si128 (),
+ __U);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_mpsadbw_epu8 (__m256i __W, __mmask16 __U, __m256i __X,
+ __m256i __Y, const int __M)
+{
+ return (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi) __X,
+ (__v32qi) __Y,
+ __M,
+ (__v16hi) __W,
+ __U);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_mpsadbw_epu8 (__mmask16 __U, __m256i __X,
+ __m256i __Y, const int __M)
+{
+ return (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi) __X,
+ (__v32qi) __Y,
+ __M,
+ (__v16hi) _mm256_setzero_si256 (),
+ __U);
+}
+#else
+#define _mm_mask_mpsadbw_epu8(W, U, X, Y, M) \
+ (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi)(__m128i)(X), \
+ (__v16qi)(__m128i)(Y), \
+ (int)(M), \
+ (__v8hi)(__m128i)(W), \
+ (__mmask8)(U))
+
+#define _mm_maskz_mpsadbw_epu8(U, X, Y, M) \
+ (__m128i) __builtin_ia32_mpsadbw128_mask ((__v16qi)(__m128i)(X), \
+ (__v16qi)(__m128i)(Y), \
+ (int)(M), \
+ (__v8hi) _mm_setzero_si128 (), \
+ (__mmask8)(U))
+
+#define _mm256_mask_mpsadbw_epu8(W, U, X, Y, M) \
+ (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi)(__m256i)(X), \
+ (__v32qi)(__m256i)(Y), \
+ (int)(M), \
+ (__v16hi)(__m256i)(W), \
+ (__mmask16)(U))
+
+#define _mm256_maskz_mpsadbw_epu8(U, X, Y, M) \
+ (__m256i) __builtin_ia32_mpsadbw256_mask ((__v32qi)(__m256i)(X), \
+ (__v32qi)(__m256i)(Y), \
+ (int)(M), \
+ (__v16hi) _mm256_setzero_si256 (), \
+ (__mmask16)(U))
+
+#endif
#ifdef __DISABLE_AVX10_2_256__
#undef __DISABLE_AVX10_2_256__
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 5bd9aabdc52..cdf28cd261c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2762,18 +2762,18 @@ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_
BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
/* AVXVNNIINT16 */
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusd_v8si, "__builtin_ia32_vpdpwusd256", IX86_BUILTIN_VPDPWUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusds_v8si, "__builtin_ia32_vpdpwusds256", IX86_BUILTIN_VPDPWUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsud_v8si, "__builtin_ia32_vpdpwsud256", IX86_BUILTIN_VPDPWSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsuds_v8si, "__builtin_ia32_vpdpwsuds256", IX86_BUILTIN_VPDPWSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuud_v8si, "__builtin_ia32_vpdpwuud256", IX86_BUILTIN_VPDPWUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v8si, "__builtin_ia32_vpdpwuuds256", IX86_BUILTIN_VPDPWUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusd_v4si, "__builtin_ia32_vpdpwusd128", IX86_BUILTIN_VPDPWUSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwusds_v4si, "__builtin_ia32_vpdpwusds128", IX86_BUILTIN_VPDPWUSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsud_v4si, "__builtin_ia32_vpdpwsud128", IX86_BUILTIN_VPDPWSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwsuds_v4si, "__builtin_ia32_vpdpwsuds128", IX86_BUILTIN_VPDPWSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuud_v4si, "__builtin_ia32_vpdpwuud128", IX86_BUILTIN_VPDPWUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
-BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia32_vpdpwuuds128", IX86_BUILTIN_VPDPWUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si, "__builtin_ia32_vpdpwusd256", IX86_BUILTIN_VPDPWUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si, "__builtin_ia32_vpdpwusds256", IX86_BUILTIN_VPDPWUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si, "__builtin_ia32_vpdpwsud256", IX86_BUILTIN_VPDPWSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si, "__builtin_ia32_vpdpwsuds256", IX86_BUILTIN_VPDPWSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si, "__builtin_ia32_vpdpwuud256", IX86_BUILTIN_VPDPWUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si, "__builtin_ia32_vpdpwuuds256", IX86_BUILTIN_VPDPWUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si, "__builtin_ia32_vpdpwusd128", IX86_BUILTIN_VPDPWUSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si, "__builtin_ia32_vpdpwusds128", IX86_BUILTIN_VPDPWUSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si, "__builtin_ia32_vpdpwsud128", IX86_BUILTIN_VPDPWSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si, "__builtin_ia32_vpdpwsuds128", IX86_BUILTIN_VPDPWSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si, "__builtin_ia32_vpdpwuud128", IX86_BUILTIN_VPDPWUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
+BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16 | OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia32_vpdpwuuds128", IX86_BUILTIN_VPDPWUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
/* VPCLMULQDQ */
BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
@@ -3063,6 +3063,58 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_mask, "__builtin_
BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuud_v4si_maskz, "__builtin_ia32_vpdpbuud_v4si_maskz", IX86_BUILTIN_VPDPBUUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_mask, "__builtin_ia32_vpdpbuuds_v4si_mask", IX86_BUILTIN_VPDPBUUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpbuuds_v4si_maskz, "__builtin_ia32_vpdpbuuds_v4si_maskz", IX86_BUILTIN_VPDPBUUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si, "__builtin_ia32_vpdpwsud512", IX86_BUILTIN_VPDPWSUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si, "__builtin_ia32_vpdpwsuds512", IX86_BUILTIN_VPDPWSUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si, "__builtin_ia32_vpdpwusd512", IX86_BUILTIN_VPDPWUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si, "__builtin_ia32_vpdpwusds512", IX86_BUILTIN_VPDPWUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si, "__builtin_ia32_vpdpwuud512", IX86_BUILTIN_VPDPWUUDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si, "__builtin_ia32_vpdpwuuds512", IX86_BUILTIN_VPDPWUUDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si_mask, "__builtin_ia32_vpdpwsud_v16si_mask", IX86_BUILTIN_VPDPWSUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsud_v16si_maskz, "__builtin_ia32_vpdpwsud_v16si_maskz", IX86_BUILTIN_VPDPWSUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si_mask, "__builtin_ia32_vpdpwsuds_v16si_mask", IX86_BUILTIN_VPDPWSUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwsuds_v16si_maskz, "__builtin_ia32_vpdpwsuds_v16si_maskz", IX86_BUILTIN_VPDPWSUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si_mask, "__builtin_ia32_vpdpwusd_v16si_mask", IX86_BUILTIN_VPDPWUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusd_v16si_maskz, "__builtin_ia32_vpdpwusd_v16si_maskz", IX86_BUILTIN_VPDPWUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si_mask, "__builtin_ia32_vpdpwusds_v16si_mask", IX86_BUILTIN_VPDPWUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwusds_v16si_maskz, "__builtin_ia32_vpdpwusds_v16si_maskz", IX86_BUILTIN_VPDPWUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si_mask, "__builtin_ia32_vpdpwuud_v16si_mask", IX86_BUILTIN_VPDPWUUDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuud_v16si_maskz, "__builtin_ia32_vpdpwuud_v16si_maskz", IX86_BUILTIN_VPDPWUUDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si_mask, "__builtin_ia32_vpdpwuuds_v16si_mask", IX86_BUILTIN_VPDPWUUDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vpdpwuuds_v16si_maskz, "__builtin_ia32_vpdpwuuds_v16si_maskz", IX86_BUILTIN_VPDPWUUDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si_mask, "__builtin_ia32_vpdpwsud_v8si_mask", IX86_BUILTIN_VPDPWSUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v8si_maskz, "__builtin_ia32_vpdpwsud_v8si_maskz", IX86_BUILTIN_VPDPWSUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si_mask, "__builtin_ia32_vpdpwsuds_v8si_mask", IX86_BUILTIN_VPDPWSUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v8si_maskz, "__builtin_ia32_vpdpwsuds_v8si_maskz", IX86_BUILTIN_VPDPWSUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si_mask, "__builtin_ia32_vpdpwusd_v8si_mask", IX86_BUILTIN_VPDPWUSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v8si_maskz, "__builtin_ia32_vpdpwusd_v8si_maskz", IX86_BUILTIN_VPDPWUSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si_mask, "__builtin_ia32_vpdpwusds_v8si_mask", IX86_BUILTIN_VPDPWUSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v8si_maskz, "__builtin_ia32_vpdpwusds_v8si_maskz", IX86_BUILTIN_VPDPWUSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si_mask, "__builtin_ia32_vpdpwuud_v8si_mask", IX86_BUILTIN_VPDPWUUDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v8si_maskz, "__builtin_ia32_vpdpwuud_v8si_maskz", IX86_BUILTIN_VPDPWUUDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si_mask, "__builtin_ia32_vpdpwuuds_v8si_mask", IX86_BUILTIN_VPDPWUUDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v8si_maskz, "__builtin_ia32_vpdpwuuds_v8si_maskz", IX86_BUILTIN_VPDPWUUDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si_mask, "__builtin_ia32_vpdpwsud_v4si_mask", IX86_BUILTIN_VPDPWSUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsud_v4si_maskz, "__builtin_ia32_vpdpwsud_v4si_maskz", IX86_BUILTIN_VPDPWSUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si_mask, "__builtin_ia32_vpdpwsuds_v4si_mask", IX86_BUILTIN_VPDPWSUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwsuds_v4si_maskz, "__builtin_ia32_vpdpwsuds_v4si_maskz", IX86_BUILTIN_VPDPWSUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si_mask, "__builtin_ia32_vpdpwusd_v4si_mask", IX86_BUILTIN_VPDPWUSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusd_v4si_maskz, "__builtin_ia32_vpdpwusd_v4si_maskz", IX86_BUILTIN_VPDPWUSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si_mask, "__builtin_ia32_vpdpwusds_v4si_mask", IX86_BUILTIN_VPDPWUSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwusds_v4si_maskz, "__builtin_ia32_vpdpwusds_v4si_maskz", IX86_BUILTIN_VPDPWUSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si_mask, "__builtin_ia32_vpdpwuud_v4si_mask", IX86_BUILTIN_VPDPWUUDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuud_v4si_maskz, "__builtin_ia32_vpdpwuud_v4si_maskz", IX86_BUILTIN_VPDPWUUDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si_mask, "__builtin_ia32_vpdpwuuds_v4si_mask", IX86_BUILTIN_VPDPWUUDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vpdpwuuds_v4si_maskz, "__builtin_ia32_vpdpwuuds_v4si_maskz", IX86_BUILTIN_VPDPWUUDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vdpphps_v16sf_mask, "__builtin_ia32_vdpphps512_mask", IX86_BUILTIN_VDPPHPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vdpphps_v16sf_maskz, "__builtin_ia32_vdpphps512_maskz", IX86_BUILTIN_VDPPHPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v8sf_mask, "__builtin_ia32_vdpphps256_mask", IX86_BUILTIN_VDPPHPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v8sf_maskz, "__builtin_ia32_vdpphps256_maskz", IX86_BUILTIN_VDPPHPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v4sf_mask, "__builtin_ia32_vdpphps128_mask", IX86_BUILTIN_VDPPHPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vdpphps_v4sf_maskz, "__builtin_ia32_vdpphps128_maskz", IX86_BUILTIN_VDPPHPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw, "__builtin_ia32_mpsadbw512", IX86_BUILTIN_AVX10_2_MPSADBW, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mpsadbw_mask, "__builtin_ia32_mpsadbw512_mask", IX86_BUILTIN_VMPSADBW_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx2_mpsadbw_mask, "__builtin_ia32_mpsadbw256_mask", IX86_BUILTIN_VMPSADBW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_sse4_1_mpsadbw_mask, "__builtin_ia32_mpsadbw128_mask", IX86_BUILTIN_VMPSADBW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI)
/* Builtins with rounding support. */
BDESC_END (ARGS, ROUND_ARGS)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 130ba853125..4286eeb80e6 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -280,17 +280,18 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
&& (mask == 0 || (mask & ix86_isa_flags) != 0))
|| ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
- /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/AVXVNNIINT8
- intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/AVX10.2 non-mask
- intrinsics should be defined whenever avxvnni/avxifma/aes/
- avxvnniint8 or avx512vnni && avx512vl/avx512ifma && avx512vl/vaes
- && avx512vl/avx10.2 exist. */
+ /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/
+ AVXVNNIINT{8,16} intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/
+ AVX10.2 non-mask intrinsics should be defined whenever avxvnni/
+ avxifma/aes/avxvnniint{8,16} or avx512vnni && avx512vl/avx512ifma
+ && avx512vl/vaes && avx512vl/avx10.2 exist. */
|| (mask2 == OPTION_MASK_ISA2_AVXVNNI)
|| (mask2 == OPTION_MASK_ISA2_AVXIFMA)
|| (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
| OPTION_MASK_ISA2_AVX512BF16))
|| ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
|| ((mask2 & OPTION_MASK_ISA2_AVXVNNIINT8) != 0)
+ || ((mask2 & OPTION_MASK_ISA2_AVXVNNIINT16) != 0)
|| (lang_hooks.builtin_function
== lang_hooks.builtin_function_ext_scope))
{
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 200b768f5d9..f1e6bc11f86 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13299,6 +13299,7 @@ ix86_check_builtin_isa_match (unsigned int fcode,
OPTION_MASK_ISA2_AVXNECONVERT
OPTION_MASK_ISA_AES or (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA2_VAES)
OPTION_MASK_ISA2_AVX10_2 or OPTION_MASK_ISA2_AVXVNNIINT8
+ OPTION_MASK_ISA2_AVX10_2 or OPTION_MASK_ISA2_AVXVNNIINT16
where for each such pair it is sufficient if either of the ISAs is
enabled, plus if it is ored with other options also those others.
OPTION_MASK_ISA_MMX in bisa is satisfied also if TARGET_MMX_WITH_SSE. */
@@ -13326,6 +13327,8 @@ ix86_check_builtin_isa_match (unsigned int fcode,
OPTION_MASK_ISA2_VAES);
SHARE_BUILTIN (0, OPTION_MASK_ISA2_AVXVNNIINT8, 0,
OPTION_MASK_ISA2_AVX10_2_256);
+ SHARE_BUILTIN (0, OPTION_MASK_ISA2_AVXVNNIINT16, 0,
+ OPTION_MASK_ISA2_AVX10_2_256);
isa = tmp_isa;
isa2 = tmp_isa2;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 41d448f57cb..6f76e8f50ad 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -214,6 +214,8 @@
UNSPEC_SM4KEY4
UNSPEC_SM4RNDS4
+ ;; For AVX10.2 suppport
+ UNSPEC_VDPPHPS
])
(define_c_enum "unspecv" [
@@ -465,6 +467,9 @@
(define_mode_iterator VF1_AVX512VL
[(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
+(define_mode_iterator VF1_AVX10_2
+ [(V16SF "TARGET_AVX10_2_512") V8SF V4SF])
+
(define_mode_iterator VHFBF
[(V32HF "TARGET_EVEX512") V16HF V8HF
(V32BF "TARGET_EVEX512") V16BF V8BF])
@@ -23555,6 +23560,31 @@
(set_attr "znver1_decode" "vector,vector,vector")
(set_attr "mode" "<sseinsnmode>")])
+(define_insn "avx10_2_mpsadbw<mask_name>"
+ [(set (match_operand:V64QI 0 "register_operand" "=v")
+ (unspec:V64QI
+ [(match_operand:V64QI 1 "register_operand" "v")
+ (match_operand:V64QI 2 "vector_operand" "vm")
+ (match_operand:SI 3 "const_0_to_255_operand" "n")]
+ UNSPEC_MPSADBW))]
+ "TARGET_AVX10_2_512"
+ "vmpsadbw\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")])
+
+(define_insn "<mask_codefor><sse4_1_avx2>_mpsadbw<mask_name>"
+ [(set (match_operand:VI1 0 "register_operand" "=v")
+ (unspec:VI1
+ [(match_operand:VI1 1 "register_operand" "v")
+ (match_operand:VI1 2 "vector_operand" "vm")
+ (match_operand:SI 3 "const_0_to_255_operand" "n")]
+ UNSPEC_MPSADBW))]
+ "TARGET_AVX10_2_256"
+ "vmpsadbw\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "<sseinsnmode>")])
+
(define_insn "<sse4_1_avx2>_packusdw<mask_name>"
[(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=Yr,*x,<v_Yw>")
(unspec:VI2_AVX2_AVX512BW
@@ -31438,13 +31468,116 @@
})
(define_insn "vpdp<vpdpwprodtype>_<mode>"
- [(set (match_operand:VI4_AVX 0 "register_operand" "=x")
+ [(set (match_operand:VI4_AVX 0 "register_operand" "=v")
(unspec:VI4_AVX
[(match_operand:VI4_AVX 1 "register_operand" "0")
- (match_operand:VI4_AVX 2 "register_operand" "x")
- (match_operand:VI4_AVX 3 "nonimmediate_operand" "xjm")]
+ (match_operand:VI4_AVX 2 "register_operand" "v")
+ (match_operand:VI4_AVX 3 "nonimmediate_operand" "vm")]
VPDPWPROD))]
- "TARGET_AVXVNNIINT16"
+ "TARGET_AVXVNNIINT16 || TARGET_AVX10_2_256"
+ "vpdp<vpdpwprodtype>\t{%3, %2, %0|%0, %2, %3}"
+ [(set_attr "prefix" "maybe_evex")])
+
+(define_insn "vpdp<vpdpwprodtype>_v16si"
+ [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (unspec:V16SI
+ [(match_operand:V16SI 1 "register_operand" "0")
+ (match_operand:V16SI 2 "register_operand" "v")
+ (match_operand:V16SI 3 "nonimmediate_operand" "vm")]
+ VPDPWPROD))]
+ "TARGET_AVX10_2_512"
"vpdp<vpdpwprodtype>\t{%3, %2, %0|%0, %2, %3}"
- [(set_attr "prefix" "vex")
- (set_attr "addr" "gpr16")])
+ [(set_attr "prefix" "evex")])
+
+(define_insn "vpdp<vpdpwprodtype>_<mode>_mask"
+ [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v")
+ (vec_merge:VI4_AVX10_2
+ (unspec:VI4_AVX10_2
+ [(match_operand:VI4_AVX10_2 1 "register_operand" "0")
+ (match_operand:VI4_AVX10_2 2 "register_operand" "v")
+ (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")]
+ VPDPWPROD)
+ (match_dup 1)
+ (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+ "TARGET_AVX10_2_256"
+ "vpdp<vpdpwprodtype>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+ [(set_attr "prefix" "evex")])
+
+(define_expand "vpdp<vpdpwprodtype>_<mode>_maskz"
+ [(set (match_operand:VI4_AVX10_2 0 "register_operand")
+ (vec_merge:VI4_AVX10_2
+ (unspec:VI4_AVX10_2
+ [(match_operand:VI4_AVX10_2 1 "register_operand")
+ (match_operand:VI4_AVX10_2 2 "register_operand")
+ (match_operand:VI4_AVX10_2 3 "nonimmediate_operand")]
+ VPDPWPROD)
+ (match_dup 5)
+ (match_operand:<avx512fmaskmode> 4 "register_operand")))]
+ "TARGET_AVX10_2_256"
+ "operands[5] = CONST0_RTX (<MODE>mode);")
+
+(define_insn "*vpdp<vpdpwprodtype>_<mode>_maskz"
+ [(set (match_operand:VI4_AVX10_2 0 "register_operand" "=v")
+ (vec_merge:VI4_AVX10_2
+ (unspec:VI4_AVX10_2
+ [(match_operand:VI4_AVX10_2 1 "register_operand" "0")
+ (match_operand:VI4_AVX10_2 2 "register_operand" "v")
+ (match_operand:VI4_AVX10_2 3 "nonimmediate_operand" "vm")]
+ VPDPWPROD)
+ (match_operand:VI4_AVX10_2 5 "const0_operand" "C")
+ (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+ "TARGET_AVX10_2_256"
+ "vpdp<vpdpwprodtype>\t{%3, %2, %0%{%4%}%N5|%0%{%4%}%N5, %2, %3}"
+ [(set_attr "prefix" "evex")])
+
+(define_insn "vdpphps_<mode>"
+ [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v")
+ (unspec:VF1_AVX10_2
+ [(match_operand:VF1_AVX10_2 1 "register_operand" "0")
+ (match_operand:VF1_AVX10_2 2 "register_operand" "v")
+ (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")]
+ UNSPEC_VDPPHPS))]
+ "TARGET_AVX10_2_256"
+ "vdpphps\t{%3, %2, %0|%0, %2, %3}"
+ [(set_attr "prefix" "evex")])
+
+(define_insn "vdpphps_<mode>_mask"
+ [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v")
+ (vec_merge:VF1_AVX10_2
+ (unspec:VF1_AVX10_2
+ [(match_operand:VF1_AVX10_2 1 "register_operand" "0")
+ (match_operand:VF1_AVX10_2 2 "register_operand" "v")
+ (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")]
+ UNSPEC_VDPPHPS)
+ (match_dup 1)
+ (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+ "TARGET_AVX10_2_256"
+ "vdpphps\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+ [(set_attr "prefix" "evex")])
+
+(define_expand "vdpphps_<mode>_maskz"
+ [(match_operand:VF1_AVX10_2 0 "register_operand")
+ (match_operand:VF1_AVX10_2 1 "register_operand")
+ (match_operand:VF1_AVX10_2 2 "register_operand")
+ (match_operand:VF1_AVX10_2 3 "nonimmediate_operand")
+ (match_operand:<avx512fmaskmode> 4 "register_operand")]
+ "TARGET_AVX10_2_256"
+{
+ emit_insn (gen_vdpphps_<mode>_maskz_1 (operands[0], operands[1],
+ operands[2], operands[3], CONST0_RTX(<MODE>mode), operands[4]));
+ DONE;
+})
+
+(define_insn "vdpphps_<mode>_maskz_1"
+ [(set (match_operand:VF1_AVX10_2 0 "register_operand" "=v")
+ (vec_merge:VF1_AVX10_2
+ (unspec:VF1_AVX10_2
+ [(match_operand:VF1_AVX10_2 1 "register_operand" "0")
+ (match_operand:VF1_AVX10_2 2 "register_operand" "v")
+ (match_operand:VF1_AVX10_2 3 "nonimmediate_operand" "vm")]
+ UNSPEC_VDPPHPS)
+ (match_operand:VF1_AVX10_2 4 "const0_operand" "C")
+ (match_operand:<avx512fmaskmode> 5 "register_operand" "Yk")))]
+ "TARGET_AVX10_2_256"
+ "vdpphps\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
+ [(set_attr "prefix" "evex")])
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index f64d0c88264..5fc84234b57 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1002,6 +1002,14 @@
#define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8)
#define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8)
+/* avx10_2-512mediaintrin.h */
+#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1)
+#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E)
+
+/* avx10_2mediaintrin.h */
+#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E)
+#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E)
+
#include <wmmintrin.h>
#include <immintrin.h>
#include <mm3dnow.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c
index d4145c41a99..00df32194e5 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-media-1.c
@@ -18,11 +18,39 @@
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
#include <immintrin.h>
+volatile __m512 a;
+volatile __m512h b,c;
volatile __m512i x,y,z,z1;
volatile __mmask16 m16;
+volatile __mmask32 m32;
void avx10_2_512_test (void)
{
@@ -49,4 +77,36 @@ void avx10_2_512_test (void)
x = _mm512_dpbuuds_epi32 (x, y, z);
x = _mm512_mask_dpbuuds_epi32 (x, m16, y, z);
x = _mm512_maskz_dpbuuds_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwsud_epi32 (x, y, z);
+ x = _mm512_mask_dpwsud_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwsud_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwsuds_epi32 (x, y, z);
+ x = _mm512_mask_dpwsuds_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwsuds_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwusd_epi32 (x, y, z);
+ x = _mm512_mask_dpwusd_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwusd_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwusds_epi32 (x, y, z);
+ x = _mm512_mask_dpwusds_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwusds_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwuud_epi32 (x, y, z);
+ x = _mm512_mask_dpwuud_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwuud_epi32 (m16, x, y, z);
+
+ x = _mm512_dpwuuds_epi32 (x, y, z);
+ x = _mm512_mask_dpwuuds_epi32 (x, m16, y, z);
+ x = _mm512_maskz_dpwuuds_epi32 (m16, x, y, z);
+
+ a = _mm512_dpph_ps (a, b, c);
+ a = _mm512_mask_dpph_ps (a, m16, b, c);
+ a = _mm512_maskz_dpph_ps (m16, a, b, c);
+
+ x = _mm512_mpsadbw_epu8 (x, y, 1);
+ x = _mm512_mask_mpsadbw_epu8 (x, m32, y, z, 1);
+ x = _mm512_maskz_mpsadbw_epu8 (m32, x, y, 1);
}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c
new file mode 100644
index 00000000000..9b73a298fb9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdpphps-2.c
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SRC_SIZE AVX512F_LEN / 16
+#define SIZE AVX512F_LEN / 32
+
+static void
+CALC (float *dest, _Float16 *src1, _Float16 *src2)
+{
+ int i;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ dest[i] += (float) src1[2 * i + 1] * (float) src2[2 * i + 1];
+ dest[i] += (float) src1[2 * i] * (float) src2[2 * i];
+ }
+}
+
+void
+TEST(void)
+{
+ UNION_TYPE (AVX512F_LEN, h) src1, src2;
+ UNION_TYPE (AVX512F_LEN,) res1, res2, res3;
+ MASK_TYPE mask = MASK_VALUE;
+ float res_ref[SIZE], res_ref2[SIZE], res_ref3[SIZE];
+
+ for (int i = 0; i < SRC_SIZE; i++)
+ {
+ src1.a[i] = (_Float16) (i * 4) + 1.25f16;
+ src2.a[i] = (_Float16) (i * 2) + 2.5f16;
+ }
+
+ for (int i = 0; i < SIZE; i++)
+ {
+ res1.a[i] = 3.125f + 2 * i;
+ res_ref[i] = 3.125f + 2 * i;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ res_ref2[i] = DEFAULT_VALUE;
+ res_ref3[i] = DEFAULT_VALUE;
+ }
+
+ res1.x = INTRINSIC (_dpph_ps) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpph_ps) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpph_ps) (mask, res3.x, src1.x, src2.x);
+
+ CALC(res_ref, src1.a, src2.a);
+ CALC(res_ref2, src1.a, src2.a);
+ CALC(res_ref3, src1.a, src2.a);
+
+ if (UNION_CHECK(AVX512F_LEN,) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE () (res_ref2, mask, SIZE);
+ if (UNION_CHECK(AVX512F_LEN,) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO () (res_ref3, mask, SIZE);
+ if (UNION_CHECK(AVX512F_LEN,) (res3, res_ref3))
+ abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c
new file mode 100644
index 00000000000..3cedab490fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmpsadbw-2.c
@@ -0,0 +1,93 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 8)
+#define SIZE_RES (AVX512F_LEN / 16)
+
+
+static void
+CALC (short* dst, char* src1, char* src2, int cont)
+{
+ int blk2_pos, blk1_pos, i, j, k, c;
+ char blk1[12], blk2[4], x;
+ short tmp[4], s;
+
+ for (k = 0; k < AVX512F_LEN / 128; k++)
+ {
+ c = cont & 0xff;
+ if (k % 2 == 1)
+ c >>= 3;
+ blk2_pos = (c & 3) * 4;
+ blk1_pos = ((c >> 2) & 1) * 4;
+
+ for (i = 0; i < 11; i++)
+ blk1[i] = src1[16 * k + i + blk1_pos];
+
+ for (i = 0; i < 4; i++)
+ blk2[i] = src2[16 * k + i + blk2_pos];
+
+ for (i = 0; i < 8; i++)
+ {
+ for (j = 0; j < 4; j++)
+ {
+ x = blk1[j + i] - blk2[j];
+ tmp[j] = x > 0 ? x : -x;
+ }
+
+ s = 0;
+ for (j = 0; j < 4; j++)
+ s += tmp[j];
+ dst[8 * k + i] = s;
+ }
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_w) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_b) src1;
+ UNION_TYPE (AVX512F_LEN, i_b) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ src1.a[i] = 10 + 2 * i;
+ src2.a[i] = 3 * i;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, src1.a, src2.a, 0x21);
+ CALC (res_ref2, src1.a, src2.a, 0x21);
+
+ res1.x = INTRINSIC (_mpsadbw_epu8) (src1.x, src2.x, 0x21);
+ res2.x = INTRINSIC (_mask_mpsadbw_epu8) (res2.x, mask, src1.x, src2.x, 0x21);
+ res3.x = INTRINSIC (_maskz_mpsadbw_epu8) (mask, src1.x, src2.x, 0x21);
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_w) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_w) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c
new file mode 100644
index 00000000000..1643f6f0803
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsud-2.c
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, short *s1, unsigned short *s2)
+{
+ int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (int) s1[i] * (unsigned int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_w) src1;
+ UNION_TYPE (AVX512F_LEN, i_uw) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ int sign = i % 2 ? 1 : -1;
+ src1.a[i] = sign * (10 + 3 * i * i);
+ src2.a[i] = sign * 10 * i * i;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwsud_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwsud_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwsud_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c
new file mode 100644
index 00000000000..7c959119a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwsuds-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, short *s1, unsigned short *s2)
+{
+ int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (int) s1[i] * (unsigned int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ long long max_int = 0x7FFFFFFF;
+ if (test > max_int)
+ test = max_int;
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_w) src1;
+ UNION_TYPE (AVX512F_LEN, i_uw) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ int sign = i % 2 ? 1 : -1;
+ src1.a[i] = sign * (10 + 3 * i * i);
+ src2.a[i] = sign * 10 * i * i;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwsuds_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwsuds_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwsuds_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c
new file mode 100644
index 00000000000..b780e41bfba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusd-2.c
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, unsigned short *s1, short *s2)
+{
+ int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (unsigned int) s1[i] * (int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_uw) src1;
+ UNION_TYPE (AVX512F_LEN, i_w) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ int sign = i % 2 ? 1 : -1;
+ src1.a[i] = sign * 10 * i * i;
+ src2.a[i] = 10 + 3 * i * i + sign;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwusd_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwusd_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwusd_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c
new file mode 100644
index 00000000000..922d4b37ab8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwusds-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, unsigned short *s1, short *s2)
+{
+ int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (unsigned int) s1[i] * (int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ long long max_int = 0x7FFFFFFF;
+ if (test > max_int)
+ test = max_int;
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_uw) src1;
+ UNION_TYPE (AVX512F_LEN, i_w) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ int sign = i % 2 ? 1 : -1;
+ src1.a[i] = sign * 10 * i * i;
+ src2.a[i] = 10 + 3 * i * i + sign;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwusds_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwusds_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwusds_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c
new file mode 100644
index 00000000000..d9f5dba8dff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuud-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, unsigned short *s1, unsigned short *s2)
+{
+ unsigned int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (unsigned int) s1[i] * (unsigned int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_uw) src1;
+ UNION_TYPE (AVX512F_LEN, i_uw) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ src1.a[i] = 10 + 3 * i * i;
+ src2.a[i] = 10 * i * i;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwuud_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwuud_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwuud_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c
new file mode 100644
index 00000000000..da3c82bd4cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vpdpwuuds-2.c
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+
+#include "avx10-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#define SIZE_RES (AVX512F_LEN / 32)
+
+
+static void
+CALC (int *r, int *dst, unsigned short *s1, unsigned short *s2)
+{
+ unsigned int tempres[SIZE];
+ for (int i = 0; i < SIZE; i++)
+ tempres[i] = (unsigned int) s1[i] * (unsigned int) s2[i];
+ for (int i = 0; i < SIZE_RES; i++)
+ {
+ long long test = (long long) dst[i] + tempres[i * 2] + tempres[i * 2 + 1];
+ long long max_uint = 0xFFFFFFFF;
+ if (test > max_uint)
+ test = max_uint;
+ r[i] = test;
+ }
+}
+
+void
+TEST (void)
+{
+ int i;
+ UNION_TYPE (AVX512F_LEN, i_d) res1, res2, res3;
+ UNION_TYPE (AVX512F_LEN, i_uw) src1;
+ UNION_TYPE (AVX512F_LEN, i_uw) src2;
+ MASK_TYPE mask = MASK_VALUE;
+ int res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+ for (i = 0; i < SIZE; i++)
+ {
+ src1.a[i] = 10 + 3 * i * i;
+ src2.a[i] = 10 * i * i;
+ }
+
+ for (i = 0; i < SIZE_RES; i++)
+ {
+ res1.a[i] = 0x7FFFFFFF;
+ res2.a[i] = DEFAULT_VALUE;
+ res3.a[i] = DEFAULT_VALUE;
+ }
+
+ CALC (res_ref, res1.a, src1.a, src2.a);
+ CALC (res_ref2, res2.a, src1.a, src2.a);
+
+ res1.x = INTRINSIC (_dpwuuds_epi32) (res1.x, src1.x, src2.x);
+ res2.x = INTRINSIC (_mask_dpwuuds_epi32) (res2.x, mask, src1.x, src2.x);
+ res3.x = INTRINSIC (_maskz_dpwuuds_epi32) (mask, res3.x, src1.x, src2.x);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref2))
+ abort ();
+
+ MASK_ZERO (i_d) (res_ref2, mask, SIZE_RES);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c
new file mode 100644
index 00000000000..521768e92b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-builtin-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx10.2 -mno-avxvnniint16" } */
+typedef int v8si __attribute__ ((vector_size (32)));
+v8si
+foo (v8si a, v8si b, v8si c)
+{
+ return __builtin_ia32_vpdpwsud256 (a, b, c);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c
index c2b3e5527d9..1be3605b81c 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-media-1.c
@@ -36,11 +36,62 @@
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdpphps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmpsadbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
#include <immintrin.h>
+volatile __m256 a;
+volatile __m256h b,c;
volatile __m256i x,y,z;
+volatile __m128 a_;
+volatile __m128h b_,c_;
volatile __m128i x_,y_,z_;
+volatile __mmask16 m16;
volatile __mmask8 m;
void extern
@@ -93,4 +144,65 @@ avx10_2_test (void)
x_ = _mm_dpbuuds_epi32 (x_, y_, z_);
x_ = _mm_mask_dpbuuds_epi32 (x_, m, y_, z_);
x_ = _mm_maskz_dpbuuds_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwsud_epi32 (x, y, z);
+ x = _mm256_mask_dpwsud_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwsud_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwsud_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwsud_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwsud_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwsuds_epi32 (x, y, z);
+ x = _mm256_mask_dpwsuds_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwsuds_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwsuds_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwsuds_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwsuds_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwusd_epi32 (x, y, z);
+ x = _mm256_mask_dpwusd_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwusd_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwusd_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwusd_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwusd_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwusds_epi32 (x, y, z);
+ x = _mm256_mask_dpwusds_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwusds_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwusds_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwusds_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwusds_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwuud_epi32 (x, y, z);
+ x = _mm256_mask_dpwuud_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwuud_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwuud_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwuud_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwuud_epi32 (m, x_, y_, z_);
+
+ x = _mm256_dpwuuds_epi32 (x, y, z);
+ x = _mm256_mask_dpwuuds_epi32 (x, m, y, z);
+ x = _mm256_maskz_dpwuuds_epi32 (m, x, y, z);
+
+ x_ = _mm_dpwuuds_epi32 (x_, y_, z_);
+ x_ = _mm_mask_dpwuuds_epi32 (x_, m, y_, z_);
+ x_ = _mm_maskz_dpwuuds_epi32 (m, x_, y_, z_);
+
+ a = _mm256_dpph_ps (a, b, c);
+ a = _mm256_mask_dpph_ps (a, m, b, c);
+ a = _mm256_maskz_dpph_ps (m, a, b, c);
+
+ a_ = _mm_dpph_ps (a_, b_, c_);
+ a_ = _mm_mask_dpph_ps (a_, m, b_, c_);
+ a_ = _mm_maskz_dpph_ps (m, a_, b_, c_);
+
+ x = _mm256_mask_mpsadbw_epu8 (x, m16, y, z, 1);
+ x = _mm256_maskz_mpsadbw_epu8 (m16, x, y, 1);
+ x_ = _mm_mask_mpsadbw_epu8 (x_, m, y_, z_, 1);
+ x_ = _mm_maskz_mpsadbw_epu8 (m, x_, y_, 1);
}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c
new file mode 100644
index 00000000000..26d98b70590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vdpphps-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vdpphps-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vdpphps-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c
new file mode 100644
index 00000000000..746ea7baacb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmpsadbw-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmpsadbw-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmpsadbw-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c
new file mode 100644
index 00000000000..e1c7a81b54f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsud-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwsud-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwsud-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c
new file mode 100644
index 00000000000..d046fd8747a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwsuds-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwsuds-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwsuds-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c
new file mode 100644
index 00000000000..5a8af9b8728
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusd-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwusd-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwusd-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c
new file mode 100644
index 00000000000..88d877f381a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwusds-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwusds-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwusds-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c
new file mode 100644
index 00000000000..aaefe02d29d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuud-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwuud-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwuud-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c
new file mode 100644
index 00000000000..6a61112e161
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vpdpwuuds-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwuuds-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vpdpwuuds-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c b/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c
index 6ae57b150fe..5a093c97351 100644
--- a/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c
+++ b/gcc/testsuite/gcc.target/i386/avxvnniint16-1.c
@@ -1,17 +1,17 @@
/* { dg-do compile } */
/* { dg-options "-mavxvnniint16 -O2" } */
-/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwusd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwusds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpdpwuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
#include <immintrin.h>
@@ -40,4 +40,22 @@ avxvnniint16_test (void)
x = _mm256_dpwuuds_avx_epi32 (x, y, z);
x_ = _mm_dpwuuds_avx_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwusd_epi32 (x, y, z);
+ x_ = _mm_dpwusd_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwusds_epi32 (x, y, z);
+ x_ = _mm_dpwusds_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwsud_epi32 (x, y, z);
+ x_ = _mm_dpwsud_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwsuds_epi32 (x, y, z);
+ x_ = _mm_dpwsuds_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwuud_epi32 (x, y, z);
+ x_ = _mm_dpwuud_epi32 (x_, y_, z_);
+
+ x = _mm256_dpwuuds_epi32 (x, y, z);
+ x_ = _mm_dpwuuds_epi32 (x_, y_, z_);
}
diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c b/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c
new file mode 100644
index 00000000000..10e9b643920
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avxvnniint16-builtin.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavxvnniint16 -mno-avx10.2" } */
+typedef int v8si __attribute__ ((vector_size (32)));
+v8si
+foo (v8si a, v8si b, v8si c)
+{
+ return __builtin_ia32_vpdpwsud256 (a, b, c);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index a5b1775ed2d..6b1c9e545f0 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1010,4 +1010,12 @@
#define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8)
#define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8)
+/* avx10_2-512mediaintrin.h */
+#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1)
+#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E)
+
+/* avx10_2mediaintrin.h */
+#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E)
+#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E)
+
#include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 4736b2a5d52..6dfdaa96c76 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1371,3 +1371,14 @@ test_4x (_mm256_mask_fixupimm_round_pd, __m256d, __m256d, __mmask8, __m256d, __m
test_4x (_mm256_mask_fixupimm_round_ps, __m256, __m256, __mmask8, __m256, __m256i, 3, 8)
test_4x (_mm256_mask_range_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 15, 8)
test_4x (_mm256_mask_range_round_ps, __m256, __m256, __mmask8, __m256, __m256, 15, 8)
+
+/* avx10_2-512mediaintrin.h */
+test_2 (_mm512_mpsadbw_epu8, __m512i, __m512i, __m512i, 1)
+test_3 (_mm512_maskz_mpsadbw_epu8, __m512i, __mmask32, __m512i, __m512i, 1)
+test_4 (_mm512_mask_mpsadbw_epu8, __m512i, __m512i, __mmask32, __m512i, __m512i, 1)
+
+/* avx10_2mediaintrin.h */
+test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1)
+test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1)
+test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1)
+test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 5bfccd52630..102b6b878c8 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -1410,3 +1410,14 @@ test_4x (_mm256_mask_fixupimm_round_pd, __m256d, __m256d, __mmask8, __m256d, __m
test_4x (_mm256_mask_fixupimm_round_ps, __m256, __m256, __mmask8, __m256, __m256i, 3, 8)
test_4x (_mm256_mask_range_round_pd, __m256d, __m256d, __mmask8, __m256d, __m256d, 15, 8)
test_4x (_mm256_mask_range_round_ps, __m256, __m256, __mmask8, __m256, __m256, 15, 8)
+
+/* avx10_2-512mediaintrin.h */
+test_2 (_mm512_mpsadbw_epu8, __m512i, __m512i, __m512i, 1)
+test_3 (_mm512_maskz_mpsadbw_epu8, __m512i, __mmask32, __m512i, __m512i, 1)
+test_4 (_mm512_mask_mpsadbw_epu8, __m512i, __m512i, __mmask32, __m512i, __m512i, 1)
+
+/* avx10_2mediaintrin.h */
+test_3 (_mm_maskz_mpsadbw_epu8, __m128i, __mmask8, __m128i, __m128i, 1)
+test_3 (_mm256_maskz_mpsadbw_epu8, __m256i, __mmask16, __m256i, __m256i, 1)
+test_4 (_mm_mask_mpsadbw_epu8, __m128i, __m128i, __mmask8, __m128i, __m128i, 1)
+test_4 (_mm256_mask_mpsadbw_epu8, __m256i, __m256i, __mmask16, __m256i, __m256i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index e63c100f452..962b9507283 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -984,6 +984,14 @@
#define __builtin_ia32_subph256_mask_round(A, B, C, D, E) __builtin_ia32_subph256_mask_round(A, B, C, D, 8)
#define __builtin_ia32_subps256_mask_round(A, B, C, D, E) __builtin_ia32_subps256_mask_round(A, B, C, D, 8)
+/* avx10_2-512mediaintrin.h */
+#define __builtin_ia32_mpsadbw512(A, B, C) __builtin_ia32_mpsadbw512 (A, B, 1)
+#define __builtin_ia32_mpsadbw512_mask(A, B, C, D, E) __builtin_ia32_mpsadbw512_mask (A, B, 1, D, E)
+
+/* avx10_2-mediaintrin.h */
+#define __builtin_ia32_mpsadbw128_mask(A, B, C, D, E) __builtin_ia32_mpsadbw128_mask (A, B, 1, D, E)
+#define __builtin_ia32_mpsadbw256_mask(A, B, C, D, E) __builtin_ia32_mpsadbw256_mask (A, B, 1, D, E)
+
#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,sha,xsavec,xsaves,clflushopt,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,vpclmulqdq,pconfig,wbnoinvd,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avxifma,avxvnniint8,avxneconvert,cmpccxadd,amx-fp16,prefetchi,raoint,amx-complex,avxvnniint16,sm3,sha512,sm4,avx10.2-512")
#include <x86intrin.h>
--
2.43.5
next prev parent reply other threads:[~2024-08-19 8:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 8:56 [PATCH 00/12] AVX10.2: Support new instructions Haochen Jiang
2024-08-19 8:56 ` [PATCH 01/12] i386: Refactor m512-check.h Haochen Jiang
2024-08-19 8:56 ` [PATCH 02/12] [PATCH 1/2] AVX10.2: Support media instructions Haochen Jiang
2024-08-19 8:56 ` Haochen Jiang [this message]
2024-08-19 8:56 ` [PATCH 04/12] AVX10.2: Support convert instructions Haochen Jiang
2024-08-19 8:56 ` [PATCH 05/12] [PATCH 1/2] AVX10.2: Support BF16 instructions Haochen Jiang
2024-08-19 8:56 ` [PATCH 06/12] [PATCH 2/2] " Haochen Jiang
2024-08-19 8:56 ` [PATCH 07/12] [PATCH 1/2] AVX10.2: Support saturating convert instructions Haochen Jiang
2024-08-19 8:56 ` [PATCH 08/12] [PATCH 2/2] " Haochen Jiang
2024-08-19 9:02 ` [PATCH 09/12] AVX10.2: Support minmax instructions Haochen Jiang
2024-08-19 9:03 ` [PATCH 10/12] AVX10.2: Support vector copy instructions Haochen Jiang
2024-08-19 9:03 ` [PATCH 11/12] AVX10.2: Support compare instructions Haochen Jiang
2024-08-19 9:03 ` [PATCH 12/12] i386: Add bf8 -> fp16 intrin Haochen Jiang
2024-08-26 1:45 ` [PATCH 00/12] AVX10.2: Support new instructions Hongtao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240819085717.193256-4-haochen.jiang@intel.com \
--to=haochen.jiang@intel.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=hongyu.wang@intel.com \
--cc=ubizjak@gmail.com \
--cc=zewei.mo@pitt.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).