public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Haochen Jiang <haochen.jiang@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: hongtao.liu@intel.com, zewei.mo@pitt.edu, ubizjak@gmail.com,
	konglin1 <lingling.kong@intel.com>, Levy Hsu <admin@levyhsu.com>
Subject: [PATCH 05/12] [PATCH 1/2] AVX10.2: Support BF16 instructions
Date: Mon, 19 Aug 2024 01:56:49 -0700	[thread overview]
Message-ID: <20240819085717.193256-6-haochen.jiang@intel.com> (raw)
In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com>

From: konglin1 <lingling.kong@intel.com>

gcc/ChangeLog:

	* config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h.
	* config/i386/i386-builtin-types.def : Add new
	DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF,
	V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF,
	V8BF_FTYPE_V8BF_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_UHI,
	V32BF_FTYPE_V32BF_V32BF_USI, V32BF_FTYPE_V32BF_V32BF_V32BF_USI,
	V8BF_FTYPE_V8BF_V8BF_V8BF_UQI and V16BF_FTYPE_V16BF_V16BF_V16BF_UHI.
	* config/i386/i386-builtin.def (BDESC): Add new builtins.
	* config/i386/i386-expand.cc (ix86_expand_args_builtin):
	Handle new DEF_FUNCTION_TYPE.
	* config/i386/immintrin.h: Include avx10_2-512bf16intrin.h and
	avx10_2bf16intrin.h.
	* config/i386/sse.md
	(avx10_2_scalefpbf16_<mode><mask_name>): New define_insn.
	(avx10_2_<code>nepbf16_<mode><mask_name>): Ditto.
	(avx10_2_<insn>nepbf16_<mode><mask_name>): Ditto.
	(avx10_2_<bf16nefma132_213>pbf16_<mode>_maskz): Ditto.
	(avx10_2_<bf16nefma132_213>pbf16_<mode><mask_name>): Ditto.
	(avx10_2_<bf16nefma_231>pbf16_<mode>_mask3): Ditto.
	* config/i386/avx10_2-512bf16intrin.h: New file.
	* config/i386/avx10_2bf16intrin.h: Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512f-helper.h: Add MAKE_MASK_MERGE and MAKE_MASK_ZERO
	for bf16_uw.
	* gcc.target/i386/m512-check.h: Add union512bf16_uw, union256bf16_uw,
	union128bf16_uw and CHECK_EXP for them.
	* gcc.target/i386/avx10-helper.h: New file.
	* gcc.target/i386/avx10_2-512-bf16ne-1.c: New test.
	* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-bf16ne-1.c: Ditto.
	* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto.
	* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto.

Co-authored-by: Levy Hsu <admin@levyhsu.com>
---
 gcc/config.gcc                                |   2 +-
 gcc/config/i386/avx10_2-512bf16intrin.h       | 364 ++++++++++
 gcc/config/i386/avx10_2bf16intrin.h           | 685 ++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def        |   9 +
 gcc/config/i386/i386-builtin.def              |  78 ++
 gcc/config/i386/i386-expand.cc                |   9 +
 gcc/config/i386/immintrin.h                   |   4 +
 gcc/config/i386/sse.md                        | 293 ++++++++
 gcc/testsuite/gcc.target/i386/avx10-helper.h  |  48 +-
 .../gcc.target/i386/avx10_2-512-bf16-1.c      |  87 +++
 .../i386/avx10_2-512-vaddnepbf16-2.c          |  49 ++
 .../i386/avx10_2-512-vdivnepbf16-2.c          |  49 ++
 .../i386/avx10_2-512-vfmaddXXXnepbf16-2.c     |  52 ++
 .../i386/avx10_2-512-vfmsubXXXnepbf16-2.c     |  53 ++
 .../i386/avx10_2-512-vfnmaddXXXnepbf16-2.c    |  53 ++
 .../i386/avx10_2-512-vfnmsubXXXnepbf16-2.c    |  53 ++
 .../gcc.target/i386/avx10_2-512-vmaxpbf16-2.c |  51 ++
 .../gcc.target/i386/avx10_2-512-vminpbf16-2.c |  51 ++
 .../i386/avx10_2-512-vmulnepbf16-2.c          |  49 ++
 .../i386/avx10_2-512-vscalefpbf16-2.c         |  51 ++
 .../i386/avx10_2-512-vsubnepbf16-2.c          |  49 ++
 .../gcc.target/i386/avx10_2-bf16-1.c          | 172 +++++
 .../gcc.target/i386/avx10_2-vaddnepbf16-2.c   |  16 +
 .../gcc.target/i386/avx10_2-vdivnepbf16-2.c   |  16 +
 .../i386/avx10_2-vfmaddXXXnepbf16-2.c         |  16 +
 .../i386/avx10_2-vfmsubXXXnepbf16-2.c         |  16 +
 .../i386/avx10_2-vfnmaddXXXnepbf16-2.c        |  16 +
 .../i386/avx10_2-vfnmsubXXXnepbf16-2.c        |  16 +
 .../gcc.target/i386/avx10_2-vmaxpbf16-2.c     |  16 +
 .../gcc.target/i386/avx10_2-vminpbf16-2.c     |  16 +
 .../gcc.target/i386/avx10_2-vmulnepbf16-2.c   |  16 +
 .../gcc.target/i386/avx10_2-vscalefpbf16-2.c  |  16 +
 .../gcc.target/i386/avx10_2-vsubnepbf16-2.c   |  16 +
 .../gcc.target/i386/avx512f-helper.h          |   2 +
 gcc/testsuite/gcc.target/i386/m512-check.h    |  27 +
 35 files changed, 2514 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/i386/avx10_2-512bf16intrin.h
 create mode 100644 gcc/config/i386/avx10_2bf16intrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5e9c36a2aad..7d761b257cd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -454,7 +454,7 @@ i[34567]86-*-* | x86_64-*-*)
 		       sm3intrin.h sha512intrin.h sm4intrin.h
 		       usermsrintrin.h avx10_2roundingintrin.h
 		       avx10_2mediaintrin.h avx10_2-512mediaintrin.h
-		       avx10_2convertintrin.h avx10_2-512convertintrin.h"
+		       avx10_2bf16intrin.h avx10_2-512bf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h b/gcc/config/i386/avx10_2-512bf16intrin.h
new file mode 100644
index 00000000000..b409ea17adb
--- /dev/null
+++ b/gcc/config/i386/avx10_2-512bf16intrin.h
@@ -0,0 +1,364 @@
+/* Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <avx10_2-512bf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AVX10_2_512BF16INTRIN_H_INCLUDED
+#define _AVX10_2_512BF16INTRIN_H_INCLUDED
+
+#if !defined (__AVX10_2_512__)
+#pragma GCC push_options
+#pragma GCC target("avx10.2-512")
+#define __DISABLE_AVX10_2_512__
+#endif /* __AVX10_2_512__ */
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_addne_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_addnepbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_addne_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_addnepbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_addne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_addnepbf16512_mask (__A, __B,
+				       (__v32bf) _mm512_setzero_si512 (),
+				        __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_subne_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_subnepbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_subne_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_subnepbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_subne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_subnepbf16512_mask (__A, __B,
+				       (__v32bf) _mm512_setzero_si512 (),
+				        __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mulne_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_mulnepbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mulne_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_mulnepbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mulne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_mulnepbf16512_mask (__A, __B,
+				       (__v32bf) _mm512_setzero_si512 (),
+				        __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_divne_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_divnepbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_divne_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_divnepbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_divne_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_divnepbf16512_mask (__A, __B,
+				       (__v32bf) _mm512_setzero_si512 (),
+				        __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_max_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_maxpbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_maxpbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_maxpbf16512_mask (__A, __B,
+				     (__v32bf) _mm512_setzero_si512 (),
+				     __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_min_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_minpbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_min_pbh (__m512bh __W, __mmask32 __U,
+		       __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_minpbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_min_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_minpbf16512_mask (__A, __B,
+				     (__v32bf) _mm512_setzero_si512 (),
+				     __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_scalef_pbh (__m512bh __A, __m512bh __B)
+{
+  return (__m512bh) __builtin_ia32_scalefpbf16512 (__A, __B);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_scalef_pbh (__m512bh __W, __mmask32 __U,
+			  __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_scalefpbf16512_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_scalef_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
+{
+  return (__m512bh)
+    __builtin_ia32_scalefpbf16512_mask (__A, __B,
+					(__v32bf) _mm512_setzero_si512 (),
+					__U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmaddne_pbh (__m512bh __A, __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmaddnepbf16512_mask (__A, __B, __C, (__mmask32) -1);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmaddne_pbh (__m512bh __A, __mmask32 __U,
+			 __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmaddnepbf16512_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmaddne_pbh (__m512bh __A, __m512bh __B,
+			  __m512bh __C, __mmask32 __U)
+{
+  return (__m512bh)
+    __builtin_ia32_fmaddnepbf16512_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmaddne_pbh (__mmask32 __U, __m512bh __A,
+			  __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmaddnepbf16512_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmsubne_pbh (__m512bh __A, __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmsubnepbf16512_mask (__A, __B, __C, (__mmask32) -1);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmsubne_pbh (__m512bh __A, __mmask32 __U,
+			 __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmsubnepbf16512_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmsubne_pbh (__m512bh __A, __m512bh __B,
+			  __m512bh __C, __mmask32 __U)
+{
+  return (__m512bh)
+    __builtin_ia32_fmsubnepbf16512_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmsubne_pbh (__mmask32 __U, __m512bh __A,
+			  __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fmsubnepbf16512_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmaddne_pbh (__m512bh __A, __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmaddnepbf16512_mask (__A, __B, __C, (__mmask32) -1);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmaddne_pbh (__m512bh __A, __mmask32 __U,
+			  __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmaddnepbf16512_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmaddne_pbh (__m512bh __A, __m512bh __B,
+			   __m512bh __C, __mmask32 __U)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmaddnepbf16512_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmaddne_pbh (__mmask32 __U, __m512bh __A,
+			   __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmaddnepbf16512_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmsubne_pbh (__m512bh __A, __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmsubnepbf16512_mask (__A, __B, __C, (__mmask32) -1);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmsubne_pbh (__m512bh __A, __mmask32 __U,
+			  __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmsubnepbf16512_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmsubne_pbh (__m512bh __A, __m512bh __B,
+			   __m512bh __C, __mmask32 __U)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmsubnepbf16512_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m512bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmsubne_pbh (__mmask32 __U, __m512bh __A,
+			   __m512bh __B, __m512bh __C)
+{
+  return (__m512bh)
+    __builtin_ia32_fnmsubnepbf16512_maskz (__A, __B, __C, __U);
+}
+
+#ifdef __DISABLE_AVX10_2_512__
+#undef __DISABLE_AVX10_2_512__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX10_2_512__ */
+
+#endif /* _AVX10_2_512BF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx10_2bf16intrin.h b/gcc/config/i386/avx10_2bf16intrin.h
new file mode 100644
index 00000000000..e16f1b66481
--- /dev/null
+++ b/gcc/config/i386/avx10_2bf16intrin.h
@@ -0,0 +1,685 @@
+/* Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <avx10_2bf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AVX10_2BF16INTRIN_H_INCLUDED
+#define _AVX10_2BF16INTRIN_H_INCLUDED
+
+#if !defined(__AVX10_2_256__)
+#pragma GCC push_options
+#pragma GCC target("avx10.2")
+#define __DISABLE_AVX10_2_256__
+#endif /* __AVX10_2_256__ */
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_addne_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_addnepbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_addne_pbh (__m256bh __W, __mmask16 __U,
+		       __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_addnepbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_addne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_addnepbf16256_mask (__A, __B,
+				       (__v16bf) _mm256_setzero_si256 (),
+				       __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_addne_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_addnepbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_addne_pbh (__m128bh __W, __mmask8 __U,
+		    __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_addnepbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_addne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_addnepbf16128_mask (__A, __B,
+				       (__v8bf) _mm_setzero_si128 (),
+				       __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_subne_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_subnepbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_subne_pbh (__m256bh __W, __mmask16 __U,
+		       __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_subnepbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_subne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_subnepbf16256_mask (__A, __B,
+				       (__v16bf) _mm256_setzero_si256 (),
+				       __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_subne_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_subnepbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_subne_pbh (__m128bh __W, __mmask8 __U,
+		    __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_subnepbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_subne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_subnepbf16128_mask (__A, __B,
+				       (__v8bf) _mm_setzero_si128 (),
+				       __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mulne_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_mulnepbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_mulne_pbh (__m256bh __W, __mmask16 __U,
+		       __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_mulnepbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_mulne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_mulnepbf16256_mask (__A, __B,
+				       (__v16bf) _mm256_setzero_si256 (),
+				       __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mulne_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_mulnepbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mulne_pbh (__m128bh __W, __mmask8 __U,
+		    __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_mulnepbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_mulne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_mulnepbf16128_mask (__A, __B,
+				       (__v8bf) _mm_setzero_si128 (),
+				       __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_divne_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_divnepbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_divne_pbh (__m256bh __W, __mmask16 __U,
+		       __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_divnepbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_divne_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_divnepbf16256_mask (__A, __B,
+				       (__v16bf) _mm256_setzero_si256 (),
+				       __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_divne_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_divnepbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_divne_pbh (__m128bh __W, __mmask8 __U,
+		    __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_divnepbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_divne_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_divnepbf16128_mask (__A, __B,
+				       (__v8bf) _mm_setzero_si128 (),
+				       __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_max_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_maxpbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_max_pbh (__m256bh __W, __mmask16 __U,
+		       __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_maxpbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_max_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_maxpbf16256_mask (__A, __B,
+				       (__v16bf) _mm256_setzero_si256 (),
+				       __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_maxpbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_max_pbh (__m128bh __W, __mmask8 __U,
+		    __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_maxpbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_max_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_maxpbf16128_mask (__A, __B,
+				     (__v8bf) _mm_setzero_si128 (),
+				     __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_min_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_minpbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_min_pbh (__m256bh __W, __mmask16 __U,
+		     __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_minpbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_min_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_minpbf16256_mask (__A, __B,
+				     (__v16bf) _mm256_setzero_si256 (),
+				     __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_minpbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_min_pbh (__m128bh __W, __mmask8 __U,
+		  __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_minpbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_min_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_minpbf16128_mask (__A, __B,
+				     (__v8bf) _mm_setzero_si128 (),
+				     __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_scalef_pbh (__m256bh __A, __m256bh __B)
+{
+  return (__m256bh) __builtin_ia32_scalefpbf16256 (__A, __B);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_scalef_pbh (__m256bh __W, __mmask16 __U,
+			__m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_scalefpbf16256_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_scalef_pbh (__mmask16 __U, __m256bh __A, __m256bh __B)
+{
+  return (__m256bh)
+    __builtin_ia32_scalefpbf16256_mask (__A, __B,
+					(__v16bf) _mm256_setzero_si256 (),
+					__U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_pbh (__m128bh __A, __m128bh __B)
+{
+  return (__m128bh) __builtin_ia32_scalefpbf16128 (__A, __B);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_scalef_pbh (__m128bh __W, __mmask8 __U,
+		     __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_scalefpbf16128_mask (__A, __B, __W, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_scalef_pbh (__mmask8 __U, __m128bh __A, __m128bh __B)
+{
+  return (__m128bh)
+    __builtin_ia32_scalefpbf16128_mask (__A, __B,
+					(__v8bf) _mm_setzero_si128 (),
+					__U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmaddne_pbh (__m256bh __A, __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fmaddnepbf16256_mask (__A, __B, __C, (__mmask16) -1);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmaddne_pbh (__m256bh __A, __mmask16 __U,
+			 __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fmaddnepbf16256_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmaddne_pbh (__m256bh __A, __m256bh __B,
+			  __m256bh __C, __mmask16 __U)
+{
+  return (__m256bh)
+    __builtin_ia32_fmaddnepbf16256_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmaddne_pbh (__mmask16 __U, __m256bh __A,
+			  __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fmaddnepbf16256_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmaddne_pbh (__m128bh __A, __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmaddnepbf16128_mask (__A, __B, __C, (__mmask8) -1);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmaddne_pbh (__m128bh __A, __mmask8 __U,
+		      __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmaddnepbf16128_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmaddne_pbh (__m128bh __A, __m128bh __B,
+		       __m128bh __C, __mmask8 __U)
+{
+  return (__m128bh)
+    __builtin_ia32_fmaddnepbf16128_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmaddne_pbh (__mmask8 __U, __m128bh __A,
+		       __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmaddnepbf16128_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmsubne_pbh (__m256bh __A, __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fmsubnepbf16256_mask (__A, __B, __C, (__mmask16) -1);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmsubne_pbh (__m256bh __A, __mmask16 __U,
+			 __m256bh __B, __m256bh __C)
+{
+  return (__m256bh) __builtin_ia32_fmsubnepbf16256_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmsubne_pbh (__m256bh __A, __m256bh __B,
+			  __m256bh __C, __mmask16 __U)
+{
+  return (__m256bh)
+    __builtin_ia32_fmsubnepbf16256_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmsubne_pbh (__mmask16 __U, __m256bh __A,
+			  __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fmsubnepbf16256_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsubne_pbh (__m128bh __A, __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmsubnepbf16128_mask (__A, __B, __C, (__mmask8) -1);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmsubne_pbh (__m128bh __A, __mmask8 __U,
+		      __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmsubnepbf16128_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsubne_pbh (__m128bh __A, __m128bh __B,
+		       __m128bh __C, __mmask8 __U)
+{
+  return (__m128bh)
+    __builtin_ia32_fmsubnepbf16128_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmsubne_pbh (__mmask8 __U, __m128bh __A,
+		       __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fmsubnepbf16128_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmaddne_pbh (__m256bh __A, __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmaddnepbf16256_mask (__A, __B, __C, (__mmask16) -1);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmaddne_pbh (__m256bh __A, __mmask16 __U,
+			  __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmaddnepbf16256_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmaddne_pbh (__m256bh __A, __m256bh __B,
+			   __m256bh __C, __mmask16 __U)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmaddnepbf16256_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmaddne_pbh (__mmask16 __U, __m256bh __A,
+			   __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmaddnepbf16256_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmaddne_pbh (__m128bh __A, __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmaddnepbf16128_mask (__A, __B, __C, (__mmask8) -1);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmaddne_pbh (__m128bh __A, __mmask8 __U,
+		       __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmaddnepbf16128_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmaddne_pbh (__m128bh __A, __m128bh __B,
+			__m128bh __C, __mmask8 __U)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmaddnepbf16128_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmaddne_pbh (__mmask8 __U, __m128bh __A,
+			__m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmaddnepbf16128_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmsubne_pbh (__m256bh __A, __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmsubnepbf16256_mask (__A, __B, __C, (__mmask16) -1);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmsubne_pbh (__m256bh __A, __mmask16 __U,
+			  __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmsubnepbf16256_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmsubne_pbh (__m256bh __A, __m256bh __B,
+			   __m256bh __C, __mmask16 __U)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmsubnepbf16256_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m256bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmsubne_pbh (__mmask16 __U, __m256bh __A,
+			   __m256bh __B, __m256bh __C)
+{
+  return (__m256bh)
+    __builtin_ia32_fnmsubnepbf16256_maskz (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsubne_pbh (__m128bh __A, __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmsubnepbf16128_mask (__A, __B, __C, (__mmask8) -1);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmsubne_pbh (__m128bh __A, __mmask8 __U,
+		       __m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmsubnepbf16128_mask (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmsubne_pbh (__m128bh __A, __m128bh __B,
+			__m128bh __C, __mmask8 __U)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmsubnepbf16128_mask3 (__A, __B, __C, __U);
+}
+
+extern __inline__ __m128bh
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmsubne_pbh (__mmask8 __U, __m128bh __A,
+			__m128bh __B, __m128bh __C)
+{
+  return (__m128bh)
+    __builtin_ia32_fnmsubnepbf16128_maskz (__A, __B, __C, __U);
+}
+
+#ifdef __DISABLE_AVX10_2_256__
+#undef __DISABLE_AVX10_2_256__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX10_2_256__ */
+
+#endif /* __AVX10_2BF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 63b65846c8f..f3838424fd4 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1474,3 +1474,12 @@ DEF_FUNCTION_TYPE (V64QI, V32HF, V32HF, V64QI, UDI)
 DEF_FUNCTION_TYPE (V16QI, V8HF, V16QI, UQI)
 DEF_FUNCTION_TYPE (V16QI, V16HF, V16QI, UHI)
 DEF_FUNCTION_TYPE (V32QI, V32HF, V32QI, USI)
+DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF)
+DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF)
+DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF)
+DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, USI)
+DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, UHI)
+DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, UQI)
+DEF_FUNCTION_TYPE (V32BF, V32BF, V32BF, V32BF, USI)
+DEF_FUNCTION_TYPE (V16BF, V16BF, V16BF, V16BF, UHI)
+DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, V8BF, UQI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 6f5ab32dd0d..3f3bc768348 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3159,6 +3159,84 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvtneph2hf8sv32hf_mask, "__bui
 BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv8hf_mask, "__builtin_ia32_vcvthf82ph128_mask", IX86_BUILTIN_VCVTHF82PH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V16QI_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_vcvthf82phv16hf_mask, "__builtin_ia32_vcvthf82ph256_mask", IX86_BUILTIN_VCVTHF82PH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16QI_V16HF_UHI)
 BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_vcvthf82phv32hf_mask, "__builtin_ia32_vcvthf82ph512_mask", IX86_BUILTIN_VCVTHF82PH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32QI_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_addnepbf16_v32bf, "__builtin_ia32_addnepbf16512", IX86_BUILTIN_ADDNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_addnepbf16_v32bf_mask, "__builtin_ia32_addnepbf16512_mask", IX86_BUILTIN_ADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v16bf, "__builtin_ia32_addnepbf16256", IX86_BUILTIN_ADDNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v16bf_mask, "__builtin_ia32_addnepbf16256_mask", IX86_BUILTIN_ADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v8bf, "__builtin_ia32_addnepbf16128", IX86_BUILTIN_ADDNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_addnepbf16_v8bf_mask, "__builtin_ia32_addnepbf16128_mask", IX86_BUILTIN_ADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_subnepbf16_v32bf, "__builtin_ia32_subnepbf16512", IX86_BUILTIN_SUBNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_subnepbf16_v32bf_mask, "__builtin_ia32_subnepbf16512_mask", IX86_BUILTIN_SUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v16bf, "__builtin_ia32_subnepbf16256", IX86_BUILTIN_SUBNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v16bf_mask, "__builtin_ia32_subnepbf16256_mask", IX86_BUILTIN_SUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v8bf, "__builtin_ia32_subnepbf16128", IX86_BUILTIN_SUBNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_subnepbf16_v8bf_mask, "__builtin_ia32_subnepbf16128_mask", IX86_BUILTIN_SUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mulnepbf16_v32bf, "__builtin_ia32_mulnepbf16512", IX86_BUILTIN_MULNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_mulnepbf16_v32bf_mask, "__builtin_ia32_mulnepbf16512_mask", IX86_BUILTIN_MULNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v16bf, "__builtin_ia32_mulnepbf16256", IX86_BUILTIN_MULNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v16bf_mask, "__builtin_ia32_mulnepbf16256_mask", IX86_BUILTIN_MULNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v8bf, "__builtin_ia32_mulnepbf16128", IX86_BUILTIN_MULNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_mulnepbf16_v8bf_mask, "__builtin_ia32_mulnepbf16128_mask", IX86_BUILTIN_MULNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_divnepbf16_v32bf, "__builtin_ia32_divnepbf16512", IX86_BUILTIN_DIVNEPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_divnepbf16_v32bf_mask, "__builtin_ia32_divnepbf16512_mask", IX86_BUILTIN_DIVNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v16bf, "__builtin_ia32_divnepbf16256", IX86_BUILTIN_DIVNEPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v16bf_mask, "__builtin_ia32_divnepbf16256_mask", IX86_BUILTIN_DIVNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v8bf, "__builtin_ia32_divnepbf16128", IX86_BUILTIN_DIVNEPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_divnepbf16_v8bf_mask, "__builtin_ia32_divnepbf16128_mask", IX86_BUILTIN_DIVNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_smaxpbf16_v32bf, "__builtin_ia32_maxpbf16512", IX86_BUILTIN_MAXPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_smaxpbf16_v32bf_mask, "__builtin_ia32_maxpbf16512_mask", IX86_BUILTIN_MAXPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v16bf, "__builtin_ia32_maxpbf16256", IX86_BUILTIN_MAXPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v16bf_mask, "__builtin_ia32_maxpbf16256_mask", IX86_BUILTIN_MAXPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v8bf, "__builtin_ia32_maxpbf16128", IX86_BUILTIN_MAXPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_smaxpbf16_v8bf_mask, "__builtin_ia32_maxpbf16128_mask", IX86_BUILTIN_MAXPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_sminpbf16_v32bf, "__builtin_ia32_minpbf16512", IX86_BUILTIN_MINPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_sminpbf16_v32bf_mask, "__builtin_ia32_minpbf16512_mask", IX86_BUILTIN_MINPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v16bf, "__builtin_ia32_minpbf16256", IX86_BUILTIN_MINPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v16bf_mask, "__builtin_ia32_minpbf16256_mask", IX86_BUILTIN_MINPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v8bf, "__builtin_ia32_minpbf16128", IX86_BUILTIN_MINPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_sminpbf16_v8bf_mask, "__builtin_ia32_minpbf16128_mask", IX86_BUILTIN_MINPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_scalefpbf16_v32bf, "__builtin_ia32_scalefpbf16512", IX86_BUILTIN_SCALEFPBF16512, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_scalefpbf16_v32bf_mask, "__builtin_ia32_scalefpbf16512_mask", IX86_BUILTIN_SCALEFPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v16bf, "__builtin_ia32_scalefpbf16256", IX86_BUILTIN_SCALEFPBF16256, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v16bf_mask, "__builtin_ia32_scalefpbf16256_mask", IX86_BUILTIN_SCALEFPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v8bf, "__builtin_ia32_scalefpbf16128", IX86_BUILTIN_SCALEFPBF16128, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_scalefpbf16_v8bf_mask, "__builtin_ia32_scalefpbf16128_mask", IX86_BUILTIN_SCALEFPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_mask, "__builtin_ia32_fmaddnepbf16512_mask", IX86_BUILTIN_FMADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_mask3, "__builtin_ia32_fmaddnepbf16512_mask3", IX86_BUILTIN_FMADDNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmaddnepbf16_v32bf_maskz, "__builtin_ia32_fmaddnepbf16512_maskz", IX86_BUILTIN_FMADDNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_mask, "__builtin_ia32_fmaddnepbf16256_mask", IX86_BUILTIN_FMADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_mask3, "__builtin_ia32_fmaddnepbf16256_mask3", IX86_BUILTIN_FMADDNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v16bf_maskz, "__builtin_ia32_fmaddnepbf16256_maskz", IX86_BUILTIN_FMADDNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_mask, "__builtin_ia32_fmaddnepbf16128_mask", IX86_BUILTIN_FMADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_mask3, "__builtin_ia32_fmaddnepbf16128_mask3", IX86_BUILTIN_FMADDNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmaddnepbf16_v8bf_maskz, "__builtin_ia32_fmaddnepbf16128_maskz", IX86_BUILTIN_FMADDNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_mask, "__builtin_ia32_fmsubnepbf16512_mask", IX86_BUILTIN_FMSUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_mask3, "__builtin_ia32_fmsubnepbf16512_mask3", IX86_BUILTIN_FMSUBNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fmsubnepbf16_v32bf_maskz, "__builtin_ia32_fmsubnepbf16512_maskz", IX86_BUILTIN_FMSUBNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_mask, "__builtin_ia32_fmsubnepbf16256_mask", IX86_BUILTIN_FMSUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_mask3, "__builtin_ia32_fmsubnepbf16256_mask3", IX86_BUILTIN_FMSUBNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v16bf_maskz, "__builtin_ia32_fmsubnepbf16256_maskz", IX86_BUILTIN_FMSUBNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_mask, "__builtin_ia32_fmsubnepbf16128_mask", IX86_BUILTIN_FMSUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_mask3, "__builtin_ia32_fmsubnepbf16128_mask3", IX86_BUILTIN_FMSUBNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fmsubnepbf16_v8bf_maskz, "__builtin_ia32_fmsubnepbf16128_maskz", IX86_BUILTIN_FMSUBNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_mask, "__builtin_ia32_fnmaddnepbf16512_mask", IX86_BUILTIN_FNMADDNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_mask3, "__builtin_ia32_fnmaddnepbf16512_mask3", IX86_BUILTIN_FNMADDNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmaddnepbf16_v32bf_maskz, "__builtin_ia32_fnmaddnepbf16512_maskz", IX86_BUILTIN_FNMADDNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_mask, "__builtin_ia32_fnmaddnepbf16256_mask", IX86_BUILTIN_FNMADDNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_mask3, "__builtin_ia32_fnmaddnepbf16256_mask3", IX86_BUILTIN_FNMADDNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v16bf_maskz, "__builtin_ia32_fnmaddnepbf16256_maskz", IX86_BUILTIN_FNMADDNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_mask, "__builtin_ia32_fnmaddnepbf16128_mask", IX86_BUILTIN_FNMADDNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_mask3, "__builtin_ia32_fnmaddnepbf16128_mask3", IX86_BUILTIN_FNMADDNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmaddnepbf16_v8bf_maskz, "__builtin_ia32_fnmaddnepbf16128_maskz", IX86_BUILTIN_FNMADDNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_mask, "__builtin_ia32_fnmsubnepbf16512_mask", IX86_BUILTIN_FNMSUBNEPBF16512_MASK, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_mask3, "__builtin_ia32_fnmsubnepbf16512_mask3", IX86_BUILTIN_FNMSUBNEPBF16512_MASK3, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_fnmsubnepbf16_v32bf_maskz, "__builtin_ia32_fnmsubnepbf16512_maskz", IX86_BUILTIN_FNMSUBNEPBF16512_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V32BF_V32BF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_mask, "__builtin_ia32_fnmsubnepbf16256_mask", IX86_BUILTIN_FNMSUBNEPBF16256_MASK, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_mask3, "__builtin_ia32_fnmsubnepbf16256_mask3", IX86_BUILTIN_FNMSUBNEPBF16256_MASK3, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v16bf_maskz, "__builtin_ia32_fnmsubnepbf16256_maskz", IX86_BUILTIN_FNMSUBNEPBF16256_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16BF_V16BF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask, "__builtin_ia32_fnmsubnepbf16128_mask", IX86_BUILTIN_FNMSUBNEPBF16128_MASK, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_mask3, "__builtin_ia32_fnmsubnepbf16128_mask3", IX86_BUILTIN_FNMSUBNEPBF16128_MASK3, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_fnmsubnepbf16_v8bf_maskz, "__builtin_ia32_fnmsubnepbf16128_maskz", IX86_BUILTIN_FNMSUBNEPBF16128_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8BF_V8BF_V8BF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index c5305395a64..dff9e09809e 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -11330,6 +11330,9 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HI_FTYPE_V8SI_V8SI:
     case V64QI_FTYPE_V64QI_V64QI:
     case V32QI_FTYPE_V32QI_V32QI:
+    case V32BF_FTYPE_V32BF_V32BF:
+    case V16BF_FTYPE_V16BF_V16BF:
+    case V8BF_FTYPE_V8BF_V8BF:
     case V16HI_FTYPE_V32QI_V32QI:
     case V16HI_FTYPE_V16HI_V16HI:
     case V8SI_FTYPE_V4DF_V4DF:
@@ -11497,6 +11500,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HI_FTYPE_V8HI_V16HI_UHI:
     case V16HI_FTYPE_HI_V16HI_UHI:
     case V8HI_FTYPE_V8HI_V8HI_UQI:
+    case V8BF_FTYPE_V8BF_V8BF_UQI:
     case V8HI_FTYPE_HI_V8HI_UQI:
     case V16HF_FTYPE_V16HF_V16HF_UHI:
     case V8SF_FTYPE_V8HI_V8SF_UQI:
@@ -11594,9 +11598,11 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HF_FTYPE_V16HF_V16HF_V16HF:
     case V16HI_FTYPE_V16HF_V16HI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_UHI:
+    case V16BF_FTYPE_V16BF_V16BF_UHI:
     case V8HI_FTYPE_V16QI_V8HI_UQI:
     case V16HI_FTYPE_V16QI_V16HI_UHI:
     case V32HI_FTYPE_V32HI_V32HI_USI:
+    case V32BF_FTYPE_V32BF_V32BF_USI:
     case V32HI_FTYPE_V32QI_V32HI_USI:
     case V8DI_FTYPE_V16QI_V8DI_UQI:
     case V8DI_FTYPE_V2DI_V8DI_UQI:
@@ -11726,6 +11732,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
       break;
     case V32QI_FTYPE_V32QI_V32QI_V32QI_USI:
     case V32HI_FTYPE_V32HI_V32HI_V32HI_USI:
+    case V32BF_FTYPE_V32BF_V32BF_V32BF_USI:
     case V32HI_FTYPE_V64QI_V64QI_V32HI_USI:
     case V16SI_FTYPE_V32HI_V32HI_V16SI_UHI:
     case V64QI_FTYPE_V64QI_V64QI_V64QI_UDI:
@@ -11756,6 +11763,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16SI_FTYPE_V16SI_V16SI_V16SI_UHI:
     case V16SI_FTYPE_V16SI_V4SI_V16SI_UHI:
     case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI:
+    case V8BF_FTYPE_V8BF_V8BF_V8BF_UQI:
     case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI:
     case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI:
     case V16HF_FTYPE_V16HF_V16HF_V16HF_UQI:
@@ -11763,6 +11771,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI:
     case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_V16HI_UHI:
+    case V16BF_FTYPE_V16BF_V16BF_V16BF_UHI:
     case V2DI_FTYPE_V2DI_V2DI_V2DI_UQI:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI:
     case V4DI_FTYPE_V4DI_V4DI_V4DI_UQI:
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index fea55a298fc..025334027eb 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -148,4 +148,8 @@
 
 #include <avx10_2-512convertintrin.h>
 
+#include <avx10_2bf16intrin.h>
+
+#include <avx10_2-512bf16intrin.h>
+
 #endif /* _IMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1d62f96dcc5..50274f01a01 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -229,6 +229,7 @@
   UNSPEC_VCVTNEPH2HF8
   UNSPEC_VCVTNEPH2HF8S
   UNSPEC_VCVTHF82PH
+  UNSPEC_VSCALEFPBF16
 ])
 
 (define_c_enum "unspecv" [
@@ -499,6 +500,9 @@
 (define_mode_iterator VHF_AVX10_2
   [(V32HF "TARGET_AVX10_2_512") V16HF V8HF])
 
+(define_mode_iterator VBF_AVX10_2
+  [(V32BF "TARGET_AVX10_2_512") V16BF V8BF])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
@@ -31812,3 +31816,292 @@
   "TARGET_AVX10_2_256"
   "vdpphps\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
   [(set_attr "prefix" "evex")])
+
+(define_insn "avx10_2_scalefpbf16_<mode><mask_name>"
+   [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+      (unspec:VBF_AVX10_2
+	[(match_operand:VBF_AVX10_2 1 "register_operand" "v")
+	 (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")]
+       UNSPEC_VSCALEFPBF16))]
+   "TARGET_AVX10_2_256"
+   "vscalefpbf16\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   [(set_attr "prefix" "evex")])
+
+(define_insn "avx10_2_<code>pbf16_<mode><mask_name>"
+   [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+      (smaxmin:VBF_AVX10_2
+	 (match_operand:VBF_AVX10_2 1 "register_operand" "v")
+	 (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")))]
+   "TARGET_AVX10_2_256"
+   "v<maxmin_float>pbf16\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   [(set_attr "prefix" "evex")
+    (set_attr "mode" "<MODE>")])
+
+(define_insn "avx10_2_<insn>nepbf16_<mode><mask_name>"
+   [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+      (plusminusmultdiv:VBF_AVX10_2
+	(match_operand:VBF_AVX10_2 1 "register_operand" "v")
+	(match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")))]
+   "TARGET_AVX10_2_256"
+   "v<insn>nepbf16\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   [(set_attr "prefix" "evex")])
+
+(define_expand "avx10_2_fmaddnepbf16_<mode>_maskz"
+  [(match_operand:VBF_AVX10_2 0 "register_operand")
+   (match_operand:VBF_AVX10_2 1 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 2 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX10_2_256"
+  {
+    emit_insn (gen_avx10_2_fmaddnepbf16_<mode>_maskz_1 (operands[0], operands[1],
+						      operands[2], operands[3], 
+						      CONST0_RTX(<MODE>mode), 
+						      operands[4]));
+    DONE;
+  })
+
+(define_insn "avx10_2_fmaddnepbf16_<mode><sd_maskz_name>"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v")
+	  (fma:VBF_AVX10_2
+	    (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v")
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm")
+	    (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfmadd132nepbf16\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmadd213nepbf16\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmadd231nepbf16\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fmaddnepbf16_<mode>_mask"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+	     (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0")
+	     (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfmadd132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmadd213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fmaddnepbf16_<mode>_mask3"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+	     (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v")
+	     (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")
+	     (match_operand:VBF_AVX10_2 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+  "TARGET_AVX10_2_256"
+  "vfmadd231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx10_2_fnmaddnepbf16_<mode>_maskz"
+  [(match_operand:VBF_AVX10_2 0 "register_operand")
+   (match_operand:VBF_AVX10_2 1 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 2 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX10_2_256"
+  {
+    emit_insn (gen_avx10_2_fnmaddnepbf16_<mode>_maskz_1 (operands[0], operands[1],
+						       operands[2], operands[3], 
+						       CONST0_RTX(<MODE>mode), 
+						       operands[4]));
+    DONE;
+  })
+
+(define_insn "avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v")
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm")
+	    (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfnmadd132nepbf16\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfnmadd213nepbf16\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfnmadd231nepbf16\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fnmaddnepbf16_<mode>_mask"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfnmadd132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmadd213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fnmaddnepbf16_<mode>_mask3"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")
+	    (match_operand:VBF_AVX10_2 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+  "TARGET_AVX10_2_256"
+  "vfnmadd231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx10_2_fmsubnepbf16_<mode>_maskz"
+  [(match_operand:VBF_AVX10_2 0 "register_operand")
+   (match_operand:VBF_AVX10_2 1 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 2 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX10_2_256"
+  {
+    emit_insn (gen_avx10_2_fmsubnepbf16_<mode>_maskz_1 (operands[0], operands[1],
+						      operands[2], operands[3], 
+						      CONST0_RTX(<MODE>mode), 
+						      operands[4]));
+    DONE;
+  })
+
+(define_insn "avx10_2_fmsubnepbf16_<mode><sd_maskz_name>"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v")
+	  (fma:VBF_AVX10_2
+	    (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v")
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm")
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0"))))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfmsub132nepbf16\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmsub213nepbf16\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmsub231nepbf16\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fmsubnepbf16_<mode>_mask"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+	     (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0")
+	     (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v")
+            (neg:VBF_AVX10_2
+	       (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfmsub132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmsub213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fmsubnepbf16_<mode>_mask3"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+	     (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v")
+	     (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")
+             (neg:VBF_AVX10_2
+	       (match_operand:VBF_AVX10_2 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+  "TARGET_AVX10_2_256"
+  "vfmsub231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx10_2_fnmsubnepbf16_<mode>_maskz"
+  [(match_operand:VBF_AVX10_2 0 "register_operand")
+   (match_operand:VBF_AVX10_2 1 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 2 "nonimmediate_operand")
+   (match_operand:VBF_AVX10_2 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX10_2_256"
+  {
+    emit_insn (gen_avx10_2_fnmsubnepbf16_<mode>_maskz_1 (operands[0], operands[1],
+						       operands[2], operands[3], 
+						       CONST0_RTX(<MODE>mode), 
+						       operands[4]));
+    DONE;
+  })
+
+(define_insn "avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v,v")
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%0,0,v"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v,vm")
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm,0"))))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfnmsub132nepbf16\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfnmsub213nepbf16\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfnmsub231nepbf16\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fnmsubnepbf16_<mode>_mask"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v,v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "0,0"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm,v")
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX10_2_256"
+  "@
+   vfnmsub132nepbf16\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmsub213nepbf16\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx10_2_fnmsubnepbf16_<mode>_mask3"
+  [(set (match_operand:VBF_AVX10_2 0 "register_operand" "=v")
+	(vec_merge:VBF_AVX10_2
+	  (fma:VBF_AVX10_2
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 1 "nonimmediate_operand" "%v"))
+	    (match_operand:VBF_AVX10_2 2 "nonimmediate_operand" "vm")
+            (neg:VBF_AVX10_2
+	      (match_operand:VBF_AVX10_2 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
+  "TARGET_AVX10_2_256"
+  "vfnmsub231nepbf16\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/testsuite/gcc.target/i386/avx10-helper.h b/gcc/testsuite/gcc.target/i386/avx10-helper.h
index 385c7446979..9ff1dd72e92 100644
--- a/gcc/testsuite/gcc.target/i386/avx10-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx10-helper.h
@@ -3,9 +3,55 @@
 
 #define AVX10
 #define AVX512FP16
-
+#define AVX512BF16
 #include "avx512f-helper.h"
 #include "avx512f-mask-type.h"
+#include <stdint.h>
+
+#define NOINLINE __attribute__((noinline,noclone))
+typedef union
+{
+  uint32_t int32;
+  float flt;
+}float_int_t;
+
+float NOINLINE
+convert_bf16_to_fp32 (unsigned short bf16)
+{
+  unsigned int ii = bf16 << 16;
+  return *(float*)&ii;
+}
+
+unsigned short NOINLINE
+convert_fp32_to_bf16 (float fp)
+{
+  float_int_t fi;
+  fi.flt = fp;
+  return ((fi.int32 >> 16) & 0xffff);
+}
+
+unsigned short NOINLINE
+convert_fp32_to_bf16_ne (float fp)
+{
+  float_int_t fi;
+  uint32_t rounding_bias, lsb;
+
+  fi.flt = fp;
+  lsb = (fi.int32 >> 16) & 0x1;
+  rounding_bias = 0x7fff + lsb;
+  fi.int32 += rounding_bias;
+
+  return ((fi.int32 >> 16) & 0xffff);
+}
+
+float NOINLINE
+scalef (float x, float y)
+{
+  __m128 px = _mm_load_ss (&x);
+  __m128 py = _mm_load_ss (&y);
+  __m128 out = _mm_scalef_ss (px, py);
+  return _mm_cvtss_f32 (out);
+}
 
 #endif /* AVX10_HELPER_INCLUDED */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c
new file mode 100644
index 00000000000..78839fb1297
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-1.c
@@ -0,0 +1,87 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.2-512 -O2" } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512bh res, x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx10_2_512_test (void)
+{
+  res = _mm512_addne_pbh (x1, x2);
+  res = _mm512_mask_addne_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_addne_pbh (m32, x1, x2);
+  res = _mm512_subne_pbh (x1, x2);
+  res = _mm512_mask_subne_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_subne_pbh (m32, x1, x2);
+  res = _mm512_mulne_pbh (x1, x2);
+  res = _mm512_mask_mulne_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_mulne_pbh (m32, x1, x2);
+  res = _mm512_divne_pbh (x1, x2);
+  res = _mm512_mask_divne_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_divne_pbh (m32, x1, x2);
+  res = _mm512_max_pbh (x1, x2);
+  res = _mm512_mask_max_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_max_pbh (m32, x1, x2);
+  res = _mm512_min_pbh (x1, x2);
+  res = _mm512_mask_min_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_min_pbh (m32, x1, x2);
+  res = _mm512_scalef_pbh (x1, x2);
+  res = _mm512_mask_scalef_pbh (res, m32, x1, x2);
+  res = _mm512_maskz_scalef_pbh (m32, x1, x2);
+
+  res = _mm512_fmaddne_pbh (res, x1, x2);
+  res = _mm512_mask_fmaddne_pbh (res, m32, x1, x2);
+  res = _mm512_mask3_fmaddne_pbh (res, x1, x2, m32);
+  res = _mm512_maskz_fmaddne_pbh (m32,res, x1, x2);
+  res = _mm512_fmsubne_pbh (res, x1, x2);
+  res = _mm512_mask_fmsubne_pbh (res, m32, x1, x2);
+  res = _mm512_mask3_fmsubne_pbh (res, x1, x2, m32);
+  res = _mm512_maskz_fmsubne_pbh (m32,res, x1, x2);
+  res = _mm512_fnmaddne_pbh (res, x1, x2);
+  res = _mm512_mask_fnmaddne_pbh (res, m32, x1, x2);
+  res = _mm512_mask3_fnmaddne_pbh (res, x1, x2, m32);
+  res = _mm512_maskz_fnmaddne_pbh (m32,res, x1, x2);
+  res = _mm512_fnmsubne_pbh (res, x1, x2);
+  res = _mm512_mask_fnmsubne_pbh (res, m32, x1, x2);
+  res = _mm512_mask3_fnmsubne_pbh (res, x1, x2, m32);
+  res = _mm512_maskz_fnmsubne_pbh (m32,res, x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c
new file mode 100644
index 00000000000..3b7d1635335
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vaddnepbf16-2.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = (float) (2 * (i % 7) + 7);
+      float y = (float) (3 * (i % 7) - 5);
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      res = x + y;
+      res_ref[i] = res_ref2[i] =  convert_fp32_to_bf16_ne (res);
+    }
+  
+  res1.x = INTRINSIC (_addne_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_addne_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_addne_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c
new file mode 100644
index 00000000000..ca9082885e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vdivnepbf16-2.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = (float) (2 * (i % 7) + 7);
+      float y = (float) (3 * (i % 7) - 5);
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      res = x / y;
+      res_ref[i] = res_ref2[i] =  convert_fp32_to_bf16_ne (res);
+    }
+
+  res1.x = INTRINSIC (_divne_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_divne_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_divne_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c
new file mode 100644
index 00000000000..b19c9d437fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      float x = 0.5;
+      float y = 2;
+      float z = 0.25;
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      res1.a[i] = convert_fp32_to_bf16 (z);
+      res2.a[i] = res1.a[i];
+      float x16, y16, z16, m1, m2;
+      x16 = convert_bf16_to_fp32 (src1.a[i]);
+      y16 = convert_bf16_to_fp32 (src2.a[i]);
+      z16 = convert_bf16_to_fp32 (res1.a[i]);
+      m1 = y16 + x16 * z16;
+      m2 = z16 + x16 * y16;
+      res_ref[i] = convert_fp32_to_bf16 (m1);
+      res_ref2[i] = convert_fp32_to_bf16 (m2);
+    }
+
+  MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES);
+  MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES);
+  res1.x = INTRINSIC (_mask_fmaddne_pbh) (res1.x, mask, src1.x, src2.x);
+  res2.x = INTRINSIC (_mask3_fmaddne_pbh) (src1.x, src2.x, res2.x, mask);
+  
+  MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c
new file mode 100644
index 00000000000..86adbc5fba4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      float x = 0.5;
+      float y = 2;
+      float z = 0.25;
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      res1.a[i] = convert_fp32_to_bf16 (z);
+      res2.a[i] = res1.a[i];
+      float x16, y16, z16, m1, m2;
+      x16 = convert_bf16_to_fp32 (src1.a[i]);
+      y16 = convert_bf16_to_fp32 (src2.a[i]);
+      z16 = convert_bf16_to_fp32 (res1.a[i]);
+      m1 = -y16 + x16 * z16;
+      m2 = -z16 + x16 * y16;
+      res_ref[i] = convert_fp32_to_bf16 (m1);
+      res_ref2[i] = convert_fp32_to_bf16 (m2);
+    }
+
+  MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES);
+  MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES);
+  res1.x = INTRINSIC (_mask_fmsubne_pbh) (res1.x, mask, src1.x, src2.x);
+  res2.x = INTRINSIC (_mask3_fmsubne_pbh) (src1.x, src2.x, res2.x, mask);
+  
+  MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c
new file mode 100644
index 00000000000..3a7d4cfca48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      float x = 0.5;
+      float y = 2;
+      float z = 0.25;
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      res1.a[i] = convert_fp32_to_bf16 (z);
+      res2.a[i] = res1.a[i];
+      float x16, y16, z16, m1, m2;
+      x16 = convert_bf16_to_fp32 (src1.a[i]);
+      y16 = convert_bf16_to_fp32 (src2.a[i]);
+      z16 = convert_bf16_to_fp32 (res1.a[i]);
+      m1 = y16 - x16 * z16;
+      m2 = z16 - x16 * y16;
+      res_ref[i] = convert_fp32_to_bf16 (m1);
+      res_ref2[i] = convert_fp32_to_bf16 (m2);
+    }
+
+  MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES);
+  MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES);
+  res1.x = INTRINSIC (_mask_fnmaddne_pbh) (res1.x, mask, src1.x, src2.x);
+  res2.x = INTRINSIC (_mask3_fnmaddne_pbh) (src1.x, src2.x, res2.x, mask);
+  
+  MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c
new file mode 100644
index 00000000000..943146e14f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      float x = 0.5;
+      float y = 2;
+      float z = 0.25;
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      res1.a[i] = convert_fp32_to_bf16 (z);
+      res2.a[i] = res1.a[i];
+      float x16, y16, z16, m1, m2;
+      x16 = convert_bf16_to_fp32 (src1.a[i]);
+      y16 = convert_bf16_to_fp32 (src2.a[i]);
+      z16 = convert_bf16_to_fp32 (res1.a[i]);
+      m1 = -y16 - x16 * z16;
+      m2 = -z16 - x16 * y16;
+      res_ref[i] = convert_fp32_to_bf16 (m1);
+      res_ref2[i] = convert_fp32_to_bf16 (m2);
+    }
+
+  MASK_MERGE (bf16_uw) (res1.a, mask, SIZE_RES);
+  MASK_MERGE (bf16_uw) (res2.a, mask, SIZE_RES);
+  res1.x = INTRINSIC (_mask_fnmsubne_pbh) (res1.x, mask, src1.x, src2.x);
+  res2.x = INTRINSIC (_mask3_fnmsubne_pbh) (src1.x, src2.x, res2.x, mask);
+  
+  MASK_MERGE (bf16_uw) (res_ref, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c
new file mode 100644
index 00000000000..a563b1e933e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmaxpbf16-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = 0.5;
+      float y = 0.25;
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      if (x > y)
+	res_ref[i] = res_ref2[i] = src1.a[i];
+      else
+	res_ref[i] = res_ref2[i] = src2.a[i];
+    }
+
+  res1.x = INTRINSIC (_max_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_max_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_max_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c
new file mode 100644
index 00000000000..10f13d45403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vminpbf16-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = 0.5;
+      float y = 0.25;
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      if (x < y)
+	res_ref[i] = res_ref2[i] = src1.a[i];
+      else
+	res_ref[i] = res_ref2[i] = src2.a[i];
+    }
+
+  res1.x = INTRINSIC (_min_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_min_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_min_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c
new file mode 100644
index 00000000000..ce168070a93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vmulnepbf16-2.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = (float) (2 * (i % 7) + 7);
+      float y = (float) (3 * (i % 7) - 5);
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      res = x * y;
+      res_ref[i] = res_ref2[i] =  convert_fp32_to_bf16_ne (res);
+    }
+
+  res1.x = INTRINSIC (_mulne_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_mulne_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_mulne_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c
new file mode 100644
index 00000000000..867f77ad3a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vscalefpbf16-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = (float) (2 * (i % 7) + 7);
+      float y = 1.0 + (float) (4 * i) / (float) SIZE_RES;
+      float xx, yy, res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      xx = convert_bf16_to_fp32 (src1.a[i]);
+      yy = convert_bf16_to_fp32 (src2.a[i]);
+      res = scalef (xx, yy);
+      res_ref[i] = res_ref2[i] = convert_fp32_to_bf16_ne(res);
+    }
+
+  res1.x = INTRINSIC (_scalef_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_scalef_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_scalef_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c
new file mode 100644
index 00000000000..f8a9a51cd37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vsubnepbf16-2.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2-512" } */
+/* { dg-require-effective-target avx10_2_512 } */
+
+#ifndef AVX10_2
+#define AVX10_2
+#define AVX10_2_512
+#define AVX10_512BIT
+#endif
+#include "avx10-helper.h"
+#define SIZE_RES (AVX512F_LEN / 16)
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, bf16_uw) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  unsigned short res_ref[SIZE_RES], res_ref2[SIZE_RES];
+  
+  for (i = 0; i < SIZE_RES; i++)
+    {
+      res1.a[i] = 0;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      float x = (float) (2 * (i % 7) + 7);
+      float y = (float) (3 * (i % 7) - 5);
+      float res;
+      src2.a[i] = convert_fp32_to_bf16 (y);
+      src1.a[i] = convert_fp32_to_bf16 (x);
+      res = x - y;
+      res_ref[i] = res_ref2[i] =  convert_fp32_to_bf16_ne (res);
+    }
+
+  res1.x = INTRINSIC (_subne_pbh) (src1.x, src2.x);
+  res2.x = INTRINSIC (_mask_subne_pbh) (res2.x, mask, src1.x, src2.x);
+  res3.x = INTRINSIC (_maskz_subne_pbh) (mask, src1.x, src2.x);
+
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res1, res_ref))
+    abort ();
+  
+  MASK_MERGE (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res2, res_ref2))
+    abort ();
+
+  MASK_ZERO (bf16_uw) (res_ref2, mask, SIZE_RES);
+  if (UNION_CHECK (AVX512F_LEN, bf16_uw) (res3, res_ref2))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c
new file mode 100644
index 00000000000..831c8f849ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-bf16-1.c
@@ -0,0 +1,172 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.2 -O2" } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefpbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256bh res, x1, x2;
+volatile __m128bh res1, x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx10_2_test (void)
+{
+  res = _mm256_addne_pbh (x1, x2);
+  res = _mm256_mask_addne_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_addne_pbh (m16, x1, x2); 
+  res1 = _mm_addne_pbh (x3, x4);
+  res1 = _mm_mask_addne_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_addne_pbh (m8, x3, x4);
+
+  res = _mm256_subne_pbh (x1, x2);
+  res = _mm256_mask_subne_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_subne_pbh (m16, x1, x2); 
+  res1 = _mm_subne_pbh (x3, x4);
+  res1 = _mm_mask_subne_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_subne_pbh (m8, x3, x4);
+
+  res = _mm256_mulne_pbh (x1, x2);
+  res = _mm256_mask_mulne_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_mulne_pbh (m16, x1, x2); 
+  res1 = _mm_mulne_pbh (x3, x4);
+  res1 = _mm_mask_mulne_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_mulne_pbh (m8, x3, x4);
+
+  res = _mm256_divne_pbh (x1, x2);
+  res = _mm256_mask_divne_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_divne_pbh (m16, x1, x2); 
+  res1 = _mm_divne_pbh (x3, x4);
+  res1 = _mm_mask_divne_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_divne_pbh (m8, x3, x4);
+
+  res = _mm256_max_pbh (x1, x2);
+  res = _mm256_mask_max_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_max_pbh (m16, x1, x2); 
+  res1 = _mm_max_pbh (x3, x4);
+  res1 = _mm_mask_max_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_max_pbh (m8, x3, x4);
+
+  res = _mm256_min_pbh (x1, x2);
+  res = _mm256_mask_min_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_min_pbh (m16, x1, x2); 
+  res1 = _mm_min_pbh (x3, x4);
+  res1 = _mm_mask_min_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_min_pbh (m8, x3, x4);
+
+  res = _mm256_scalef_pbh (x1, x2);
+  res = _mm256_mask_scalef_pbh (res, m16, x1, x2);
+  res = _mm256_maskz_scalef_pbh (m16, x1, x2); 
+  res1 = _mm_scalef_pbh (x3, x4);
+  res1 = _mm_mask_scalef_pbh (res1, m8, x3, x4);
+  res1 = _mm_maskz_scalef_pbh (m8, x3, x4);
+
+  res = _mm256_fmaddne_pbh (res, x1, x2);
+  res = _mm256_mask_fmaddne_pbh (res, m16, x1, x2);
+  res = _mm256_mask3_fmaddne_pbh (res, x1, x2, m16);
+  res = _mm256_maskz_fmaddne_pbh (m16,res, x1, x2);
+  res1 = _mm_fmaddne_pbh (res1, x3, x4);
+  res1 = _mm_mask_fmaddne_pbh (res1, m8, x3, x4);
+  res1 = _mm_mask3_fmaddne_pbh (res1, x3, x4, m8);
+  res1 = _mm_maskz_fmaddne_pbh (m8,res1, x3, x4);
+
+  res = _mm256_fmsubne_pbh (res, x1, x2);
+  res = _mm256_mask_fmsubne_pbh (res, m16, x1, x2);
+  res = _mm256_mask3_fmsubne_pbh (res, x1, x2, m16);
+  res = _mm256_maskz_fmsubne_pbh (m16,res, x1, x2);
+  res1 = _mm_fmsubne_pbh (res1, x3, x4);
+  res1 = _mm_mask_fmsubne_pbh (res1, m8, x3, x4);
+  res1 = _mm_mask3_fmsubne_pbh (res1, x3, x4, m8);
+  res1 = _mm_maskz_fmsubne_pbh (m8,res1, x3, x4);
+
+  res = _mm256_fnmaddne_pbh (res, x1, x2);
+  res = _mm256_mask_fnmaddne_pbh (res, m16, x1, x2);
+  res = _mm256_mask3_fnmaddne_pbh (res, x1, x2, m16);
+  res = _mm256_maskz_fnmaddne_pbh (m16,res, x1, x2);
+  res1 = _mm_fnmaddne_pbh (res1, x3, x4);
+  res1 = _mm_mask_fnmaddne_pbh (res1, m8, x3, x4);
+  res1 = _mm_mask3_fnmaddne_pbh (res1, x3, x4, m8);
+  res1 = _mm_maskz_fnmaddne_pbh (m8,res1, x3, x4);
+
+  res = _mm256_fnmsubne_pbh (res, x1, x2);
+  res = _mm256_mask_fnmsubne_pbh (res, m16, x1, x2);
+  res = _mm256_mask3_fnmsubne_pbh (res, x1, x2, m16);
+  res = _mm256_maskz_fnmsubne_pbh (m16,res, x1, x2);
+  res1 = _mm_fnmsubne_pbh (res1, x3, x4);
+  res1 = _mm_mask_fnmsubne_pbh (res1, m8, x3, x4);
+  res1 = _mm_mask3_fnmsubne_pbh (res1, x3, x4, m8);
+  res1 = _mm_maskz_fnmsubne_pbh (m8,res1, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c
new file mode 100644
index 00000000000..7783dcee820
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vaddnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vaddnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vaddnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c
new file mode 100644
index 00000000000..dd2c5442c47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vdivnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vdivnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vdivnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c
new file mode 100644
index 00000000000..a4f2e5f791c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfmaddXXXnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfmaddXXXnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c
new file mode 100644
index 00000000000..406c1739e00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfmsubXXXnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfmsubXXXnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c
new file mode 100644
index 00000000000..3f53099bc4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfnmaddXXXnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfnmaddXXXnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c
new file mode 100644
index 00000000000..fc906ccad3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfnmsubXXXnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vfnmsubXXXnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c
new file mode 100644
index 00000000000..2b8f820822b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmaxpbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmaxpbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmaxpbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c
new file mode 100644
index 00000000000..dcb7c0e4a7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vminpbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vminpbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vminpbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c
new file mode 100644
index 00000000000..753e2d100d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmulnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmulnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vmulnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c
new file mode 100644
index 00000000000..8f26dfbc9bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vscalefpbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vscalefpbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vscalefpbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c b/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c
new file mode 100644
index 00000000000..ad02ee19de2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vsubnepbf16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-require-effective-target avx10_2 } */
+
+#define AVX10_2
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vsubnepbf16-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx10_2-512-vsubnepbf16-2.c" 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
index 3cd6751af26..b61c03b4781 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
@@ -45,6 +45,7 @@ MAKE_MASK_MERGE(, float)
 MAKE_MASK_MERGE(d, double)
 MAKE_MASK_MERGE(i_ub, unsigned char)
 MAKE_MASK_MERGE(i_uw, unsigned short)
+MAKE_MASK_MERGE(bf16_uw, unsigned short)
 MAKE_MASK_MERGE(i_ud, unsigned int)
 MAKE_MASK_MERGE(i_uq, unsigned long long)
 
@@ -70,6 +71,7 @@ MAKE_MASK_ZERO(, float)
 MAKE_MASK_ZERO(d, double)
 MAKE_MASK_ZERO(i_ub, unsigned char)
 MAKE_MASK_ZERO(i_uw, unsigned short)
+MAKE_MASK_ZERO(bf16_uw, unsigned short)
 MAKE_MASK_ZERO(i_ud, unsigned int)
 MAKE_MASK_ZERO(i_uq, unsigned long long)
 
diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h
index d5d18372947..bdc682d63bb 100644
--- a/gcc/testsuite/gcc.target/i386/m512-check.h
+++ b/gcc/testsuite/gcc.target/i386/m512-check.h
@@ -67,6 +67,12 @@ typedef union
   _Float16 a[32];
 } union512h;
 
+typedef union
+{
+  __m512bh x;
+  unsigned short a[32];
+} union512bf16_uw;
+
 typedef union
 {
   __m128h x;
@@ -79,6 +85,18 @@ typedef union
   _Float16 a[16];
 } union256h;
 
+typedef union
+{
+  __m128bh x;
+  unsigned short a[8];
+} union128bf16_uw;
+
+typedef union
+{
+  __m256bh x;
+  unsigned short a[16];
+} union256bf16_uw;
+
 #define CHECK_ROUGH_EXP(UNION_TYPE, VALUE_TYPE, FMT)		\
 static int							\
 __attribute__((noinline, unused))				\
@@ -155,3 +173,12 @@ CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f")
 CHECK_ROUGH_EXP (union128h, _Float16, "%f")
 CHECK_ROUGH_EXP (union256h, _Float16, "%f")
 #endif
+
+#if defined(AVX512BF16)
+CHECK_EXP (union512bf16_uw, unsigned short, "%d")
+#endif
+
+#if defined(AVX512BF16)
+CHECK_EXP (union128bf16_uw, unsigned short, "%d")
+CHECK_EXP (union256bf16_uw, unsigned short, "%d")
+#endif
-- 
2.43.5


  parent reply	other threads:[~2024-08-19  8:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-19  8:56 [PATCH 00/12] AVX10.2: Support new instructions Haochen Jiang
2024-08-19  8:56 ` [PATCH 01/12] i386: Refactor m512-check.h Haochen Jiang
2024-08-19  8:56 ` [PATCH 02/12] [PATCH 1/2] AVX10.2: Support media instructions Haochen Jiang
2024-08-19  8:56 ` [PATCH 03/12] [PATCH 2/2] " Haochen Jiang
2024-08-19  8:56 ` [PATCH 04/12] AVX10.2: Support convert instructions Haochen Jiang
2024-08-19  8:56 ` Haochen Jiang [this message]
2024-08-19  8:56 ` [PATCH 06/12] [PATCH 2/2] AVX10.2: Support BF16 instructions Haochen Jiang
2024-08-19  8:56 ` [PATCH 07/12] [PATCH 1/2] AVX10.2: Support saturating convert instructions Haochen Jiang
2024-08-19  8:56 ` [PATCH 08/12] [PATCH 2/2] " Haochen Jiang
2024-08-19  9:02 ` [PATCH 09/12] AVX10.2: Support minmax instructions Haochen Jiang
2024-08-19  9:03 ` [PATCH 10/12] AVX10.2: Support vector copy instructions Haochen Jiang
2024-08-19  9:03 ` [PATCH 11/12] AVX10.2: Support compare instructions Haochen Jiang
2024-08-19  9:03 ` [PATCH 12/12] i386: Add bf8 -> fp16 intrin Haochen Jiang
2024-08-26  1:45 ` [PATCH 00/12] AVX10.2: Support new instructions Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240819085717.193256-6-haochen.jiang@intel.com \
    --to=haochen.jiang@intel.com \
    --cc=admin@levyhsu.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongtao.liu@intel.com \
    --cc=lingling.kong@intel.com \
    --cc=ubizjak@gmail.com \
    --cc=zewei.mo@pitt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).