public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] x86: Support vector __bf16 type.
@ 2022-08-16  7:49 Kong, Lingling
  2022-08-17  5:56 ` Hongtao Liu
  2022-08-18  7:34 ` [PATCH] Add ABI test for " Haochen Jiang
  0 siblings, 2 replies; 9+ messages in thread
From: Kong, Lingling @ 2022-08-16  7:49 UTC (permalink / raw)
  To: Liu, Hongtao, gcc-patches

Hi,

The patch is support vector init/broadcast/set/extract for __bf16 type.
The __bf16 type is a storage type.

OK for master?

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
	BFmode.
	(ix86_expand_vector_init_duplicate): Support vector BFmode.
	(ix86_expand_vector_init_one_nonzero): Ditto.
	(ix86_expand_vector_init_one_var): Ditto.
	(ix86_expand_vector_init_concat): Ditto.
	(ix86_expand_vector_init_interleave): Ditto.
	(ix86_expand_vector_init_general): Ditto.
	(ix86_expand_vector_init): Ditto.
	(ix86_expand_vector_set_var): Ditto.
	(ix86_expand_vector_set): Ditto.
	(ix86_expand_vector_extract): Ditto.
	* config/i386/i386.cc (classify_argument): Add BF vector modes.
	(function_arg_64): Ditto.
	(ix86_gimplify_va_arg): Ditto.
	(ix86_get_ssemov): Ditto.
	* config/i386/i386.h (VALID_AVX256_REG_MODE): Add BF vector modes.
	(VALID_AVX512F_REG_MODE): Ditto.
	(host_detect_local_cpu): Ditto.
	(VALID_SSE2_REG_MODE): Ditto.
	* config/i386/i386.md: Add BF vector modes.
	(MODE_SIZE): Ditto.
	(ssemodesuffix): Add bf suffix for BF vector modes.
	(ssevecmode): Ditto.
	* config/i386/sse.md (VMOVE): Adjust for BF vector modes.
	(VI12HFBF_AVX512VL): Ditto.
	(V_256_512): Ditto.
	(VF_AVX512HFBF16): Ditto.
	(VF_AVX512BWHFBF16): Ditto.
	(VIHFBF): Ditto.
	(avx512): Ditto.
	(VIHFBF_256): Ditto.
	(VIHFBF_AVX512BW): Ditto.
	(VI2F_256_512):Ditto.
	(V8_128):Ditto.
	(V16_256): Ditto.
	(V32_512): Ditto.
	(sseinsnmode): Ditto.
	(sseconstm1): Ditto.
	(sseintmodesuffix): New mode_attr.
	(avx512fmaskmode): Ditto.
	(avx512fmaskmodelower): Ditto.
	(ssedoublevecmode): Ditto.
	(ssehalfvecmode): Ditto.
	(ssehalfvecmodelower): Ditto.
	(ssescalarmode): Add vector BFmode mapping.
	(ssescalarmodelower): Ditto.
	(ssexmmmode): Ditto.
	(ternlogsuffix): Ditto.
	(ssescalarsize): Ditto.
	(sseintprefix): Ditto.
	(i128): Ditto.
	(xtg_mode): Ditto.
	(bcstscalarsuff): Ditto.
	(<avx512>_blendm<mode>): New define_insn for BFmode.
	(<avx512>_store<mode>_mask): Ditto.
	(vcond_mask_<mode><avx512fmaskmodelower>): Ditto.
	(vec_set<mode>_0): New define_insn for BF vector set.
	(V8BFH_128): New mode_iterator for BFmode.
	(avx512fp16_mov<mode>): Ditto.
	(vec_set<mode>): New define_insn for BF vector set.
	(@vec_extract_hi_<mode>): Ditto.
	(@vec_extract_lo_<mode>): Ditto.
	(vec_set_hi_<mode>): Ditto.
	(vec_set_lo_<mode>): Ditto.
	(*vec_extract<mode>_0): New define_insn_and_split for BF
	vector extract.
	(*vec_extract<mode>): New define_insn.
	(VEC_EXTRACT_MODE): Add BF vector modes.
	(PINSR_MODE): Add V8BF.
	(sse2p4_1): Ditto.
	(pinsr_evex_isa): Ditto.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
	insert for V8BFmode.
	(pbroadcast_evex_isa): Add BF vector modes.
	(AVX2_VEC_DUP_MODE): Ditto.
	(VEC_INIT_MODE): Ditto.
	(VEC_INIT_HALF_MODE): Ditto.
	(avx2_pbroadcast<mode>): Adjust to support BF vector mode
	broadcast.
	(avx2_pbroadcast<mode>_1): Ditto.
	(<avx512>_vec_dup<mode>_1): Ditto.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
	Ditto.

gcc/testsuite/ChangeLog:

	* g++.target/i386/vect-bfloat16-1.C: New test.
	* gcc.target/i386/vect-bfloat16-1.c: New test.
	* gcc.target/i386/vect-bfloat16-2a.c: New test.
	* gcc.target/i386/vect-bfloat16-2b.c: New test.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: New test.
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: New test.
---
 gcc/config/i386/i386-expand.cc                | 129 +++++++--
 gcc/config/i386/i386.cc                       |  16 +-
 gcc/config/i386/i386.h                        |  12 +-
 gcc/config/i386/i386.md                       |   9 +-
 gcc/config/i386/sse.md                        | 211 ++++++++------
 .../g++.target/i386/vect-bfloat16-1.C         |  13 +
 .../gcc.target/i386/vect-bfloat16-1.c         |  30 ++
 .../gcc.target/i386/vect-bfloat16-2a.c        | 121 ++++++++
 .../gcc.target/i386/vect-bfloat16-2b.c        |  22 ++
 .../i386/vect-bfloat16-typecheck_1.c          | 258 ++++++++++++++++++
 .../i386/vect-bfloat16-typecheck_2.c          | 248 +++++++++++++++++
 11 files changed, 950 insertions(+), 119 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 66d8f28984c..c3da9bf1636 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -4064,6 +4064,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
     case E_V16QImode:
     case E_V8HImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V4SImode:
     case E_V2DImode:
     case E_V1TImode:
@@ -4084,6 +4085,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
     case E_V32QImode:
     case E_V16HImode:
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V8SImode:
     case E_V4DImode:
       if (TARGET_AVX2)
@@ -4102,6 +4104,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
     case E_V32HFmode:
       gen = gen_avx512bw_blendmv32hf;
       break;
+    case E_V32BFmode:
+      gen = gen_avx512bw_blendmv32bf;
+      break;
     case E_V16SImode:
       gen = gen_avx512f_blendmv16si;
       break;
@@ -15008,6 +15013,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 
     case E_V8HImode:
     case E_V8HFmode:
+    case E_V8BFmode:
       if (TARGET_AVX2)
 	return ix86_vector_duplicate_value (mode, target, val);
 
@@ -15092,6 +15098,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 
     case E_V16HImode:
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V32QImode:
       if (TARGET_AVX2)
 	return ix86_vector_duplicate_value (mode, target, val);
@@ -15112,6 +15119,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 
     case E_V32HImode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V64QImode:
       if (TARGET_AVX512BW)
 	return ix86_vector_duplicate_value (mode, target, val);
@@ -15119,6 +15127,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 	{
 	  machine_mode hvmode = (mode == V32HImode ? V16HImode
 				 : mode == V32HFmode ? V16HFmode
+				 : mode == V32BFmode ? V16BFmode
 				 : V32QImode);
 	  rtx x = gen_reg_rtx (hvmode);
 
@@ -15232,6 +15241,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
       gen_vec_set_0 = gen_vec_setv32hf_0;
       break;
+    case E_V8BFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv8bf_0;
+      break;
+    case E_V16BFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv16bf_0;
+      break;
+    case E_V32BFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv32bf_0;
+      break;
     case E_V32HImode:
       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
       gen_vec_set_0 = gen_vec_setv32hi_0;
@@ -15386,6 +15407,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
       /* FALLTHRU */
     case E_V8HFmode:
     case E_V16HFmode:
+    case E_V8BFmode:
+    case E_V16BFmode:
     case E_V4DFmode:
     case E_V8SFmode:
     case E_V8SImode:
@@ -15469,6 +15492,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
 	case E_V32HFmode:
 	  half_mode = V16HFmode;
 	  break;
+	case E_V32BFmode:
+	  half_mode = V16BFmode;
+	  break;
 	case E_V16SImode:
 	  half_mode = V8SImode;
 	  break;
@@ -15484,6 +15510,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
 	case E_V16HFmode:
 	  half_mode = V8HFmode;
 	  break;
+	case E_V16BFmode:
+	  half_mode = V8BFmode;
+	  break;
 	case E_V8SImode:
 	  half_mode = V4SImode;
 	  break;
@@ -15642,6 +15671,15 @@ ix86_expand_vector_init_interleave (machine_mode mode,
       second_imode = V2DImode;
       third_imode = VOIDmode;
       break;
+    case E_V8BFmode:
+      gen_load_even = gen_vec_interleave_lowv8bf;
+      gen_interleave_first_low = gen_vec_interleave_lowv4si;
+      gen_interleave_second_low = gen_vec_interleave_lowv2di;
+      inner_mode = BFmode;
+      first_imode = V4SImode;
+      second_imode = V2DImode;
+      third_imode = VOIDmode;
+      break;
     case E_V8HImode:
       gen_load_even = gen_vec_setv8hi;
       gen_interleave_first_low = gen_vec_interleave_lowv4si;
@@ -15667,15 +15705,18 @@ ix86_expand_vector_init_interleave (machine_mode mode,
   for (i = 0; i < n; i++)
     {
       op = ops [i + i];
-      if (inner_mode == HFmode)
+      if (inner_mode == HFmode || inner_mode == BFmode)
 	{
 	  rtx even, odd;
-	  /* Use vpuncklwd to pack 2 HFmode.  */
-	  op0 = gen_reg_rtx (V8HFmode);
-	  even = lowpart_subreg (V8HFmode, force_reg (HFmode, op), HFmode);
-	  odd = lowpart_subreg (V8HFmode,
-				force_reg (HFmode, ops[i + i + 1]),
-				HFmode);
+	  /* Use vpuncklwd to pack 2 HFmode or BFmode.  */
+	  machine_mode vec_mode = ((inner_mode == HFmode)
+				   ? V8HFmode : V8BFmode);
+	  op0 = gen_reg_rtx (vec_mode);
+	  even = lowpart_subreg (vec_mode,
+				 force_reg (inner_mode, op), inner_mode);
+	  odd = lowpart_subreg (vec_mode,
+				force_reg (inner_mode, ops[i + i + 1]),
+				inner_mode);
 	  emit_insn (gen_load_even (op0, even, odd));
 	}
       else
@@ -15824,6 +15865,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
       half_mode = V8HFmode;
       goto half;
 
+    case E_V16BFmode:
+      half_mode = V8BFmode;
+      goto half;
+
 half:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15852,6 +15897,11 @@ half:
       half_mode = V16HFmode;
       goto quarter;
 
+    case E_V32BFmode:
+      quarter_mode = V8BFmode;
+      half_mode = V16BFmode;
+      goto quarter;
+
 quarter:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15891,6 +15941,7 @@ quarter:
       /* FALLTHRU */
 
     case E_V8HFmode:
+    case E_V8BFmode:
 
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15994,7 +16045,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals)
 	  if (inner_mode == QImode
 	      || inner_mode == HImode
 	      || inner_mode == TImode
-	      || inner_mode == HFmode)
+	      || inner_mode == HFmode
+	      || inner_mode == BFmode)
 	    {
 	      unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
 	      scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
@@ -16078,7 +16130,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
   /* 512-bits vector byte/word broadcast and comparison only available
      under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
      when without TARGET_AVX512BW.  */
-  if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode)
+  if ((mode == V32HImode || mode == V32HFmode || mode == V32BFmode
+       || mode == V64QImode)
       && !TARGET_AVX512BW)
     {
       gcc_assert (TARGET_AVX512F);
@@ -16099,6 +16152,12 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
 	  extract_hi = gen_vec_extract_hi_v32hf;
 	  extract_lo = gen_vec_extract_lo_v32hf;
 	}
+      else if (mode == V32BFmode)
+	{
+	  half_mode = V16BFmode;
+	  extract_hi = gen_vec_extract_hi_v32bf;
+	  extract_lo = gen_vec_extract_lo_v32bf;
+	}
       else
 	{
 	  half_mode = V32QImode;
@@ -16155,6 +16214,15 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
 	case E_V32HFmode:
 	  cmp_mode = V32HImode;
 	  break;
+	case E_V8BFmode:
+	  cmp_mode = V8HImode;
+	  break;
+	case E_V16BFmode:
+	  cmp_mode = V16HImode;
+	  break;
+	case E_V32BFmode:
+	  cmp_mode = V32HImode;
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -16192,7 +16260,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
   bool use_vec_merge = false;
   bool blendm_const = false;
   rtx tmp;
-  static rtx (*gen_extract[7][2]) (rtx, rtx)
+  static rtx (*gen_extract[8][2]) (rtx, rtx)
     = {
 	{ gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi },
 	{ gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi },
@@ -16200,9 +16268,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 	{ gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di },
 	{ gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf },
 	{ gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df },
-	{ gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf }
+	{ gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf },
+	{ gen_vec_extract_lo_v16bf, gen_vec_extract_hi_v16bf }
       };
-  static rtx (*gen_insert[7][2]) (rtx, rtx, rtx)
+  static rtx (*gen_insert[8][2]) (rtx, rtx, rtx)
     = {
 	{ gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi },
 	{ gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi },
@@ -16211,6 +16280,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 	{ gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf },
 	{ gen_vec_set_lo_v4df, gen_vec_set_hi_v4df },
 	{ gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf },
+	{ gen_vec_set_lo_v16bf, gen_vec_set_hi_v16bf },
       };
   int i, j, n;
   machine_mode mmode = VOIDmode;
@@ -16379,6 +16449,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 
     case E_V8HImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V2HImode:
       use_vec_merge = TARGET_SSE2;
       break;
@@ -16402,18 +16473,20 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
       goto half;
 
     case E_V16HFmode:
+    case E_V16BFmode:
       /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw.  */
       if (TARGET_AVX2 && elt != 0)
 	{
 	  mmode = SImode;
-	  gen_blendm = gen_avx2_pblendph_1;
+	  gen_blendm = ((mode == E_V16HFmode) ? gen_avx2_pblendph_1
+						: gen_avx2_pblendbf_1);
 	  blendm_const = true;
 	  break;
 	}
       else
 	{
-	  half_mode = V8HFmode;
-	  j = 6;
+	  half_mode = ((mode == E_V16HFmode) ? V8HFmode : V8BFmode);
+	  j = ((mode == E_V16HFmode) ? 6 : 7);
 	  n = 8;
 	  goto half;
 	}
@@ -16505,6 +16578,13 @@ half:
 	  gen_blendm = gen_avx512bw_blendmv32hf;
 	}
       break;
+    case E_V32BFmode:
+      if (TARGET_AVX512BW)
+	{
+	  mmode = SImode;
+	  gen_blendm = gen_avx512bw_blendmv32bf;
+	}
+      break;
     case E_V32HImode:
       if (TARGET_AVX512BW)
 	{
@@ -16712,6 +16792,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
 
     case E_V8HImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V2HImode:
       use_vec_extr = TARGET_SSE2;
       break;
@@ -16878,26 +16959,32 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       return;
 
     case E_V32HFmode:
+    case E_V32BFmode:
       if (TARGET_AVX512BW)
 	{
-	  tmp = gen_reg_rtx (V16HFmode);
+	  tmp = (mode == E_V32HFmode
+		 ? gen_reg_rtx (V16HFmode)
+		 : gen_reg_rtx (V16BFmode));
 	  if (elt < 16)
-	    emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
+	    emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
 	  else
-	    emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
+	    emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
 	  ix86_expand_vector_extract (false, target, tmp, elt & 15);
 	  return;
 	}
       break;
 
     case E_V16HFmode:
+    case E_V16BFmode:
       if (TARGET_AVX)
 	{
-	  tmp = gen_reg_rtx (V8HFmode);
+	  tmp = (mode == E_V16HFmode
+		 ? gen_reg_rtx (V8HFmode)
+		 : gen_reg_rtx (V8BFmode));
 	  if (elt < 8)
-	    emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
+	    emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
 	  else
-	    emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
+	    emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
 	  ix86_expand_vector_extract (false, target, tmp, elt & 7);
 	  return;
 	}
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fa3722a11e1..e27c87f8c83 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2463,6 +2463,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V8SImode:
     case E_V32QImode:
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
@@ -2474,6 +2475,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V8DFmode:
     case E_V16SFmode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V8DImode:
     case E_V16SImode:
     case E_V32HImode:
@@ -2492,6 +2494,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V16QImode:
     case E_V8HImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V2DFmode:
     case E_V2DImode:
       classes[0] = X86_64_SSE_CLASS;
@@ -2947,6 +2950,7 @@ pass_in_reg:
       /* FALLTHRU */
 
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V64QImode:
@@ -2954,6 +2958,7 @@ pass_in_reg:
     case E_V16SImode:
     case E_V8DImode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V16SFmode:
     case E_V8DFmode:
     case E_V32QImode:
@@ -2966,6 +2971,7 @@ pass_in_reg:
     case E_V4SImode:
     case E_V2DImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V4SFmode:
     case E_V2DFmode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -3190,6 +3196,7 @@ pass_in_reg:
     case E_V4SImode:
     case E_V2DImode:
     case E_V8HFmode:
+    case E_V8BFmode:
     case E_V4SFmode:
     case E_V2DFmode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -3210,9 +3217,11 @@ pass_in_reg:
     case E_V16SImode:
     case E_V8DImode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V16SFmode:
     case E_V8DFmode:
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
@@ -3273,6 +3282,7 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
       break;
 
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
@@ -3280,6 +3290,7 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
     case E_V4DFmode:
     case E_V4DImode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -4748,6 +4759,7 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   switch (nat_mode)
     {
     case E_V16HFmode:
+    case E_V16BFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
@@ -4755,6 +4767,7 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
     case E_V4DFmode:
     case E_V4DImode:
     case E_V32HFmode:
+    case E_V32BFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -5430,7 +5443,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (type)
 	{
 	case opcode_int:
-	  if (scalar_mode == E_HFmode)
+	  if (scalar_mode == E_HFmode || scalar_mode == E_BFmode)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
 		      : "vmovdqa64");
@@ -5450,6 +5463,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (scalar_mode)
 	{
 	case E_HFmode:
+	case E_BFmode:
 	  if (evex_reg_p)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 0da3dce1d31..0de5c77bc7d 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1011,7 +1011,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_AVX256_REG_MODE(MODE)					\
   ((MODE) == V32QImode || (MODE) == V16HImode || (MODE) == V8SImode	\
    || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode	\
-   || (MODE) == V4DFmode || (MODE) == V16HFmode)
+   || (MODE) == V4DFmode || (MODE) == V16HFmode || (MODE) == V16BFmode)
 
 #define VALID_AVX256_REG_OR_OI_MODE(MODE)		\
   (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
@@ -1026,7 +1026,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_AVX512F_REG_MODE(MODE)					\
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode	\
    || (MODE) == V16SImode || (MODE) == V16SFmode || (MODE) == V32HImode \
-   || (MODE) == V4TImode || (MODE) == V32HFmode)
+   || (MODE) == V4TImode || (MODE) == V32HFmode || (MODE) == V32BFmode)
 
 #define VALID_AVX512F_REG_OR_XI_MODE(MODE)				\
   (VALID_AVX512F_REG_MODE (MODE) || (MODE) == XImode)
@@ -1035,7 +1035,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
   ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode	\
    || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode	\
    || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode	\
-   || (MODE) == TImode)
+   || (MODE) == V8BFmode || (MODE) == TImode)
 
 #define VALID_AVX512FP16_REG_MODE(MODE)					\
   ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode	\
@@ -1044,6 +1044,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode	\
+   || (MODE) == V8BFmode \
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode	\
    || (MODE) == HFmode || (MODE) == BFmode)
@@ -1095,8 +1096,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode	\
    || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode	\
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
-   || (MODE) == V16SFmode || (MODE) == V32HFmode || (MODE) == V16HFmode \
-   || (MODE) == V8HFmode)
+   || (MODE) == V16SFmode \
+   || (MODE) == V32HFmode || (MODE) == V16HFmode || (MODE) == V8HFmode  \
+   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode)
 
 #define X87_FLOAT_MODE_P(MODE)	\
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5f7e2457f5c..58fcc382fa2 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1114,7 +1114,8 @@
 			     (V2DF "16") (V4DF "32") (V8DF "64")
 			     (V4SF "16") (V8SF "32") (V16SF "64")
 			     (V8HF "16") (V16HF "32") (V32HF "64")
-			     (V4HF "8") (V2HF "4")])
+			     (V4HF "8") (V2HF "4")
+			     (V8BF "16") (V16BF "32") (V32BF "64")])
 
 ;; Double word integer modes as mode attribute.
 (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
@@ -1258,8 +1259,8 @@
 (define_mode_attr ssemodesuffix
   [(HF "sh") (SF "ss") (DF "sd")
    (V32HF "ph") (V16SF "ps") (V8DF "pd")
-   (V16HF "ph") (V8SF "ps") (V4DF "pd")
-   (V8HF "ph") (V4SF "ps") (V2DF "pd")
+   (V16HF "ph") (V16BF "bf") (V8SF "ps") (V4DF "pd")
+   (V8HF "ph")  (V8BF "bf") (V4SF "ps") (V2DF "pd")
    (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
    (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
    (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
@@ -1269,7 +1270,7 @@
 
 ;; SSE vector mode corresponding to a scalar mode
 (define_mode_attr ssevecmode
-  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (SF "V4SF") (DF "V2DF")])
+  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (BF "V8BF") (SF "V4SF") (DF "V2DF")])
 (define_mode_attr ssevecmodelower
   [(QI "v16qi") (HI "v8hi") (SI "v4si") (DI "v2di") (SF "v4sf") (DF "v2df")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b23f07e08c6..9ba47b62a01 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -232,6 +232,7 @@
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
    (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
@@ -263,10 +264,11 @@
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
-(define_mode_iterator VI12HF_AVX512VL
+(define_mode_iterator VI12HFBF_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
-   V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
+   V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
 
 ;; Same iterator, but without supposed TARGET_AVX512BW
 (define_mode_iterator VI12_AVX512VLBW
@@ -309,10 +311,10 @@
 
 ;; All 256bit and 512bit vector modes
 (define_mode_iterator V_256_512
-  [V32QI V16HI V16HF V8SI V4DI V8SF V4DF
+  [V32QI V16HI V16HF V16BF V8SI V4DI V8SF V4DF
    (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V32HF "TARGET_AVX512F")
-   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
-   (V8DF "TARGET_AVX512F")])
+   (V32BF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
 
 ;; All vector float modes
 (define_mode_iterator VF
@@ -435,6 +437,13 @@
 (define_mode_iterator VF_AVX512FP16
   [V32HF V16HF V8HF])
 
+(define_mode_iterator VF_AVX512HFBF16
+  [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16") V32BF V16BF V8BF])
+
+(define_mode_iterator VF_AVX512BWHFBF16
+  [V32HF V16HF V8HF V32BF V16BF V8BF])
+
 (define_mode_iterator VF_AVX512FP16VL
   [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
 
@@ -447,13 +456,14 @@
    (V4DI "TARGET_AVX") V2DI])
 
 ;; All vector integer and HF modes
-(define_mode_iterator VIHF
+(define_mode_iterator VIHFBF
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
    (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF])
+   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF])
 
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
@@ -676,6 +686,7 @@
    (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
    (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
    (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
+   (V8BF "avx512vl") (V16BF "avx512vl") (V32BF "avx512bw")
    (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
    (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
 
@@ -786,7 +797,7 @@
 ;; All 128 and 256bit vector integer modes
 (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
 ;; All 256bit vector integer and HF modes
-(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
+(define_mode_iterator VIHFBF_256 [V32QI V16HI V8SI V4DI V16HF V16BF])
 
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
@@ -813,12 +824,12 @@
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
-(define_mode_iterator VIHF_AVX512BW
+(define_mode_iterator VIHFBF_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
-  (V32HF "TARGET_AVX512BW")])
+  (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
 
 ;; Int-float size matches
-(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF])
+(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
 (define_mode_iterator VI4F_128 [V4SI V4SF])
 (define_mode_iterator VI8F_128 [V2DI V2DF])
 (define_mode_iterator VI4F_256 [V8SI V8SF])
@@ -863,9 +874,9 @@
    (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    V16SF V8DF])
 
-(define_mode_iterator V8_128 [V8HI V8HF])
-(define_mode_iterator V16_256 [V16HI V16HF])
-(define_mode_iterator V32_512 [V32HI V32HF])
+(define_mode_iterator V8_128 [V8HI V8HF V8BF])
+(define_mode_iterator V16_256 [V16HI V16HF V16BF])
+(define_mode_iterator V32_512 [V32HI V32HF V32BF])
 
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
@@ -910,6 +921,7 @@
    (V8SF "V8SF") (V4DF "V4DF")
    (V4SF "V4SF") (V2DF "V2DF")
    (V8HF "TI") (V16HF "OI") (V32HF "XI")
+   (V8BF "TI") (V16BF "OI") (V32BF "XI")
    (TI "TI")])
 
 (define_mode_attr sseintvecinsnmode
@@ -926,16 +938,17 @@
   [(V64QI "BC") (V32HI "BC") (V16SI "BC") (V8DI "BC") (V4TI "BC")
    (V32QI "BC") (V16HI "BC") (V8SI "BC") (V4DI "BC") (V2TI "BC")
    (V16QI "BC") (V8HI "BC") (V4SI "BC") (V2DI "BC") (V1TI "BC")
-   (V32HF "BF") (V16SF "BF") (V8DF "BF")
-   (V16HF "BF") (V8SF "BF") (V4DF "BF")
-   (V8HF "BF") (V4SF "BF") (V2DF "BF")])
+   (V32HF "BF") (V32BF "BF") (V16SF "BF") (V8DF "BF")
+   (V16HF "BF") (V16BF "BF") (V8SF "BF") (V4DF "BF")
+   (V8HF "BF") (V8BF "BF") (V4SF "BF") (V2DF "BF")])
 
 ;; SSE integer instruction suffix for various modes
 (define_mode_attr sseintmodesuffix
   [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
    (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
    (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
-   (V8HF "w") (V16HF "w") (V32HF "w")])
+   (V8HF "w") (V16HF "w") (V32HF "w")
+   (V8BF "w") (V16BF "w") (V32BF "w")])
 
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
@@ -944,6 +957,7 @@
    (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
    (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
    (V32HF "SI") (V16HF "HI") (V8HF  "QI")
+   (V32BF "SI") (V16BF "HI") (V8BF  "QI")
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
@@ -958,6 +972,7 @@
    (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
    (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
    (V32HF "si") (V16HF "hi") (V8HF  "qi")
+   (V32BF "si") (V16BF "hi") (V8BF  "qi")
    (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
    (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
 
@@ -973,9 +988,9 @@
 
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
-  [(V32HF "V32HI") (V16SF "V16SI") (V8DF  "V8DI")
-   (V16HF "V16HI") (V8SF  "V8SI")  (V4DF  "V4DI")
-   (V8HF "V8HI") (V4SF  "V4SI")  (V2DF  "V2DI")
+  [(V32HF "V32HI") (V32BF "V32HI") (V16SF "V16SI") (V8DF  "V8DI")
+   (V16HF "V16HI") (V16BF "V16HI") (V8SF  "V8SI")  (V4DF  "V4DI")
+   (V8HF "V8HI") (V8BF "V8HI") (V4SF "V4SI")  (V2DF  "V2DI")
    (V16SI "V16SI") (V8DI  "V8DI")
    (V8SI  "V8SI")  (V4DI  "V4DI")
    (V4SI  "V4SI")  (V2DI  "V2DI")
@@ -998,9 +1013,9 @@
    (V16HF "OI") (V8HF "TI")])
 
 (define_mode_attr sseintvecmodelower
-  [(V32HF "v32hi") (V16SF "v16si") (V8DF "v8di")
-   (V16HF "v16hi") (V8SF "v8si") (V4DF "v4di")
-   (V8HF "v8hi") (V4SF "v4si") (V2DF "v2di")
+  [(V32HF "v32hi") (V32BF "v32hi") (V16SF "v16si") (V8DF "v8di")
+   (V16HF "v16hi") (V16BF "v16hi") (V8SF "v8si") (V4DF "v4di")
+   (V8HF "v8hi") (V8BF "v8hi") (V4SF "v4si") (V2DF "v2di")
    (V8SI "v8si") (V4DI "v4di")
    (V4SI "v4si") (V2DI "v2di")
    (V16HI "v16hi") (V8HI "v8hi")
@@ -1014,7 +1029,8 @@
    (V16SF "V32SF") (V8DF "V16DF")
    (V8SF "V16SF") (V4DF "V8DF")
    (V4SF "V8SF") (V2DF "V4DF")
-   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
+   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")
+   (V32BF "V64BF") (V16BF "V32BF") (V8BF "V16BF")])
 
 ;; Mapping of vector modes to a vector mode of half size
 ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
@@ -1025,7 +1041,8 @@
    (V16SF "V8SF") (V8DF "V4DF")
    (V8SF  "V4SF") (V4DF "V2DF")
    (V4SF  "V2SF") (V2DF "DF")
-   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
+   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")
+   (V32BF "V16BF") (V16BF "V8BF") (V8BF "V4BF")])
 
 (define_mode_attr ssehalfvecmodelower
   [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
@@ -1034,7 +1051,8 @@
    (V16SF "v8sf") (V8DF "v4df")
    (V8SF  "v4sf") (V4DF "v2df")
    (V4SF  "v2sf")
-   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
+   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")
+   (V32BF "v16bf") (V16BF "v8bf") (V8BF "v4bf")])
 
 ;; Mapping of vector modes to vector hf modes of conversion.
 (define_mode_attr ssePHmode
@@ -1085,6 +1103,7 @@
    (V16SI "SI") (V8SI "SI")  (V4SI "SI")
    (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
    (V32HF "HF") (V16HF "HF") (V8HF "HF")
+   (V32BF "BF") (V16BF "BF") (V8BF "BF")
    (V16SF "SF") (V8SF "SF")  (V4SF "SF")
    (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
    (V4TI "TI")  (V2TI "TI")])
@@ -1096,6 +1115,7 @@
    (V16SI "si") (V8SI "si")  (V4SI "si")
    (V8DI "di")  (V4DI "di")  (V2DI "di")
    (V32HF "hf") (V16HF "hf")  (V8HF "hf")
+   (V32BF "bf") (V16BF "bf")  (V8BF "bf")
    (V16SF "sf") (V8SF "sf")  (V4SF "sf")
    (V8DF "df")  (V4DF "df")  (V2DF "df")
    (V4TI "ti")  (V2TI "ti")])
@@ -1107,6 +1127,7 @@
    (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
    (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
    (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
+   (V32BF "V8BF")  (V16BF "V8BF") (V8BF "V8BF")
    (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
    (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
 
@@ -1128,6 +1149,7 @@
    (V16SF "d") (V8SF "d") (V4SF "d")
    (V32HI "d") (V16HI "d") (V8HI "d")
    (V32HF "d") (V16HF "d") (V8HF "d")
+   (V32BF "d") (V16BF "d") (V8BF "d")
    (V64QI "d") (V32QI "d") (V16QI "d")])
 
 ;; Number of scalar elements in each vector type
@@ -1153,6 +1175,7 @@
    (V32HI "16") (V16HI "16") (V8HI "16")
    (V16SI "32") (V8SI "32") (V4SI "32")
    (V32HF "16") (V16HF "16") (V8HF "16")
+   (V32BF "16") (V16BF "16") (V8BF "16")
    (V16SF "32") (V8SF "32") (V4SF "32")
    (V8DF "64") (V4DF "64") (V2DF "64")])
 
@@ -1164,9 +1187,9 @@
    (V4SI  "p") (V4SF  "")
    (V8SI  "p") (V8SF  "")
    (V16SI "p") (V16SF "")
-   (V16QI "p") (V8HI "p") (V8HF "p")
-   (V32QI "p") (V16HI "p") (V16HF "p")
-   (V64QI "p") (V32HI "p") (V32HF "p")])
+   (V16QI "p") (V8HI "p") (V8HF "p") (V8BF "p")
+   (V32QI "p") (V16HI "p") (V16HF "p") (V16BF "p")
+   (V64QI "p") (V32HI "p") (V32HF "p") (V32BF "p")])
 
 ;; SSE prefix for integer and HF vector comparison.
 (define_mode_attr ssecmpintprefix
@@ -1219,7 +1242,8 @@
 ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
 ;; i64x4 or f64x4 for 512bit modes.
 (define_mode_attr i128
-  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
+  [(V16HF "%~128") (V32HF "i64x4") (V16BF "%~128") (V32BF "i64x4")
+   (V16SF "f64x4") (V8SF "f128")
    (V8DF "f64x4") (V4DF "f128")
    (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
    (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
@@ -1245,17 +1269,18 @@
    (V16SI "d")  (V8SI "d")  (V4SI "d")
    (V8DI "q")   (V4DI "q")  (V2DI "q")
    (V32HF "w")  (V16HF "w") (V8HF "w")
+   (V32BF "w")  (V16BF "w") (V8BF "w")
    (V16SF "ss") (V8SF "ss") (V4SF "ss")
    (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
 
 ;; Tie mode of assembler operand to mode iterator
 (define_mode_attr xtg_mode
   [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
-   (V8HF "x") (V4SF "x") (V2DF "x")
+   (V8HF "x")  (V8BF "x") (V4SF "x") (V2DF "x")
    (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
-   (V16HF "t") (V8SF "t") (V4DF "t")
+   (V16HF "t") (V16BF "t") (V8SF "t") (V4DF "t")
    (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
-   (V32HF "g") (V16SF "g") (V8DF "g")])
+   (V32HF "g") (V32BF "g") (V16SF "g") (V8DF "g")])
 
 ;; Half mask mode for unpacks
 (define_mode_attr HALFMASKMODE
@@ -1553,10 +1578,10 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_blendm<mode>"
-  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512FP16
-	  (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm")
-	  (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v")
+  [(set (match_operand:VF_AVX512BWHFBF16 0 "register_operand" "=v,v")
+	(vec_merge:VF_AVX512BWHFBF16
+	  (match_operand:VF_AVX512BWHFBF16 2 "nonimmediate_operand" "vm,vm")
+	  (match_operand:VF_AVX512BWHFBF16 1 "nonimm_or_0_operand" "0C,v")
 	  (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512BW"
   "@
@@ -1595,9 +1620,9 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_store<mode>_mask"
-  [(set (match_operand:VI12HF_AVX512VL 0 "memory_operand" "=m")
-	(vec_merge:VI12HF_AVX512VL
-	  (match_operand:VI12HF_AVX512VL 1 "register_operand" "v")
+  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
+	(vec_merge:VI12HFBF_AVX512VL
+	  (match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
 	  (match_dup 0)
 	  (match_operand:<avx512fmaskmode> 2 "register_operand" "Yk")))]
   "TARGET_AVX512BW"
@@ -4513,14 +4538,18 @@
   DONE;
 })
 
+(define_mode_iterator VF_AVX512HFBFVL
+  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+
 (define_expand "vcond<mode><sseintvecmodelower>"
-  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
-	(if_then_else:VF_AVX512FP16VL
+  [(set (match_operand:VF_AVX512HFBFVL 0 "register_operand")
+	(if_then_else:VF_AVX512HFBFVL
 	  (match_operator 3 ""
 	    [(match_operand:<sseintvecmode> 4 "vector_operand")
 	     (match_operand:<sseintvecmode> 5 "vector_operand")])
-	  (match_operand:VF_AVX512FP16VL 1 "general_operand")
-	  (match_operand:VF_AVX512FP16VL 2 "general_operand")))]
+	  (match_operand:VF_AVX512HFBFVL 1 "general_operand")
+	  (match_operand:VF_AVX512HFBFVL 2 "general_operand")))]
   "TARGET_AVX512FP16"
 {
   bool ok = ix86_expand_int_vcond (operands);
@@ -4552,10 +4581,10 @@
   "TARGET_AVX512F")
 
 (define_expand "vcond_mask_<mode><avx512fmaskmodelower>"
-  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand")
-	(vec_merge:VI12HF_AVX512VL
-	  (match_operand:VI12HF_AVX512VL 1 "nonimmediate_operand")
-	  (match_operand:VI12HF_AVX512VL 2 "nonimm_or_0_operand")
+  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand")
+	(vec_merge:VI12HFBF_AVX512VL
+	  (match_operand:VI12HFBF_AVX512VL 1 "nonimmediate_operand")
+	  (match_operand:VI12HFBF_AVX512VL 2 "nonimm_or_0_operand")
 	  (match_operand:<avx512fmaskmode> 3 "register_operand")))]
   "TARGET_AVX512BW")
 
@@ -10747,7 +10776,7 @@
 		   (const_string "HF")
 		   (const_string "TI")))
    (set (attr "enabled")
-     (cond [(and (not (match_test "<MODE>mode == V8HFmode"))
+     (cond [(and (not (match_test "<MODE>mode == V8HFmode || <MODE>mode == V8BFmode"))
 		 (eq_attr "alternative" "2"))
 	      (symbol_ref "false")
 	   ]
@@ -10809,11 +10838,13 @@
   DONE;
 })
 
-(define_insn "avx512fp16_movsh"
-  [(set (match_operand:V8HF 0 "register_operand" "=v")
-	(vec_merge:V8HF
-          (match_operand:V8HF 2 "register_operand" "v")
-	  (match_operand:V8HF 1 "register_operand" "v")
+(define_mode_iterator V8BFH_128 [V8HF V8BF])
+
+(define_insn "avx512fp16_mov<mode>"
+  [(set (match_operand:V8BFH_128 0 "register_operand" "=v")
+	(vec_merge:V8BFH_128
+	  (match_operand:V8BFH_128 2 "register_operand" "v")
+	  (match_operand:V8BFH_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512FP16"
   "vmovsh\t{%2, %1, %0|%0, %1, %2}"
@@ -10996,9 +11027,9 @@
   DONE;
 })
 
-(define_expand "vec_setv8hf"
-  [(match_operand:V8HF 0 "register_operand")
-   (match_operand:HF 1 "register_operand")
+(define_expand "vec_set<mode>"
+  [(match_operand:V8BFH_128 0 "register_operand")
+   (match_operand:<ssescalarmode> 1 "register_operand")
    (match_operand 2 "vec_setm_sse41_operand")]
   "TARGET_SSE"
 {
@@ -11726,7 +11757,7 @@
    (set_attr "length_immediate" "1")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn_and_split "vec_extract_lo_<mode>"
+(define_insn_and_split "@vec_extract_lo_<mode>"
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
@@ -11768,7 +11799,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_extract_hi_<mode>"
+(define_insn "@vec_extract_hi_<mode>"
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V32_512 1 "register_operand" "v")
@@ -11788,7 +11819,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn_and_split "vec_extract_lo_<mode>"
+(define_insn_and_split "@vec_extract_lo_<mode>"
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
@@ -11802,7 +11833,7 @@
   [(set (match_dup 0) (match_dup 1))]
   "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
 
-(define_insn "vec_extract_hi_<mode>"
+(define_insn "@vec_extract_hi_<mode>"
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V16_256 1 "register_operand" "x,v,v")
@@ -11944,20 +11975,20 @@
 ;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
 ;; Otherwise, it will be ignored.
 (define_insn_and_split "*vec_extract<mode>_0"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
-	(vec_select:HF
-	  (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
+  [(set (match_operand:<ssescalarmode> 0 "nonimmediate_operand" "=v,m,r")
+	(vec_select:<ssescalarmode>
+	  (match_operand:VF_AVX512HFBF16 1 "nonimmediate_operand" "vm,v,m")
 	  (parallel [(const_int 0)])))]
-  "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (HFmode, operands[1]);")
+  "operands[1] = gen_lowpart (<ssescalarmode>mode, operands[1]);")
 
-(define_insn "*vec_extracthf"
-  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
-	(vec_select:HF
-	  (match_operand:V8HF 1 "register_operand" "v,v,0,v")
+(define_insn "*vec_extract<mode>"
+  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
+	(vec_select:HFBF
+	  (match_operand:<ssevecmode> 1 "register_operand" "v,v,0,v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
   "TARGET_SSE2"
@@ -11992,6 +12023,7 @@
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
    (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -18097,17 +18129,17 @@
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI V8HF
+  [(V16QI "TARGET_SSE4_1") V8HI V8HF V8BF
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
 (define_mode_attr sse2p4_1
   [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2")
-   (V4SI "sse4_1") (V2DI "sse4_1")])
+   (V8BF "sse2") (V4SI "sse4_1") (V2DI "sse4_1")])
 
 (define_mode_attr pinsr_evex_isa
   [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
-   (V4SI "avx512dq") (V2DI "avx512dq")])
+   (V8BF "avx512bw") (V4SI "avx512dq") (V2DI "avx512dq")])
 
 ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
 (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
@@ -25193,11 +25225,12 @@
    (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
    (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
    (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
-   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
+   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
+   (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
 
 (define_insn "avx2_pbroadcast<mode>"
-  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
-	(vec_duplicate:VIHF
+  [(set (match_operand:VIHFBF 0 "register_operand" "=x,v")
+	(vec_duplicate:VIHFBF
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
 	    (parallel [(const_int 0)]))))]
@@ -25210,10 +25243,10 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_pbroadcast<mode>_1"
-  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
-	(vec_duplicate:VIHF_256
+  [(set (match_operand:VIHFBF_256 0 "register_operand" "=x,x,v,v")
+	(vec_duplicate:VIHFBF_256
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
+	    (match_operand:VIHFBF_256 1 "nonimmediate_operand" "m,x,m,v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
   "@
@@ -25589,10 +25622,10 @@
    (set_attr "mode" "V4DF")])
 
 (define_insn "<avx512>_vec_dup<mode>_1"
-  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
-	(vec_duplicate:VIHF_AVX512BW
+  [(set (match_operand:VIHFBF_AVX512BW 0 "register_operand" "=v,v")
+	(vec_duplicate:VIHFBF_AVX512BW
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
+	    (match_operand:VIHFBF_AVX512BW 1 "nonimmediate_operand" "v,m")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
   "@
@@ -25622,8 +25655,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
-  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
-	(vec_duplicate:VI12HF_AVX512VL
+  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI12HFBF_AVX512VL
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
@@ -25658,8 +25691,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
-  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_duplicate:VI12HF_AVX512VL
+  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v,v")
+	(vec_duplicate:VI12HFBF_AVX512VL
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
   "TARGET_AVX512BW"
   "@
@@ -25759,7 +25792,7 @@
   [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
 ;; Modes handled by AVX2 vec_dup patterns.
 (define_mode_iterator AVX2_VEC_DUP_MODE
-  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
+  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF V16BF V8BF])
 
 (define_insn "*vec_dup<mode>"
   [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
@@ -26522,6 +26555,7 @@
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
    (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -26534,6 +26568,7 @@
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
    (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
+   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
diff --git a/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C b/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
new file mode 100644
index 00000000000..71b4d86d36e
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpblendmw" 1 } }  */
+
+typedef short v8hi __attribute__((vector_size(16)));
+typedef __bf16 v8bf __attribute__((vector_size(16)));
+
+v8bf
+foo (v8hi a, v8hi b, v8bf c, v8bf d)
+{
+      return a > b ? c : d;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
new file mode 100644
index 00000000000..dd33f1add9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+
+/* { dg-final { scan-assembler-times "vpbroadcastw" 1 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "vpblendw" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovsh" 1 { target { ! ia32 } } } }  */
+
+/* { dg-final { scan-assembler-times "vpinsrw" 2 { target ia32 } } }  */
+#include <immintrin.h>
+
+typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16)));
+typedef __bf16 __m128bf16 __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128bf16
+__attribute__ ((noinline, noclone))
+foo1 (__m128bf16 a, __bf16 f)
+{
+  __v8bf x = (__v8bf) a;
+  x[2] = f;
+  return (__m128bf16) x;
+}
+
+__m128bf16
+__attribute__ ((noinline, noclone))
+foo2 (__m128bf16 a, __bf16 f)
+{
+  __v8bf x = (__v8bf) a;
+  x[0] = f;
+  return (__m128bf16) x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
new file mode 100644
index 00000000000..70152d03f92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+
+typedef __bf16 v8bf __attribute__ ((__vector_size__ (16)));
+typedef __bf16 v16bf __attribute__ ((__vector_size__ (32)));
+typedef __bf16 v32bf __attribute__ ((__vector_size__ (64)));
+
+#define VEC_EXTRACT(V,S,IDX)			\
+  S						\
+  __attribute__((noipa))			\
+  vec_extract_##V##_##IDX (V v)			\
+  {						\
+    return v[IDX];				\
+  }
+
+#define VEC_SET(V,S,IDX)			\
+  V						\
+  __attribute__((noipa))			\
+  vec_set_##V##_##IDX (V v, S s)		\
+  {						\
+    v[IDX] = s;				\
+    return v;					\
+  }
+
+v8bf
+vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
+	       __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8)
+{
+    return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8};
+}
+
+v16bf
+vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
+	       __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
+	       __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
+	       __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16)
+{
+    return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
+				  a9, a10, a11, a12, a13, a14, a15, a16};
+}
+
+v32bf
+vec_init_v32bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
+		__bf16 a5, __bf16 a6, __bf16 a7, __bf16 a8,
+		__bf16 a9, __bf16 a10, __bf16 a11, __bf16 a12,
+		__bf16 a13, __bf16 a14, __bf16 a15, __bf16 a16,
+		__bf16 a17, __bf16 a18, __bf16 a19, __bf16 a20,
+		__bf16 a21, __bf16 a22, __bf16 a23, __bf16 a24,
+		__bf16 a25, __bf16 a26, __bf16 a27, __bf16 a28,
+		__bf16 a29, __bf16 a30, __bf16 a31, __bf16 a32)
+{
+    return __extension__ (v32bf) {a1, a2, a3, a4, a5, a6, a7, a8,
+				  a9, a10, a11, a12, a13, a14, a15, a16,
+				  a17, a18, a19, a20, a21, a22, a23, a24,
+				  a25, a26, a27, a28, a29, a30, a31, a32};
+}
+
+v8bf
+vec_init_dup_v8bf (__bf16 a1)
+{
+    return __extension__ (v8bf) {a1, a1, a1, a1, a1, a1, a1, a1};
+}
+
+v16bf
+vec_init_dup_v16bf (__bf16 a1)
+{
+    return __extension__ (v16bf) {a1, a1, a1, a1, a1, a1, a1, a1,
+				  a1, a1, a1, a1, a1, a1, a1, a1};
+}
+
+v32bf
+vec_init_dup_v32bf (__bf16 a1)
+{
+    return __extension__ (v32bf) {a1, a1, a1, a1, a1, a1, a1, a1,
+				  a1, a1, a1, a1, a1, a1, a1, a1,
+				  a1, a1, a1, a1, a1, a1, a1, a1,
+				  a1, a1, a1, a1, a1, a1, a1, a1};
+}
+
+/* { dg-final { scan-assembler-times "vpunpcklwd" 28 } } */
+/* { dg-final { scan-assembler-times "vpunpckldq" 14 } } */
+/* { dg-final { scan-assembler-times "vpunpcklqdq" 7 } } */
+
+VEC_EXTRACT (v8bf, __bf16, 0);
+VEC_EXTRACT (v8bf, __bf16, 4);
+VEC_EXTRACT (v16bf, __bf16, 0);
+VEC_EXTRACT (v16bf, __bf16, 3);
+VEC_EXTRACT (v16bf, __bf16, 8);
+VEC_EXTRACT (v16bf, __bf16, 15);
+VEC_EXTRACT (v32bf, __bf16, 0);
+VEC_EXTRACT (v32bf, __bf16, 5);
+VEC_EXTRACT (v32bf, __bf16, 8);
+VEC_EXTRACT (v32bf, __bf16, 14);
+VEC_EXTRACT (v32bf, __bf16, 16);
+VEC_EXTRACT (v32bf, __bf16, 24);
+VEC_EXTRACT (v32bf, __bf16, 28);
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 2 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$10" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$12" 1 } } */
+/* { dg-final { scan-assembler-times "vextract" 9 } } */
+
+VEC_SET (v8bf, __bf16, 4);
+VEC_SET (v16bf, __bf16, 3);
+VEC_SET (v16bf, __bf16, 8);
+VEC_SET (v16bf, __bf16, 15);
+VEC_SET (v32bf, __bf16, 5);
+VEC_SET (v32bf, __bf16, 8);
+VEC_SET (v32bf, __bf16, 14);
+VEC_SET (v32bf, __bf16, 16);
+VEC_SET (v32bf, __bf16, 24);
+VEC_SET (v32bf, __bf16, 28);
+/* { dg-final { scan-assembler-times "vpbroadcastw" 13 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpblendw" 4 { target { ! ia32 } } } } */
+
+/* { dg-final { scan-assembler-times "vpbroadcastw" 12 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpblendw" 3 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpinsrw" 1 { target ia32 } } } */
+
+/* { dg-final { scan-assembler-times "vpblendd" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
new file mode 100644
index 00000000000..5b846e68c99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+
+#include "vect-bfloat16-2a.c"
+
+/* { dg-final { scan-assembler-times "vpunpcklwd" 28 } } */
+/* { dg-final { scan-assembler-times "vpunpckldq" 14 } } */
+/* { dg-final { scan-assembler-times "vpunpcklqdq" 7 } } */
+
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
+/* { dg-final { scan-assembler-times "vextract" 2 } } */
+
+/* { dg-final { scan-assembler-times "vpbroadcastw" 7 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpblendw" 4 { target { ! ia32 } } } } */
+
+/* { dg-final { scan-assembler-times "vpbroadcastw" 6 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpblendw" 3 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpinsrw" 63 { target ia32 } } } */
+
+/* { dg-final { scan-assembler-times "vpblendd" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
new file mode 100644
index 00000000000..3804bac7220
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
@@ -0,0 +1,258 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+
+#include <immintrin.h>
+
+typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16)));
+typedef __bf16 __m128bf16 __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__bf16 glob_bfloat;
+__m128bf16 glob_bfloat_vec;
+
+__m256 is_a_float_vec;
+__m128 is_a_float_pair;
+
+__m128h *float_ptr;
+__m128h is_a_float16_vec;
+
+__v8si is_an_int_vec;
+__v4si is_an_int_pair;
+__v8hi is_a_short_vec;
+
+int is_an_int;
+short is_a_short_int;
+float is_a_float;
+float is_a_float16;
+double is_a_double;
+
+__m128bf16 footest (__m128bf16 vector0)
+{
+  /* Initialisation  */
+
+  __m128bf16 vector1_1;
+  __m128bf16 vector1_2 = glob_bfloat_vec;
+  __m128bf16 vector1_3 = is_a_float_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m256'} }*/
+  __m128bf16 vector1_4 = is_an_int_vec;  /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v8si'} } */
+  __m128bf16 vector1_5 = is_a_float16_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m128h'} } */
+  __m128bf16 vector1_6 = is_a_float_pair; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m128'} } */
+  __m128bf16 vector1_7 = is_an_int_pair; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v4si'} } */
+  __m128bf16 vector1_8 = is_a_short_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v8hi'} } */
+
+  __v8si initi_1_1 = glob_bfloat_vec;   /* { dg-error {incompatible types when initializing type '__v8si' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  __m256 initi_1_2 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  __m128h initi_1_3 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m128h' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  __m128 initi_1_4 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m128' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  __v4si initi_1_5 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v4si' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  __v4hi initi_1_6 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v4hi' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+
+  __m128bf16 vector2_1 = {};
+  __m128bf16 vector2_2 = { glob_bfloat };
+  __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
+  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
+
+  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+
+  /* Assignments to/from vectors.  */
+
+  glob_bfloat_vec = glob_bfloat_vec;
+  glob_bfloat_vec = 0;   /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type 'int'} } */
+  glob_bfloat_vec = 0.1; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type 'double'} } */
+  glob_bfloat_vec = is_a_float_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m256'} } */
+  glob_bfloat_vec = is_an_int_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v8si'} } */
+  glob_bfloat_vec = is_a_float16_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m128h'} } */
+  glob_bfloat_vec = is_a_float_pair; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m128'} } */
+  glob_bfloat_vec = is_an_int_pair; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v4si'} } */
+  glob_bfloat_vec = is_a_short_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v8hi'} } */
+
+  is_an_int_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v8si' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  is_a_float_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  is_a_float16_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m128h' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  is_a_float_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m128' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  is_an_int_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v4si' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  is_a_short_vec = glob_bfloat_vec;/* { dg-error {incompatible types when assigning to type '__v8hi' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+
+  /* Assignments to/from elements.  */
+
+  vector2_3[0] = glob_bfloat;
+  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+
+  glob_bfloat = vector2_3[0];
+  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+
+  /* Compound literals.  */
+
+  (__m128bf16) {};
+
+  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
+  (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
+  (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
+  (__m128bf16) { is_an_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v4si'} } */
+  (__m128bf16) { is_a_float16_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128h'} } */
+  (__m128bf16) { is_a_short_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8hi'} } */
+
+  (__m128bf16) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  (__v8si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  (__m256) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'float' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  (__v4si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  (__m256h) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '_Float16' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+  (__v8hi) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'short int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
+
+  /* Casting.  */
+
+  (void) glob_bfloat_vec;
+  (__m128bf16) glob_bfloat_vec;
+
+  (__bf16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (short) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m128bf16' {aka '__vector\(8\) __bf16'} to type 'short int' which has different size} } */
+  (int) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m128bf16' {aka '__vector\(8\) __bf16'} to type 'int' which has different size} } */
+  (_Float16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (float) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (double) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+
+  (__v8si) glob_bfloat_vec; /* { dg-error {cannot convert a value of type '__m128bf16' {aka '__vector\(8\) __bf16'} to vector type '__vector\(8\) int' which has different size} } */
+  (__m256) glob_bfloat_vec; /* { dg-error {cannot convert a value of type '__m128bf16' {aka '__vector\(8\) __bf16'} to vector type '__vector\(8\) float' which has different size} } */
+  (__m128h) glob_bfloat_vec;
+  (__v4si) glob_bfloat_vec;
+  (__m128) glob_bfloat_vec;
+  (__v8hi) glob_bfloat_vec;
+
+  (__m128bf16) is_an_int_vec; /* { dg-error {cannot convert a value of type '__v8si' to vector type '__vector\(8\) __bf16' which has different size} } */
+  (__m128bf16) is_a_float_vec; /* { dg-error {cannot convert a value of type '__m256' to vector type '__vector\(8\) __bf16' which has different size} } */
+  (__m128bf16) is_a_float16_vec;
+  (__m128bf16) is_an_int_pair;
+  (__m128bf16) is_a_float_pair;
+  (__m128bf16) is_a_short_vec;
+  (__m128bf16) is_a_double; /* { dg-error {cannot convert value to a vector} } */
+
+  /* Arrays and Structs.  */
+
+  typedef __m128bf16 array_type[2];
+  extern __m128bf16 extern_array[];
+
+  __m128bf16 array[2];
+  __m128bf16 zero_length_array[0];
+  __m128bf16 empty_init_array[] = {};
+  typedef __m128bf16 some_other_type[is_an_int];
+
+  struct struct1 {
+    __m128bf16 a;
+  };
+
+  union union1 {
+    __m128bf16 a;
+  };
+
+  /* Addressing and dereferencing.  */
+
+  __m128bf16 *bfloat_ptr = &vector0;
+  vector0 = *bfloat_ptr;
+
+  /* Pointer assignment.  */
+
+  __m128bf16 *bfloat_ptr2 = bfloat_ptr;
+  __m128bf16 *bfloat_ptr3 = array;
+
+  /* Pointer arithmetic.  */
+
+  ++bfloat_ptr;
+  --bfloat_ptr;
+  bfloat_ptr++;
+  bfloat_ptr--;
+  bfloat_ptr += 1;
+  bfloat_ptr -= 1;
+  bfloat_ptr - bfloat_ptr2;
+  bfloat_ptr = &bfloat_ptr3[0];
+  bfloat_ptr = &bfloat_ptr3[1];
+
+  /* Simple comparison.  */
+  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
+  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  /* Pointer comparison.  */
+
+  bfloat_ptr == &vector0;
+  bfloat_ptr != &vector0;
+  bfloat_ptr < &vector0;
+  bfloat_ptr <= &vector0;
+  bfloat_ptr > &vector0;
+  bfloat_ptr >= &vector0;
+  bfloat_ptr == bfloat_ptr2;
+  bfloat_ptr != bfloat_ptr2;
+  bfloat_ptr < bfloat_ptr2;
+  bfloat_ptr <= bfloat_ptr2;
+  bfloat_ptr > bfloat_ptr2;
+  bfloat_ptr >= bfloat_ptr2;
+
+  /* Conditional expressions.  */
+
+  0 ? vector0 : vector0;
+  0 ? vector0 : is_a_float_vec; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? is_a_float_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : is_a_float16_vec; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? is_a_float16_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : 0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? 0 : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? 0.1 : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : 0.1; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? bfloat_ptr : bfloat_ptr2;
+  0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
+  0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
+
+  vector0 ? vector0 : vector0; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? is_a_float16_vec : vector0; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? vector0 : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? is_a_float16_vec : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
+
+  /* Unary operators.  */
+
+  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
+  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  /* Binary arithmetic operations.  */
+
+  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  return vector0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c
new file mode 100644
index 00000000000..f63b41d832b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c
@@ -0,0 +1,248 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+
+#include <immintrin.h>
+
+typedef __bf16 __v16bf __attribute__ ((__vector_size__ (32)));
+typedef __bf16 __m256bf16 __attribute__ ((__vector_size__ (32), __may_alias__));
+
+__bf16 glob_bfloat;
+__m256bf16 glob_bfloat_vec;
+
+__m256 is_a_float_vec;
+
+__m256h *float_ptr;
+__m256h is_a_float16_vec;
+
+__v8si is_an_int_vec;
+__m256i is_a_long_int_pair;
+__v16hi is_a_short_vec;
+
+int is_an_int;
+short is_a_short_int;
+float is_a_float;
+float is_a_float16;
+double is_a_double;
+
+__m256bf16 footest (__m256bf16 vector0)
+{
+  /* Initialisation  */
+
+  __m256bf16 vector1_1;
+  __m256bf16 vector1_2 = glob_bfloat_vec;
+  __m256bf16 vector1_3 = is_a_float_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256'} } */
+  __m256bf16 vector1_4 = is_an_int_vec;  /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__v8si'} } */
+  __m256bf16 vector1_5 = is_a_float16_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256h'} } */
+  __m256bf16 vector1_7 = is_a_long_int_pair; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256i'} } */
+  __m256bf16 vector1_8 = is_a_short_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__v16hi'} } */
+
+  __v8si initi_1_1 = glob_bfloat_vec;   /* { dg-error {incompatible types when initializing type '__v8si' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  __m256 initi_1_2 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  __m256h initi_1_3 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256h' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  __m256i initi_1_5 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__m256i' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  __v16hi initi_1_6 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v16hi' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+
+  __m256bf16 vector2_1 = {};
+  __m256bf16 vector2_2 = { glob_bfloat };
+  __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
+  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
+
+  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+
+  /* Assignments to/from vectors.  */
+
+  glob_bfloat_vec = glob_bfloat_vec;
+  glob_bfloat_vec = 0;   /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type 'int'} } */
+  glob_bfloat_vec = 0.1; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type 'double'} } */
+  glob_bfloat_vec = is_a_float_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256'} } */
+  glob_bfloat_vec = is_an_int_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__v8si'} } */
+  glob_bfloat_vec = is_a_float16_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256h'} } */
+  glob_bfloat_vec = is_a_long_int_pair; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256i'} } */
+  glob_bfloat_vec = is_a_short_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__v16hi'} } */
+
+  is_an_int_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v8si' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  is_a_float_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  is_a_float16_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256h' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  is_a_long_int_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256i' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  is_a_short_vec = glob_bfloat_vec;/* { dg-error {incompatible types when assigning to type '__v16hi' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+
+  /* Assignments to/from elements.  */
+
+  vector2_3[0] = glob_bfloat;
+  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+
+  glob_bfloat = vector2_3[0];
+  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+
+  /* Compound literals.  */
+
+  (__m256bf16) {};
+
+  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
+  (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
+  (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
+  (__m256bf16) { is_a_float16_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256h'} } */
+  (__m256bf16) { is_a_short_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v16hi'} } */
+
+  (__m256bf16) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  (__v8si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  (__m256) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'float' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  (__m256i) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'long long int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  (__m256h) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '_Float16' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+  (__v16hi) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'short int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
+
+  /* Casting.  */
+
+  (void) glob_bfloat_vec;
+  (__m256bf16) glob_bfloat_vec;
+
+  (__bf16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (short) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m256bf16' {aka '__vector\(16\) __bf16'} to type 'short int' which has different size} } */
+  (int) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m256bf16' {aka '__vector\(16\) __bf16'} to type 'int' which has different size} } */
+  (_Float16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (float) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+  (double) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
+
+  (__v8si) glob_bfloat_vec;
+  (__m256) glob_bfloat_vec;
+  (__m256h) glob_bfloat_vec;
+  (__m256i) glob_bfloat_vec;
+  (__v16hi) glob_bfloat_vec;
+
+  (__m256bf16) is_an_int_vec;
+  (__m256bf16) is_a_float_vec;
+  (__m256bf16) is_a_float16_vec;
+  (__m256bf16) is_a_long_int_pair;
+  (__m256bf16) is_a_short_vec;
+
+  /* Arrays and Structs.  */
+
+  typedef __m256bf16 array_type[2];
+  extern __m256bf16 extern_array[];
+
+  __m256bf16 array[2];
+  __m256bf16 zero_length_array[0];
+  __m256bf16 empty_init_array[] = {};
+  typedef __m256bf16 some_other_type[is_an_int];
+
+  struct struct1 {
+    __m256bf16 a;
+  };
+
+  union union1 {
+    __m256bf16 a;
+  };
+
+  /* Addressing and dereferencing.  */
+
+  __m256bf16 *bfloat_ptr = &vector0;
+  vector0 = *bfloat_ptr;
+
+  /* Pointer assignment.  */
+
+  __m256bf16 *bfloat_ptr2 = bfloat_ptr;
+  __m256bf16 *bfloat_ptr3 = array;
+
+  /* Pointer arithmetic.  */
+
+  ++bfloat_ptr;
+  --bfloat_ptr;
+  bfloat_ptr++;
+  bfloat_ptr--;
+  bfloat_ptr += 1;
+  bfloat_ptr -= 1;
+  bfloat_ptr - bfloat_ptr2;
+  bfloat_ptr = &bfloat_ptr3[0];
+  bfloat_ptr = &bfloat_ptr3[1];
+
+  /* Simple comparison.  */
+  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
+  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  /* Pointer comparison.  */
+
+  bfloat_ptr == &vector0;
+  bfloat_ptr != &vector0;
+  bfloat_ptr < &vector0;
+  bfloat_ptr <= &vector0;
+  bfloat_ptr > &vector0;
+  bfloat_ptr >= &vector0;
+  bfloat_ptr == bfloat_ptr2;
+  bfloat_ptr != bfloat_ptr2;
+  bfloat_ptr < bfloat_ptr2;
+  bfloat_ptr <= bfloat_ptr2;
+  bfloat_ptr > bfloat_ptr2;
+  bfloat_ptr >= bfloat_ptr2;
+
+  /* Conditional expressions.  */
+
+  0 ? vector0 : vector0;
+  0 ? vector0 : is_a_float_vec; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? is_a_float_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : is_a_float16_vec; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? is_a_float16_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : 0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? 0 : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? 0.1 : vector0; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? vector0 : 0.1; /* { dg-error {type mismatch in conditional expression} } */
+  0 ? bfloat_ptr : bfloat_ptr2;
+  0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
+  0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
+
+  vector0 ? vector0 : vector0; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? is_a_float16_vec : vector0; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? vector0 : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
+  vector0 ? is_a_float16_vec : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
+
+  /* Unary operators.  */
+
+  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
+  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  /* Binary arithmetic operations.  */
+
+  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+
+  return vector0;
+}
+
-- 
2.18.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Support vector __bf16 type.
  2022-08-16  7:49 [PATCH] x86: Support vector __bf16 type Kong, Lingling
@ 2022-08-17  5:56 ` Hongtao Liu
  2022-08-18  7:34 ` [PATCH] Add ABI test for " Haochen Jiang
  1 sibling, 0 replies; 9+ messages in thread
From: Hongtao Liu @ 2022-08-17  5:56 UTC (permalink / raw)
  To: Kong, Lingling; +Cc: Liu, Hongtao, gcc-patches

On Tue, Aug 16, 2022 at 3:50 PM Kong, Lingling via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> The patch is support vector init/broadcast/set/extract for __bf16 type.
> The __bf16 type is a storage type.
>
> OK for master?
Ok.
>
> gcc/ChangeLog:
>
>         * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
>         BFmode.
>         (ix86_expand_vector_init_duplicate): Support vector BFmode.
>         (ix86_expand_vector_init_one_nonzero): Ditto.
>         (ix86_expand_vector_init_one_var): Ditto.
>         (ix86_expand_vector_init_concat): Ditto.
>         (ix86_expand_vector_init_interleave): Ditto.
>         (ix86_expand_vector_init_general): Ditto.
>         (ix86_expand_vector_init): Ditto.
>         (ix86_expand_vector_set_var): Ditto.
>         (ix86_expand_vector_set): Ditto.
>         (ix86_expand_vector_extract): Ditto.
>         * config/i386/i386.cc (classify_argument): Add BF vector modes.
>         (function_arg_64): Ditto.
>         (ix86_gimplify_va_arg): Ditto.
>         (ix86_get_ssemov): Ditto.
>         * config/i386/i386.h (VALID_AVX256_REG_MODE): Add BF vector modes.
>         (VALID_AVX512F_REG_MODE): Ditto.
>         (host_detect_local_cpu): Ditto.
>         (VALID_SSE2_REG_MODE): Ditto.
>         * config/i386/i386.md: Add BF vector modes.
>         (MODE_SIZE): Ditto.
>         (ssemodesuffix): Add bf suffix for BF vector modes.
>         (ssevecmode): Ditto.
>         * config/i386/sse.md (VMOVE): Adjust for BF vector modes.
>         (VI12HFBF_AVX512VL): Ditto.
>         (V_256_512): Ditto.
>         (VF_AVX512HFBF16): Ditto.
>         (VF_AVX512BWHFBF16): Ditto.
>         (VIHFBF): Ditto.
>         (avx512): Ditto.
>         (VIHFBF_256): Ditto.
>         (VIHFBF_AVX512BW): Ditto.
>         (VI2F_256_512):Ditto.
>         (V8_128):Ditto.
>         (V16_256): Ditto.
>         (V32_512): Ditto.
>         (sseinsnmode): Ditto.
>         (sseconstm1): Ditto.
>         (sseintmodesuffix): New mode_attr.
>         (avx512fmaskmode): Ditto.
>         (avx512fmaskmodelower): Ditto.
>         (ssedoublevecmode): Ditto.
>         (ssehalfvecmode): Ditto.
>         (ssehalfvecmodelower): Ditto.
>         (ssescalarmode): Add vector BFmode mapping.
>         (ssescalarmodelower): Ditto.
>         (ssexmmmode): Ditto.
>         (ternlogsuffix): Ditto.
>         (ssescalarsize): Ditto.
>         (sseintprefix): Ditto.
>         (i128): Ditto.
>         (xtg_mode): Ditto.
>         (bcstscalarsuff): Ditto.
>         (<avx512>_blendm<mode>): New define_insn for BFmode.
>         (<avx512>_store<mode>_mask): Ditto.
>         (vcond_mask_<mode><avx512fmaskmodelower>): Ditto.
>         (vec_set<mode>_0): New define_insn for BF vector set.
>         (V8BFH_128): New mode_iterator for BFmode.
>         (avx512fp16_mov<mode>): Ditto.
>         (vec_set<mode>): New define_insn for BF vector set.
>         (@vec_extract_hi_<mode>): Ditto.
>         (@vec_extract_lo_<mode>): Ditto.
>         (vec_set_hi_<mode>): Ditto.
>         (vec_set_lo_<mode>): Ditto.
>         (*vec_extract<mode>_0): New define_insn_and_split for BF
>         vector extract.
>         (*vec_extract<mode>): New define_insn.
>         (VEC_EXTRACT_MODE): Add BF vector modes.
>         (PINSR_MODE): Add V8BF.
>         (sse2p4_1): Ditto.
>         (pinsr_evex_isa): Ditto.
>         (<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
>         insert for V8BFmode.
>         (pbroadcast_evex_isa): Add BF vector modes.
>         (AVX2_VEC_DUP_MODE): Ditto.
>         (VEC_INIT_MODE): Ditto.
>         (VEC_INIT_HALF_MODE): Ditto.
>         (avx2_pbroadcast<mode>): Adjust to support BF vector mode
>         broadcast.
>         (avx2_pbroadcast<mode>_1): Ditto.
>         (<avx512>_vec_dup<mode>_1): Ditto.
>         (<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
>         Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * g++.target/i386/vect-bfloat16-1.C: New test.
>         * gcc.target/i386/vect-bfloat16-1.c: New test.
>         * gcc.target/i386/vect-bfloat16-2a.c: New test.
>         * gcc.target/i386/vect-bfloat16-2b.c: New test.
>         * gcc.target/i386/vect-bfloat16-typecheck_1.c: New test.
>         * gcc.target/i386/vect-bfloat16-typecheck_2.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc                | 129 +++++++--
>  gcc/config/i386/i386.cc                       |  16 +-
>  gcc/config/i386/i386.h                        |  12 +-
>  gcc/config/i386/i386.md                       |   9 +-
>  gcc/config/i386/sse.md                        | 211 ++++++++------
>  .../g++.target/i386/vect-bfloat16-1.C         |  13 +
>  .../gcc.target/i386/vect-bfloat16-1.c         |  30 ++
>  .../gcc.target/i386/vect-bfloat16-2a.c        | 121 ++++++++
>  .../gcc.target/i386/vect-bfloat16-2b.c        |  22 ++
>  .../i386/vect-bfloat16-typecheck_1.c          | 258 ++++++++++++++++++
>  .../i386/vect-bfloat16-typecheck_2.c          | 248 +++++++++++++++++
>  11 files changed, 950 insertions(+), 119 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 66d8f28984c..c3da9bf1636 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -4064,6 +4064,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>      case E_V16QImode:
>      case E_V8HImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V4SImode:
>      case E_V2DImode:
>      case E_V1TImode:
> @@ -4084,6 +4085,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>      case E_V32QImode:
>      case E_V16HImode:
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V8SImode:
>      case E_V4DImode:
>        if (TARGET_AVX2)
> @@ -4102,6 +4104,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>      case E_V32HFmode:
>        gen = gen_avx512bw_blendmv32hf;
>        break;
> +    case E_V32BFmode:
> +      gen = gen_avx512bw_blendmv32bf;
> +      break;
>      case E_V16SImode:
>        gen = gen_avx512f_blendmv16si;
>        break;
> @@ -15008,6 +15013,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
>
>      case E_V8HImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>        if (TARGET_AVX2)
>         return ix86_vector_duplicate_value (mode, target, val);
>
> @@ -15092,6 +15098,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
>
>      case E_V16HImode:
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V32QImode:
>        if (TARGET_AVX2)
>         return ix86_vector_duplicate_value (mode, target, val);
> @@ -15112,6 +15119,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
>
>      case E_V32HImode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V64QImode:
>        if (TARGET_AVX512BW)
>         return ix86_vector_duplicate_value (mode, target, val);
> @@ -15119,6 +15127,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
>         {
>           machine_mode hvmode = (mode == V32HImode ? V16HImode
>                                  : mode == V32HFmode ? V16HFmode
> +                                : mode == V32BFmode ? V16BFmode
>                                  : V32QImode);
>           rtx x = gen_reg_rtx (hvmode);
>
> @@ -15232,6 +15241,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
>        use_vector_set = TARGET_AVX512FP16 && one_var == 0;
>        gen_vec_set_0 = gen_vec_setv32hf_0;
>        break;
> +    case E_V8BFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv8bf_0;
> +      break;
> +    case E_V16BFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv16bf_0;
> +      break;
> +    case E_V32BFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv32bf_0;
> +      break;
>      case E_V32HImode:
>        use_vector_set = TARGET_AVX512FP16 && one_var == 0;
>        gen_vec_set_0 = gen_vec_setv32hi_0;
> @@ -15386,6 +15407,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
>        /* FALLTHRU */
>      case E_V8HFmode:
>      case E_V16HFmode:
> +    case E_V8BFmode:
> +    case E_V16BFmode:
>      case E_V4DFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
> @@ -15469,6 +15492,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
>         case E_V32HFmode:
>           half_mode = V16HFmode;
>           break;
> +       case E_V32BFmode:
> +         half_mode = V16BFmode;
> +         break;
>         case E_V16SImode:
>           half_mode = V8SImode;
>           break;
> @@ -15484,6 +15510,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
>         case E_V16HFmode:
>           half_mode = V8HFmode;
>           break;
> +       case E_V16BFmode:
> +         half_mode = V8BFmode;
> +         break;
>         case E_V8SImode:
>           half_mode = V4SImode;
>           break;
> @@ -15642,6 +15671,15 @@ ix86_expand_vector_init_interleave (machine_mode mode,
>        second_imode = V2DImode;
>        third_imode = VOIDmode;
>        break;
> +    case E_V8BFmode:
> +      gen_load_even = gen_vec_interleave_lowv8bf;
> +      gen_interleave_first_low = gen_vec_interleave_lowv4si;
> +      gen_interleave_second_low = gen_vec_interleave_lowv2di;
> +      inner_mode = BFmode;
> +      first_imode = V4SImode;
> +      second_imode = V2DImode;
> +      third_imode = VOIDmode;
> +      break;
>      case E_V8HImode:
>        gen_load_even = gen_vec_setv8hi;
>        gen_interleave_first_low = gen_vec_interleave_lowv4si;
> @@ -15667,15 +15705,18 @@ ix86_expand_vector_init_interleave (machine_mode mode,
>    for (i = 0; i < n; i++)
>      {
>        op = ops [i + i];
> -      if (inner_mode == HFmode)
> +      if (inner_mode == HFmode || inner_mode == BFmode)
>         {
>           rtx even, odd;
> -         /* Use vpuncklwd to pack 2 HFmode.  */
> -         op0 = gen_reg_rtx (V8HFmode);
> -         even = lowpart_subreg (V8HFmode, force_reg (HFmode, op), HFmode);
> -         odd = lowpart_subreg (V8HFmode,
> -                               force_reg (HFmode, ops[i + i + 1]),
> -                               HFmode);
> +         /* Use vpuncklwd to pack 2 HFmode or BFmode.  */
> +         machine_mode vec_mode = ((inner_mode == HFmode)
> +                                  ? V8HFmode : V8BFmode);
> +         op0 = gen_reg_rtx (vec_mode);
> +         even = lowpart_subreg (vec_mode,
> +                                force_reg (inner_mode, op), inner_mode);
> +         odd = lowpart_subreg (vec_mode,
> +                               force_reg (inner_mode, ops[i + i + 1]),
> +                               inner_mode);
>           emit_insn (gen_load_even (op0, even, odd));
>         }
>        else
> @@ -15824,6 +15865,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
>        half_mode = V8HFmode;
>        goto half;
>
> +    case E_V16BFmode:
> +      half_mode = V8BFmode;
> +      goto half;
> +
>  half:
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -15852,6 +15897,11 @@ half:
>        half_mode = V16HFmode;
>        goto quarter;
>
> +    case E_V32BFmode:
> +      quarter_mode = V8BFmode;
> +      half_mode = V16BFmode;
> +      goto quarter;
> +
>  quarter:
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -15891,6 +15941,7 @@ quarter:
>        /* FALLTHRU */
>
>      case E_V8HFmode:
> +    case E_V8BFmode:
>
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -15994,7 +16045,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals)
>           if (inner_mode == QImode
>               || inner_mode == HImode
>               || inner_mode == TImode
> -             || inner_mode == HFmode)
> +             || inner_mode == HFmode
> +             || inner_mode == BFmode)
>             {
>               unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
>               scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
> @@ -16078,7 +16130,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
>    /* 512-bits vector byte/word broadcast and comparison only available
>       under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
>       when without TARGET_AVX512BW.  */
> -  if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode)
> +  if ((mode == V32HImode || mode == V32HFmode || mode == V32BFmode
> +       || mode == V64QImode)
>        && !TARGET_AVX512BW)
>      {
>        gcc_assert (TARGET_AVX512F);
> @@ -16099,6 +16152,12 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
>           extract_hi = gen_vec_extract_hi_v32hf;
>           extract_lo = gen_vec_extract_lo_v32hf;
>         }
> +      else if (mode == V32BFmode)
> +       {
> +         half_mode = V16BFmode;
> +         extract_hi = gen_vec_extract_hi_v32bf;
> +         extract_lo = gen_vec_extract_lo_v32bf;
> +       }
>        else
>         {
>           half_mode = V32QImode;
> @@ -16155,6 +16214,15 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
>         case E_V32HFmode:
>           cmp_mode = V32HImode;
>           break;
> +       case E_V8BFmode:
> +         cmp_mode = V8HImode;
> +         break;
> +       case E_V16BFmode:
> +         cmp_mode = V16HImode;
> +         break;
> +       case E_V32BFmode:
> +         cmp_mode = V32HImode;
> +         break;
>         default:
>           gcc_unreachable ();
>         }
> @@ -16192,7 +16260,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>    bool use_vec_merge = false;
>    bool blendm_const = false;
>    rtx tmp;
> -  static rtx (*gen_extract[7][2]) (rtx, rtx)
> +  static rtx (*gen_extract[8][2]) (rtx, rtx)
>      = {
>         { gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi },
>         { gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi },
> @@ -16200,9 +16268,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>         { gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di },
>         { gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf },
>         { gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df },
> -       { gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf }
> +       { gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf },
> +       { gen_vec_extract_lo_v16bf, gen_vec_extract_hi_v16bf }
>        };
> -  static rtx (*gen_insert[7][2]) (rtx, rtx, rtx)
> +  static rtx (*gen_insert[8][2]) (rtx, rtx, rtx)
>      = {
>         { gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi },
>         { gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi },
> @@ -16211,6 +16280,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>         { gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf },
>         { gen_vec_set_lo_v4df, gen_vec_set_hi_v4df },
>         { gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf },
> +       { gen_vec_set_lo_v16bf, gen_vec_set_hi_v16bf },
>        };
>    int i, j, n;
>    machine_mode mmode = VOIDmode;
> @@ -16379,6 +16449,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>
>      case E_V8HImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V2HImode:
>        use_vec_merge = TARGET_SSE2;
>        break;
> @@ -16402,18 +16473,20 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>        goto half;
>
>      case E_V16HFmode:
> +    case E_V16BFmode:
>        /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw.  */
>        if (TARGET_AVX2 && elt != 0)
>         {
>           mmode = SImode;
> -         gen_blendm = gen_avx2_pblendph_1;
> +         gen_blendm = ((mode == E_V16HFmode) ? gen_avx2_pblendph_1
> +                                               : gen_avx2_pblendbf_1);
>           blendm_const = true;
>           break;
>         }
>        else
>         {
> -         half_mode = V8HFmode;
> -         j = 6;
> +         half_mode = ((mode == E_V16HFmode) ? V8HFmode : V8BFmode);
> +         j = ((mode == E_V16HFmode) ? 6 : 7);
>           n = 8;
>           goto half;
>         }
> @@ -16505,6 +16578,13 @@ half:
>           gen_blendm = gen_avx512bw_blendmv32hf;
>         }
>        break;
> +    case E_V32BFmode:
> +      if (TARGET_AVX512BW)
> +       {
> +         mmode = SImode;
> +         gen_blendm = gen_avx512bw_blendmv32bf;
> +       }
> +      break;
>      case E_V32HImode:
>        if (TARGET_AVX512BW)
>         {
> @@ -16712,6 +16792,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
>
>      case E_V8HImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V2HImode:
>        use_vec_extr = TARGET_SSE2;
>        break;
> @@ -16878,26 +16959,32 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
>        return;
>
>      case E_V32HFmode:
> +    case E_V32BFmode:
>        if (TARGET_AVX512BW)
>         {
> -         tmp = gen_reg_rtx (V16HFmode);
> +         tmp = (mode == E_V32HFmode
> +                ? gen_reg_rtx (V16HFmode)
> +                : gen_reg_rtx (V16BFmode));
>           if (elt < 16)
> -           emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
> +           emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
>           else
> -           emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
> +           emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
>           ix86_expand_vector_extract (false, target, tmp, elt & 15);
>           return;
>         }
>        break;
>
>      case E_V16HFmode:
> +    case E_V16BFmode:
>        if (TARGET_AVX)
>         {
> -         tmp = gen_reg_rtx (V8HFmode);
> +         tmp = (mode == E_V16HFmode
> +                ? gen_reg_rtx (V8HFmode)
> +                : gen_reg_rtx (V8BFmode));
>           if (elt < 8)
> -           emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
> +           emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
>           else
> -           emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
> +           emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
>           ix86_expand_vector_extract (false, target, tmp, elt & 7);
>           return;
>         }
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index fa3722a11e1..e27c87f8c83 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -2463,6 +2463,7 @@ classify_argument (machine_mode mode, const_tree type,
>      case E_V8SImode:
>      case E_V32QImode:
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V16HImode:
>      case E_V4DFmode:
>      case E_V4DImode:
> @@ -2474,6 +2475,7 @@ classify_argument (machine_mode mode, const_tree type,
>      case E_V8DFmode:
>      case E_V16SFmode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V8DImode:
>      case E_V16SImode:
>      case E_V32HImode:
> @@ -2492,6 +2494,7 @@ classify_argument (machine_mode mode, const_tree type,
>      case E_V16QImode:
>      case E_V8HImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V2DFmode:
>      case E_V2DImode:
>        classes[0] = X86_64_SSE_CLASS;
> @@ -2947,6 +2950,7 @@ pass_in_reg:
>        /* FALLTHRU */
>
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V64QImode:
> @@ -2954,6 +2958,7 @@ pass_in_reg:
>      case E_V16SImode:
>      case E_V8DImode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V16SFmode:
>      case E_V8DFmode:
>      case E_V32QImode:
> @@ -2966,6 +2971,7 @@ pass_in_reg:
>      case E_V4SImode:
>      case E_V2DImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V4SFmode:
>      case E_V2DFmode:
>        if (!type || !AGGREGATE_TYPE_P (type))
> @@ -3190,6 +3196,7 @@ pass_in_reg:
>      case E_V4SImode:
>      case E_V2DImode:
>      case E_V8HFmode:
> +    case E_V8BFmode:
>      case E_V4SFmode:
>      case E_V2DFmode:
>        if (!type || !AGGREGATE_TYPE_P (type))
> @@ -3210,9 +3217,11 @@ pass_in_reg:
>      case E_V16SImode:
>      case E_V8DImode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V16SFmode:
>      case E_V8DFmode:
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
> @@ -3273,6 +3282,7 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
>        break;
>
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
> @@ -3280,6 +3290,7 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
>      case E_V4DFmode:
>      case E_V4DImode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V16SFmode:
>      case E_V16SImode:
>      case E_V64QImode:
> @@ -4748,6 +4759,7 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
>    switch (nat_mode)
>      {
>      case E_V16HFmode:
> +    case E_V16BFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
> @@ -4755,6 +4767,7 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
>      case E_V4DFmode:
>      case E_V4DImode:
>      case E_V32HFmode:
> +    case E_V32BFmode:
>      case E_V16SFmode:
>      case E_V16SImode:
>      case E_V64QImode:
> @@ -5430,7 +5443,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>        switch (type)
>         {
>         case opcode_int:
> -         if (scalar_mode == E_HFmode)
> +         if (scalar_mode == E_HFmode || scalar_mode == E_BFmode)
>             opcode = (misaligned_p
>                       ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
>                       : "vmovdqa64");
> @@ -5450,6 +5463,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>        switch (scalar_mode)
>         {
>         case E_HFmode:
> +       case E_BFmode:
>           if (evex_reg_p)
>             opcode = (misaligned_p
>                       ? (TARGET_AVX512BW
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 0da3dce1d31..0de5c77bc7d 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1011,7 +1011,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_AVX256_REG_MODE(MODE)                                    \
>    ((MODE) == V32QImode || (MODE) == V16HImode || (MODE) == V8SImode    \
>     || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode   \
> -   || (MODE) == V4DFmode || (MODE) == V16HFmode)
> +   || (MODE) == V4DFmode || (MODE) == V16HFmode || (MODE) == V16BFmode)
>
>  #define VALID_AVX256_REG_OR_OI_MODE(MODE)              \
>    (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> @@ -1026,7 +1026,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
>     || (MODE) == V16SImode || (MODE) == V16SFmode || (MODE) == V32HImode \
> -   || (MODE) == V4TImode || (MODE) == V32HFmode)
> +   || (MODE) == V4TImode || (MODE) == V32HFmode || (MODE) == V32BFmode)
>
>  #define VALID_AVX512F_REG_OR_XI_MODE(MODE)                             \
>    (VALID_AVX512F_REG_MODE (MODE) || (MODE) == XImode)
> @@ -1035,7 +1035,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>    ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode     \
>     || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode   \
>     || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode     \
> -   || (MODE) == TImode)
> +   || (MODE) == V8BFmode || (MODE) == TImode)
>
>  #define VALID_AVX512FP16_REG_MODE(MODE)                                        \
>    ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode    \
> @@ -1044,6 +1044,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
> +   || (MODE) == V8BFmode \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
>     || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode     \
>     || (MODE) == HFmode || (MODE) == BFmode)
> @@ -1095,8 +1096,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode   \
>     || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode  \
>     || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \
> -   || (MODE) == V16SFmode || (MODE) == V32HFmode || (MODE) == V16HFmode \
> -   || (MODE) == V8HFmode)
> +   || (MODE) == V16SFmode \
> +   || (MODE) == V32HFmode || (MODE) == V16HFmode || (MODE) == V8HFmode  \
> +   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode)
>
>  #define X87_FLOAT_MODE_P(MODE) \
>    (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 5f7e2457f5c..58fcc382fa2 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1114,7 +1114,8 @@
>                              (V2DF "16") (V4DF "32") (V8DF "64")
>                              (V4SF "16") (V8SF "32") (V16SF "64")
>                              (V8HF "16") (V16HF "32") (V32HF "64")
> -                            (V4HF "8") (V2HF "4")])
> +                            (V4HF "8") (V2HF "4")
> +                            (V8BF "16") (V16BF "32") (V32BF "64")])
>
>  ;; Double word integer modes as mode attribute.
>  (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
> @@ -1258,8 +1259,8 @@
>  (define_mode_attr ssemodesuffix
>    [(HF "sh") (SF "ss") (DF "sd")
>     (V32HF "ph") (V16SF "ps") (V8DF "pd")
> -   (V16HF "ph") (V8SF "ps") (V4DF "pd")
> -   (V8HF "ph") (V4SF "ps") (V2DF "pd")
> +   (V16HF "ph") (V16BF "bf") (V8SF "ps") (V4DF "pd")
> +   (V8HF "ph")  (V8BF "bf") (V4SF "ps") (V2DF "pd")
>     (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
>     (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
>     (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
> @@ -1269,7 +1270,7 @@
>
>  ;; SSE vector mode corresponding to a scalar mode
>  (define_mode_attr ssevecmode
> -  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (SF "V4SF") (DF "V2DF")])
> +  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (BF "V8BF") (SF "V4SF") (DF "V2DF")])
>  (define_mode_attr ssevecmodelower
>    [(QI "v16qi") (HI "v8hi") (SI "v4si") (DI "v2di") (SF "v4sf") (DF "v2df")])
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b23f07e08c6..9ba47b62a01 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -232,6 +232,7 @@
>     (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
>     (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
>
> @@ -263,10 +264,11 @@
>    [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
>     V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
>
> -(define_mode_iterator VI12HF_AVX512VL
> +(define_mode_iterator VI12HFBF_AVX512VL
>    [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
>     V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
> -   V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
> +   V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
> +   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
>
>  ;; Same iterator, but without supposed TARGET_AVX512BW
>  (define_mode_iterator VI12_AVX512VLBW
> @@ -309,10 +311,10 @@
>
>  ;; All 256bit and 512bit vector modes
>  (define_mode_iterator V_256_512
> -  [V32QI V16HI V16HF V8SI V4DI V8SF V4DF
> +  [V32QI V16HI V16HF V16BF V8SI V4DI V8SF V4DF
>     (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V32HF "TARGET_AVX512F")
> -   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
> -   (V8DF "TARGET_AVX512F")])
> +   (V32BF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> +   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
>
>  ;; All vector float modes
>  (define_mode_iterator VF
> @@ -435,6 +437,13 @@
>  (define_mode_iterator VF_AVX512FP16
>    [V32HF V16HF V8HF])
>
> +(define_mode_iterator VF_AVX512HFBF16
> +  [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16") V32BF V16BF V8BF])
> +
> +(define_mode_iterator VF_AVX512BWHFBF16
> +  [V32HF V16HF V8HF V32BF V16BF V8BF])
> +
>  (define_mode_iterator VF_AVX512FP16VL
>    [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
>
> @@ -447,13 +456,14 @@
>     (V4DI "TARGET_AVX") V2DI])
>
>  ;; All vector integer and HF modes
> -(define_mode_iterator VIHF
> +(define_mode_iterator VIHFBF
>    [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
>     (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
>     (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
>     (V8SI "TARGET_AVX") V4SI
>     (V4DI "TARGET_AVX") V2DI
> -   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF])
> +   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF])
>
>  (define_mode_iterator VI_AVX2
>    [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
> @@ -676,6 +686,7 @@
>     (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
>     (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
>     (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
> +   (V8BF "avx512vl") (V16BF "avx512vl") (V32BF "avx512bw")
>     (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
>     (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
>
> @@ -786,7 +797,7 @@
>  ;; All 128 and 256bit vector integer modes
>  (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
>  ;; All 256bit vector integer and HF modes
> -(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
> +(define_mode_iterator VIHFBF_256 [V32QI V16HI V8SI V4DI V16HF V16BF])
>
>  ;; Various 128bit vector integer mode combinations
>  (define_mode_iterator VI12_128 [V16QI V8HI])
> @@ -813,12 +824,12 @@
>  (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
>  (define_mode_iterator VI_AVX512BW
>    [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
> -(define_mode_iterator VIHF_AVX512BW
> +(define_mode_iterator VIHFBF_AVX512BW
>    [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
> -  (V32HF "TARGET_AVX512BW")])
> +  (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
>
>  ;; Int-float size matches
> -(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF])
> +(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
>  (define_mode_iterator VI4F_128 [V4SI V4SF])
>  (define_mode_iterator VI8F_128 [V2DI V2DF])
>  (define_mode_iterator VI4F_256 [V8SI V8SF])
> @@ -863,9 +874,9 @@
>     (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
>     V16SF V8DF])
>
> -(define_mode_iterator V8_128 [V8HI V8HF])
> -(define_mode_iterator V16_256 [V16HI V16HF])
> -(define_mode_iterator V32_512 [V32HI V32HF])
> +(define_mode_iterator V8_128 [V8HI V8HF V8BF])
> +(define_mode_iterator V16_256 [V16HI V16HF V16BF])
> +(define_mode_iterator V32_512 [V32HI V32HF V32BF])
>
>  ;; Mapping from float mode to required SSE level
>  (define_mode_attr sse
> @@ -910,6 +921,7 @@
>     (V8SF "V8SF") (V4DF "V4DF")
>     (V4SF "V4SF") (V2DF "V2DF")
>     (V8HF "TI") (V16HF "OI") (V32HF "XI")
> +   (V8BF "TI") (V16BF "OI") (V32BF "XI")
>     (TI "TI")])
>
>  (define_mode_attr sseintvecinsnmode
> @@ -926,16 +938,17 @@
>    [(V64QI "BC") (V32HI "BC") (V16SI "BC") (V8DI "BC") (V4TI "BC")
>     (V32QI "BC") (V16HI "BC") (V8SI "BC") (V4DI "BC") (V2TI "BC")
>     (V16QI "BC") (V8HI "BC") (V4SI "BC") (V2DI "BC") (V1TI "BC")
> -   (V32HF "BF") (V16SF "BF") (V8DF "BF")
> -   (V16HF "BF") (V8SF "BF") (V4DF "BF")
> -   (V8HF "BF") (V4SF "BF") (V2DF "BF")])
> +   (V32HF "BF") (V32BF "BF") (V16SF "BF") (V8DF "BF")
> +   (V16HF "BF") (V16BF "BF") (V8SF "BF") (V4DF "BF")
> +   (V8HF "BF") (V8BF "BF") (V4SF "BF") (V2DF "BF")])
>
>  ;; SSE integer instruction suffix for various modes
>  (define_mode_attr sseintmodesuffix
>    [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
>     (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
>     (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
> -   (V8HF "w") (V16HF "w") (V32HF "w")])
> +   (V8HF "w") (V16HF "w") (V32HF "w")
> +   (V8BF "w") (V16BF "w") (V32BF "w")])
>
>  ;; Mapping of vector modes to corresponding mask size
>  (define_mode_attr avx512fmaskmode
> @@ -944,6 +957,7 @@
>     (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
>     (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
>     (V32HF "SI") (V16HF "HI") (V8HF  "QI")
> +   (V32BF "SI") (V16BF "HI") (V8BF  "QI")
>     (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
>     (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
>
> @@ -958,6 +972,7 @@
>     (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
>     (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
>     (V32HF "si") (V16HF "hi") (V8HF  "qi")
> +   (V32BF "si") (V16BF "hi") (V8BF  "qi")
>     (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
>     (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
>
> @@ -973,9 +988,9 @@
>
>  ;; Mapping of vector float modes to an integer mode of the same size
>  (define_mode_attr sseintvecmode
> -  [(V32HF "V32HI") (V16SF "V16SI") (V8DF  "V8DI")
> -   (V16HF "V16HI") (V8SF  "V8SI")  (V4DF  "V4DI")
> -   (V8HF "V8HI") (V4SF  "V4SI")  (V2DF  "V2DI")
> +  [(V32HF "V32HI") (V32BF "V32HI") (V16SF "V16SI") (V8DF  "V8DI")
> +   (V16HF "V16HI") (V16BF "V16HI") (V8SF  "V8SI")  (V4DF  "V4DI")
> +   (V8HF "V8HI") (V8BF "V8HI") (V4SF "V4SI")  (V2DF  "V2DI")
>     (V16SI "V16SI") (V8DI  "V8DI")
>     (V8SI  "V8SI")  (V4DI  "V4DI")
>     (V4SI  "V4SI")  (V2DI  "V2DI")
> @@ -998,9 +1013,9 @@
>     (V16HF "OI") (V8HF "TI")])
>
>  (define_mode_attr sseintvecmodelower
> -  [(V32HF "v32hi") (V16SF "v16si") (V8DF "v8di")
> -   (V16HF "v16hi") (V8SF "v8si") (V4DF "v4di")
> -   (V8HF "v8hi") (V4SF "v4si") (V2DF "v2di")
> +  [(V32HF "v32hi") (V32BF "v32hi") (V16SF "v16si") (V8DF "v8di")
> +   (V16HF "v16hi") (V16BF "v16hi") (V8SF "v8si") (V4DF "v4di")
> +   (V8HF "v8hi") (V8BF "v8hi") (V4SF "v4si") (V2DF "v2di")
>     (V8SI "v8si") (V4DI "v4di")
>     (V4SI "v4si") (V2DI "v2di")
>     (V16HI "v16hi") (V8HI "v8hi")
> @@ -1014,7 +1029,8 @@
>     (V16SF "V32SF") (V8DF "V16DF")
>     (V8SF "V16SF") (V4DF "V8DF")
>     (V4SF "V8SF") (V2DF "V4DF")
> -   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
> +   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")
> +   (V32BF "V64BF") (V16BF "V32BF") (V8BF "V16BF")])
>
>  ;; Mapping of vector modes to a vector mode of half size
>  ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
> @@ -1025,7 +1041,8 @@
>     (V16SF "V8SF") (V8DF "V4DF")
>     (V8SF  "V4SF") (V4DF "V2DF")
>     (V4SF  "V2SF") (V2DF "DF")
> -   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
> +   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")
> +   (V32BF "V16BF") (V16BF "V8BF") (V8BF "V4BF")])
>
>  (define_mode_attr ssehalfvecmodelower
>    [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
> @@ -1034,7 +1051,8 @@
>     (V16SF "v8sf") (V8DF "v4df")
>     (V8SF  "v4sf") (V4DF "v2df")
>     (V4SF  "v2sf")
> -   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
> +   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")
> +   (V32BF "v16bf") (V16BF "v8bf") (V8BF "v4bf")])
>
>  ;; Mapping of vector modes to vector hf modes of conversion.
>  (define_mode_attr ssePHmode
> @@ -1085,6 +1103,7 @@
>     (V16SI "SI") (V8SI "SI")  (V4SI "SI")
>     (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
>     (V32HF "HF") (V16HF "HF") (V8HF "HF")
> +   (V32BF "BF") (V16BF "BF") (V8BF "BF")
>     (V16SF "SF") (V8SF "SF")  (V4SF "SF")
>     (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
>     (V4TI "TI")  (V2TI "TI")])
> @@ -1096,6 +1115,7 @@
>     (V16SI "si") (V8SI "si")  (V4SI "si")
>     (V8DI "di")  (V4DI "di")  (V2DI "di")
>     (V32HF "hf") (V16HF "hf")  (V8HF "hf")
> +   (V32BF "bf") (V16BF "bf")  (V8BF "bf")
>     (V16SF "sf") (V8SF "sf")  (V4SF "sf")
>     (V8DF "df")  (V4DF "df")  (V2DF "df")
>     (V4TI "ti")  (V2TI "ti")])
> @@ -1107,6 +1127,7 @@
>     (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
>     (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
>     (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
> +   (V32BF "V8BF")  (V16BF "V8BF") (V8BF "V8BF")
>     (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
>     (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
>
> @@ -1128,6 +1149,7 @@
>     (V16SF "d") (V8SF "d") (V4SF "d")
>     (V32HI "d") (V16HI "d") (V8HI "d")
>     (V32HF "d") (V16HF "d") (V8HF "d")
> +   (V32BF "d") (V16BF "d") (V8BF "d")
>     (V64QI "d") (V32QI "d") (V16QI "d")])
>
>  ;; Number of scalar elements in each vector type
> @@ -1153,6 +1175,7 @@
>     (V32HI "16") (V16HI "16") (V8HI "16")
>     (V16SI "32") (V8SI "32") (V4SI "32")
>     (V32HF "16") (V16HF "16") (V8HF "16")
> +   (V32BF "16") (V16BF "16") (V8BF "16")
>     (V16SF "32") (V8SF "32") (V4SF "32")
>     (V8DF "64") (V4DF "64") (V2DF "64")])
>
> @@ -1164,9 +1187,9 @@
>     (V4SI  "p") (V4SF  "")
>     (V8SI  "p") (V8SF  "")
>     (V16SI "p") (V16SF "")
> -   (V16QI "p") (V8HI "p") (V8HF "p")
> -   (V32QI "p") (V16HI "p") (V16HF "p")
> -   (V64QI "p") (V32HI "p") (V32HF "p")])
> +   (V16QI "p") (V8HI "p") (V8HF "p") (V8BF "p")
> +   (V32QI "p") (V16HI "p") (V16HF "p") (V16BF "p")
> +   (V64QI "p") (V32HI "p") (V32HF "p") (V32BF "p")])
>
>  ;; SSE prefix for integer and HF vector comparison.
>  (define_mode_attr ssecmpintprefix
> @@ -1219,7 +1242,8 @@
>  ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
>  ;; i64x4 or f64x4 for 512bit modes.
>  (define_mode_attr i128
> -  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
> +  [(V16HF "%~128") (V32HF "i64x4") (V16BF "%~128") (V32BF "i64x4")
> +   (V16SF "f64x4") (V8SF "f128")
>     (V8DF "f64x4") (V4DF "f128")
>     (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
>     (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
> @@ -1245,17 +1269,18 @@
>     (V16SI "d")  (V8SI "d")  (V4SI "d")
>     (V8DI "q")   (V4DI "q")  (V2DI "q")
>     (V32HF "w")  (V16HF "w") (V8HF "w")
> +   (V32BF "w")  (V16BF "w") (V8BF "w")
>     (V16SF "ss") (V8SF "ss") (V4SF "ss")
>     (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
>
>  ;; Tie mode of assembler operand to mode iterator
>  (define_mode_attr xtg_mode
>    [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
> -   (V8HF "x") (V4SF "x") (V2DF "x")
> +   (V8HF "x")  (V8BF "x") (V4SF "x") (V2DF "x")
>     (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
> -   (V16HF "t") (V8SF "t") (V4DF "t")
> +   (V16HF "t") (V16BF "t") (V8SF "t") (V4DF "t")
>     (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
> -   (V32HF "g") (V16SF "g") (V8DF "g")])
> +   (V32HF "g") (V32BF "g") (V16SF "g") (V8DF "g")])
>
>  ;; Half mask mode for unpacks
>  (define_mode_attr HALFMASKMODE
> @@ -1553,10 +1578,10 @@
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<avx512>_blendm<mode>"
> -  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
> -       (vec_merge:VF_AVX512FP16
> -         (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm")
> -         (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v")
> +  [(set (match_operand:VF_AVX512BWHFBF16 0 "register_operand" "=v,v")
> +       (vec_merge:VF_AVX512BWHFBF16
> +         (match_operand:VF_AVX512BWHFBF16 2 "nonimmediate_operand" "vm,vm")
> +         (match_operand:VF_AVX512BWHFBF16 1 "nonimm_or_0_operand" "0C,v")
>           (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
>    "TARGET_AVX512BW"
>    "@
> @@ -1595,9 +1620,9 @@
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<avx512>_store<mode>_mask"
> -  [(set (match_operand:VI12HF_AVX512VL 0 "memory_operand" "=m")
> -       (vec_merge:VI12HF_AVX512VL
> -         (match_operand:VI12HF_AVX512VL 1 "register_operand" "v")
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> +       (vec_merge:VI12HFBF_AVX512VL
> +         (match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
>           (match_dup 0)
>           (match_operand:<avx512fmaskmode> 2 "register_operand" "Yk")))]
>    "TARGET_AVX512BW"
> @@ -4513,14 +4538,18 @@
>    DONE;
>  })
>
> +(define_mode_iterator VF_AVX512HFBFVL
> +  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
> +   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
> +
>  (define_expand "vcond<mode><sseintvecmodelower>"
> -  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
> -       (if_then_else:VF_AVX512FP16VL
> +  [(set (match_operand:VF_AVX512HFBFVL 0 "register_operand")
> +       (if_then_else:VF_AVX512HFBFVL
>           (match_operator 3 ""
>             [(match_operand:<sseintvecmode> 4 "vector_operand")
>              (match_operand:<sseintvecmode> 5 "vector_operand")])
> -         (match_operand:VF_AVX512FP16VL 1 "general_operand")
> -         (match_operand:VF_AVX512FP16VL 2 "general_operand")))]
> +         (match_operand:VF_AVX512HFBFVL 1 "general_operand")
> +         (match_operand:VF_AVX512HFBFVL 2 "general_operand")))]
>    "TARGET_AVX512FP16"
>  {
>    bool ok = ix86_expand_int_vcond (operands);
> @@ -4552,10 +4581,10 @@
>    "TARGET_AVX512F")
>
>  (define_expand "vcond_mask_<mode><avx512fmaskmodelower>"
> -  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand")
> -       (vec_merge:VI12HF_AVX512VL
> -         (match_operand:VI12HF_AVX512VL 1 "nonimmediate_operand")
> -         (match_operand:VI12HF_AVX512VL 2 "nonimm_or_0_operand")
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand")
> +       (vec_merge:VI12HFBF_AVX512VL
> +         (match_operand:VI12HFBF_AVX512VL 1 "nonimmediate_operand")
> +         (match_operand:VI12HFBF_AVX512VL 2 "nonimm_or_0_operand")
>           (match_operand:<avx512fmaskmode> 3 "register_operand")))]
>    "TARGET_AVX512BW")
>
> @@ -10747,7 +10776,7 @@
>                    (const_string "HF")
>                    (const_string "TI")))
>     (set (attr "enabled")
> -     (cond [(and (not (match_test "<MODE>mode == V8HFmode"))
> +     (cond [(and (not (match_test "<MODE>mode == V8HFmode || <MODE>mode == V8BFmode"))
>                  (eq_attr "alternative" "2"))
>               (symbol_ref "false")
>            ]
> @@ -10809,11 +10838,13 @@
>    DONE;
>  })
>
> -(define_insn "avx512fp16_movsh"
> -  [(set (match_operand:V8HF 0 "register_operand" "=v")
> -       (vec_merge:V8HF
> -          (match_operand:V8HF 2 "register_operand" "v")
> -         (match_operand:V8HF 1 "register_operand" "v")
> +(define_mode_iterator V8BFH_128 [V8HF V8BF])
> +
> +(define_insn "avx512fp16_mov<mode>"
> +  [(set (match_operand:V8BFH_128 0 "register_operand" "=v")
> +       (vec_merge:V8BFH_128
> +         (match_operand:V8BFH_128 2 "register_operand" "v")
> +         (match_operand:V8BFH_128 1 "register_operand" "v")
>           (const_int 1)))]
>    "TARGET_AVX512FP16"
>    "vmovsh\t{%2, %1, %0|%0, %1, %2}"
> @@ -10996,9 +11027,9 @@
>    DONE;
>  })
>
> -(define_expand "vec_setv8hf"
> -  [(match_operand:V8HF 0 "register_operand")
> -   (match_operand:HF 1 "register_operand")
> +(define_expand "vec_set<mode>"
> +  [(match_operand:V8BFH_128 0 "register_operand")
> +   (match_operand:<ssescalarmode> 1 "register_operand")
>     (match_operand 2 "vec_setm_sse41_operand")]
>    "TARGET_SSE"
>  {
> @@ -11726,7 +11757,7 @@
>     (set_attr "length_immediate" "1")
>     (set_attr "mode" "<sseinsnmode>")])
>
> -(define_insn_and_split "vec_extract_lo_<mode>"
> +(define_insn_and_split "@vec_extract_lo_<mode>"
>    [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
>         (vec_select:<ssehalfvecmode>
>           (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
> @@ -11768,7 +11799,7 @@
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "XI")])
>
> -(define_insn "vec_extract_hi_<mode>"
> +(define_insn "@vec_extract_hi_<mode>"
>    [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
>         (vec_select:<ssehalfvecmode>
>           (match_operand:V32_512 1 "register_operand" "v")
> @@ -11788,7 +11819,7 @@
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "XI")])
>
> -(define_insn_and_split "vec_extract_lo_<mode>"
> +(define_insn_and_split "@vec_extract_lo_<mode>"
>    [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
>         (vec_select:<ssehalfvecmode>
>           (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
> @@ -11802,7 +11833,7 @@
>    [(set (match_dup 0) (match_dup 1))]
>    "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
>
> -(define_insn "vec_extract_hi_<mode>"
> +(define_insn "@vec_extract_hi_<mode>"
>    [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
>         (vec_select:<ssehalfvecmode>
>           (match_operand:V16_256 1 "register_operand" "x,v,v")
> @@ -11944,20 +11975,20 @@
>  ;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
>  ;; Otherwise, it will be ignored.
>  (define_insn_and_split "*vec_extract<mode>_0"
> -  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
> -       (vec_select:HF
> -         (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
> +  [(set (match_operand:<ssescalarmode> 0 "nonimmediate_operand" "=v,m,r")
> +       (vec_select:<ssescalarmode>
> +         (match_operand:VF_AVX512HFBF16 1 "nonimmediate_operand" "vm,v,m")
>           (parallel [(const_int 0)])))]
> -  "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> +  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0) (match_dup 1))]
> -  "operands[1] = gen_lowpart (HFmode, operands[1]);")
> +  "operands[1] = gen_lowpart (<ssescalarmode>mode, operands[1]);")
>
> -(define_insn "*vec_extracthf"
> -  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
> -       (vec_select:HF
> -         (match_operand:V8HF 1 "register_operand" "v,v,0,v")
> +(define_insn "*vec_extract<mode>"
> +  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
> +       (vec_select:HFBF
> +         (match_operand:<ssevecmode> 1 "register_operand" "v,v,0,v")
>           (parallel
>             [(match_operand:SI 2 "const_0_to_7_operand")])))]
>    "TARGET_SSE2"
> @@ -11992,6 +12023,7 @@
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
>     (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
> @@ -18097,17 +18129,17 @@
>
>  ;; Modes handled by pinsr patterns.
>  (define_mode_iterator PINSR_MODE
> -  [(V16QI "TARGET_SSE4_1") V8HI V8HF
> +  [(V16QI "TARGET_SSE4_1") V8HI V8HF V8BF
>     (V4SI "TARGET_SSE4_1")
>     (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
>
>  (define_mode_attr sse2p4_1
>    [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2")
> -   (V4SI "sse4_1") (V2DI "sse4_1")])
> +   (V8BF "sse2") (V4SI "sse4_1") (V2DI "sse4_1")])
>
>  (define_mode_attr pinsr_evex_isa
>    [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
> -   (V4SI "avx512dq") (V2DI "avx512dq")])
> +   (V8BF "avx512bw") (V4SI "avx512dq") (V2DI "avx512dq")])
>
>  ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
>  (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
> @@ -25193,11 +25225,12 @@
>     (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
>     (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
>     (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
> -   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
> +   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
> +   (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
>
>  (define_insn "avx2_pbroadcast<mode>"
> -  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
> -       (vec_duplicate:VIHF
> +  [(set (match_operand:VIHFBF 0 "register_operand" "=x,v")
> +       (vec_duplicate:VIHFBF
>           (vec_select:<ssescalarmode>
>             (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
>             (parallel [(const_int 0)]))))]
> @@ -25210,10 +25243,10 @@
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "avx2_pbroadcast<mode>_1"
> -  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
> -       (vec_duplicate:VIHF_256
> +  [(set (match_operand:VIHFBF_256 0 "register_operand" "=x,x,v,v")
> +       (vec_duplicate:VIHFBF_256
>           (vec_select:<ssescalarmode>
> -           (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
> +           (match_operand:VIHFBF_256 1 "nonimmediate_operand" "m,x,m,v")
>             (parallel [(const_int 0)]))))]
>    "TARGET_AVX2"
>    "@
> @@ -25589,10 +25622,10 @@
>     (set_attr "mode" "V4DF")])
>
>  (define_insn "<avx512>_vec_dup<mode>_1"
> -  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
> -       (vec_duplicate:VIHF_AVX512BW
> +  [(set (match_operand:VIHFBF_AVX512BW 0 "register_operand" "=v,v")
> +       (vec_duplicate:VIHFBF_AVX512BW
>           (vec_select:<ssescalarmode>
> -           (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
> +           (match_operand:VIHFBF_AVX512BW 1 "nonimmediate_operand" "v,m")
>             (parallel [(const_int 0)]))))]
>    "TARGET_AVX512F"
>    "@
> @@ -25622,8 +25655,8 @@
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<avx512>_vec_dup<mode><mask_name>"
> -  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
> -       (vec_duplicate:VI12HF_AVX512VL
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v")
> +       (vec_duplicate:VI12HFBF_AVX512VL
>           (vec_select:<ssescalarmode>
>             (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
>             (parallel [(const_int 0)]))))]
> @@ -25658,8 +25691,8 @@
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
> -  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
> -       (vec_duplicate:VI12HF_AVX512VL
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v,v")
> +       (vec_duplicate:VI12HFBF_AVX512VL
>           (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
>    "TARGET_AVX512BW"
>    "@
> @@ -25759,7 +25792,7 @@
>    [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
>  ;; Modes handled by AVX2 vec_dup patterns.
>  (define_mode_iterator AVX2_VEC_DUP_MODE
> -  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
> +  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF V16BF V8BF])
>
>  (define_insn "*vec_dup<mode>"
>    [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
> @@ -26522,6 +26555,7 @@
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
>     (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
> @@ -26534,6 +26568,7 @@
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
>     (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
>     (V4TI "TARGET_AVX512F")])
> diff --git a/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C b/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
> new file mode 100644
> index 00000000000..71b4d86d36e
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
> +/* { dg-final { scan-assembler-times "vpblendmw" 1 } }  */
> +
> +typedef short v8hi __attribute__((vector_size(16)));
> +typedef __bf16 v8bf __attribute__((vector_size(16)));
> +
> +v8bf
> +foo (v8hi a, v8hi b, v8bf c, v8bf d)
> +{
> +      return a > b ? c : d;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
> new file mode 100644
> index 00000000000..dd33f1add9c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -O2" } */
> +
> +/* { dg-final { scan-assembler-times "vpbroadcastw" 1 { target { ! ia32 } } } }  */
> +/* { dg-final { scan-assembler-times "vpblendw" 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vmovsh" 1 { target { ! ia32 } } } }  */
> +
> +/* { dg-final { scan-assembler-times "vpinsrw" 2 { target ia32 } } }  */
> +#include <immintrin.h>
> +
> +typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16)));
> +typedef __bf16 __m128bf16 __attribute__ ((__vector_size__ (16), __may_alias__));
> +
> +__m128bf16
> +__attribute__ ((noinline, noclone))
> +foo1 (__m128bf16 a, __bf16 f)
> +{
> +  __v8bf x = (__v8bf) a;
> +  x[2] = f;
> +  return (__m128bf16) x;
> +}
> +
> +__m128bf16
> +__attribute__ ((noinline, noclone))
> +foo2 (__m128bf16 a, __bf16 f)
> +{
> +  __v8bf x = (__v8bf) a;
> +  x[0] = f;
> +  return (__m128bf16) x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
> new file mode 100644
> index 00000000000..70152d03f92
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
> @@ -0,0 +1,121 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -O2" } */
> +
> +typedef __bf16 v8bf __attribute__ ((__vector_size__ (16)));
> +typedef __bf16 v16bf __attribute__ ((__vector_size__ (32)));
> +typedef __bf16 v32bf __attribute__ ((__vector_size__ (64)));
> +
> +#define VEC_EXTRACT(V,S,IDX)                   \
> +  S                                            \
> +  __attribute__((noipa))                       \
> +  vec_extract_##V##_##IDX (V v)                        \
> +  {                                            \
> +    return v[IDX];                             \
> +  }
> +
> +#define VEC_SET(V,S,IDX)                       \
> +  V                                            \
> +  __attribute__((noipa))                       \
> +  vec_set_##V##_##IDX (V v, S s)               \
> +  {                                            \
> +    v[IDX] = s;                                \
> +    return v;                                  \
> +  }
> +
> +v8bf
> +vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> +              __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8)
> +{
> +    return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8};
> +}
> +
> +v16bf
> +vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> +              __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
> +              __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
> +              __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16)
> +{
> +    return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
> +                                 a9, a10, a11, a12, a13, a14, a15, a16};
> +}
> +
> +v32bf
> +vec_init_v32bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> +               __bf16 a5, __bf16 a6, __bf16 a7, __bf16 a8,
> +               __bf16 a9, __bf16 a10, __bf16 a11, __bf16 a12,
> +               __bf16 a13, __bf16 a14, __bf16 a15, __bf16 a16,
> +               __bf16 a17, __bf16 a18, __bf16 a19, __bf16 a20,
> +               __bf16 a21, __bf16 a22, __bf16 a23, __bf16 a24,
> +               __bf16 a25, __bf16 a26, __bf16 a27, __bf16 a28,
> +               __bf16 a29, __bf16 a30, __bf16 a31, __bf16 a32)
> +{
> +    return __extension__ (v32bf) {a1, a2, a3, a4, a5, a6, a7, a8,
> +                                 a9, a10, a11, a12, a13, a14, a15, a16,
> +                                 a17, a18, a19, a20, a21, a22, a23, a24,
> +                                 a25, a26, a27, a28, a29, a30, a31, a32};
> +}
> +
> +v8bf
> +vec_init_dup_v8bf (__bf16 a1)
> +{
> +    return __extension__ (v8bf) {a1, a1, a1, a1, a1, a1, a1, a1};
> +}
> +
> +v16bf
> +vec_init_dup_v16bf (__bf16 a1)
> +{
> +    return __extension__ (v16bf) {a1, a1, a1, a1, a1, a1, a1, a1,
> +                                 a1, a1, a1, a1, a1, a1, a1, a1};
> +}
> +
> +v32bf
> +vec_init_dup_v32bf (__bf16 a1)
> +{
> +    return __extension__ (v32bf) {a1, a1, a1, a1, a1, a1, a1, a1,
> +                                 a1, a1, a1, a1, a1, a1, a1, a1,
> +                                 a1, a1, a1, a1, a1, a1, a1, a1,
> +                                 a1, a1, a1, a1, a1, a1, a1, a1};
> +}
> +
> +/* { dg-final { scan-assembler-times "vpunpcklwd" 28 } } */
> +/* { dg-final { scan-assembler-times "vpunpckldq" 14 } } */
> +/* { dg-final { scan-assembler-times "vpunpcklqdq" 7 } } */
> +
> +VEC_EXTRACT (v8bf, __bf16, 0);
> +VEC_EXTRACT (v8bf, __bf16, 4);
> +VEC_EXTRACT (v16bf, __bf16, 0);
> +VEC_EXTRACT (v16bf, __bf16, 3);
> +VEC_EXTRACT (v16bf, __bf16, 8);
> +VEC_EXTRACT (v16bf, __bf16, 15);
> +VEC_EXTRACT (v32bf, __bf16, 0);
> +VEC_EXTRACT (v32bf, __bf16, 5);
> +VEC_EXTRACT (v32bf, __bf16, 8);
> +VEC_EXTRACT (v32bf, __bf16, 14);
> +VEC_EXTRACT (v32bf, __bf16, 16);
> +VEC_EXTRACT (v32bf, __bf16, 24);
> +VEC_EXTRACT (v32bf, __bf16, 28);
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 2 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$10" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$12" 1 } } */
> +/* { dg-final { scan-assembler-times "vextract" 9 } } */
> +
> +VEC_SET (v8bf, __bf16, 4);
> +VEC_SET (v16bf, __bf16, 3);
> +VEC_SET (v16bf, __bf16, 8);
> +VEC_SET (v16bf, __bf16, 15);
> +VEC_SET (v32bf, __bf16, 5);
> +VEC_SET (v32bf, __bf16, 8);
> +VEC_SET (v32bf, __bf16, 14);
> +VEC_SET (v32bf, __bf16, 16);
> +VEC_SET (v32bf, __bf16, 24);
> +VEC_SET (v32bf, __bf16, 28);
> +/* { dg-final { scan-assembler-times "vpbroadcastw" 13 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpblendw" 4 { target { ! ia32 } } } } */
> +
> +/* { dg-final { scan-assembler-times "vpbroadcastw" 12 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpblendw" 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpinsrw" 1 { target ia32 } } } */
> +
> +/* { dg-final { scan-assembler-times "vpblendd" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
> new file mode 100644
> index 00000000000..5b846e68c99
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +
> +#include "vect-bfloat16-2a.c"
> +
> +/* { dg-final { scan-assembler-times "vpunpcklwd" 28 } } */
> +/* { dg-final { scan-assembler-times "vpunpckldq" 14 } } */
> +/* { dg-final { scan-assembler-times "vpunpcklqdq" 7 } } */
> +
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
> +/* { dg-final { scan-assembler-times "vextract" 2 } } */
> +
> +/* { dg-final { scan-assembler-times "vpbroadcastw" 7 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpblendw" 4 { target { ! ia32 } } } } */
> +
> +/* { dg-final { scan-assembler-times "vpbroadcastw" 6 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpblendw" 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpinsrw" 63 { target ia32 } } } */
> +
> +/* { dg-final { scan-assembler-times "vpblendd" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
> new file mode 100644
> index 00000000000..3804bac7220
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
> @@ -0,0 +1,258 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -O2" } */
> +
> +#include <immintrin.h>
> +
> +typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16)));
> +typedef __bf16 __m128bf16 __attribute__ ((__vector_size__ (16), __may_alias__));
> +
> +__bf16 glob_bfloat;
> +__m128bf16 glob_bfloat_vec;
> +
> +__m256 is_a_float_vec;
> +__m128 is_a_float_pair;
> +
> +__m128h *float_ptr;
> +__m128h is_a_float16_vec;
> +
> +__v8si is_an_int_vec;
> +__v4si is_an_int_pair;
> +__v8hi is_a_short_vec;
> +
> +int is_an_int;
> +short is_a_short_int;
> +float is_a_float;
> +float is_a_float16;
> +double is_a_double;
> +
> +__m128bf16 footest (__m128bf16 vector0)
> +{
> +  /* Initialisation  */
> +
> +  __m128bf16 vector1_1;
> +  __m128bf16 vector1_2 = glob_bfloat_vec;
> +  __m128bf16 vector1_3 = is_a_float_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m256'} }*/
> +  __m128bf16 vector1_4 = is_an_int_vec;  /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v8si'} } */
> +  __m128bf16 vector1_5 = is_a_float16_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m128h'} } */
> +  __m128bf16 vector1_6 = is_a_float_pair; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__m128'} } */
> +  __m128bf16 vector1_7 = is_an_int_pair; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v4si'} } */
> +  __m128bf16 vector1_8 = is_a_short_vec; /* { dg-error {incompatible types when initializing type '__m128bf16' {aka '__vector\(8\) __bf16'} using type '__v8hi'} } */
> +
> +  __v8si initi_1_1 = glob_bfloat_vec;   /* { dg-error {incompatible types when initializing type '__v8si' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  __m256 initi_1_2 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  __m128h initi_1_3 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m128h' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  __m128 initi_1_4 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m128' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  __v4si initi_1_5 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v4si' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  __v4hi initi_1_6 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v4hi' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +
> +  __m128bf16 vector2_1 = {};
> +  __m128bf16 vector2_2 = { glob_bfloat };
> +  __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> +  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +
> +  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +
> +  /* Assignments to/from vectors.  */
> +
> +  glob_bfloat_vec = glob_bfloat_vec;
> +  glob_bfloat_vec = 0;   /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type 'int'} } */
> +  glob_bfloat_vec = 0.1; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type 'double'} } */
> +  glob_bfloat_vec = is_a_float_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m256'} } */
> +  glob_bfloat_vec = is_an_int_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v8si'} } */
> +  glob_bfloat_vec = is_a_float16_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m128h'} } */
> +  glob_bfloat_vec = is_a_float_pair; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__m128'} } */
> +  glob_bfloat_vec = is_an_int_pair; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v4si'} } */
> +  glob_bfloat_vec = is_a_short_vec; /* { dg-error {incompatible types when assigning to type '__m128bf16' {aka '__vector\(8\) __bf16'} from type '__v8hi'} } */
> +
> +  is_an_int_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v8si' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  is_a_float_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  is_a_float16_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m128h' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  is_a_float_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m128' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  is_an_int_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v4si' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  is_a_short_vec = glob_bfloat_vec;/* { dg-error {incompatible types when assigning to type '__v8hi' from type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +
> +  /* Assignments to/from elements.  */
> +
> +  vector2_3[0] = glob_bfloat;
> +  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +
> +  glob_bfloat = vector2_3[0];
> +  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +
> +  /* Compound literals.  */
> +
> +  (__m128bf16) {};
> +
> +  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
> +  (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
> +  (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
> +  (__m128bf16) { is_an_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v4si'} } */
> +  (__m128bf16) { is_a_float16_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128h'} } */
> +  (__m128bf16) { is_a_short_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8hi'} } */
> +
> +  (__m128bf16) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  (__v8si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  (__m256) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'float' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  (__v4si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  (__m256h) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '_Float16' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +  (__v8hi) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'short int' using type '__m128bf16' {aka '__vector\(8\) __bf16'}} } */
> +
> +  /* Casting.  */
> +
> +  (void) glob_bfloat_vec;
> +  (__m128bf16) glob_bfloat_vec;
> +
> +  (__bf16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (short) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m128bf16' {aka '__vector\(8\) __bf16'} to type 'short int' which has different size} } */
> +  (int) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m128bf16' {aka '__vector\(8\) __bf16'} to type 'int' which has different size} } */
> +  (_Float16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (float) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (double) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +
> +  (__v8si) glob_bfloat_vec; /* { dg-error {cannot convert a value of type '__m128bf16' {aka '__vector\(8\) __bf16'} to vector type '__vector\(8\) int' which has different size} } */
> +  (__m256) glob_bfloat_vec; /* { dg-error {cannot convert a value of type '__m128bf16' {aka '__vector\(8\) __bf16'} to vector type '__vector\(8\) float' which has different size} } */
> +  (__m128h) glob_bfloat_vec;
> +  (__v4si) glob_bfloat_vec;
> +  (__m128) glob_bfloat_vec;
> +  (__v8hi) glob_bfloat_vec;
> +
> +  (__m128bf16) is_an_int_vec; /* { dg-error {cannot convert a value of type '__v8si' to vector type '__vector\(8\) __bf16' which has different size} } */
> +  (__m128bf16) is_a_float_vec; /* { dg-error {cannot convert a value of type '__m256' to vector type '__vector\(8\) __bf16' which has different size} } */
> +  (__m128bf16) is_a_float16_vec;
> +  (__m128bf16) is_an_int_pair;
> +  (__m128bf16) is_a_float_pair;
> +  (__m128bf16) is_a_short_vec;
> +  (__m128bf16) is_a_double; /* { dg-error {cannot convert value to a vector} } */
> +
> +  /* Arrays and Structs.  */
> +
> +  typedef __m128bf16 array_type[2];
> +  extern __m128bf16 extern_array[];
> +
> +  __m128bf16 array[2];
> +  __m128bf16 zero_length_array[0];
> +  __m128bf16 empty_init_array[] = {};
> +  typedef __m128bf16 some_other_type[is_an_int];
> +
> +  struct struct1 {
> +    __m128bf16 a;
> +  };
> +
> +  union union1 {
> +    __m128bf16 a;
> +  };
> +
> +  /* Addressing and dereferencing.  */
> +
> +  __m128bf16 *bfloat_ptr = &vector0;
> +  vector0 = *bfloat_ptr;
> +
> +  /* Pointer assignment.  */
> +
> +  __m128bf16 *bfloat_ptr2 = bfloat_ptr;
> +  __m128bf16 *bfloat_ptr3 = array;
> +
> +  /* Pointer arithmetic.  */
> +
> +  ++bfloat_ptr;
> +  --bfloat_ptr;
> +  bfloat_ptr++;
> +  bfloat_ptr--;
> +  bfloat_ptr += 1;
> +  bfloat_ptr -= 1;
> +  bfloat_ptr - bfloat_ptr2;
> +  bfloat_ptr = &bfloat_ptr3[0];
> +  bfloat_ptr = &bfloat_ptr3[1];
> +
> +  /* Simple comparison.  */
> +  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  /* Pointer comparison.  */
> +
> +  bfloat_ptr == &vector0;
> +  bfloat_ptr != &vector0;
> +  bfloat_ptr < &vector0;
> +  bfloat_ptr <= &vector0;
> +  bfloat_ptr > &vector0;
> +  bfloat_ptr >= &vector0;
> +  bfloat_ptr == bfloat_ptr2;
> +  bfloat_ptr != bfloat_ptr2;
> +  bfloat_ptr < bfloat_ptr2;
> +  bfloat_ptr <= bfloat_ptr2;
> +  bfloat_ptr > bfloat_ptr2;
> +  bfloat_ptr >= bfloat_ptr2;
> +
> +  /* Conditional expressions.  */
> +
> +  0 ? vector0 : vector0;
> +  0 ? vector0 : is_a_float_vec; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? is_a_float_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : is_a_float16_vec; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? is_a_float16_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : 0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? 0 : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? 0.1 : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : 0.1; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? bfloat_ptr : bfloat_ptr2;
> +  0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> +  0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> +
> +  vector0 ? vector0 : vector0; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? is_a_float16_vec : vector0; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? vector0 : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? is_a_float16_vec : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
> +
> +  /* Unary operators.  */
> +
> +  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> +  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  /* Binary arithmetic operations.  */
> +
> +  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  return vector0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c
> new file mode 100644
> index 00000000000..f63b41d832b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c
> @@ -0,0 +1,248 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -O2" } */
> +
> +#include <immintrin.h>
> +
> +typedef __bf16 __v16bf __attribute__ ((__vector_size__ (32)));
> +typedef __bf16 __m256bf16 __attribute__ ((__vector_size__ (32), __may_alias__));
> +
> +__bf16 glob_bfloat;
> +__m256bf16 glob_bfloat_vec;
> +
> +__m256 is_a_float_vec;
> +
> +__m256h *float_ptr;
> +__m256h is_a_float16_vec;
> +
> +__v8si is_an_int_vec;
> +__m256i is_a_long_int_pair;
> +__v16hi is_a_short_vec;
> +
> +int is_an_int;
> +short is_a_short_int;
> +float is_a_float;
> +float is_a_float16;
> +double is_a_double;
> +
> +__m256bf16 footest (__m256bf16 vector0)
> +{
> +  /* Initialisation  */
> +
> +  __m256bf16 vector1_1;
> +  __m256bf16 vector1_2 = glob_bfloat_vec;
> +  __m256bf16 vector1_3 = is_a_float_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256'} } */
> +  __m256bf16 vector1_4 = is_an_int_vec;  /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__v8si'} } */
> +  __m256bf16 vector1_5 = is_a_float16_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256h'} } */
> +  __m256bf16 vector1_7 = is_a_long_int_pair; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__m256i'} } */
> +  __m256bf16 vector1_8 = is_a_short_vec; /* { dg-error {incompatible types when initializing type '__m256bf16' {aka '__vector\(16\) __bf16'} using type '__v16hi'} } */
> +
> +  __v8si initi_1_1 = glob_bfloat_vec;   /* { dg-error {incompatible types when initializing type '__v8si' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  __m256 initi_1_2 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  __m256h initi_1_3 = glob_bfloat_vec; /* { dg-error {incompatible types when initializing type '__m256h' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  __m256i initi_1_5 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__m256i' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  __v16hi initi_1_6 = glob_bfloat_vec;  /* { dg-error {incompatible types when initializing type '__v16hi' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +
> +  __m256bf16 vector2_1 = {};
> +  __m256bf16 vector2_2 = { glob_bfloat };
> +  __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> +  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
> +
> +  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +
> +  /* Assignments to/from vectors.  */
> +
> +  glob_bfloat_vec = glob_bfloat_vec;
> +  glob_bfloat_vec = 0;   /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type 'int'} } */
> +  glob_bfloat_vec = 0.1; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type 'double'} } */
> +  glob_bfloat_vec = is_a_float_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256'} } */
> +  glob_bfloat_vec = is_an_int_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__v8si'} } */
> +  glob_bfloat_vec = is_a_float16_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256h'} } */
> +  glob_bfloat_vec = is_a_long_int_pair; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__m256i'} } */
> +  glob_bfloat_vec = is_a_short_vec; /* { dg-error {incompatible types when assigning to type '__m256bf16' {aka '__vector\(16\) __bf16'} from type '__v16hi'} } */
> +
> +  is_an_int_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__v8si' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  is_a_float_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  is_a_float16_vec = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256h' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  is_a_long_int_pair = glob_bfloat_vec; /* { dg-error {incompatible types when assigning to type '__m256i' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  is_a_short_vec = glob_bfloat_vec;/* { dg-error {incompatible types when assigning to type '__v16hi' from type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +
> +  /* Assignments to/from elements.  */
> +
> +  vector2_3[0] = glob_bfloat;
> +  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +
> +  glob_bfloat = vector2_3[0];
> +  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +
> +  /* Compound literals.  */
> +
> +  (__m256bf16) {};
> +
> +  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
> +  (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
> +  (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
> +  (__m256bf16) { is_a_float16_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256h'} } */
> +  (__m256bf16) { is_a_short_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v16hi'} } */
> +
> +  (__m256bf16) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  (__v8si) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  (__m256) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'float' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  (__m256i) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'long long int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  (__m256h) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '_Float16' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +  (__v16hi) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type 'short int' using type '__m256bf16' {aka '__vector\(16\) __bf16'}} } */
> +
> +  /* Casting.  */
> +
> +  (void) glob_bfloat_vec;
> +  (__m256bf16) glob_bfloat_vec;
> +
> +  (__bf16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (short) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m256bf16' {aka '__vector\(16\) __bf16'} to type 'short int' which has different size} } */
> +  (int) glob_bfloat_vec; /* { dg-error {cannot convert a vector of type '__m256bf16' {aka '__vector\(16\) __bf16'} to type 'int' which has different size} } */
> +  (_Float16) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (float) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +  (double) glob_bfloat_vec; /* { dg-error {aggregate value used where a floating-point was expected} } */
> +
> +  (__v8si) glob_bfloat_vec;
> +  (__m256) glob_bfloat_vec;
> +  (__m256h) glob_bfloat_vec;
> +  (__m256i) glob_bfloat_vec;
> +  (__v16hi) glob_bfloat_vec;
> +
> +  (__m256bf16) is_an_int_vec;
> +  (__m256bf16) is_a_float_vec;
> +  (__m256bf16) is_a_float16_vec;
> +  (__m256bf16) is_a_long_int_pair;
> +  (__m256bf16) is_a_short_vec;
> +
> +  /* Arrays and Structs.  */
> +
> +  typedef __m256bf16 array_type[2];
> +  extern __m256bf16 extern_array[];
> +
> +  __m256bf16 array[2];
> +  __m256bf16 zero_length_array[0];
> +  __m256bf16 empty_init_array[] = {};
> +  typedef __m256bf16 some_other_type[is_an_int];
> +
> +  struct struct1 {
> +    __m256bf16 a;
> +  };
> +
> +  union union1 {
> +    __m256bf16 a;
> +  };
> +
> +  /* Addressing and dereferencing.  */
> +
> +  __m256bf16 *bfloat_ptr = &vector0;
> +  vector0 = *bfloat_ptr;
> +
> +  /* Pointer assignment.  */
> +
> +  __m256bf16 *bfloat_ptr2 = bfloat_ptr;
> +  __m256bf16 *bfloat_ptr3 = array;
> +
> +  /* Pointer arithmetic.  */
> +
> +  ++bfloat_ptr;
> +  --bfloat_ptr;
> +  bfloat_ptr++;
> +  bfloat_ptr--;
> +  bfloat_ptr += 1;
> +  bfloat_ptr -= 1;
> +  bfloat_ptr - bfloat_ptr2;
> +  bfloat_ptr = &bfloat_ptr3[0];
> +  bfloat_ptr = &bfloat_ptr3[1];
> +
> +  /* Simple comparison.  */
> +  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  /* Pointer comparison.  */
> +
> +  bfloat_ptr == &vector0;
> +  bfloat_ptr != &vector0;
> +  bfloat_ptr < &vector0;
> +  bfloat_ptr <= &vector0;
> +  bfloat_ptr > &vector0;
> +  bfloat_ptr >= &vector0;
> +  bfloat_ptr == bfloat_ptr2;
> +  bfloat_ptr != bfloat_ptr2;
> +  bfloat_ptr < bfloat_ptr2;
> +  bfloat_ptr <= bfloat_ptr2;
> +  bfloat_ptr > bfloat_ptr2;
> +  bfloat_ptr >= bfloat_ptr2;
> +
> +  /* Conditional expressions.  */
> +
> +  0 ? vector0 : vector0;
> +  0 ? vector0 : is_a_float_vec; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? is_a_float_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : is_a_float16_vec; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? is_a_float16_vec : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : 0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? 0 : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? 0.1 : vector0; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? vector0 : 0.1; /* { dg-error {type mismatch in conditional expression} } */
> +  0 ? bfloat_ptr : bfloat_ptr2;
> +  0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> +  0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> +
> +  vector0 ? vector0 : vector0; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? is_a_float16_vec : vector0; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? vector0 : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
> +  vector0 ? is_a_float16_vec : is_a_float16_vec; /* { dg-error {used vector type where scalar is required} } */
> +
> +  /* Unary operators.  */
> +
> +  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> +  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  /* Binary arithmetic operations.  */
> +
> +  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +
> +  return vector0;
> +}
> +
> --
> 2.18.2
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Add ABI test for __bf16 type
  2022-08-16  7:49 [PATCH] x86: Support vector __bf16 type Kong, Lingling
  2022-08-17  5:56 ` Hongtao Liu
@ 2022-08-18  7:34 ` Haochen Jiang
  2022-08-19  0:58   ` Hongtao Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Haochen Jiang @ 2022-08-18  7:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Hi all,

This patch aims to add bf16 abi test after the whole __bf16 type is added.

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Haochen

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
	* gcc.target/x86_64/abi/bf16/args.h: Ditto.
	* gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
	* gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
	* gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
	* gcc.target/x86_64/abi/bf16/defines.h: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
	* gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
	* gcc.target/x86_64/abi/bf16/macros.h: Ditto.
	* gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
	* gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.
---
 .../gcc.target/x86_64/abi/bf16/abi-bf16.exp   |  46 +++
 .../gcc.target/x86_64/abi/bf16/args.h         | 164 +++++++++
 .../gcc.target/x86_64/abi/bf16/asm-support.S  |  84 +++++
 .../gcc.target/x86_64/abi/bf16/bf16-check.h   |  24 ++
 .../gcc.target/x86_64/abi/bf16/bf16-helper.h  |  41 +++
 .../gcc.target/x86_64/abi/bf16/defines.h      | 163 +++++++++
 .../x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp |  46 +++
 .../x86_64/abi/bf16/m256bf16/args.h           | 152 +++++++++
 .../x86_64/abi/bf16/m256bf16/asm-support.S    |  84 +++++
 .../x86_64/abi/bf16/m256bf16/bf16-ymm-check.h |  24 ++
 .../abi/bf16/m256bf16/test_m256_returning.c   |  38 +++
 .../abi/bf16/m256bf16/test_passing_m256.c     | 235 +++++++++++++
 .../abi/bf16/m256bf16/test_passing_structs.c  |  69 ++++
 .../abi/bf16/m256bf16/test_passing_unions.c   | 179 ++++++++++
 .../abi/bf16/m256bf16/test_varargs-m256.c     | 107 ++++++
 .../x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp |  46 +++
 .../x86_64/abi/bf16/m512bf16/args.h           | 155 +++++++++
 .../x86_64/abi/bf16/m512bf16/asm-support.S    | 100 ++++++
 .../x86_64/abi/bf16/m512bf16/bf16-zmm-check.h |  23 ++
 .../abi/bf16/m512bf16/test_m512_returning.c   |  44 +++
 .../abi/bf16/m512bf16/test_passing_m512.c     | 243 ++++++++++++++
 .../abi/bf16/m512bf16/test_passing_structs.c  |  77 +++++
 .../abi/bf16/m512bf16/test_passing_unions.c   | 222 +++++++++++++
 .../abi/bf16/m512bf16/test_varargs-m512.c     | 111 +++++++
 .../gcc.target/x86_64/abi/bf16/macros.h       |  53 +++
 .../bf16/test_3_element_struct_and_unions.c   | 214 ++++++++++++
 .../x86_64/abi/bf16/test_basic_alignment.c    |  14 +
 .../bf16/test_basic_array_size_and_align.c    |  13 +
 .../x86_64/abi/bf16/test_basic_returning.c    |  20 ++
 .../x86_64/abi/bf16/test_basic_sizes.c        |  14 +
 .../bf16/test_basic_struct_size_and_align.c   |  14 +
 .../bf16/test_basic_union_size_and_align.c    |  12 +
 .../x86_64/abi/bf16/test_m128_returning.c     |  38 +++
 .../x86_64/abi/bf16/test_passing_floats.c     | 312 ++++++++++++++++++
 .../x86_64/abi/bf16/test_passing_m128.c       | 238 +++++++++++++
 .../x86_64/abi/bf16/test_passing_structs.c    |  67 ++++
 .../x86_64/abi/bf16/test_passing_unions.c     | 160 +++++++++
 .../x86_64/abi/bf16/test_struct_returning.c   | 176 ++++++++++
 .../x86_64/abi/bf16/test_varargs-m128.c       | 111 +++++++
 39 files changed, 3933 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
new file mode 100644
index 00000000000..bd386f2a560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
@@ -0,0 +1,46 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || ![is-effective-target lp64]
+     || ![is-effective-target sse2] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -msse2"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+        c-torture-execute [list $src \
+                                $srcdir/$subdir/asm-support.S] \
+                                $additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
new file mode 100644
index 00000000000..11d7e2b3a1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
@@ -0,0 +1,164 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <string.h>
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 xmm0
+#define F1 xmm1
+#define F2 xmm2
+#define F3 xmm3
+#define F4 xmm4
+#define F5 xmm5
+#define F6 xmm6
+#define F7 xmm7
+
+typedef union {
+  __bf16 ___bf16[8];
+  float _float[4];
+  double _double[2];
+  long long _longlong[2];
+  int _int[4];
+  ulonglong _ulonglong[2];
+#ifdef CHECK_M64_M128
+  __m64 _m64[2];
+  __m128 _m128[1];
+  __m128bf16 _m128bf16[1];
+#endif
+} XMM_T;
+
+typedef union {
+  __bf16 ___bf16;
+  float _float;
+  double _double;
+  ldouble _ldouble;
+  ulonglong _ulonglong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+XMM_T xmm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  ldouble st0, st1, st2, st3, st4, st5, st6, st7;
+  XMM_T xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9,
+        xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (xmm_regs, 0, sizeof (xmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || check_bf16 (fregs.xmm0._ ## T [0], xmm_regs[0]._ ## T [0]) == 1); \
+  assert (num_fregs <= 1 || check_bf16 (fregs.xmm1._ ## T [0], xmm_regs[1]._ ## T [0]) == 1); \
+  assert (num_fregs <= 2 || check_bf16 (fregs.xmm2._ ## T [0], xmm_regs[2]._ ## T [0]) == 1); \
+  assert (num_fregs <= 3 || check_bf16 (fregs.xmm3._ ## T [0], xmm_regs[3]._ ## T [0]) == 1); \
+  assert (num_fregs <= 4 || check_bf16 (fregs.xmm4._ ## T [0], xmm_regs[4]._ ## T [0]) == 1); \
+  assert (num_fregs <= 5 || check_bf16 (fregs.xmm5._ ## T [0], xmm_regs[5]._ ## T [0]) == 1); \
+  assert (num_fregs <= 6 || check_bf16 (fregs.xmm6._ ## T [0], xmm_regs[6]._ ## T [0]) == 1); \
+  assert (num_fregs <= 7 || check_bf16 (fregs.xmm7._ ## T [0], xmm_regs[7]._ ## T [0]) == 1); \
+  } while (0)
+
+#define check_bf16_arguments check_f_arguments(__bf16)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.xmm0) + (O), \
+		     &xmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.xmm1) + (O), \
+		     &xmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.xmm2) + (O), \
+		     &xmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.xmm3) + (O), \
+		     &xmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.xmm4) + (O), \
+		     &xmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.xmm5) + (O), \
+		     &xmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.xmm6) + (O), \
+		     &xmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.xmm7) + (O), \
+		     &xmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m128_arguments check_vector_arguments(m128, 0)
+
+#define clear_float_registers \
+  clear_struct_registers
+
+#define clear_x87_registers \
+  clear_struct_registers
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
new file mode 100644
index 00000000000..a8165d86317
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
@@ -0,0 +1,84 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	vmovdqu	%xmm2, xmm_regs+32(%rip)
+	vmovdqu	%xmm3, xmm_regs+48(%rip)
+	vmovdqu	%xmm4, xmm_regs+64(%rip)
+	vmovdqu	%xmm5, xmm_regs+80(%rip)
+	vmovdqu	%xmm6, xmm_regs+96(%rip)
+	vmovdqu	%xmm7, xmm_regs+112(%rip)
+	vmovdqu	%xmm8, xmm_regs+128(%rip)
+	vmovdqu	%xmm9, xmm_regs+144(%rip)
+	vmovdqu	%xmm10, xmm_regs+160(%rip)
+	vmovdqu	%xmm11, xmm_regs+176(%rip)
+	vmovdqu	%xmm12, xmm_regs+192(%rip)
+	vmovdqu	%xmm13, xmm_regs+208(%rip)
+	vmovdqu	%xmm14, xmm_regs+224(%rip)
+	vmovdqu	%xmm15, xmm_regs+240(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	xmm_regs,256,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
+#ifdef __linux__
+	.section	.note.GNU-stack,"",@progbits
+#endif
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
new file mode 100644
index 00000000000..25448fc6863
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
@@ -0,0 +1,24 @@
+#include <stdlib.h>
+#include "bf16-helper.h"
+
+static void do_test (void);
+
+int
+main ()
+{
+
+  if (__builtin_cpu_supports ("sse2"))
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+      return 0;
+    }
+
+#ifdef DEBUG
+  printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
new file mode 100644
index 00000000000..83d89fcf62c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
@@ -0,0 +1,41 @@
+typedef union
+{
+  float f;
+  unsigned int u;
+  __bf16 b[2];
+} unionf_b;
+
+static __bf16 make_f32_bf16 (float f)
+{
+  unionf_b tmp;
+  tmp.f = f;
+  return tmp.b[1];
+}
+
+static float make_bf16_f32 (__bf16 bf)
+{
+  unionf_b tmp;
+  tmp.u = 0;
+  tmp.b[1] = bf;
+  return tmp.f;
+}
+
+static int check_bf16 (__bf16 bf1, __bf16 bf2)
+{
+  unionf_b tmp1, tmp2;
+  tmp1.u = 0;
+  tmp2.u = 0;
+  tmp1.b[1] = bf1;
+  tmp2.b[1] = bf2;
+  return (tmp1.u == tmp2.u);
+}
+
+static int check_bf16_float (__bf16 bf, float f)
+{
+  unionf_b tmp1, tmp2;
+  tmp1.u = 0;
+  tmp1.b[0] = bf;
+  tmp2.f = f;
+  tmp2.u >>= 16;
+  return (tmp1.u == tmp2.u);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
new file mode 100644
index 00000000000..a4df0b0528d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
@@ -0,0 +1,163 @@
+#ifndef DEFINED_DEFINES_H
+#define DEFINED_DEFINES_H
+
+/* Get __m64 and __m128. */
+#include <immintrin.h>
+
+typedef unsigned long long ulonglong;
+typedef long double ldouble;
+
+/* These defines determines what part of the test should be run.  When
+   GCC implements these parts, the defines should be uncommented to
+   enable testing.  */
+
+/* Scalar type __int128.  */
+/* #define CHECK_INT128 */
+
+/* Scalar type long double.  */
+#define CHECK_LONG_DOUBLE
+
+/* Scalar type __float128.  */
+/* #define CHECK_FLOAT128 */
+
+/* Scalar types __m64 and __m128.  */
+#define CHECK_M64_M128
+
+/* Structs with size >= 16.  */
+#define CHECK_LARGER_STRUCTS
+
+/* Checks for passing floats and doubles.  */
+#define CHECK_FLOAT_DOUBLE_PASSING
+
+/* Union passing with not-extremely-simple unions.  */
+#define CHECK_LARGER_UNION_PASSING
+
+/* Variable args.  */
+#define CHECK_VARARGS
+
+/* Check argument passing and returning for scalar types with sizeof = 16.  */
+/* TODO: Implement these tests. Don't activate them for now.  */
+#define CHECK_LARGE_SCALAR_PASSING
+
+/* Defines for sizing and alignment.  */
+
+#define TYPE_SIZE_CHAR         1
+#define TYPE_SIZE_SHORT        2
+#define TYPE_SIZE_INT          4
+#ifdef __ILP32__
+# define TYPE_SIZE_LONG        4
+#else
+# define TYPE_SIZE_LONG        8
+#endif
+#define TYPE_SIZE_LONG_LONG    8
+#define TYPE_SIZE_INT128       16
+#define TYPE_SIZE_BF16	       2
+#define TYPE_SIZE_FLOAT        4
+#define TYPE_SIZE_DOUBLE       8
+#define TYPE_SIZE_LONG_DOUBLE  16
+#define TYPE_SIZE_FLOAT128     16
+#define TYPE_SIZE_M64          8
+#define TYPE_SIZE_M128         16
+#define TYPE_SIZE_ENUM         4
+#ifdef __ILP32__
+# define TYPE_SIZE_POINTER     4
+#else
+# define TYPE_SIZE_POINTER     8
+#endif
+
+#define TYPE_ALIGN_CHAR        1
+#define TYPE_ALIGN_SHORT       2
+#define TYPE_ALIGN_INT         4
+#ifdef __ILP32__
+# define TYPE_ALIGN_LONG       4
+#else
+# define TYPE_ALIGN_LONG       8
+#endif
+#define TYPE_ALIGN_LONG_LONG   8
+#define TYPE_ALIGN_INT128      16
+#define TYPE_ALIGN_BF16	       2
+#define TYPE_ALIGN_FLOAT       4
+#define TYPE_ALIGN_DOUBLE      8
+#define TYPE_ALIGN_LONG_DOUBLE 16
+#define TYPE_ALIGN_FLOAT128    16
+#define TYPE_ALIGN_M64         8
+#define TYPE_ALIGN_M128        16
+#define TYPE_ALIGN_ENUM        4
+#ifdef __ILP32__
+# define TYPE_ALIGN_POINTER    4
+#else
+# define TYPE_ALIGN_POINTER    8
+#endif
+
+/* These defines control the building of the list of types to check. There
+   is a string identifying the type (with a comma after), a size of the type
+   (also with a comma and an integer for adding to the total amount of types)
+   and an alignment of the type (which is currently not really needed since
+   the abi specifies that alignof == sizeof for all scalar types).  */
+#ifdef CHECK_INT128
+#define CI128_STR "__int128",
+#define CI128_SIZ TYPE_SIZE_INT128,
+#define CI128_ALI TYPE_ALIGN_INT128,
+#define CI128_RET "???",
+#else
+#define CI128_STR
+#define CI128_SIZ
+#define CI128_ALI
+#define CI128_RET
+#endif
+#ifdef CHECK_LONG_DOUBLE
+#define CLD_STR "long double",
+#define CLD_SIZ TYPE_SIZE_LONG_DOUBLE,
+#define CLD_ALI TYPE_ALIGN_LONG_DOUBLE,
+#define CLD_RET "x87_regs[0]._ldouble",
+#else
+#define CLD_STR
+#define CLD_SIZ
+#define CLD_ALI
+#define CLD_RET
+#endif
+#ifdef CHECK_FLOAT128
+#define CF128_STR "__float128",
+#define CF128_SIZ TYPE_SIZE_FLOAT128,
+#define CF128_ALI TYPE_ALIGN_FLOAT128, 
+#define CF128_RET "???",
+#else
+#define CF128_STR
+#define CF128_SIZ
+#define CF128_ALI
+#define CF128_RET
+#endif
+#ifdef CHECK_M64_M128
+#define CMM_STR "__m64", "__m128",
+#define CMM_SIZ TYPE_SIZE_M64, TYPE_SIZE_M128,
+#define CMM_ALI TYPE_ALIGN_M64, TYPE_ALIGN_M128,
+#define CMM_RET "???", "???",
+#else
+#define CMM_STR
+#define CMM_SIZ
+#define CMM_ALI
+#define CMM_RET
+#endif
+
+/* Used in size and alignment tests.  */
+enum dummytype { enumtype };
+
+extern void abort (void);
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+#ifdef __GNUC__
+#define PACKED __attribute__((__packed__))
+#else
+#warning Some tests will fail due to missing __packed__ support
+#define PACKED
+#endif
+
+#endif /* DEFINED_DEFINES_H */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
new file mode 100644
index 00000000000..309db8ff12e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
@@ -0,0 +1,46 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || ![is-effective-target lp64]
+     || ![is-effective-target avx2] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -mavx2"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+        c-torture-execute [list $src \
+                                $srcdir/$subdir/asm-support.S] \
+                                $additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
new file mode 100644
index 00000000000..94627ffbd44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
@@ -0,0 +1,152 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 ymm0
+#define F1 ymm1
+#define F2 ymm2
+#define F3 ymm3
+#define F4 ymm4
+#define F5 ymm5
+#define F6 ymm6
+#define F7 ymm7
+
+typedef union {
+  __bf16 ___bf16[16];
+  float _float[8];
+  double _double[4];
+  long long _longlong[4];
+  int _int[8];
+  unsigned long long _ulonglong[4];
+  __m64 _m64[4];
+  __m128 _m128[2];
+  __m256 _m256[1];
+  __m256bf16 _m256bf16[1];
+} YMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long long _ulonglong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+YMM_T ymm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  YMM_T ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9,
+        ymm10, ymm11, ymm12, ymm13, ymm14, ymm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (ymm_regs, 0, sizeof (ymm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.ymm0) + (O), \
+		     &ymm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.ymm1) + (O), \
+		     &ymm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.ymm2) + (O), \
+		     &ymm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.ymm3) + (O), \
+		     &ymm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.ymm4) + (O), \
+		     &ymm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.ymm5) + (O), \
+		     &ymm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.ymm6) + (O), \
+		     &ymm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.ymm7) + (O), \
+		     &ymm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m256_arguments check_vector_arguments(m256, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
new file mode 100644
index 00000000000..24c8b3c9023
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
@@ -0,0 +1,84 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	vmovdqu	%ymm2, ymm_regs+64(%rip)
+	vmovdqu	%ymm3, ymm_regs+96(%rip)
+	vmovdqu	%ymm4, ymm_regs+128(%rip)
+	vmovdqu	%ymm5, ymm_regs+160(%rip)
+	vmovdqu	%ymm6, ymm_regs+192(%rip)
+	vmovdqu	%ymm7, ymm_regs+224(%rip)
+	vmovdqu	%ymm8, ymm_regs+256(%rip)
+	vmovdqu	%ymm9, ymm_regs+288(%rip)
+	vmovdqu	%ymm10, ymm_regs+320(%rip)
+	vmovdqu	%ymm11, ymm_regs+352(%rip)
+	vmovdqu	%ymm12, ymm_regs+384(%rip)
+	vmovdqu	%ymm13, ymm_regs+416(%rip)
+	vmovdqu	%ymm14, ymm_regs+448(%rip)
+	vmovdqu	%ymm15, ymm_regs+480(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	ymm_regs,512,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
+#ifdef __linux__
+	.section	.note.GNU-stack,"",@progbits
+#endif
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
new file mode 100644
index 00000000000..479ebc3ec3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
@@ -0,0 +1,24 @@
+#include <stdlib.h>
+#include "../bf16-helper.h"
+
+static void do_test (void);
+
+int
+main ()
+{
+
+  if (__builtin_cpu_supports ("avx2"))
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+      return 0;
+    }
+
+#ifdef DEBUG
+  printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
new file mode 100644
index 00000000000..ea7512850ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
@@ -0,0 +1,38 @@
+#include <stdio.h>
+#include "bf16-ymm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+
+__m256bf16
+fun_test_returning___m256bf16 (void)
+{
+  volatile_var++;
+  return (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
+}
+
+__m256bf16 test_256bf16;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  YMM_T ymmt1, ymmt2;
+
+  clear_struct_registers;
+  test_256bf16 = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
+  ymmt1._m256bf16[0] = test_256bf16;
+  ymmt2._m256bf16[0] = WRAP_RET (fun_test_returning___m256bf16) ();
+  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
+    printf ("fail m256bf16\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
new file mode 100644
index 00000000000..3fb2d7d20f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
@@ -0,0 +1,235 @@
+#include <stdio.h>
+#include "bf16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m256bf16_8_values (__m256bf16 i0 ATTRIBUTE_UNUSED,
+				     __m256bf16 i1 ATTRIBUTE_UNUSED,
+				     __m256bf16 i2 ATTRIBUTE_UNUSED,
+				     __m256bf16 i3 ATTRIBUTE_UNUSED,
+				     __m256bf16 i4 ATTRIBUTE_UNUSED,
+				     __m256bf16 i5 ATTRIBUTE_UNUSED,
+				     __m256bf16 i6 ATTRIBUTE_UNUSED,
+				     __m256bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256bf16);
+  compare (values.i1, i1, __m256bf16);
+  compare (values.i2, i2, __m256bf16);
+  compare (values.i3, i3, __m256bf16);
+  compare (values.i4, i4, __m256bf16);
+  compare (values.i5, i5, __m256bf16);
+  compare (values.i6, i6, __m256bf16);
+  compare (values.i7, i7, __m256bf16);
+}
+
+void
+fun_check_passing_m256bf16_8_regs (__m256bf16 i0 ATTRIBUTE_UNUSED,
+				   __m256bf16 i1 ATTRIBUTE_UNUSED,
+				   __m256bf16 i2 ATTRIBUTE_UNUSED,
+				   __m256bf16 i3 ATTRIBUTE_UNUSED,
+				   __m256bf16 i4 ATTRIBUTE_UNUSED,
+				   __m256bf16 i5 ATTRIBUTE_UNUSED,
+				   __m256bf16 i6 ATTRIBUTE_UNUSED,
+				   __m256bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256bf16_20_values (__m256bf16 i0 ATTRIBUTE_UNUSED,
+				      __m256bf16 i1 ATTRIBUTE_UNUSED,
+				      __m256bf16 i2 ATTRIBUTE_UNUSED,
+				      __m256bf16 i3 ATTRIBUTE_UNUSED,
+				      __m256bf16 i4 ATTRIBUTE_UNUSED,
+				      __m256bf16 i5 ATTRIBUTE_UNUSED,
+				      __m256bf16 i6 ATTRIBUTE_UNUSED,
+				      __m256bf16 i7 ATTRIBUTE_UNUSED,
+				      __m256bf16 i8 ATTRIBUTE_UNUSED,
+				      __m256bf16 i9 ATTRIBUTE_UNUSED,
+				      __m256bf16 i10 ATTRIBUTE_UNUSED,
+				      __m256bf16 i11 ATTRIBUTE_UNUSED,
+				      __m256bf16 i12 ATTRIBUTE_UNUSED,
+				      __m256bf16 i13 ATTRIBUTE_UNUSED,
+				      __m256bf16 i14 ATTRIBUTE_UNUSED,
+				      __m256bf16 i15 ATTRIBUTE_UNUSED,
+				      __m256bf16 i16 ATTRIBUTE_UNUSED,
+				      __m256bf16 i17 ATTRIBUTE_UNUSED,
+				      __m256bf16 i18 ATTRIBUTE_UNUSED,
+				      __m256bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256bf16);
+  compare (values.i1, i1, __m256bf16);
+  compare (values.i2, i2, __m256bf16);
+  compare (values.i3, i3, __m256bf16);
+  compare (values.i4, i4, __m256bf16);
+  compare (values.i5, i5, __m256bf16);
+  compare (values.i6, i6, __m256bf16);
+  compare (values.i7, i7, __m256bf16);
+  compare (values.i8, i8, __m256bf16);
+  compare (values.i9, i9, __m256bf16);
+  compare (values.i10, i10, __m256bf16);
+  compare (values.i11, i11, __m256bf16);
+  compare (values.i12, i12, __m256bf16);
+  compare (values.i13, i13, __m256bf16);
+  compare (values.i14, i14, __m256bf16);
+  compare (values.i15, i15, __m256bf16);
+  compare (values.i16, i16, __m256bf16);
+  compare (values.i17, i17, __m256bf16);
+  compare (values.i18, i18, __m256bf16);
+  compare (values.i19, i19, __m256bf16);
+}
+
+void
+fun_check_passing_m256bf16_20_regs (__m256bf16 i0 ATTRIBUTE_UNUSED,
+				    __m256bf16 i1 ATTRIBUTE_UNUSED,
+				    __m256bf16 i2 ATTRIBUTE_UNUSED,
+				    __m256bf16 i3 ATTRIBUTE_UNUSED,
+				    __m256bf16 i4 ATTRIBUTE_UNUSED,
+				    __m256bf16 i5 ATTRIBUTE_UNUSED,
+				    __m256bf16 i6 ATTRIBUTE_UNUSED,
+				    __m256bf16 i7 ATTRIBUTE_UNUSED,
+				    __m256bf16 i8 ATTRIBUTE_UNUSED,
+				    __m256bf16 i9 ATTRIBUTE_UNUSED,
+				    __m256bf16 i10 ATTRIBUTE_UNUSED,
+				    __m256bf16 i11 ATTRIBUTE_UNUSED,
+				    __m256bf16 i12 ATTRIBUTE_UNUSED,
+				    __m256bf16 i13 ATTRIBUTE_UNUSED,
+				    __m256bf16 i14 ATTRIBUTE_UNUSED,
+				    __m256bf16 i15 ATTRIBUTE_UNUSED,
+				    __m256bf16 i16 ATTRIBUTE_UNUSED,
+				    __m256bf16 i17 ATTRIBUTE_UNUSED,
+				    __m256bf16 i18 ATTRIBUTE_UNUSED,
+				    __m256bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, \
+			    _i8, _i9, _i10, _i11, _i12, _i13, _i14, \
+			    _i15, _i16, _i17, _i18, _i19, _func1, \
+			    _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19);
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+
+void
+test_m256bf16_on_stack ()
+{
+  __m256bf16 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
+  pass = "m256bf16-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m256bf16_8_values,
+		      fun_check_passing_m256bf16_8_regs, _m256bf16);
+}
+
+void
+test_too_many_m256bf16 ()
+{
+  __m256bf16 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
+  pass = "m256bf16-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m256bf16_20_values,
+		       fun_check_passing_m256bf16_20_regs, _m256bf16);
+}
+
+static void
+do_test (void)
+{
+  test_m256bf16_on_stack ();
+  test_too_many_m256bf16 ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
new file mode 100644
index 00000000000..e06350ed493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
@@ -0,0 +1,69 @@
+#include "bf16-ymm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+struct m256bf16_struct
+{
+  __m256bf16 x;
+};
+
+struct m256bf16_2_struct
+{
+  __m256bf16 x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1bf16 (struct m256bf16_struct ms1 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms2 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms3 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms4 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms5 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms6 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms7 ATTRIBUTE_UNUSED,
+			   struct m256bf16_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_struct_passing2bf16 (struct m256bf16_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+40);
+}
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+
+static void
+do_test (void)
+{
+  struct m256bf16_struct m256bf16s [8];
+  struct m256bf16_2_struct m256bf16_2s = { 
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16},
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16},
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m256bf16s[i].x = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256bf16[0] = m256bf16s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1bf16) (m256bf16s[0], m256bf16s[1], m256bf16s[2], m256bf16s[3],
+					 m256bf16s[4], m256bf16s[5], m256bf16s[6], m256bf16s[7]);
+  WRAP_CALL (check_struct_passing2bf16) (m256bf16_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
new file mode 100644
index 00000000000..6d663b88b1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
@@ -0,0 +1,179 @@
+#include "bf16-ymm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+union un1b
+{
+  __m256bf16 x;
+  float f;
+};
+
+union un1bb
+{
+  __m256bf16 x;
+  __bf16 f;
+};
+
+union un2b
+{
+  __m256bf16 x;
+  double d;
+};
+
+union un3b
+{
+  __m256bf16 x;
+  __m128 v;
+};
+
+union un4b
+{
+  __m256bf16 x;
+  long double ld;
+};
+
+union un5b
+{
+  __m256bf16 x;
+  int i;
+};
+
+void
+check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
+		       union un1b u2 ATTRIBUTE_UNUSED,
+		       union un1b u3 ATTRIBUTE_UNUSED,
+		       union un1b u4 ATTRIBUTE_UNUSED,
+		       union un1b u5 ATTRIBUTE_UNUSED,
+		       union un1b u6 ATTRIBUTE_UNUSED,
+		       union un1b u7 ATTRIBUTE_UNUSED,
+		       union un1b u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
+		        union un1bb u2 ATTRIBUTE_UNUSED,
+		        union un1bb u3 ATTRIBUTE_UNUSED,
+		        union un1bb u4 ATTRIBUTE_UNUSED,
+		        union un1bb u5 ATTRIBUTE_UNUSED,
+		        union un1bb u6 ATTRIBUTE_UNUSED,
+		        union un1bb u7 ATTRIBUTE_UNUSED,
+		        union un1bb u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
+		       union un2b u2 ATTRIBUTE_UNUSED,
+		       union un2b u3 ATTRIBUTE_UNUSED,
+		       union un2b u4 ATTRIBUTE_UNUSED,
+		       union un2b u5 ATTRIBUTE_UNUSED,
+		       union un2b u6 ATTRIBUTE_UNUSED,
+		       union un2b u7 ATTRIBUTE_UNUSED,
+		       union un2b u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
+		       union un3b u2 ATTRIBUTE_UNUSED,
+		       union un3b u3 ATTRIBUTE_UNUSED,
+		       union un3b u4 ATTRIBUTE_UNUSED,
+		       union un3b u5 ATTRIBUTE_UNUSED,
+		       union un3b u6 ATTRIBUTE_UNUSED,
+		       union un3b u7 ATTRIBUTE_UNUSED,
+		       union un3b u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5b (union un5b u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+#define check_union_passing1b WRAP_CALL(check_union_passing1b)
+#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
+#define check_union_passing2b WRAP_CALL(check_union_passing2b)
+#define check_union_passing3b WRAP_CALL(check_union_passing3b)
+#define check_union_passing4b WRAP_CALL(check_union_passing4b)
+#define check_union_passing5b WRAP_CALL(check_union_passing5b)
+
+static void
+do_test (void)
+{
+  union un1b u1b[8];
+  union un1bb u1bb[8];
+  union un2b u2b[8];
+  union un3b u3b[8];
+  union un4b u4b;
+  union un5b u5b;
+  int i;
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+	 bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1b[i].x = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16 };
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256bf16[0] = u1b[i].x;
+  num_fregs = 8;
+  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
+		         u1b[4], u1b[5], u1b[6], u1b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1bb[i].x = u1b[i].x;
+      (&fregs.ymm0)[i]._m256bf16[0] = u1bb[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
+		          u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2b[i].x = u1b[i].x;
+      (&fregs.ymm0)[i]._m256bf16[0] = u2b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
+		         u2b[4], u2b[5], u2b[6], u2b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3b[i].x = u1b[i].x;
+      (&fregs.ymm0)[i]._m256bf16[0] = u3b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
+		         u3b[4], u3b[5], u3b[6], u3b[7]);
+
+  check_union_passing4b (u4b);
+  check_union_passing5b (u5b);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
new file mode 100644
index 00000000000..b69e095d808
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
@@ -0,0 +1,107 @@
+/* Test variable number of 256-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "bf16-ymm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m256bf16_varargs (__m256bf16 i0, __m256bf16 i1, __m256bf16 i2,
+				 __m256bf16 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256bf16 *argp;
+
+  compare (values.i0, i0, __m256bf16);
+  compare (values.i1, i1, __m256bf16);
+  compare (values.i2, i2, __m256bf16);
+  compare (values.i3, i3, __m256bf16);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m256bf16 *)(((char *) fp) + 8);
+
+  /* Check __m256bf16 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m256bf16);
+  compare (values.i5, argp[1], __m256bf16);
+  compare (values.i6, argp[2], __m256bf16);
+  compare (values.i7, argp[3], __m256bf16);
+  compare (values.i8, argp[4], __m256bf16);
+  compare (values.i9, argp[5], __m256bf16);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256bf16);
+  compare (fregs.ymm1, ymm_regs[1], __m256bf16);
+  compare (fregs.ymm2, ymm_regs[2], __m256bf16);
+  compare (fregs.ymm3, ymm_regs[3], __m256bf16);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m256bf16_varargs (void)
+{
+  __m256bf16 x[10];
+  int i;
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16 };
+  pass = "m256bf16-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m256bf16_varargs,
+				 _m256bf16);
+}
+
+void
+do_test (void)
+{
+  test_m256bf16_varargs ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
new file mode 100644
index 00000000000..b6e0fed4cb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
@@ -0,0 +1,46 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || ![is-effective-target lp64]
+     || ![is-effective-target avx512f] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -mavx512f"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+        c-torture-execute [list $src \
+                                $srcdir/$subdir/asm-support.S] \
+                                $additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
new file mode 100644
index 00000000000..64b24783833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
@@ -0,0 +1,155 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 zmm0
+#define F1 zmm1
+#define F2 zmm2
+#define F3 zmm3
+#define F4 zmm4
+#define F5 zmm5
+#define F6 zmm6
+#define F7 zmm7
+
+typedef union {
+  __bf16 ___bf16[32];
+  float _float[16];
+  double _double[8];
+  long long _longlong[8];
+  int _int[16];
+  unsigned long long _ulonglong[8];
+  __m64 _m64[8];
+  __m128 _m128[4];
+  __m256 _m256[2];
+  __m512 _m512[1];
+  __m512bf16 _m512bf16[1];
+} ZMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long long _ulonglong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+ZMM_T zmm_regs[32];
+X87_T x87_regs[8];
+extern volatile unsigned long long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  ZMM_T zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7, zmm8, zmm9,
+        zmm10, zmm11, zmm12, zmm13, zmm14, zmm15, zmm16, zmm17, zmm18,
+	zmm19, zmm20, zmm21, zmm22, zmm23, zmm24, zmm25, zmm26, zmm27,
+	zmm28, zmm29, zmm30, zmm31;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (zmm_regs, 0, sizeof (zmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.zmm0) + (O), \
+		     &zmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.zmm1) + (O), \
+		     &zmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.zmm2) + (O), \
+		     &zmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.zmm3) + (O), \
+		     &zmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.zmm4) + (O), \
+		     &zmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.zmm5) + (O), \
+		     &zmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.zmm6) + (O), \
+		     &zmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.zmm7) + (O), \
+		     &zmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m512_arguments check_vector_arguments(m512, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
new file mode 100644
index 00000000000..86d54d11c58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
@@ -0,0 +1,100 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu32 %zmm0, zmm_regs+0(%rip)
+	vmovdqu32 %zmm1, zmm_regs+64(%rip)
+	vmovdqu32 %zmm2, zmm_regs+128(%rip)
+	vmovdqu32 %zmm3, zmm_regs+192(%rip)
+	vmovdqu32 %zmm4, zmm_regs+256(%rip)
+	vmovdqu32 %zmm5, zmm_regs+320(%rip)
+	vmovdqu32 %zmm6, zmm_regs+384(%rip)
+	vmovdqu32 %zmm7, zmm_regs+448(%rip)
+	vmovdqu32 %zmm8, zmm_regs+512(%rip)
+	vmovdqu32 %zmm9, zmm_regs+576(%rip)
+	vmovdqu32 %zmm10, zmm_regs+640(%rip)
+	vmovdqu32 %zmm11, zmm_regs+704(%rip)
+	vmovdqu32 %zmm12, zmm_regs+768(%rip)
+	vmovdqu32 %zmm13, zmm_regs+832(%rip)
+	vmovdqu32 %zmm14, zmm_regs+896(%rip)
+	vmovdqu32 %zmm15, zmm_regs+960(%rip)
+	vmovdqu32 %zmm16, zmm_regs+1024(%rip)
+	vmovdqu32 %zmm17, zmm_regs+1088(%rip)
+	vmovdqu32 %zmm18, zmm_regs+1152(%rip)
+	vmovdqu32 %zmm19, zmm_regs+1216(%rip)
+	vmovdqu32 %zmm20, zmm_regs+1280(%rip)
+	vmovdqu32 %zmm21, zmm_regs+1344(%rip)
+	vmovdqu32 %zmm22, zmm_regs+1408(%rip)
+	vmovdqu32 %zmm23, zmm_regs+1472(%rip)
+	vmovdqu32 %zmm24, zmm_regs+1536(%rip)
+	vmovdqu32 %zmm25, zmm_regs+1600(%rip)
+	vmovdqu32 %zmm26, zmm_regs+1664(%rip)
+	vmovdqu32 %zmm27, zmm_regs+1728(%rip)
+	vmovdqu32 %zmm28, zmm_regs+1792(%rip)
+	vmovdqu32 %zmm29, zmm_regs+1856(%rip)
+	vmovdqu32 %zmm30, zmm_regs+1920(%rip)
+	vmovdqu32 %zmm31, zmm_regs+1984(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu32	%zmm0, zmm_regs+0(%rip)
+	vmovdqu32	%zmm1, zmm_regs+64(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	zmm_regs,2048,64
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
+#ifdef __linux__
+	.section	.note.GNU-stack,"",@progbits
+#endif
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
new file mode 100644
index 00000000000..8379fcfaf8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
@@ -0,0 +1,23 @@
+#include <stdlib.h>
+
+static void do_test (void);
+
+int
+main ()
+{
+
+  if (__builtin_cpu_supports ("avx512f"))
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+      return 0;
+    }
+
+#ifdef DEBUG
+  printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
new file mode 100644
index 00000000000..1a2500bd883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
@@ -0,0 +1,44 @@
+#include <stdio.h>
+#include "bf16-zmm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+		bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+		bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
+
+__m512bf16
+fun_test_returning___m512bf16 (void)
+{
+  volatile_var++;
+  return (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+			bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+			bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+}
+
+__m512bf16 test_512bf16;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  ZMM_T zmmt1, zmmt2;
+
+  clear_struct_registers;
+  test_512bf16 = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+				bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+				bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+  zmmt1._m512bf16[0] = test_512bf16;
+  zmmt2._m512bf16[0] = WRAP_RET (fun_test_returning___m512bf16)();
+  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
+    printf ("fail m512bf16\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
new file mode 100644
index 00000000000..1c5c407efee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
@@ -0,0 +1,243 @@
+#include <stdio.h>
+#include "bf16-zmm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m512bf16_8_values (__m512bf16 i0 ATTRIBUTE_UNUSED,
+				     __m512bf16 i1 ATTRIBUTE_UNUSED,
+				     __m512bf16 i2 ATTRIBUTE_UNUSED,
+				     __m512bf16 i3 ATTRIBUTE_UNUSED,
+				     __m512bf16 i4 ATTRIBUTE_UNUSED,
+				     __m512bf16 i5 ATTRIBUTE_UNUSED,
+				     __m512bf16 i6 ATTRIBUTE_UNUSED,
+				     __m512bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512bf16);
+  compare (values.i1, i1, __m512bf16);
+  compare (values.i2, i2, __m512bf16);
+  compare (values.i3, i3, __m512bf16);
+  compare (values.i4, i4, __m512bf16);
+  compare (values.i5, i5, __m512bf16);
+  compare (values.i6, i6, __m512bf16);
+  compare (values.i7, i7, __m512bf16);
+}
+
+void
+fun_check_passing_m512bf16_8_regs (__m512bf16 i0 ATTRIBUTE_UNUSED,
+				   __m512bf16 i1 ATTRIBUTE_UNUSED,
+				   __m512bf16 i2 ATTRIBUTE_UNUSED,
+				   __m512bf16 i3 ATTRIBUTE_UNUSED,
+				   __m512bf16 i4 ATTRIBUTE_UNUSED,
+				   __m512bf16 i5 ATTRIBUTE_UNUSED,
+				   __m512bf16 i6 ATTRIBUTE_UNUSED,
+				   __m512bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512bf16_20_values (__m512bf16 i0 ATTRIBUTE_UNUSED,
+				      __m512bf16 i1 ATTRIBUTE_UNUSED,
+				      __m512bf16 i2 ATTRIBUTE_UNUSED,
+				      __m512bf16 i3 ATTRIBUTE_UNUSED,
+				      __m512bf16 i4 ATTRIBUTE_UNUSED,
+				      __m512bf16 i5 ATTRIBUTE_UNUSED,
+				      __m512bf16 i6 ATTRIBUTE_UNUSED,
+				      __m512bf16 i7 ATTRIBUTE_UNUSED,
+				      __m512bf16 i8 ATTRIBUTE_UNUSED,
+				      __m512bf16 i9 ATTRIBUTE_UNUSED,
+				      __m512bf16 i10 ATTRIBUTE_UNUSED,
+				      __m512bf16 i11 ATTRIBUTE_UNUSED,
+				      __m512bf16 i12 ATTRIBUTE_UNUSED,
+				      __m512bf16 i13 ATTRIBUTE_UNUSED,
+				      __m512bf16 i14 ATTRIBUTE_UNUSED,
+				      __m512bf16 i15 ATTRIBUTE_UNUSED,
+				      __m512bf16 i16 ATTRIBUTE_UNUSED,
+				      __m512bf16 i17 ATTRIBUTE_UNUSED,
+				      __m512bf16 i18 ATTRIBUTE_UNUSED,
+				      __m512bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512bf16);
+  compare (values.i1, i1, __m512bf16);
+  compare (values.i2, i2, __m512bf16);
+  compare (values.i3, i3, __m512bf16);
+  compare (values.i4, i4, __m512bf16);
+  compare (values.i5, i5, __m512bf16);
+  compare (values.i6, i6, __m512bf16);
+  compare (values.i7, i7, __m512bf16);
+  compare (values.i8, i8, __m512bf16);
+  compare (values.i9, i9, __m512bf16);
+  compare (values.i10, i10, __m512bf16);
+  compare (values.i11, i11, __m512bf16);
+  compare (values.i12, i12, __m512bf16);
+  compare (values.i13, i13, __m512bf16);
+  compare (values.i14, i14, __m512bf16);
+  compare (values.i15, i15, __m512bf16);
+  compare (values.i16, i16, __m512bf16);
+  compare (values.i17, i17, __m512bf16);
+  compare (values.i18, i18, __m512bf16);
+  compare (values.i19, i19, __m512bf16);
+}
+
+void
+fun_check_passing_m512bf16_20_regs (__m512bf16 i0 ATTRIBUTE_UNUSED,
+				    __m512bf16 i1 ATTRIBUTE_UNUSED,
+				    __m512bf16 i2 ATTRIBUTE_UNUSED,
+				    __m512bf16 i3 ATTRIBUTE_UNUSED,
+				    __m512bf16 i4 ATTRIBUTE_UNUSED,
+				    __m512bf16 i5 ATTRIBUTE_UNUSED,
+				    __m512bf16 i6 ATTRIBUTE_UNUSED,
+				    __m512bf16 i7 ATTRIBUTE_UNUSED,
+				    __m512bf16 i8 ATTRIBUTE_UNUSED,
+				    __m512bf16 i9 ATTRIBUTE_UNUSED,
+				    __m512bf16 i10 ATTRIBUTE_UNUSED,
+				    __m512bf16 i11 ATTRIBUTE_UNUSED,
+				    __m512bf16 i12 ATTRIBUTE_UNUSED,
+				    __m512bf16 i13 ATTRIBUTE_UNUSED,
+				    __m512bf16 i14 ATTRIBUTE_UNUSED,
+				    __m512bf16 i15 ATTRIBUTE_UNUSED,
+				    __m512bf16 i16 ATTRIBUTE_UNUSED,
+				    __m512bf16 i17 ATTRIBUTE_UNUSED,
+				    __m512bf16 i18 ATTRIBUTE_UNUSED,
+				    __m512bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+			    _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+			    _i18, _i19, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19);
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+		bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+		bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
+
+void
+test_m512bf16_on_stack ()
+{
+  __m512bf16 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+			  bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+			  bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+
+  pass = "m512bf16-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m512bf16_8_values,
+		      fun_check_passing_m512bf16_8_regs, _m512bf16);
+}
+
+void
+test_too_many_m512bf16 ()
+{
+  __m512bf16 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+			  bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+			  bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+  pass = "m512bf16-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m512bf16_20_values,
+		       fun_check_passing_m512bf16_20_regs, _m512bf16);
+}
+
+static void
+do_test (void)
+{
+  test_m512bf16_on_stack ();
+  test_too_many_m512bf16 ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
new file mode 100644
index 00000000000..f93a2b81086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
@@ -0,0 +1,77 @@
+#include "bf16-zmm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+struct m512bf16_struct
+{
+  __m512bf16 x;
+};
+
+struct m512bf16_2_struct
+{
+  __m512bf16 x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1bf16 (struct m512bf16_struct ms1 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms2 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms3 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms4 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms5 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms6 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms7 ATTRIBUTE_UNUSED,
+			   struct m512bf16_struct ms8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_struct_passing2bf16 (struct m512bf16_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+72);
+}
+
+static void
+do_test (void)
+{
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+	 bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+	 bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+	 bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
+  struct m512bf16_struct m512bf16s [8];
+  struct m512bf16_2_struct m512bf16_2s = {
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+      bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+      bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 },
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+      bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+      bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 }
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m512bf16s[i].x = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+				      bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+				      bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512bf16[0] = m512bf16s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1bf16) (m512bf16s[0], m512bf16s[1], m512bf16s[2], m512bf16s[3],
+					 m512bf16s[4], m512bf16s[5], m512bf16s[6], m512bf16s[7]);
+  WRAP_CALL (check_struct_passing2bf16) (m512bf16_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
new file mode 100644
index 00000000000..3769b38aeb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
@@ -0,0 +1,222 @@
+#include "bf16-zmm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+union un1b
+{
+  __m512bf16 x;
+  float f;
+};
+
+union un1bb
+{
+  __m512bf16 x;
+  __bf16 f;
+};
+
+union un2b
+{
+  __m512bf16 x;
+  double d;
+};
+
+union un3b
+{
+  __m512bf16 x;
+  __m128 v;
+};
+
+union un4b
+{
+  __m512bf16 x;
+  long double ld;
+};
+
+union un5b
+{
+  __m512bf16 x;
+  int i;
+};
+
+union un6b
+{
+  __m512bf16 x;
+  __m256 v;
+};
+
+void
+check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
+		       union un1b u2 ATTRIBUTE_UNUSED,
+		       union un1b u3 ATTRIBUTE_UNUSED,
+		       union un1b u4 ATTRIBUTE_UNUSED,
+		       union un1b u5 ATTRIBUTE_UNUSED,
+		       union un1b u6 ATTRIBUTE_UNUSED,
+		       union un1b u7 ATTRIBUTE_UNUSED,
+		       union un1b u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
+		        union un1bb u2 ATTRIBUTE_UNUSED,
+		        union un1bb u3 ATTRIBUTE_UNUSED,
+		        union un1bb u4 ATTRIBUTE_UNUSED,
+		        union un1bb u5 ATTRIBUTE_UNUSED,
+		        union un1bb u6 ATTRIBUTE_UNUSED,
+		        union un1bb u7 ATTRIBUTE_UNUSED,
+		        union un1bb u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+
+void
+check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
+		       union un2b u2 ATTRIBUTE_UNUSED,
+		       union un2b u3 ATTRIBUTE_UNUSED,
+		       union un2b u4 ATTRIBUTE_UNUSED,
+		       union un2b u5 ATTRIBUTE_UNUSED,
+		       union un2b u6 ATTRIBUTE_UNUSED,
+		       union un2b u7 ATTRIBUTE_UNUSED,
+		       union un2b u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
+		       union un3b u2 ATTRIBUTE_UNUSED,
+		       union un3b u3 ATTRIBUTE_UNUSED,
+		       union un3b u4 ATTRIBUTE_UNUSED,
+		       union un3b u5 ATTRIBUTE_UNUSED,
+		       union un3b u6 ATTRIBUTE_UNUSED,
+		       union un3b u7 ATTRIBUTE_UNUSED,
+		       union un3b u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5b (union un5b u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing6b (union un6b u1 ATTRIBUTE_UNUSED,
+		       union un6b u2 ATTRIBUTE_UNUSED,
+		       union un6b u3 ATTRIBUTE_UNUSED,
+		       union un6b u4 ATTRIBUTE_UNUSED,
+		       union un6b u5 ATTRIBUTE_UNUSED,
+		       union un6b u6 ATTRIBUTE_UNUSED,
+		       union un6b u7 ATTRIBUTE_UNUSED,
+		       union un6b u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define check_union_passing1b WRAP_CALL(check_union_passing1b)
+#define check_union_passing1bf WRAP_CALL(check_union_passing1bf)
+#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
+#define check_union_passing2b WRAP_CALL(check_union_passing2b)
+#define check_union_passing3b WRAP_CALL(check_union_passing3b)
+#define check_union_passing4b WRAP_CALL(check_union_passing4b)
+#define check_union_passing5b WRAP_CALL(check_union_passing5b)
+#define check_union_passing6b WRAP_CALL(check_union_passing6b)
+
+
+static void
+do_test (void)
+{
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+	 bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+	 bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+	 bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
+  union un1b u1b[8];
+  union un1bb u1bb[8];
+  union un2b u2b[8];
+  union un3b u3b[8];
+  union un4b u4b;
+  union un5b u5b;
+  union un6b u6b[8];
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1b[i].x =  (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+				 bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+				 bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+				 bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512bf16[0] = u1b[i].x;
+  num_fregs = 8;
+  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
+		         u1b[4], u1b[5], u1b[6], u1b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1bb[i].x = u1b[i].x;
+      (&fregs.zmm0)[i]._m512bf16[0] = u1bb[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
+		          u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2b[i].x = u1bb[i].x;
+      (&fregs.zmm0)[i]._m512bf16[0] = u2b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
+		         u2b[4], u2b[5], u2b[6], u2b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3b[i].x = u1b[i].x;
+      (&fregs.zmm0)[i]._m512bf16[0] = u3b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
+			 u3b[4], u3b[5], u3b[6], u3b[7]);
+
+  check_union_passing4b (u4b);
+  check_union_passing5b (u5b);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u6b[i].x = u1b[i].x;
+      (&fregs.zmm0)[i]._m512bf16[0] = u6b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing6b (u6b[0], u6b[1], u6b[2], u6b[3],
+			 u6b[4], u6b[5], u6b[6], u6b[7]);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
new file mode 100644
index 00000000000..2be57b8b5fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
@@ -0,0 +1,111 @@
+/* Test variable number of 512-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "bf16-zmm-check.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m512bf16_varargs (__m512bf16 i0, __m512bf16 i1, __m512bf16 i2,
+				 __m512bf16 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m512bf16 *argp;
+
+  compare (values.i0, i0, __m512bf16);
+  compare (values.i1, i1, __m512bf16);
+  compare (values.i2, i2, __m512bf16);
+  compare (values.i3, i3, __m512bf16);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m512bf16 *)(((char *) fp) + 8);
+
+  /* Check __m512bf16 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m512bf16);
+  compare (values.i5, argp[1], __m512bf16);
+  compare (values.i6, argp[2], __m512bf16);
+  compare (values.i7, argp[3], __m512bf16);
+  compare (values.i8, argp[4], __m512bf16);
+  compare (values.i9, argp[5], __m512bf16);
+
+  /* Check register contents.  */
+  compare (fregs.zmm0, zmm_regs[0], __m512bf16);
+  compare (fregs.zmm1, zmm_regs[1], __m512bf16);
+  compare (fregs.zmm2, zmm_regs[2], __m512bf16);
+  compare (fregs.zmm3, zmm_regs[3], __m512bf16);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m512bf16_varargs (void)
+{
+  __m512bf16 x[10];
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+	 bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+	 bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+	 bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			  bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+			  bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
+			  bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
+  pass = "m512bf16-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m512bf16_varargs,
+				 _m512bf16);
+}
+
+void
+do_test (void)
+{
+  test_m512bf16_varargs ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
new file mode 100644
index 00000000000..98fbc660f27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
@@ -0,0 +1,53 @@
+#ifndef MACROS_H
+
+#define check_size(_t, _size) assert(sizeof(_t) == (_size))
+
+#define check_align(_t, _align) assert(__alignof__(_t) == (_align))
+
+#define check_align_lv(_t, _align) assert(__alignof__(_t) == (_align) \
+					  && (((unsigned long)&(_t)) & ((_align) - 1) ) == 0)
+
+#define check_basic_struct_size_and_align(_type, _size, _align) { \
+  struct _str { _type dummy; } _t; \
+  check_size(_t, _size); \
+  check_align_lv(_t, _align); \
+}
+
+#define check_array_size_and_align(_type, _size, _align) { \
+  _type _a[1]; _type _b[2]; _type _c[16]; \
+  struct _str { _type _a[1]; } _s; \
+  check_align_lv(_a[0], _align); \
+  check_size(_a, _size); \
+  check_size(_b, (_size*2)); \
+  check_size(_c, (_size*16)); \
+  check_size(_s, _size); \
+  check_align_lv(_s._a[0], _align); \
+}
+
+#define check_basic_union_size_and_align(_type, _size, _align) { \
+  union _union { _type dummy; } _u; \
+  check_size(_u, _size); \
+  check_align_lv(_u, _align); \
+}
+
+#define run_signed_tests2(_function, _arg1, _arg2) \
+  _function(_arg1, _arg2); \
+  _function(signed _arg1, _arg2); \
+  _function(unsigned _arg1, _arg2);
+
+#define run_signed_tests3(_function, _arg1, _arg2, _arg3) \
+  _function(_arg1, _arg2, _arg3); \
+  _function(signed _arg1, _arg2, _arg3); \
+  _function(unsigned _arg1, _arg2, _arg3);
+
+/* Check size of a struct and a union of three types.  */
+
+#define check_struct_and_union3(type1, type2, type3, struct_size, align_size) \
+{ \
+  struct _str { type1 t1; type2 t2; type3 t3; } _t; \
+  union _uni { type1 t1; type2 t2; type3 t3; } _u; \
+  check_size(_t, struct_size); \
+  check_size(_u, align_size); \
+}
+
+#endif // MACROS_H
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
new file mode 100644
index 00000000000..0c58db101e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
@@ -0,0 +1,214 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "defines.h"
+#include "macros.h"
+
+/* Check structs and unions of all permutations of 3 basic types.  */
+int
+main (void)
+{
+  check_struct_and_union3(char, char, __bf16, 4, 2);
+  check_struct_and_union3(char, __bf16, char, 6, 2);
+  check_struct_and_union3(char, __bf16, __bf16, 6, 2);
+  check_struct_and_union3(char, __bf16, int, 8, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(char, __bf16, long, 16, 8);
+#endif
+  check_struct_and_union3(char, __bf16, long long, 16, 8);
+  check_struct_and_union3(char, __bf16, float, 8, 4);
+  check_struct_and_union3(char, __bf16, double, 16, 8);
+  check_struct_and_union3(char, __bf16, long double, 32, 16);
+  check_struct_and_union3(char, int, __bf16, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(char, long, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(char, long long, __bf16, 24, 8);
+  check_struct_and_union3(char, float, __bf16, 12, 4);
+  check_struct_and_union3(char, double, __bf16, 24, 8);
+  check_struct_and_union3(char, long double, __bf16, 48, 16);
+  check_struct_and_union3(__bf16, char, char, 4, 2);
+  check_struct_and_union3(__bf16, char, __bf16, 6, 2);
+  check_struct_and_union3(__bf16, char, int, 8, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(__bf16, char, long, 16, 8);
+#endif
+  check_struct_and_union3(__bf16, char, long long, 16, 8);
+  check_struct_and_union3(__bf16, char, float, 8, 4);
+  check_struct_and_union3(__bf16, char, double, 16, 8);
+  check_struct_and_union3(__bf16, char, long double, 32, 16);
+  check_struct_and_union3(__bf16, __bf16, char, 6, 2);
+  check_struct_and_union3(__bf16, __bf16, __bf16, 6, 2);
+  check_struct_and_union3(__bf16, __bf16, int, 8, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(__bf16, __bf16, long, 16, 8);
+#endif
+  check_struct_and_union3(__bf16, __bf16, long long, 16, 8);
+  check_struct_and_union3(__bf16, __bf16, float, 8, 4);
+  check_struct_and_union3(__bf16, __bf16, double, 16, 8);
+  check_struct_and_union3(__bf16, __bf16, long double, 32, 16);
+  check_struct_and_union3(__bf16, int, char, 12, 4);
+  check_struct_and_union3(__bf16, int, __bf16, 12, 4);
+  check_struct_and_union3(__bf16, int, int, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(__bf16, int, long, 16, 8);
+#endif
+  check_struct_and_union3(__bf16, int, long long, 16, 8);
+  check_struct_and_union3(__bf16, int, float, 12, 4);
+  check_struct_and_union3(__bf16, int, double, 16, 8);
+  check_struct_and_union3(__bf16, int, long double, 32, 16);
+#ifndef __ILP32__
+  check_struct_and_union3(__bf16, long, char, 24, 8);
+  check_struct_and_union3(__bf16, long, __bf16, 24, 8);
+  check_struct_and_union3(__bf16, long, int, 24, 8);
+  check_struct_and_union3(__bf16, long, long, 24, 8);
+  check_struct_and_union3(__bf16, long, long long, 24, 8);
+  check_struct_and_union3(__bf16, long, float, 24, 8);
+  check_struct_and_union3(__bf16, long, double, 24, 8);
+#endif
+  check_struct_and_union3(__bf16, long, long double, 32, 16);
+  check_struct_and_union3(__bf16, long long, char, 24, 8);
+  check_struct_and_union3(__bf16, long long, __bf16, 24, 8);
+  check_struct_and_union3(__bf16, long long, int, 24, 8);
+  check_struct_and_union3(__bf16, long long, long, 24, 8);
+  check_struct_and_union3(__bf16, long long, long long, 24, 8);
+  check_struct_and_union3(__bf16, long long, float, 24, 8);
+  check_struct_and_union3(__bf16, long long, double, 24, 8);
+  check_struct_and_union3(__bf16, long long, long double, 32, 16);
+  check_struct_and_union3(__bf16, float, char, 12, 4);
+  check_struct_and_union3(__bf16, float, __bf16, 12, 4);
+  check_struct_and_union3(__bf16, float, int, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(__bf16, float, long, 16, 8);
+#endif
+  check_struct_and_union3(__bf16, float, long long, 16, 8);
+  check_struct_and_union3(__bf16, float, float, 12, 4);
+  check_struct_and_union3(__bf16, float, double, 16, 8);
+  check_struct_and_union3(__bf16, float, long double, 32, 16);
+  check_struct_and_union3(__bf16, double, char, 24, 8);
+  check_struct_and_union3(__bf16, double, __bf16, 24, 8);
+  check_struct_and_union3(__bf16, double, int, 24, 8);
+  check_struct_and_union3(__bf16, double, long, 24, 8);
+  check_struct_and_union3(__bf16, double, long long, 24, 8);
+  check_struct_and_union3(__bf16, double, float, 24, 8);
+  check_struct_and_union3(__bf16, double, double, 24, 8);
+  check_struct_and_union3(__bf16, double, long double, 32, 16);
+  check_struct_and_union3(__bf16, long double, char, 48, 16);
+  check_struct_and_union3(__bf16, long double, __bf16, 48, 16);
+  check_struct_and_union3(__bf16, long double, int, 48, 16);
+  check_struct_and_union3(__bf16, long double, long, 48, 16);
+  check_struct_and_union3(__bf16, long double, long long, 48, 16);
+  check_struct_and_union3(__bf16, long double, float, 48, 16);
+  check_struct_and_union3(__bf16, long double, double, 48, 16);
+  check_struct_and_union3(__bf16, long double, long double, 48, 16);
+  check_struct_and_union3(int, char, __bf16, 8, 4);
+  check_struct_and_union3(int, __bf16, char, 8, 4);
+  check_struct_and_union3(int, __bf16, __bf16, 8, 4);
+  check_struct_and_union3(int, __bf16, int, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(int, __bf16, long, 16, 8);
+#endif
+  check_struct_and_union3(int, __bf16, long long, 16, 8);
+  check_struct_and_union3(int, __bf16, float, 12, 4);
+  check_struct_and_union3(int, __bf16, double, 16, 8);
+  check_struct_and_union3(int, __bf16, long double, 32, 16);
+  check_struct_and_union3(int, int, __bf16, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(int, long, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(int, long long, __bf16, 24, 8);
+  check_struct_and_union3(int, float, __bf16, 12, 4);
+  check_struct_and_union3(int, double, __bf16, 24, 8);
+  check_struct_and_union3(int, long double, __bf16, 48, 16);
+#ifndef __ILP32__
+  check_struct_and_union3(long, char, __bf16, 16, 8);
+  check_struct_and_union3(long, __bf16, char, 16, 8);
+  check_struct_and_union3(long, __bf16, __bf16, 16, 8);
+  check_struct_and_union3(long, __bf16, int, 16, 8);
+  check_struct_and_union3(long, __bf16, long, 24, 8);
+  check_struct_and_union3(long, __bf16, long long, 24, 8);
+  check_struct_and_union3(long, __bf16, float, 16, 8);
+  check_struct_and_union3(long, __bf16, double, 24, 8);
+#endif
+  check_struct_and_union3(long, __bf16, long double, 32, 16);
+#ifndef __ILP32__
+  check_struct_and_union3(long, int, __bf16, 16, 8);
+  check_struct_and_union3(long, long, __bf16, 24, 8);
+  check_struct_and_union3(long, long long, __bf16, 24, 8);
+  check_struct_and_union3(long, float, __bf16, 16, 8);
+  check_struct_and_union3(long, double, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(long, long double, __bf16, 48, 16);
+  check_struct_and_union3(long long, char, __bf16, 16, 8);
+  check_struct_and_union3(long long, __bf16, char, 16, 8);
+  check_struct_and_union3(long long, __bf16, __bf16, 16, 8);
+  check_struct_and_union3(long long, __bf16, int, 16, 8);
+#ifndef __ILP32__
+  check_struct_and_union3(long long, __bf16, long, 24, 8);
+#endif
+  check_struct_and_union3(long long, __bf16, long long, 24, 8);
+  check_struct_and_union3(long long, __bf16, float, 16, 8);
+  check_struct_and_union3(long long, __bf16, double, 24, 8);
+  check_struct_and_union3(long long, __bf16, long double, 32, 16);
+  check_struct_and_union3(long long, int, __bf16, 16, 8);
+#ifndef __ILP32__
+  check_struct_and_union3(long long, long, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(long long, long long, __bf16, 24, 8);
+  check_struct_and_union3(long long, float, __bf16, 16, 8);
+  check_struct_and_union3(long long, double, __bf16, 24, 8);
+  check_struct_and_union3(long long, long double, __bf16, 48, 16);
+  check_struct_and_union3(float, char, __bf16, 8, 4);
+  check_struct_and_union3(float, __bf16, char, 8, 4);
+  check_struct_and_union3(float, __bf16, __bf16, 8, 4);
+  check_struct_and_union3(float, __bf16, int, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(float, __bf16, long, 16, 8);
+#endif
+  check_struct_and_union3(float, __bf16, long long, 16, 8);
+  check_struct_and_union3(float, __bf16, float, 12, 4);
+  check_struct_and_union3(float, __bf16, double, 16, 8);
+  check_struct_and_union3(float, __bf16, long double, 32, 16);
+  check_struct_and_union3(float, int, __bf16, 12, 4);
+#ifndef __ILP32__
+  check_struct_and_union3(float, long, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(float, long long, __bf16, 24, 8);
+  check_struct_and_union3(float, float, __bf16, 12, 4);
+  check_struct_and_union3(float, double, __bf16, 24, 8);
+  check_struct_and_union3(float, long double, __bf16, 48, 16);
+  check_struct_and_union3(double, char, __bf16, 16, 8);
+  check_struct_and_union3(double, __bf16, char, 16, 8);
+  check_struct_and_union3(double, __bf16, __bf16, 16, 8);
+  check_struct_and_union3(double, __bf16, int, 16, 8);
+#ifndef __ILP32__
+  check_struct_and_union3(double, __bf16, long, 24, 8);
+#endif
+  check_struct_and_union3(double, __bf16, long long, 24, 8);
+  check_struct_and_union3(double, __bf16, float, 16, 8);
+  check_struct_and_union3(double, __bf16, double, 24, 8);
+  check_struct_and_union3(double, __bf16, long double, 32, 16);
+  check_struct_and_union3(double, int, __bf16, 16, 8);
+#ifndef __ILP32__
+  check_struct_and_union3(double, long, __bf16, 24, 8);
+#endif
+  check_struct_and_union3(double, long long, __bf16, 24, 8);
+  check_struct_and_union3(double, float, __bf16, 16, 8);
+  check_struct_and_union3(double, double, __bf16, 24, 8);
+  check_struct_and_union3(double, long double, __bf16, 48, 16);
+  check_struct_and_union3(long double, char, __bf16, 32, 16);
+  check_struct_and_union3(long double, __bf16, char, 32, 16);
+  check_struct_and_union3(long double, __bf16, __bf16, 32, 16);
+  check_struct_and_union3(long double, __bf16, int, 32, 16);
+  check_struct_and_union3(long double, __bf16, long, 32, 16);
+  check_struct_and_union3(long double, __bf16, long long, 32, 16);
+  check_struct_and_union3(long double, __bf16, float, 32, 16);
+  check_struct_and_union3(long double, __bf16, double, 32, 16);
+  check_struct_and_union3(long double, __bf16, long double, 48, 16);
+  check_struct_and_union3(long double, int, __bf16, 32, 16);
+  check_struct_and_union3(long double, long, __bf16, 32, 16);
+  check_struct_and_union3(long double, long long, __bf16, 32, 16);
+  check_struct_and_union3(long double, float, __bf16, 32, 16);
+  check_struct_and_union3(long double, double, __bf16, 32, 16);
+  check_struct_and_union3(long double, long double, __bf16, 48, 16);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
new file mode 100644
index 00000000000..6490a5228ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
@@ -0,0 +1,14 @@
+/* This checks alignment of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* __bf16 point types.  */
+  check_align(__bf16, TYPE_ALIGN_BF16);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
new file mode 100644
index 00000000000..c004c35bb83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
@@ -0,0 +1,13 @@
+/* This checks .  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  check_array_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
new file mode 100644
index 00000000000..cfea2224733
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
@@ -0,0 +1,20 @@
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+__bf16
+fun_test_returning_bf16 (void)
+{
+  __bf16 b = make_f32_bf16 (72.0f);
+  volatile_var++;
+  return b;
+}
+
+static void
+do_test (void)
+{
+  __bf16 var = WRAP_RET (fun_test_returning_bf16) ();
+  assert (check_bf16_float (xmm_regs[0].___bf16[0], 72.0f) == 1);
+  assert (check_bf16_float (var, 72.0f) == 1);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
new file mode 100644
index 00000000000..b81a8d971b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
@@ -0,0 +1,14 @@
+/* This checks sizes of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Floating point types.  */
+  check_size(__bf16, TYPE_SIZE_BF16);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
new file mode 100644
index 00000000000..f282506703c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
@@ -0,0 +1,14 @@
+/* This checks size and alignment of structs with a single basic type
+   element. All basic types are checked.  */
+
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+
+
+static void
+do_test (void)
+{
+  /* Floating point types.  */
+  check_basic_struct_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
new file mode 100644
index 00000000000..03afa68c0e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
@@ -0,0 +1,12 @@
+/* Test of simple unions, size and alignment.  */
+
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+
+static void
+do_test (void)
+{
+  /* Floating point types.  */
+  check_basic_union_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
new file mode 100644
index 00000000000..64857ce7b71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
@@ -0,0 +1,38 @@
+#include <stdio.h>
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
+
+__m128bf16
+fun_test_returning___m128bf16 (void)
+{
+  volatile_var++;
+  return (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+}
+
+__m128bf16 test_128bf16;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  XMM_T xmmt1, xmmt2;
+
+  clear_struct_registers;
+  test_128bf16 = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+  xmmt1._m128bf16[0] = test_128bf16;
+  xmmt2._m128bf16[0] = WRAP_RET (fun_test_returning___m128bf16)();
+  if (xmmt1._longlong[0] != xmmt2._longlong[0]
+      || xmmt1._longlong[0] != xmm_regs[0]._longlong[0])
+    printf ("fail m128bf16\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
new file mode 100644
index 00000000000..fe08042286b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
@@ -0,0 +1,312 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  __bf16 f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
+    f15, f16, f17, f18, f19, f20, f21, f22, f23;
+} values___bf16;
+
+void
+fun_check_bf16_passing_8_values (__bf16 f0 ATTRIBUTE_UNUSED,
+				 __bf16 f1 ATTRIBUTE_UNUSED,
+				 __bf16 f2 ATTRIBUTE_UNUSED,
+				 __bf16 f3 ATTRIBUTE_UNUSED,
+				 __bf16 f4 ATTRIBUTE_UNUSED,
+				 __bf16 f5 ATTRIBUTE_UNUSED,
+				 __bf16 f6 ATTRIBUTE_UNUSED,
+				 __bf16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  check_bf16 (values___bf16.f0, f0);
+  check_bf16 (values___bf16.f1, f1);
+  check_bf16 (values___bf16.f2, f2);
+  check_bf16 (values___bf16.f3, f3);
+  check_bf16 (values___bf16.f4, f4);
+  check_bf16 (values___bf16.f5, f5);
+  check_bf16 (values___bf16.f6, f6);
+  check_bf16 (values___bf16.f7, f7);
+}
+
+void
+fun_check_bf16_passing_8_regs (__bf16 f0 ATTRIBUTE_UNUSED,
+			       __bf16 f1 ATTRIBUTE_UNUSED,
+			       __bf16 f2 ATTRIBUTE_UNUSED,
+			       __bf16 f3 ATTRIBUTE_UNUSED,
+			       __bf16 f4 ATTRIBUTE_UNUSED,
+			       __bf16 f5 ATTRIBUTE_UNUSED,
+			       __bf16 f6 ATTRIBUTE_UNUSED,
+			       __bf16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_bf16_arguments;
+}
+
+void
+fun_check_bf16_passing_16_values (__bf16 f0 ATTRIBUTE_UNUSED,
+				  __bf16 f1 ATTRIBUTE_UNUSED,
+				  __bf16 f2 ATTRIBUTE_UNUSED,
+				  __bf16 f3 ATTRIBUTE_UNUSED,
+				  __bf16 f4 ATTRIBUTE_UNUSED,
+				  __bf16 f5 ATTRIBUTE_UNUSED,
+				  __bf16 f6 ATTRIBUTE_UNUSED,
+				  __bf16 f7 ATTRIBUTE_UNUSED,
+				  __bf16 f8 ATTRIBUTE_UNUSED,
+				  __bf16 f9 ATTRIBUTE_UNUSED,
+				  __bf16 f10 ATTRIBUTE_UNUSED,
+				  __bf16 f11 ATTRIBUTE_UNUSED,
+				  __bf16 f12 ATTRIBUTE_UNUSED,
+				  __bf16 f13 ATTRIBUTE_UNUSED,
+				  __bf16 f14 ATTRIBUTE_UNUSED,
+				  __bf16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  check_bf16 (values___bf16.f0, f0);
+  check_bf16 (values___bf16.f1, f1);
+  check_bf16 (values___bf16.f2, f2);
+  check_bf16 (values___bf16.f3, f3);
+  check_bf16 (values___bf16.f4, f4);
+  check_bf16 (values___bf16.f5, f5);
+  check_bf16 (values___bf16.f6, f6);
+  check_bf16 (values___bf16.f7, f7);
+  check_bf16 (values___bf16.f8, f8);
+  check_bf16 (values___bf16.f9, f9);
+  check_bf16 (values___bf16.f10, f10);
+  check_bf16 (values___bf16.f11, f11);
+  check_bf16 (values___bf16.f12, f12);
+  check_bf16 (values___bf16.f13, f13);
+  check_bf16 (values___bf16.f14, f14);
+  check_bf16 (values___bf16.f15, f15);
+}
+
+void
+fun_check_bf16_passing_16_regs (__bf16 f0 ATTRIBUTE_UNUSED,
+				__bf16 f1 ATTRIBUTE_UNUSED,
+				__bf16 f2 ATTRIBUTE_UNUSED,
+				__bf16 f3 ATTRIBUTE_UNUSED,
+				__bf16 f4 ATTRIBUTE_UNUSED,
+				__bf16 f5 ATTRIBUTE_UNUSED,
+				__bf16 f6 ATTRIBUTE_UNUSED,
+				__bf16 f7 ATTRIBUTE_UNUSED,
+				__bf16 f8 ATTRIBUTE_UNUSED,
+				__bf16 f9 ATTRIBUTE_UNUSED,
+				__bf16 f10 ATTRIBUTE_UNUSED,
+				__bf16 f11 ATTRIBUTE_UNUSED,
+				__bf16 f12 ATTRIBUTE_UNUSED,
+				__bf16 f13 ATTRIBUTE_UNUSED,
+				__bf16 f14 ATTRIBUTE_UNUSED,
+				__bf16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_bf16_arguments;
+}
+
+void
+fun_check_bf16_passing_20_values (__bf16 f0 ATTRIBUTE_UNUSED,
+				  __bf16 f1 ATTRIBUTE_UNUSED,
+				  __bf16 f2 ATTRIBUTE_UNUSED,
+				  __bf16 f3 ATTRIBUTE_UNUSED,
+				  __bf16 f4 ATTRIBUTE_UNUSED,
+				  __bf16 f5 ATTRIBUTE_UNUSED,
+				  __bf16 f6 ATTRIBUTE_UNUSED,
+				  __bf16 f7 ATTRIBUTE_UNUSED,
+				  __bf16 f8 ATTRIBUTE_UNUSED,
+				  __bf16 f9 ATTRIBUTE_UNUSED,
+				  __bf16 f10 ATTRIBUTE_UNUSED,
+				  __bf16 f11 ATTRIBUTE_UNUSED,
+				  __bf16 f12 ATTRIBUTE_UNUSED,
+				  __bf16 f13 ATTRIBUTE_UNUSED,
+				  __bf16 f14 ATTRIBUTE_UNUSED,
+				  __bf16 f15 ATTRIBUTE_UNUSED,
+				  __bf16 f16 ATTRIBUTE_UNUSED,
+				  __bf16 f17 ATTRIBUTE_UNUSED,
+				  __bf16 f18 ATTRIBUTE_UNUSED,
+				  __bf16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  check_bf16 (values___bf16.f0, f0);
+  check_bf16 (values___bf16.f1, f1);
+  check_bf16 (values___bf16.f2, f2);
+  check_bf16 (values___bf16.f3, f3);
+  check_bf16 (values___bf16.f4, f4);
+  check_bf16 (values___bf16.f5, f5);
+  check_bf16 (values___bf16.f6, f6);
+  check_bf16 (values___bf16.f7, f7);
+  check_bf16 (values___bf16.f8, f8);
+  check_bf16 (values___bf16.f9, f9);
+  check_bf16 (values___bf16.f10, f10);
+  check_bf16 (values___bf16.f11, f11);
+  check_bf16 (values___bf16.f12, f12);
+  check_bf16 (values___bf16.f13, f13);
+  check_bf16 (values___bf16.f14, f14);
+  check_bf16 (values___bf16.f15, f15);
+  check_bf16 (values___bf16.f16, f16);
+  check_bf16 (values___bf16.f17, f17);
+  check_bf16 (values___bf16.f18, f18);
+  check_bf16 (values___bf16.f19, f19);
+}
+
+void
+fun_check_bf16_passing_20_regs (__bf16 f0 ATTRIBUTE_UNUSED,
+				__bf16 f1 ATTRIBUTE_UNUSED,
+				__bf16 f2 ATTRIBUTE_UNUSED,
+				__bf16 f3 ATTRIBUTE_UNUSED,
+				__bf16 f4 ATTRIBUTE_UNUSED,
+				__bf16 f5 ATTRIBUTE_UNUSED,
+				__bf16 f6 ATTRIBUTE_UNUSED,
+				__bf16 f7 ATTRIBUTE_UNUSED,
+				__bf16 f8 ATTRIBUTE_UNUSED,
+				__bf16 f9 ATTRIBUTE_UNUSED,
+				__bf16 f10 ATTRIBUTE_UNUSED,
+				__bf16 f11 ATTRIBUTE_UNUSED,
+				__bf16 f12 ATTRIBUTE_UNUSED,
+				__bf16 f13 ATTRIBUTE_UNUSED,
+				__bf16 f14 ATTRIBUTE_UNUSED,
+				__bf16 f15 ATTRIBUTE_UNUSED,
+				__bf16 f16 ATTRIBUTE_UNUSED,
+				__bf16 f17 ATTRIBUTE_UNUSED,
+				__bf16 f18 ATTRIBUTE_UNUSED,
+				__bf16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_bf16_arguments;
+}
+
+#define def_check_bf16_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6,\
+				   _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_bf16_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, _f13, \
+				    _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_bf16_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, \
+				    _f13, _f14, _f15, _f16, _f17, \
+				    _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, \
+		     _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, \
+		     _f17, _f18, _f19); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, \
+		     _f18, _f19);
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10,
+		bf11,bf12,bf13,bf14,bf15,bf16,bf17,bf18,bf19,bf20;
+
+void
+test_bf16_on_stack ()
+{
+  def_check_bf16_passing8 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			   fun_check_bf16_passing_8_values,
+			   fun_check_bf16_passing_8_regs, __bf16);
+
+  def_check_bf16_passing16 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+			    bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
+			    fun_check_bf16_passing_16_values,
+			    fun_check_bf16_passing_16_regs, __bf16);
+}
+
+void
+test_too_many_bf16 ()
+{
+  def_check_bf16_passing20 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10,
+			    bf11,bf12,bf13,bf14,bf15,bf16,bf17,bf18,bf19,bf20,
+			    fun_check_bf16_passing_20_values,
+			    fun_check_bf16_passing_20_regs, __bf16);
+}
+
+static void
+do_test (void)
+{
+  test_bf16_on_stack ();
+  test_too_many_bf16 ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
new file mode 100644
index 00000000000..298b644e93d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
@@ -0,0 +1,238 @@
+#include <stdio.h>
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m128bf16_8_values (__m128bf16 i0 ATTRIBUTE_UNUSED,
+				     __m128bf16 i1 ATTRIBUTE_UNUSED,
+				     __m128bf16 i2 ATTRIBUTE_UNUSED,
+				     __m128bf16 i3 ATTRIBUTE_UNUSED,
+				     __m128bf16 i4 ATTRIBUTE_UNUSED,
+				     __m128bf16 i5 ATTRIBUTE_UNUSED,
+				     __m128bf16 i6 ATTRIBUTE_UNUSED,
+				     __m128bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128bf16);
+  compare (values.i1, i1, __m128bf16);
+  compare (values.i2, i2, __m128bf16);
+  compare (values.i3, i3, __m128bf16);
+  compare (values.i4, i4, __m128bf16);
+  compare (values.i5, i5, __m128bf16);
+  compare (values.i6, i6, __m128bf16);
+  compare (values.i7, i7, __m128bf16);
+}
+
+void
+fun_check_passing_m128bf16_8_regs (__m128bf16 i0 ATTRIBUTE_UNUSED,
+				   __m128bf16 i1 ATTRIBUTE_UNUSED,
+				   __m128bf16 i2 ATTRIBUTE_UNUSED,
+				   __m128bf16 i3 ATTRIBUTE_UNUSED,
+				   __m128bf16 i4 ATTRIBUTE_UNUSED,
+				   __m128bf16 i5 ATTRIBUTE_UNUSED,
+				   __m128bf16 i6 ATTRIBUTE_UNUSED,
+				   __m128bf16 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128bf16_20_values (__m128bf16 i0 ATTRIBUTE_UNUSED,
+				      __m128bf16 i1 ATTRIBUTE_UNUSED,
+				      __m128bf16 i2 ATTRIBUTE_UNUSED,
+				      __m128bf16 i3 ATTRIBUTE_UNUSED,
+				      __m128bf16 i4 ATTRIBUTE_UNUSED,
+				      __m128bf16 i5 ATTRIBUTE_UNUSED,
+				      __m128bf16 i6 ATTRIBUTE_UNUSED,
+				      __m128bf16 i7 ATTRIBUTE_UNUSED,
+				      __m128bf16 i8 ATTRIBUTE_UNUSED,
+				      __m128bf16 i9 ATTRIBUTE_UNUSED,
+				      __m128bf16 i10 ATTRIBUTE_UNUSED,
+				      __m128bf16 i11 ATTRIBUTE_UNUSED,
+				      __m128bf16 i12 ATTRIBUTE_UNUSED,
+				      __m128bf16 i13 ATTRIBUTE_UNUSED,
+				      __m128bf16 i14 ATTRIBUTE_UNUSED,
+				      __m128bf16 i15 ATTRIBUTE_UNUSED,
+				      __m128bf16 i16 ATTRIBUTE_UNUSED,
+				      __m128bf16 i17 ATTRIBUTE_UNUSED,
+				      __m128bf16 i18 ATTRIBUTE_UNUSED,
+				      __m128bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128bf16);
+  compare (values.i1, i1, __m128bf16);
+  compare (values.i2, i2, __m128bf16);
+  compare (values.i3, i3, __m128bf16);
+  compare (values.i4, i4, __m128bf16);
+  compare (values.i5, i5, __m128bf16);
+  compare (values.i6, i6, __m128bf16);
+  compare (values.i7, i7, __m128bf16);
+  compare (values.i8, i8, __m128bf16);
+  compare (values.i9, i9, __m128bf16);
+  compare (values.i10, i10, __m128bf16);
+  compare (values.i11, i11, __m128bf16);
+  compare (values.i12, i12, __m128bf16);
+  compare (values.i13, i13, __m128bf16);
+  compare (values.i14, i14, __m128bf16);
+  compare (values.i15, i15, __m128bf16);
+  compare (values.i16, i16, __m128bf16);
+  compare (values.i17, i17, __m128bf16);
+  compare (values.i18, i18, __m128bf16);
+  compare (values.i19, i19, __m128bf16);
+}
+
+void
+fun_check_passing_m128bf16_20_regs (__m128bf16 i0 ATTRIBUTE_UNUSED,
+				    __m128bf16 i1 ATTRIBUTE_UNUSED,
+				    __m128bf16 i2 ATTRIBUTE_UNUSED,
+				    __m128bf16 i3 ATTRIBUTE_UNUSED,
+				    __m128bf16 i4 ATTRIBUTE_UNUSED,
+				    __m128bf16 i5 ATTRIBUTE_UNUSED,
+				    __m128bf16 i6 ATTRIBUTE_UNUSED,
+				    __m128bf16 i7 ATTRIBUTE_UNUSED,
+				    __m128bf16 i8 ATTRIBUTE_UNUSED,
+				    __m128bf16 i9 ATTRIBUTE_UNUSED,
+				    __m128bf16 i10 ATTRIBUTE_UNUSED,
+				    __m128bf16 i11 ATTRIBUTE_UNUSED,
+				    __m128bf16 i12 ATTRIBUTE_UNUSED,
+				    __m128bf16 i13 ATTRIBUTE_UNUSED,
+				    __m128bf16 i14 ATTRIBUTE_UNUSED,
+				    __m128bf16 i15 ATTRIBUTE_UNUSED,
+				    __m128bf16 i16 ATTRIBUTE_UNUSED,
+				    __m128bf16 i17 ATTRIBUTE_UNUSED,
+				    __m128bf16 i18 ATTRIBUTE_UNUSED,
+				    __m128bf16 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+#define def_check_int_passing8(_i0, _i1, _i2, _i3, \
+			       _i4, _i5, _i6, _i7, \
+			       _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_int_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, \
+				_i7, _i8, _i9, _i10, _i11, _i12, _i13, \
+				_i14, _i15, _i16, _i17, _i18, _i19, \
+				_func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19);
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
+
+void
+test_m128bf16_on_stack ()
+{
+  __m128bf16 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+  pass = "m128bf16-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m128bf16_8_values,
+			  fun_check_passing_m128bf16_8_regs, _m128bf16);
+}
+
+void
+test_too_many_m128bf16 ()
+{
+  __m128bf16 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+  pass = "m128bf16-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m128bf16_20_values,
+			   fun_check_passing_m128bf16_20_regs, _m128bf16);
+}
+
+static void
+do_test (void)
+{
+  test_m128bf16_on_stack ();
+  test_too_many_m128bf16 ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
new file mode 100644
index 00000000000..8d966005741
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
@@ -0,0 +1,67 @@
+#include "bf16-check.h"
+#include "defines.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+struct m128bf16_struct
+{
+  __m128bf16 x;
+};
+
+struct m128bf16_2_struct
+{
+  __m128bf16 x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1bf16 (struct m128bf16_struct ms1 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms2 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms3 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms4 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms5 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms6 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms7 ATTRIBUTE_UNUSED,
+			   struct m128bf16_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_struct_passing2bf16 (struct m128bf16_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+24);
+}
+
+volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
+		bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
+
+static void
+do_test (void)
+{
+  struct m128bf16_struct m128bf16s [8];
+  struct m128bf16_2_struct m128bf16_2s = { 
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 },
+    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 },
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m128bf16s[i].x = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.xmm0)[i]._m128bf16[0] = m128bf16s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1bf16) (m128bf16s[0], m128bf16s[1], m128bf16s[2], m128bf16s[3],
+					 m128bf16s[4], m128bf16s[5], m128bf16s[6], m128bf16s[7]);
+  WRAP_CALL (check_struct_passing2bf16) (m128bf16_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
new file mode 100644
index 00000000000..83e4380512b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
@@ -0,0 +1,160 @@
+#include "bf16-check.h"
+#include "defines.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+unsigned int num_fregs, num_iregs;
+
+union un1b
+{
+  __m128bf16 x;
+  float f;
+};
+
+union un1bb
+{
+  __m128bf16 x;
+  __bf16 f;
+};
+
+union un2b
+{
+  __m128bf16 x;
+  double d;
+};
+
+union un3b
+{
+  __m128bf16 x;
+  __m128 v;
+};
+
+union un4b
+{
+  __m128bf16 x;
+  long double ld;
+};
+
+void
+check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
+		       union un1b u2 ATTRIBUTE_UNUSED,
+		       union un1b u3 ATTRIBUTE_UNUSED,
+		       union un1b u4 ATTRIBUTE_UNUSED,
+		       union un1b u5 ATTRIBUTE_UNUSED,
+		       union un1b u6 ATTRIBUTE_UNUSED,
+		       union un1b u7 ATTRIBUTE_UNUSED,
+		       union un1b u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
+		        union un1bb u2 ATTRIBUTE_UNUSED,
+		        union un1bb u3 ATTRIBUTE_UNUSED,
+		        union un1bb u4 ATTRIBUTE_UNUSED,
+		        union un1bb u5 ATTRIBUTE_UNUSED,
+		        union un1bb u6 ATTRIBUTE_UNUSED,
+		        union un1bb u7 ATTRIBUTE_UNUSED,
+		        union un1bb u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
+		       union un2b u2 ATTRIBUTE_UNUSED,
+		       union un2b u3 ATTRIBUTE_UNUSED,
+		       union un2b u4 ATTRIBUTE_UNUSED,
+		       union un2b u5 ATTRIBUTE_UNUSED,
+		       union un2b u6 ATTRIBUTE_UNUSED,
+		       union un2b u7 ATTRIBUTE_UNUSED,
+		       union un2b u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
+		       union un3b u2 ATTRIBUTE_UNUSED,
+		       union un3b u3 ATTRIBUTE_UNUSED,
+		       union un3b u4 ATTRIBUTE_UNUSED,
+		       union un3b u5 ATTRIBUTE_UNUSED,
+		       union un3b u6 ATTRIBUTE_UNUSED,
+		       union un3b u7 ATTRIBUTE_UNUSED,
+		       union un3b u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+#define check_union_passing1b WRAP_CALL(check_union_passing1b)
+#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
+#define check_union_passing2b WRAP_CALL(check_union_passing2b)
+#define check_union_passing3b WRAP_CALL(check_union_passing3b)
+#define check_union_passing4b WRAP_CALL(check_union_passing4b)
+
+static void
+do_test (void)
+{
+  union un1b u1b[8];
+  union un1bb u1bb[8];
+  union un2b u2b[8];
+  union un3b u3b[8];
+  union un4b u4b;
+  int i;
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1b[i].x = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.xmm0)[i]._m128bf16[0] = u1b[i].x;
+  num_fregs = 8;
+  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
+		         u1b[4], u1b[5], u1b[6], u1b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1bb[i].x = u1b[i].x;
+      (&fregs.xmm0)[i]._m128bf16[0] = u1bb[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
+		          u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2b[i].x = u1b[i].x;
+      (&fregs.xmm0)[i]._m128bf16[0] = u2b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
+		         u2b[4], u2b[5], u2b[6], u2b[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3b[i].x = u1b[i].x;
+      (&fregs.xmm0)[i]._m128bf16[0] = u3b[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
+		         u3b[4], u3b[5], u3b[6], u3b[7]);
+
+  check_union_passing4b (u4b);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
new file mode 100644
index 00000000000..757ccc26b79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
@@ -0,0 +1,176 @@
+/* This tests returning of structures.  */
+
+#include <stdio.h>
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+int current_test;
+int num_failed = 0;
+
+#undef assert
+#define assert(test) do { if (!(test)) {fprintf (stderr, "failed in test %d\n", current_test); num_failed++; } } while (0)
+
+#define xmm0b xmm_regs[0].___bf16
+#define xmm1b xmm_regs[1].___bf16
+#define xmm0f xmm_regs[0]._float
+#define xmm0d xmm_regs[0]._double
+#define xmm1f xmm_regs[1]._float
+#define xmm1d xmm_regs[1]._double
+
+typedef enum {
+  SSE_B = 0,
+  SSE_D,
+  MEM,
+  INT_SSE,
+  SSE_INT,
+  SSE_F_H,
+  SSE_F_H8
+} Type;
+
+/* Structures which should be returned in SSE.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
+
+D(120,__bf16 f,SSE_B, s.f=make_f32_bf16(42.0f))
+D(121,__bf16 f;__bf16 f2,SSE_B, s.f=make_f32_bf16(42.0f))
+D(122,__bf16 f;float d,SSE_B, s.f=make_f32_bf16(42.0f))
+D(123,__bf16 f;double d,SSE_B, s.f=make_f32_bf16(42.0f))
+D(124,double d; __bf16 f,SSE_D, s.d=42)
+D(125,__bf16 f[2],SSE_B, s.f[0]=make_f32_bf16(42.0f))
+D(126,__bf16 f[3],SSE_B, s.f[0]=make_f32_bf16(42.0f))
+D(127,__bf16 f[4],SSE_B, s.f[0]=make_f32_bf16(42.0f))
+D(128,__bf16 f[2]; double d,SSE_B, s.f[0]=make_f32_bf16(42.0f))
+D(129,double d;__bf16 f[2],SSE_D, s.d=42)
+
+#undef D
+
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT_SSE; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42, make_f32_bf16(43.0f) }; return s; }
+
+D(310,char m1; __bf16 m2)
+D(311,short m1; __bf16 m2)
+D(312,int m1; __bf16 m2)
+D(313,long long m1; __bf16 m2)
+
+#undef D
+
+void check_300 (void)
+{
+  XMM_T x;
+  x._ulonglong[0] = rax;
+  switch (current_test) {
+    case 310: assert ((rax & 0xff) == 42
+		      && check_bf16_float (x.___bf16[1], 43.0f) == 1); break;
+    case 311: assert ((rax & 0xffff) == 42
+		      && check_bf16_float (x.___bf16[1], 43.0f) == 1); break;
+    case 312: assert ((rax & 0xffffffff) == 42
+		      && check_bf16_float (x.___bf16[2], 43.0f) == 1); break;
+    case 313: assert (rax == 42
+		      && check_bf16_float (xmm0b[0], 43.0f) == 1); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in SSE (low) and INT (high).  */
+#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = SSE_INT; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s));  B; return s; }
+
+D(402,__bf16 f[4];char c, s.f[0]=make_f32_bf16(42.0f); s.c=43)
+
+#undef D
+
+void check_400 (void)
+{
+  switch (current_test) {
+    case 402: assert (check_bf16_float (xmm0b[0], 42.0f) == 1 && (rax & 0xff) == 43); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in MEM.  */
+void *struct_addr;
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = MEM; \
+struct S_ ## I f_ ## I (void) { union {unsigned char c; struct S_ ## I s;} u; memset (&u.s, 0, sizeof(u.s)); u.c = 42; return u.s; }
+
+/* Unnaturally aligned members.  */
+D(540,__bf16 m1[10])
+D(541,char m1[1];__bf16 f[8])
+
+#undef D
+
+
+/* Special tests.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; B; return s; }
+D(601,__bf16 f[4], SSE_F_H, s.f[0] = s.f[1] = s.f[2] = s.f[3] = make_f32_bf16 (42.0f))
+D(602,__bf16 f[8], SSE_F_H8,
+  s.f[0] = s.f[1] = s.f[2] = s.f[3] = s.f[4] = s.f[5] = s.f[6] = s.f[7] = make_f32_bf16 (42.0f))
+#undef D
+
+void clear_all (void)
+{
+  clear_int_registers;
+}
+
+void check_all (Type class, unsigned long size)
+{
+  switch (class) {
+    case SSE_B: assert (check_bf16_float (xmm0b[0], 42.0f) == 1); break;
+    case SSE_D: assert (xmm0d[0] == 42); break;
+    case SSE_F_H: assert (check_bf16_float (xmm0b[0], 42) == 1
+			  && check_bf16_float (xmm0b[1], 42) == 1
+			  && check_bf16_float (xmm0b[2], 42) == 1
+			  && check_bf16_float (xmm0b[3], 42) == 1); break;
+    case SSE_F_H8: assert (check_bf16_float (xmm0b[0], 42) == 1
+			   && check_bf16_float (xmm0b[1], 42) == 1
+			   && check_bf16_float (xmm0b[2], 42) == 1
+			   && check_bf16_float (xmm0b[3], 42) == 1
+			   && check_bf16_float (xmm1b[0], 42) == 1
+			   && check_bf16_float (xmm1b[1], 42) == 1
+			   && check_bf16_float (xmm1b[2], 42) == 1
+			   && check_bf16_float (xmm1b[3], 42) == 1); break;
+    case INT_SSE: check_300(); break;
+    case SSE_INT: check_400(); break;
+    /* Ideally we would like to check that rax == struct_addr.
+       Unfortunately the address of the target struct escapes (for setting
+       struct_addr), so the return struct is a temporary one whose address
+       is given to the f_* functions, otherwise a conforming program
+       could notice the struct changing already before the function returns.
+       This temporary struct could be anywhere.  For GCC it will be on
+       stack, but no one is forbidding that it could be a static variable
+       if there's no threading or proper locking.  Nobody in his right mind
+       will not use the stack for that.  */
+    case MEM: assert (*(unsigned char*)struct_addr == 42 && rdi == rax); break;
+  }
+}
+
+#define D(I) { struct S_ ## I s; current_test = I; struct_addr = (void*)&s; \
+  clear_all(); \
+  s = WRAP_RET(f_ ## I) (); \
+  check_all(class_ ## I, sizeof(s)); \
+}
+
+static void
+do_test (void)
+{
+  D(120) D(121) D(122) D(123) D(124) D(125) D(126) D(127) D(128) D(129)
+
+  D(310) D(311) D(312) D(313)
+
+  D(402)
+
+  D(540) D(541)
+
+  D(601) D(602)
+  if (num_failed)
+    abort ();
+}
+#undef D
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c
new file mode 100644
index 00000000000..4eea7eb7d3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c
@@ -0,0 +1,111 @@
+/* Test variable number of 128-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "bf16-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct FloatRegisters fregs;
+struct IntegerRegisters iregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m128bf16_varargs (__m128bf16 i0, __m128bf16 i1, __m128bf16 i2,
+				 __m128bf16 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m128bf16 *argp;
+
+  compare (values.i0, i0, __m128bf16);
+  compare (values.i1, i1, __m128bf16);
+  compare (values.i2, i2, __m128bf16);
+  compare (values.i3, i3, __m128bf16);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m128bf16 *) (((char *) fp) + 8);
+
+  /* Check __m128bf16 arguments passed on stack.  */
+  compare (values.i8, argp[0], __m128bf16);
+  compare (values.i9, argp[1], __m128bf16);
+
+  /* Check register contents.  */
+  compare (fregs.xmm0, xmm_regs[0], __m128bf16);
+  compare (fregs.xmm1, xmm_regs[1], __m128bf16);
+  compare (fregs.xmm2, xmm_regs[2], __m128bf16);
+  compare (fregs.xmm3, xmm_regs[3], __m128bf16);
+  compare (fregs.xmm4, xmm_regs[4], __m128bf16);
+  compare (fregs.xmm5, xmm_regs[5], __m128bf16);
+  compare (fregs.xmm6, xmm_regs[6], __m128bf16);
+  compare (fregs.xmm7, xmm_regs[7], __m128bf16);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m128bf16_varargs (void)
+{
+  __m128bf16 x[10];
+  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
+  pass = "m128bf16-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m128bf16_varargs,
+				 _m128bf16);
+}
+
+static void
+do_test (void)
+{
+  test_m128bf16_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add ABI test for __bf16 type
  2022-08-18  7:34 ` [PATCH] Add ABI test for " Haochen Jiang
@ 2022-08-19  0:58   ` Hongtao Liu
  2022-08-19 17:30     ` H.J. Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2022-08-19  0:58 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, hongtao.liu

On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi all,
>
> This patch aims to add bf16 abi test after the whole __bf16 type is added.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
>         * gcc.target/x86_64/abi/bf16/args.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
>         * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/defines.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/macros.h: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
>         * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.
> ---
>  .../gcc.target/x86_64/abi/bf16/abi-bf16.exp   |  46 +++
>  .../gcc.target/x86_64/abi/bf16/args.h         | 164 +++++++++
>  .../gcc.target/x86_64/abi/bf16/asm-support.S  |  84 +++++
>  .../gcc.target/x86_64/abi/bf16/bf16-check.h   |  24 ++
>  .../gcc.target/x86_64/abi/bf16/bf16-helper.h  |  41 +++
>  .../gcc.target/x86_64/abi/bf16/defines.h      | 163 +++++++++
>  .../x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp |  46 +++
>  .../x86_64/abi/bf16/m256bf16/args.h           | 152 +++++++++
>  .../x86_64/abi/bf16/m256bf16/asm-support.S    |  84 +++++
>  .../x86_64/abi/bf16/m256bf16/bf16-ymm-check.h |  24 ++
>  .../abi/bf16/m256bf16/test_m256_returning.c   |  38 +++
>  .../abi/bf16/m256bf16/test_passing_m256.c     | 235 +++++++++++++
>  .../abi/bf16/m256bf16/test_passing_structs.c  |  69 ++++
>  .../abi/bf16/m256bf16/test_passing_unions.c   | 179 ++++++++++
>  .../abi/bf16/m256bf16/test_varargs-m256.c     | 107 ++++++
>  .../x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp |  46 +++
>  .../x86_64/abi/bf16/m512bf16/args.h           | 155 +++++++++
>  .../x86_64/abi/bf16/m512bf16/asm-support.S    | 100 ++++++
>  .../x86_64/abi/bf16/m512bf16/bf16-zmm-check.h |  23 ++
>  .../abi/bf16/m512bf16/test_m512_returning.c   |  44 +++
>  .../abi/bf16/m512bf16/test_passing_m512.c     | 243 ++++++++++++++
>  .../abi/bf16/m512bf16/test_passing_structs.c  |  77 +++++
>  .../abi/bf16/m512bf16/test_passing_unions.c   | 222 +++++++++++++
>  .../abi/bf16/m512bf16/test_varargs-m512.c     | 111 +++++++
>  .../gcc.target/x86_64/abi/bf16/macros.h       |  53 +++
>  .../bf16/test_3_element_struct_and_unions.c   | 214 ++++++++++++
>  .../x86_64/abi/bf16/test_basic_alignment.c    |  14 +
>  .../bf16/test_basic_array_size_and_align.c    |  13 +
>  .../x86_64/abi/bf16/test_basic_returning.c    |  20 ++
>  .../x86_64/abi/bf16/test_basic_sizes.c        |  14 +
>  .../bf16/test_basic_struct_size_and_align.c   |  14 +
>  .../bf16/test_basic_union_size_and_align.c    |  12 +
>  .../x86_64/abi/bf16/test_m128_returning.c     |  38 +++
>  .../x86_64/abi/bf16/test_passing_floats.c     | 312 ++++++++++++++++++
>  .../x86_64/abi/bf16/test_passing_m128.c       | 238 +++++++++++++
>  .../x86_64/abi/bf16/test_passing_structs.c    |  67 ++++
>  .../x86_64/abi/bf16/test_passing_unions.c     | 160 +++++++++
>  .../x86_64/abi/bf16/test_struct_returning.c   | 176 ++++++++++
>  .../x86_64/abi/bf16/test_varargs-m128.c       | 111 +++++++
>  39 files changed, 3933 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c
>
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
> new file mode 100644
> index 00000000000..bd386f2a560
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/abi-bf16.exp
> @@ -0,0 +1,46 @@
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# The x86-64 ABI testsuite needs one additional assembler file for most
> +# testcases.  For simplicity we will just link it into each test.
> +
> +load_lib c-torture.exp
> +load_lib target-supports.exp
> +load_lib torture-options.exp
> +load_lib clearcap.exp
> +
> +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
> +     || ![is-effective-target lp64]
> +     || ![is-effective-target sse2] } then {
> +  return
> +}
> +
> +
> +torture-init
> +clearcap-init
> +set-torture-options $C_TORTURE_OPTIONS
> +set additional_flags "-W -Wall -msse2"
> +
> +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
> +    if {[runtest_file_p $runtests $src]} {
> +        c-torture-execute [list $src \
> +                                $srcdir/$subdir/asm-support.S] \
> +                                $additional_flags
> +    }
> +}
> +
> +clearcap-finish
> +torture-finish
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
> new file mode 100644
> index 00000000000..11d7e2b3a1c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/args.h
> @@ -0,0 +1,164 @@
> +#ifndef INCLUDED_ARGS_H
> +#define INCLUDED_ARGS_H
> +
> +#include <string.h>
> +
> +/* This defines the calling sequences for integers and floats.  */
> +#define I0 rdi
> +#define I1 rsi
> +#define I2 rdx
> +#define I3 rcx
> +#define I4 r8
> +#define I5 r9
> +#define F0 xmm0
> +#define F1 xmm1
> +#define F2 xmm2
> +#define F3 xmm3
> +#define F4 xmm4
> +#define F5 xmm5
> +#define F6 xmm6
> +#define F7 xmm7
> +
> +typedef union {
> +  __bf16 ___bf16[8];
> +  float _float[4];
> +  double _double[2];
> +  long long _longlong[2];
> +  int _int[4];
> +  ulonglong _ulonglong[2];
> +#ifdef CHECK_M64_M128
> +  __m64 _m64[2];
> +  __m128 _m128[1];
> +  __m128bf16 _m128bf16[1];
> +#endif
> +} XMM_T;
> +
> +typedef union {
> +  __bf16 ___bf16;
> +  float _float;
> +  double _double;
> +  ldouble _ldouble;
> +  ulonglong _ulonglong[2];
> +} X87_T;
> +extern void (*callthis)(void);
> +extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
> +XMM_T xmm_regs[16];
> +X87_T x87_regs[8];
> +extern volatile unsigned long long volatile_var;
> +extern void snapshot (void);
> +extern void snapshot_ret (void);
> +#define WRAP_CALL(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
> +#define WRAP_RET(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
> +
> +/* Clear all integer registers.  */
> +#define clear_int_hardware_registers \
> +  asm __volatile__ ("xor %%rax, %%rax\n\t" \
> +                   "xor %%rbx, %%rbx\n\t" \
> +                   "xor %%rcx, %%rcx\n\t" \
> +                   "xor %%rdx, %%rdx\n\t" \
> +                   "xor %%rsi, %%rsi\n\t" \
> +                   "xor %%rdi, %%rdi\n\t" \
> +                   "xor %%r8, %%r8\n\t" \
> +                   "xor %%r9, %%r9\n\t" \
> +                   "xor %%r10, %%r10\n\t" \
> +                   "xor %%r11, %%r11\n\t" \
> +                   "xor %%r12, %%r12\n\t" \
> +                   "xor %%r13, %%r13\n\t" \
> +                   "xor %%r14, %%r14\n\t" \
> +                   "xor %%r15, %%r15\n\t" \
> +                   ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
> +                   "r9", "r10", "r11", "r12", "r13", "r14", "r15");
> +
> +/* This is the list of registers available for passing arguments. Not all of
> +   these are used or even really available.  */
> +struct IntegerRegisters
> +{
> +  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
> +};
> +struct FloatRegisters
> +{
> +  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
> +  ldouble st0, st1, st2, st3, st4, st5, st6, st7;
> +  XMM_T xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9,
> +        xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
> +};
> +
> +/* Implemented in scalarargs.c  */
> +extern struct IntegerRegisters iregs;
> +extern struct FloatRegisters fregs;
> +extern unsigned int num_iregs, num_fregs;
> +
> +/* Clear register struct.  */
> +#define clear_struct_registers \
> +  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
> +    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
> +  memset (&iregs, 0, sizeof (iregs)); \
> +  memset (&fregs, 0, sizeof (fregs)); \
> +  memset (xmm_regs, 0, sizeof (xmm_regs)); \
> +  memset (x87_regs, 0, sizeof (x87_regs));
> +
> +/* Clear both hardware and register structs for integers.  */
> +#define clear_int_registers \
> +  clear_struct_registers \
> +  clear_int_hardware_registers
> +
> +/* Do the checking.  */
> +#define check_f_arguments(T) do { \
> +  assert (num_fregs <= 0 || check_bf16 (fregs.xmm0._ ## T [0], xmm_regs[0]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 1 || check_bf16 (fregs.xmm1._ ## T [0], xmm_regs[1]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 2 || check_bf16 (fregs.xmm2._ ## T [0], xmm_regs[2]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 3 || check_bf16 (fregs.xmm3._ ## T [0], xmm_regs[3]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 4 || check_bf16 (fregs.xmm4._ ## T [0], xmm_regs[4]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 5 || check_bf16 (fregs.xmm5._ ## T [0], xmm_regs[5]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 6 || check_bf16 (fregs.xmm6._ ## T [0], xmm_regs[6]._ ## T [0]) == 1); \
> +  assert (num_fregs <= 7 || check_bf16 (fregs.xmm7._ ## T [0], xmm_regs[7]._ ## T [0]) == 1); \
> +  } while (0)
> +
> +#define check_bf16_arguments check_f_arguments(__bf16)
> +
> +#define check_vector_arguments(T,O) do { \
> +  assert (num_fregs <= 0 \
> +         || memcmp (((char *) &fregs.xmm0) + (O), \
> +                    &xmm_regs[0], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 1 \
> +         || memcmp (((char *) &fregs.xmm1) + (O), \
> +                    &xmm_regs[1], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 2 \
> +         || memcmp (((char *) &fregs.xmm2) + (O), \
> +                    &xmm_regs[2], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 3 \
> +         || memcmp (((char *) &fregs.xmm3) + (O), \
> +                    &xmm_regs[3], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 4 \
> +         || memcmp (((char *) &fregs.xmm4) + (O), \
> +                    &xmm_regs[4], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 5 \
> +         || memcmp (((char *) &fregs.xmm5) + (O), \
> +                    &xmm_regs[5], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 6 \
> +         || memcmp (((char *) &fregs.xmm6) + (O), \
> +                    &xmm_regs[6], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 7 \
> +         || memcmp (((char *) &fregs.xmm7) + (O), \
> +                    &xmm_regs[7], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  } while (0)
> +
> +#define check_m128_arguments check_vector_arguments(m128, 0)
> +
> +#define clear_float_registers \
> +  clear_struct_registers
> +
> +#define clear_x87_registers \
> +  clear_struct_registers
> +
> +#endif /* INCLUDED_ARGS_H  */
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
> new file mode 100644
> index 00000000000..a8165d86317
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/asm-support.S
> @@ -0,0 +1,84 @@
> +       .text
> +       .p2align 4,,15
> +.globl snapshot
> +       .type   snapshot, @function
> +snapshot:
> +.LFB3:
> +       movq    %rax, rax(%rip)
> +       movq    %rbx, rbx(%rip)
> +       movq    %rcx, rcx(%rip)
> +       movq    %rdx, rdx(%rip)
> +       movq    %rdi, rdi(%rip)
> +       movq    %rsi, rsi(%rip)
> +       movq    %rbp, rbp(%rip)
> +       movq    %rsp, rsp(%rip)
> +       movq    %r8, r8(%rip)
> +       movq    %r9, r9(%rip)
> +       movq    %r10, r10(%rip)
> +       movq    %r11, r11(%rip)
> +       movq    %r12, r12(%rip)
> +       movq    %r13, r13(%rip)
> +       movq    %r14, r14(%rip)
> +       movq    %r15, r15(%rip)
> +       vmovdqu %xmm0, xmm_regs+0(%rip)
> +       vmovdqu %xmm1, xmm_regs+16(%rip)
> +       vmovdqu %xmm2, xmm_regs+32(%rip)
> +       vmovdqu %xmm3, xmm_regs+48(%rip)
> +       vmovdqu %xmm4, xmm_regs+64(%rip)
> +       vmovdqu %xmm5, xmm_regs+80(%rip)
> +       vmovdqu %xmm6, xmm_regs+96(%rip)
> +       vmovdqu %xmm7, xmm_regs+112(%rip)
> +       vmovdqu %xmm8, xmm_regs+128(%rip)
> +       vmovdqu %xmm9, xmm_regs+144(%rip)
> +       vmovdqu %xmm10, xmm_regs+160(%rip)
> +       vmovdqu %xmm11, xmm_regs+176(%rip)
> +       vmovdqu %xmm12, xmm_regs+192(%rip)
> +       vmovdqu %xmm13, xmm_regs+208(%rip)
> +       vmovdqu %xmm14, xmm_regs+224(%rip)
> +       vmovdqu %xmm15, xmm_regs+240(%rip)
> +       jmp     *callthis(%rip)
> +.LFE3:
> +       .size   snapshot, .-snapshot
> +
> +       .p2align 4,,15
> +.globl snapshot_ret
> +       .type   snapshot_ret, @function
> +snapshot_ret:
> +       movq    %rdi, rdi(%rip)
> +       subq    $8, %rsp
> +       call    *callthis(%rip)
> +       addq    $8, %rsp
> +       movq    %rax, rax(%rip)
> +       movq    %rdx, rdx(%rip)
> +       vmovdqu %xmm0, xmm_regs+0(%rip)
> +       vmovdqu %xmm1, xmm_regs+16(%rip)
> +       fstpt   x87_regs(%rip)
> +       fstpt   x87_regs+16(%rip)
> +       fldt    x87_regs+16(%rip)
> +       fldt    x87_regs(%rip)
> +       ret
> +       .size   snapshot_ret, .-snapshot_ret
> +
> +       .comm   callthis,8,8
> +       .comm   rax,8,8
> +       .comm   rbx,8,8
> +       .comm   rcx,8,8
> +       .comm   rdx,8,8
> +       .comm   rsi,8,8
> +       .comm   rdi,8,8
> +       .comm   rsp,8,8
> +       .comm   rbp,8,8
> +       .comm   r8,8,8
> +       .comm   r9,8,8
> +       .comm   r10,8,8
> +       .comm   r11,8,8
> +       .comm   r12,8,8
> +       .comm   r13,8,8
> +       .comm   r14,8,8
> +       .comm   r15,8,8
> +       .comm   xmm_regs,256,32
> +       .comm   x87_regs,128,32
> +       .comm   volatile_var,8,8
> +#ifdef __linux__
> +       .section        .note.GNU-stack,"",@progbits
> +#endif
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
> new file mode 100644
> index 00000000000..25448fc6863
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-check.h
> @@ -0,0 +1,24 @@
> +#include <stdlib.h>
> +#include "bf16-helper.h"
> +
> +static void do_test (void);
> +
> +int
> +main ()
> +{
> +
> +  if (__builtin_cpu_supports ("sse2"))
> +    {
> +      do_test ();
> +#ifdef DEBUG
> +      printf ("PASSED\n");
> +#endif
> +      return 0;
> +    }
> +
> +#ifdef DEBUG
> +  printf ("SKIPPED\n");
> +#endif
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
> new file mode 100644
> index 00000000000..83d89fcf62c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
> @@ -0,0 +1,41 @@
> +typedef union
> +{
> +  float f;
> +  unsigned int u;
> +  __bf16 b[2];
> +} unionf_b;
> +
> +static __bf16 make_f32_bf16 (float f)
> +{
> +  unionf_b tmp;
> +  tmp.f = f;
> +  return tmp.b[1];
> +}
> +
> +static float make_bf16_f32 (__bf16 bf)
> +{
> +  unionf_b tmp;
> +  tmp.u = 0;
> +  tmp.b[1] = bf;
> +  return tmp.f;
> +}
> +
> +static int check_bf16 (__bf16 bf1, __bf16 bf2)
> +{
> +  unionf_b tmp1, tmp2;
> +  tmp1.u = 0;
> +  tmp2.u = 0;
> +  tmp1.b[1] = bf1;
> +  tmp2.b[1] = bf2;
> +  return (tmp1.u == tmp2.u);
> +}
> +
> +static int check_bf16_float (__bf16 bf, float f)
> +{
> +  unionf_b tmp1, tmp2;
> +  tmp1.u = 0;
> +  tmp1.b[0] = bf;
> +  tmp2.f = f;
> +  tmp2.u >>= 16;
> +  return (tmp1.u == tmp2.u);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
> new file mode 100644
> index 00000000000..a4df0b0528d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/defines.h
> @@ -0,0 +1,163 @@
> +#ifndef DEFINED_DEFINES_H
> +#define DEFINED_DEFINES_H
> +
> +/* Get __m64 and __m128. */
> +#include <immintrin.h>
> +
> +typedef unsigned long long ulonglong;
> +typedef long double ldouble;
> +
> +/* These defines determines what part of the test should be run.  When
> +   GCC implements these parts, the defines should be uncommented to
> +   enable testing.  */
> +
> +/* Scalar type __int128.  */
> +/* #define CHECK_INT128 */
> +
> +/* Scalar type long double.  */
> +#define CHECK_LONG_DOUBLE
> +
> +/* Scalar type __float128.  */
> +/* #define CHECK_FLOAT128 */
> +
> +/* Scalar types __m64 and __m128.  */
> +#define CHECK_M64_M128
> +
> +/* Structs with size >= 16.  */
> +#define CHECK_LARGER_STRUCTS
> +
> +/* Checks for passing floats and doubles.  */
> +#define CHECK_FLOAT_DOUBLE_PASSING
> +
> +/* Union passing with not-extremely-simple unions.  */
> +#define CHECK_LARGER_UNION_PASSING
> +
> +/* Variable args.  */
> +#define CHECK_VARARGS
> +
> +/* Check argument passing and returning for scalar types with sizeof = 16.  */
> +/* TODO: Implement these tests. Don't activate them for now.  */
> +#define CHECK_LARGE_SCALAR_PASSING
> +
> +/* Defines for sizing and alignment.  */
> +
> +#define TYPE_SIZE_CHAR         1
> +#define TYPE_SIZE_SHORT        2
> +#define TYPE_SIZE_INT          4
> +#ifdef __ILP32__
> +# define TYPE_SIZE_LONG        4
> +#else
> +# define TYPE_SIZE_LONG        8
> +#endif
> +#define TYPE_SIZE_LONG_LONG    8
> +#define TYPE_SIZE_INT128       16
> +#define TYPE_SIZE_BF16        2
> +#define TYPE_SIZE_FLOAT        4
> +#define TYPE_SIZE_DOUBLE       8
> +#define TYPE_SIZE_LONG_DOUBLE  16
> +#define TYPE_SIZE_FLOAT128     16
> +#define TYPE_SIZE_M64          8
> +#define TYPE_SIZE_M128         16
> +#define TYPE_SIZE_ENUM         4
> +#ifdef __ILP32__
> +# define TYPE_SIZE_POINTER     4
> +#else
> +# define TYPE_SIZE_POINTER     8
> +#endif
> +
> +#define TYPE_ALIGN_CHAR        1
> +#define TYPE_ALIGN_SHORT       2
> +#define TYPE_ALIGN_INT         4
> +#ifdef __ILP32__
> +# define TYPE_ALIGN_LONG       4
> +#else
> +# define TYPE_ALIGN_LONG       8
> +#endif
> +#define TYPE_ALIGN_LONG_LONG   8
> +#define TYPE_ALIGN_INT128      16
> +#define TYPE_ALIGN_BF16               2
> +#define TYPE_ALIGN_FLOAT       4
> +#define TYPE_ALIGN_DOUBLE      8
> +#define TYPE_ALIGN_LONG_DOUBLE 16
> +#define TYPE_ALIGN_FLOAT128    16
> +#define TYPE_ALIGN_M64         8
> +#define TYPE_ALIGN_M128        16
> +#define TYPE_ALIGN_ENUM        4
> +#ifdef __ILP32__
> +# define TYPE_ALIGN_POINTER    4
> +#else
> +# define TYPE_ALIGN_POINTER    8
> +#endif
> +
> +/* These defines control the building of the list of types to check. There
> +   is a string identifying the type (with a comma after), a size of the type
> +   (also with a comma and an integer for adding to the total amount of types)
> +   and an alignment of the type (which is currently not really needed since
> +   the abi specifies that alignof == sizeof for all scalar types).  */
> +#ifdef CHECK_INT128
> +#define CI128_STR "__int128",
> +#define CI128_SIZ TYPE_SIZE_INT128,
> +#define CI128_ALI TYPE_ALIGN_INT128,
> +#define CI128_RET "???",
> +#else
> +#define CI128_STR
> +#define CI128_SIZ
> +#define CI128_ALI
> +#define CI128_RET
> +#endif
> +#ifdef CHECK_LONG_DOUBLE
> +#define CLD_STR "long double",
> +#define CLD_SIZ TYPE_SIZE_LONG_DOUBLE,
> +#define CLD_ALI TYPE_ALIGN_LONG_DOUBLE,
> +#define CLD_RET "x87_regs[0]._ldouble",
> +#else
> +#define CLD_STR
> +#define CLD_SIZ
> +#define CLD_ALI
> +#define CLD_RET
> +#endif
> +#ifdef CHECK_FLOAT128
> +#define CF128_STR "__float128",
> +#define CF128_SIZ TYPE_SIZE_FLOAT128,
> +#define CF128_ALI TYPE_ALIGN_FLOAT128,
> +#define CF128_RET "???",
> +#else
> +#define CF128_STR
> +#define CF128_SIZ
> +#define CF128_ALI
> +#define CF128_RET
> +#endif
> +#ifdef CHECK_M64_M128
> +#define CMM_STR "__m64", "__m128",
> +#define CMM_SIZ TYPE_SIZE_M64, TYPE_SIZE_M128,
> +#define CMM_ALI TYPE_ALIGN_M64, TYPE_ALIGN_M128,
> +#define CMM_RET "???", "???",
> +#else
> +#define CMM_STR
> +#define CMM_SIZ
> +#define CMM_ALI
> +#define CMM_RET
> +#endif
> +
> +/* Used in size and alignment tests.  */
> +enum dummytype { enumtype };
> +
> +extern void abort (void);
> +
> +/* Assertion macro.  */
> +#define assert(test) if (!(test)) abort()
> +
> +#ifdef __GNUC__
> +#define ATTRIBUTE_UNUSED __attribute__((__unused__))
> +#else
> +#define ATTRIBUTE_UNUSED
> +#endif
> +
> +#ifdef __GNUC__
> +#define PACKED __attribute__((__packed__))
> +#else
> +#warning Some tests will fail due to missing __packed__ support
> +#define PACKED
> +#endif
> +
> +#endif /* DEFINED_DEFINES_H */
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
> new file mode 100644
> index 00000000000..309db8ff12e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp
> @@ -0,0 +1,46 @@
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# The x86-64 ABI testsuite needs one additional assembler file for most
> +# testcases.  For simplicity we will just link it into each test.
> +
> +load_lib c-torture.exp
> +load_lib target-supports.exp
> +load_lib torture-options.exp
> +load_lib clearcap.exp
> +
> +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
> +     || ![is-effective-target lp64]
> +     || ![is-effective-target avx2] } then {
> +  return
> +}
> +
> +
> +torture-init
> +clearcap-init
> +set-torture-options $C_TORTURE_OPTIONS
> +set additional_flags "-W -Wall -mavx2"
> +
> +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
> +    if {[runtest_file_p $runtests $src]} {
> +        c-torture-execute [list $src \
> +                                $srcdir/$subdir/asm-support.S] \
> +                                $additional_flags
> +    }
> +}
> +
> +clearcap-finish
> +torture-finish
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
> new file mode 100644
> index 00000000000..94627ffbd44
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/args.h
> @@ -0,0 +1,152 @@
> +#ifndef INCLUDED_ARGS_H
> +#define INCLUDED_ARGS_H
> +
> +#include <immintrin.h>
> +#include <string.h>
> +
> +/* Assertion macro.  */
> +#define assert(test) if (!(test)) abort()
> +
> +#ifdef __GNUC__
> +#define ATTRIBUTE_UNUSED __attribute__((__unused__))
> +#else
> +#define ATTRIBUTE_UNUSED
> +#endif
> +
> +/* This defines the calling sequences for integers and floats.  */
> +#define I0 rdi
> +#define I1 rsi
> +#define I2 rdx
> +#define I3 rcx
> +#define I4 r8
> +#define I5 r9
> +#define F0 ymm0
> +#define F1 ymm1
> +#define F2 ymm2
> +#define F3 ymm3
> +#define F4 ymm4
> +#define F5 ymm5
> +#define F6 ymm6
> +#define F7 ymm7
> +
> +typedef union {
> +  __bf16 ___bf16[16];
> +  float _float[8];
> +  double _double[4];
> +  long long _longlong[4];
> +  int _int[8];
> +  unsigned long long _ulonglong[4];
> +  __m64 _m64[4];
> +  __m128 _m128[2];
> +  __m256 _m256[1];
> +  __m256bf16 _m256bf16[1];
> +} YMM_T;
> +
> +typedef union {
> +  float _float;
> +  double _double;
> +  long double _ldouble;
> +  unsigned long long _ulonglong[2];
> +} X87_T;
> +extern void (*callthis)(void);
> +extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
> +YMM_T ymm_regs[16];
> +X87_T x87_regs[8];
> +extern volatile unsigned long long volatile_var;
> +extern void snapshot (void);
> +extern void snapshot_ret (void);
> +#define WRAP_CALL(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
> +#define WRAP_RET(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
> +
> +/* Clear all integer registers.  */
> +#define clear_int_hardware_registers \
> +  asm __volatile__ ("xor %%rax, %%rax\n\t" \
> +                   "xor %%rbx, %%rbx\n\t" \
> +                   "xor %%rcx, %%rcx\n\t" \
> +                   "xor %%rdx, %%rdx\n\t" \
> +                   "xor %%rsi, %%rsi\n\t" \
> +                   "xor %%rdi, %%rdi\n\t" \
> +                   "xor %%r8, %%r8\n\t" \
> +                   "xor %%r9, %%r9\n\t" \
> +                   "xor %%r10, %%r10\n\t" \
> +                   "xor %%r11, %%r11\n\t" \
> +                   "xor %%r12, %%r12\n\t" \
> +                   "xor %%r13, %%r13\n\t" \
> +                   "xor %%r14, %%r14\n\t" \
> +                   "xor %%r15, %%r15\n\t" \
> +                   ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
> +                   "r9", "r10", "r11", "r12", "r13", "r14", "r15");
> +
> +/* This is the list of registers available for passing arguments. Not all of
> +   these are used or even really available.  */
> +struct IntegerRegisters
> +{
> +  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
> +};
> +struct FloatRegisters
> +{
> +  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
> +  long double st0, st1, st2, st3, st4, st5, st6, st7;
> +  YMM_T ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9,
> +        ymm10, ymm11, ymm12, ymm13, ymm14, ymm15;
> +};
> +
> +/* Implemented in scalarargs.c  */
> +extern struct IntegerRegisters iregs;
> +extern struct FloatRegisters fregs;
> +extern unsigned int num_iregs, num_fregs;
> +
> +/* Clear register struct.  */
> +#define clear_struct_registers \
> +  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
> +    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
> +  memset (&iregs, 0, sizeof (iregs)); \
> +  memset (&fregs, 0, sizeof (fregs)); \
> +  memset (ymm_regs, 0, sizeof (ymm_regs)); \
> +  memset (x87_regs, 0, sizeof (x87_regs));
> +
> +/* Clear both hardware and register structs for integers.  */
> +#define clear_int_registers \
> +  clear_struct_registers \
> +  clear_int_hardware_registers
> +
> +#define check_vector_arguments(T,O) do { \
> +  assert (num_fregs <= 0 \
> +         || memcmp (((char *) &fregs.ymm0) + (O), \
> +                    &ymm_regs[0], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 1 \
> +         || memcmp (((char *) &fregs.ymm1) + (O), \
> +                    &ymm_regs[1], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 2 \
> +         || memcmp (((char *) &fregs.ymm2) + (O), \
> +                    &ymm_regs[2], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 3 \
> +         || memcmp (((char *) &fregs.ymm3) + (O), \
> +                    &ymm_regs[3], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 4 \
> +         || memcmp (((char *) &fregs.ymm4) + (O), \
> +                    &ymm_regs[4], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 5 \
> +         || memcmp (((char *) &fregs.ymm5) + (O), \
> +                    &ymm_regs[5], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 6 \
> +         || memcmp (((char *) &fregs.ymm6) + (O), \
> +                    &ymm_regs[6], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 7 \
> +         || memcmp (((char *) &fregs.ymm7) + (O), \
> +                    &ymm_regs[7], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  } while (0)
> +
> +#define check_m256_arguments check_vector_arguments(m256, 0)
> +
> +#endif /* INCLUDED_ARGS_H  */
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
> new file mode 100644
> index 00000000000..24c8b3c9023
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S
> @@ -0,0 +1,84 @@
> +       .text
> +       .p2align 4,,15
> +.globl snapshot
> +       .type   snapshot, @function
> +snapshot:
> +.LFB3:
> +       movq    %rax, rax(%rip)
> +       movq    %rbx, rbx(%rip)
> +       movq    %rcx, rcx(%rip)
> +       movq    %rdx, rdx(%rip)
> +       movq    %rdi, rdi(%rip)
> +       movq    %rsi, rsi(%rip)
> +       movq    %rbp, rbp(%rip)
> +       movq    %rsp, rsp(%rip)
> +       movq    %r8, r8(%rip)
> +       movq    %r9, r9(%rip)
> +       movq    %r10, r10(%rip)
> +       movq    %r11, r11(%rip)
> +       movq    %r12, r12(%rip)
> +       movq    %r13, r13(%rip)
> +       movq    %r14, r14(%rip)
> +       movq    %r15, r15(%rip)
> +       vmovdqu %ymm0, ymm_regs+0(%rip)
> +       vmovdqu %ymm1, ymm_regs+32(%rip)
> +       vmovdqu %ymm2, ymm_regs+64(%rip)
> +       vmovdqu %ymm3, ymm_regs+96(%rip)
> +       vmovdqu %ymm4, ymm_regs+128(%rip)
> +       vmovdqu %ymm5, ymm_regs+160(%rip)
> +       vmovdqu %ymm6, ymm_regs+192(%rip)
> +       vmovdqu %ymm7, ymm_regs+224(%rip)
> +       vmovdqu %ymm8, ymm_regs+256(%rip)
> +       vmovdqu %ymm9, ymm_regs+288(%rip)
> +       vmovdqu %ymm10, ymm_regs+320(%rip)
> +       vmovdqu %ymm11, ymm_regs+352(%rip)
> +       vmovdqu %ymm12, ymm_regs+384(%rip)
> +       vmovdqu %ymm13, ymm_regs+416(%rip)
> +       vmovdqu %ymm14, ymm_regs+448(%rip)
> +       vmovdqu %ymm15, ymm_regs+480(%rip)
> +       jmp     *callthis(%rip)
> +.LFE3:
> +       .size   snapshot, .-snapshot
> +
> +       .p2align 4,,15
> +.globl snapshot_ret
> +       .type   snapshot_ret, @function
> +snapshot_ret:
> +       movq    %rdi, rdi(%rip)
> +       subq    $8, %rsp
> +       call    *callthis(%rip)
> +       addq    $8, %rsp
> +       movq    %rax, rax(%rip)
> +       movq    %rdx, rdx(%rip)
> +       vmovdqu %ymm0, ymm_regs+0(%rip)
> +       vmovdqu %ymm1, ymm_regs+32(%rip)
> +       fstpt   x87_regs(%rip)
> +       fstpt   x87_regs+16(%rip)
> +       fldt    x87_regs+16(%rip)
> +       fldt    x87_regs(%rip)
> +       ret
> +       .size   snapshot_ret, .-snapshot_ret
> +
> +       .comm   callthis,8,8
> +       .comm   rax,8,8
> +       .comm   rbx,8,8
> +       .comm   rcx,8,8
> +       .comm   rdx,8,8
> +       .comm   rsi,8,8
> +       .comm   rdi,8,8
> +       .comm   rsp,8,8
> +       .comm   rbp,8,8
> +       .comm   r8,8,8
> +       .comm   r9,8,8
> +       .comm   r10,8,8
> +       .comm   r11,8,8
> +       .comm   r12,8,8
> +       .comm   r13,8,8
> +       .comm   r14,8,8
> +       .comm   r15,8,8
> +       .comm   ymm_regs,512,32
> +       .comm   x87_regs,128,32
> +       .comm   volatile_var,8,8
> +#ifdef __linux__
> +       .section        .note.GNU-stack,"",@progbits
> +#endif
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
> new file mode 100644
> index 00000000000..479ebc3ec3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h
> @@ -0,0 +1,24 @@
> +#include <stdlib.h>
> +#include "../bf16-helper.h"
> +
> +static void do_test (void);
> +
> +int
> +main ()
> +{
> +
> +  if (__builtin_cpu_supports ("avx2"))
> +    {
> +      do_test ();
> +#ifdef DEBUG
> +      printf ("PASSED\n");
> +#endif
> +      return 0;
> +    }
> +
> +#ifdef DEBUG
> +  printf ("SKIPPED\n");
> +#endif
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
> new file mode 100644
> index 00000000000..ea7512850ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c
> @@ -0,0 +1,38 @@
> +#include <stdio.h>
> +#include "bf16-ymm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +
> +__m256bf16
> +fun_test_returning___m256bf16 (void)
> +{
> +  volatile_var++;
> +  return (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                       bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
> +}
> +
> +__m256bf16 test_256bf16;
> +
> +static void
> +do_test (void)
> +{
> +  unsigned failed = 0;
> +  YMM_T ymmt1, ymmt2;
> +
> +  clear_struct_registers;
> +  test_256bf16 = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
> +  ymmt1._m256bf16[0] = test_256bf16;
> +  ymmt2._m256bf16[0] = WRAP_RET (fun_test_returning___m256bf16) ();
> +  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
> +    printf ("fail m256bf16\n"), failed++;
> +
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
> new file mode 100644
> index 00000000000..3fb2d7d20f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
> @@ -0,0 +1,235 @@
> +#include <stdio.h>
> +#include "bf16-ymm-check.h"
> +#include "args.h"
> +
> +struct IntegerRegisters iregs;
> +struct FloatRegisters fregs;
> +unsigned int num_iregs, num_fregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
> +    i16, i17, i18, i19, i20, i21, i22, i23;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +fun_check_passing_m256bf16_8_values (__m256bf16 i0 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i1 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i2 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i3 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i4 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i5 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i6 ATTRIBUTE_UNUSED,
> +                                    __m256bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m256bf16);
> +  compare (values.i1, i1, __m256bf16);
> +  compare (values.i2, i2, __m256bf16);
> +  compare (values.i3, i3, __m256bf16);
> +  compare (values.i4, i4, __m256bf16);
> +  compare (values.i5, i5, __m256bf16);
> +  compare (values.i6, i6, __m256bf16);
> +  compare (values.i7, i7, __m256bf16);
> +}
> +
> +void
> +fun_check_passing_m256bf16_8_regs (__m256bf16 i0 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i1 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i2 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i3 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i4 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i5 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i6 ATTRIBUTE_UNUSED,
> +                                  __m256bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m256_arguments;
> +}
> +
> +void
> +fun_check_passing_m256bf16_20_values (__m256bf16 i0 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i1 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i2 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i3 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i4 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i5 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i6 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i7 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i8 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i9 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i10 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i11 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i12 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i13 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i14 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i15 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i16 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i17 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i18 ATTRIBUTE_UNUSED,
> +                                     __m256bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m256bf16);
> +  compare (values.i1, i1, __m256bf16);
> +  compare (values.i2, i2, __m256bf16);
> +  compare (values.i3, i3, __m256bf16);
> +  compare (values.i4, i4, __m256bf16);
> +  compare (values.i5, i5, __m256bf16);
> +  compare (values.i6, i6, __m256bf16);
> +  compare (values.i7, i7, __m256bf16);
> +  compare (values.i8, i8, __m256bf16);
> +  compare (values.i9, i9, __m256bf16);
> +  compare (values.i10, i10, __m256bf16);
> +  compare (values.i11, i11, __m256bf16);
> +  compare (values.i12, i12, __m256bf16);
> +  compare (values.i13, i13, __m256bf16);
> +  compare (values.i14, i14, __m256bf16);
> +  compare (values.i15, i15, __m256bf16);
> +  compare (values.i16, i16, __m256bf16);
> +  compare (values.i17, i17, __m256bf16);
> +  compare (values.i18, i18, __m256bf16);
> +  compare (values.i19, i19, __m256bf16);
> +}
> +
> +void
> +fun_check_passing_m256bf16_20_regs (__m256bf16 i0 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i1 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i2 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i3 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i4 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i5 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i6 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i7 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i8 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i9 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i10 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i11 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i12 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i13 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i14 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i15 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i16 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i17 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i18 ATTRIBUTE_UNUSED,
> +                                   __m256bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m256_arguments;
> +}
> +
> +#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
> +
> +#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, \
> +                           _i8, _i9, _i10, _i11, _i12, _i13, _i14, \
> +                           _i15, _i16, _i17, _i18, _i19, _func1, \
> +                           _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  values.i10.TYPE[0] = _i10; \
> +  values.i11.TYPE[0] = _i11; \
> +  values.i12.TYPE[0] = _i12; \
> +  values.i13.TYPE[0] = _i13; \
> +  values.i14.TYPE[0] = _i14; \
> +  values.i15.TYPE[0] = _i15; \
> +  values.i16.TYPE[0] = _i16; \
> +  values.i17.TYPE[0] = _i17; \
> +  values.i18.TYPE[0] = _i18; \
> +  values.i19.TYPE[0] = _i19; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
> +                    _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
> +                    _i16, _i17, _i18, _i19); \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
> +                    _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
> +                    _i16, _i17, _i18, _i19);
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +
> +void
> +test_m256bf16_on_stack ()
> +{
> +  __m256bf16 x[8];
> +  int i;
> +  for (i = 0; i < 8; i++)
> +    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
> +  pass = "m256bf16-8";
> +  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +                     fun_check_passing_m256bf16_8_values,
> +                     fun_check_passing_m256bf16_8_regs, _m256bf16);
> +}
> +
> +void
> +test_too_many_m256bf16 ()
> +{
> +  __m256bf16 x[20];
> +  int i;
> +  for (i = 0; i < 20; i++)
> +    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
> +  pass = "m256bf16-20";
> +  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
> +                      x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
> +                      x[17], x[18], x[19], fun_check_passing_m256bf16_20_values,
> +                      fun_check_passing_m256bf16_20_regs, _m256bf16);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  test_m256bf16_on_stack ();
> +  test_too_many_m256bf16 ();
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
> new file mode 100644
> index 00000000000..e06350ed493
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c
> @@ -0,0 +1,69 @@
> +#include "bf16-ymm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +struct m256bf16_struct
> +{
> +  __m256bf16 x;
> +};
> +
> +struct m256bf16_2_struct
> +{
> +  __m256bf16 x1, x2;
> +};
> +
> +/* Check that the struct is passed as the individual members in fregs.  */
> +void
> +check_struct_passing1bf16 (struct m256bf16_struct ms1 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms2 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms3 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms4 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms5 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms6 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms7 ATTRIBUTE_UNUSED,
> +                          struct m256bf16_struct ms8 ATTRIBUTE_UNUSED)
> +{
> +  check_m256_arguments;
> +}
> +
> +void
> +check_struct_passing2bf16 (struct m256bf16_2_struct ms ATTRIBUTE_UNUSED)
> +{
> +  /* Check the passing on the stack by comparing the address of the
> +     stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&ms.x1 == rsp+8);
> +  assert ((unsigned long)&ms.x2 == rsp+40);
> +}
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +
> +static void
> +do_test (void)
> +{
> +  struct m256bf16_struct m256bf16s [8];
> +  struct m256bf16_2_struct m256bf16_2s = {
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16},
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16},
> +  };
> +  int i;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      m256bf16s[i].x = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                                     bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16};
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.ymm0)[i]._m256bf16[0] = m256bf16s[i].x;
> +  num_fregs = 8;
> +  WRAP_CALL (check_struct_passing1bf16) (m256bf16s[0], m256bf16s[1], m256bf16s[2], m256bf16s[3],
> +                                        m256bf16s[4], m256bf16s[5], m256bf16s[6], m256bf16s[7]);
> +  WRAP_CALL (check_struct_passing2bf16) (m256bf16_2s);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
> new file mode 100644
> index 00000000000..6d663b88b1a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c
> @@ -0,0 +1,179 @@
> +#include "bf16-ymm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +union un1b
> +{
> +  __m256bf16 x;
> +  float f;
> +};
> +
> +union un1bb
> +{
> +  __m256bf16 x;
> +  __bf16 f;
> +};
> +
> +union un2b
> +{
> +  __m256bf16 x;
> +  double d;
> +};
> +
> +union un3b
> +{
> +  __m256bf16 x;
> +  __m128 v;
> +};
> +
> +union un4b
> +{
> +  __m256bf16 x;
> +  long double ld;
> +};
> +
> +union un5b
> +{
> +  __m256bf16 x;
> +  int i;
> +};
> +
> +void
> +check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
> +                      union un1b u2 ATTRIBUTE_UNUSED,
> +                      union un1b u3 ATTRIBUTE_UNUSED,
> +                      union un1b u4 ATTRIBUTE_UNUSED,
> +                      union un1b u5 ATTRIBUTE_UNUSED,
> +                      union un1b u6 ATTRIBUTE_UNUSED,
> +                      union un1b u7 ATTRIBUTE_UNUSED,
> +                      union un1b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m256_arguments;
> +}
> +
> +void
> +check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
> +                       union un1bb u2 ATTRIBUTE_UNUSED,
> +                       union un1bb u3 ATTRIBUTE_UNUSED,
> +                       union un1bb u4 ATTRIBUTE_UNUSED,
> +                       union un1bb u5 ATTRIBUTE_UNUSED,
> +                       union un1bb u6 ATTRIBUTE_UNUSED,
> +                       union un1bb u7 ATTRIBUTE_UNUSED,
> +                       union un1bb u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m256_arguments;
> +}
> +
> +void
> +check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
> +                      union un2b u2 ATTRIBUTE_UNUSED,
> +                      union un2b u3 ATTRIBUTE_UNUSED,
> +                      union un2b u4 ATTRIBUTE_UNUSED,
> +                      union un2b u5 ATTRIBUTE_UNUSED,
> +                      union un2b u6 ATTRIBUTE_UNUSED,
> +                      union un2b u7 ATTRIBUTE_UNUSED,
> +                      union un2b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m256_arguments;
> +}
> +
> +void
> +check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
> +                      union un3b u2 ATTRIBUTE_UNUSED,
> +                      union un3b u3 ATTRIBUTE_UNUSED,
> +                      union un3b u4 ATTRIBUTE_UNUSED,
> +                      union un3b u5 ATTRIBUTE_UNUSED,
> +                      union un3b u6 ATTRIBUTE_UNUSED,
> +                      union un3b u7 ATTRIBUTE_UNUSED,
> +                      union un3b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m256_arguments;
> +}
> +
> +void
> +check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
> +{
> +   /* Check the passing on the stack by comparing the address of the
> +      stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&u.x == rsp+8);
> +  assert ((unsigned long)&u.ld == rsp+8);
> +}
> +
> +void
> +check_union_passing5b (union un5b u ATTRIBUTE_UNUSED)
> +{
> +   /* Check the passing on the stack by comparing the address of the
> +      stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&u.x == rsp+8);
> +  assert ((unsigned long)&u.i == rsp+8);
> +}
> +
> +#define check_union_passing1b WRAP_CALL(check_union_passing1b)
> +#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
> +#define check_union_passing2b WRAP_CALL(check_union_passing2b)
> +#define check_union_passing3b WRAP_CALL(check_union_passing3b)
> +#define check_union_passing4b WRAP_CALL(check_union_passing4b)
> +#define check_union_passing5b WRAP_CALL(check_union_passing5b)
> +
> +static void
> +do_test (void)
> +{
> +  union un1b u1b[8];
> +  union un1bb u1bb[8];
> +  union un2b u2b[8];
> +  union un3b u3b[8];
> +  union un4b u4b;
> +  union un5b u5b;
> +  int i;
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +        bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1b[i].x = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16 };
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.ymm0)[i]._m256bf16[0] = u1b[i].x;
> +  num_fregs = 8;
> +  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
> +                        u1b[4], u1b[5], u1b[6], u1b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1bb[i].x = u1b[i].x;
> +      (&fregs.ymm0)[i]._m256bf16[0] = u1bb[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
> +                         u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u2b[i].x = u1b[i].x;
> +      (&fregs.ymm0)[i]._m256bf16[0] = u2b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
> +                        u2b[4], u2b[5], u2b[6], u2b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u3b[i].x = u1b[i].x;
> +      (&fregs.ymm0)[i]._m256bf16[0] = u3b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
> +                        u3b[4], u3b[5], u3b[6], u3b[7]);
> +
> +  check_union_passing4b (u4b);
> +  check_union_passing5b (u5b);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
> new file mode 100644
> index 00000000000..b69e095d808
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c
> @@ -0,0 +1,107 @@
> +/* Test variable number of 256-bit vector arguments passed to functions.  */
> +
> +#include <stdio.h>
> +#include "bf16-ymm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +void
> +fun_check_passing_m256bf16_varargs (__m256bf16 i0, __m256bf16 i1, __m256bf16 i2,
> +                                __m256bf16 i3, ...)
> +{
> +  /* Check argument values.  */
> +  void **fp = __builtin_frame_address (0);
> +  void *ra = __builtin_return_address (0);
> +  __m256bf16 *argp;
> +
> +  compare (values.i0, i0, __m256bf16);
> +  compare (values.i1, i1, __m256bf16);
> +  compare (values.i2, i2, __m256bf16);
> +  compare (values.i3, i3, __m256bf16);
> +
> +  /* Get the pointer to the return address on stack.  */
> +  while (*fp != ra)
> +    fp++;
> +
> +  /* Skip the return address stack slot.  */
> +  argp = (__m256bf16 *)(((char *) fp) + 8);
> +
> +  /* Check __m256bf16 arguments passed on stack.  */
> +  compare (values.i4, argp[0], __m256bf16);
> +  compare (values.i5, argp[1], __m256bf16);
> +  compare (values.i6, argp[2], __m256bf16);
> +  compare (values.i7, argp[3], __m256bf16);
> +  compare (values.i8, argp[4], __m256bf16);
> +  compare (values.i9, argp[5], __m256bf16);
> +
> +  /* Check register contents.  */
> +  compare (fregs.ymm0, ymm_regs[0], __m256bf16);
> +  compare (fregs.ymm1, ymm_regs[1], __m256bf16);
> +  compare (fregs.ymm2, ymm_regs[2], __m256bf16);
> +  compare (fregs.ymm3, ymm_regs[3], __m256bf16);
> +}
> +
> +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
> +                                     _i6, _i7, _i8, _i9, \
> +                                     _func, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
> +
> +void
> +test_m256bf16_varargs (void)
> +{
> +  __m256bf16 x[10];
> +  int i;
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +  for (i = 0; i < 10; i++)
> +    x[i] = (__m256bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16 };
> +  pass = "m256bf16-varargs";
> +  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
> +                                x[6], x[7], x[8], x[9],
> +                                fun_check_passing_m256bf16_varargs,
> +                                _m256bf16);
> +}
> +
> +void
> +do_test (void)
> +{
> +  test_m256bf16_varargs ();
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
> new file mode 100644
> index 00000000000..b6e0fed4cb4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp
> @@ -0,0 +1,46 @@
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# The x86-64 ABI testsuite needs one additional assembler file for most
> +# testcases.  For simplicity we will just link it into each test.
> +
> +load_lib c-torture.exp
> +load_lib target-supports.exp
> +load_lib torture-options.exp
> +load_lib clearcap.exp
> +
> +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
> +     || ![is-effective-target lp64]
> +     || ![is-effective-target avx512f] } then {
> +  return
> +}
> +
> +
> +torture-init
> +clearcap-init
> +set-torture-options $C_TORTURE_OPTIONS
> +set additional_flags "-W -Wall -mavx512f"
> +
> +foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
> +    if {[runtest_file_p $runtests $src]} {
> +        c-torture-execute [list $src \
> +                                $srcdir/$subdir/asm-support.S] \
> +                                $additional_flags
> +    }
> +}
> +
> +clearcap-finish
> +torture-finish
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
> new file mode 100644
> index 00000000000..64b24783833
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/args.h
> @@ -0,0 +1,155 @@
> +#ifndef INCLUDED_ARGS_H
> +#define INCLUDED_ARGS_H
> +
> +#include <immintrin.h>
> +#include <string.h>
> +
> +/* Assertion macro.  */
> +#define assert(test) if (!(test)) abort()
> +
> +#ifdef __GNUC__
> +#define ATTRIBUTE_UNUSED __attribute__((__unused__))
> +#else
> +#define ATTRIBUTE_UNUSED
> +#endif
> +
> +/* This defines the calling sequences for integers and floats.  */
> +#define I0 rdi
> +#define I1 rsi
> +#define I2 rdx
> +#define I3 rcx
> +#define I4 r8
> +#define I5 r9
> +#define F0 zmm0
> +#define F1 zmm1
> +#define F2 zmm2
> +#define F3 zmm3
> +#define F4 zmm4
> +#define F5 zmm5
> +#define F6 zmm6
> +#define F7 zmm7
> +
> +typedef union {
> +  __bf16 ___bf16[32];
> +  float _float[16];
> +  double _double[8];
> +  long long _longlong[8];
> +  int _int[16];
> +  unsigned long long _ulonglong[8];
> +  __m64 _m64[8];
> +  __m128 _m128[4];
> +  __m256 _m256[2];
> +  __m512 _m512[1];
> +  __m512bf16 _m512bf16[1];
> +} ZMM_T;
> +
> +typedef union {
> +  float _float;
> +  double _double;
> +  long double _ldouble;
> +  unsigned long long _ulonglong[2];
> +} X87_T;
> +extern void (*callthis)(void);
> +extern unsigned long long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
> +ZMM_T zmm_regs[32];
> +X87_T x87_regs[8];
> +extern volatile unsigned long long volatile_var;
> +extern void snapshot (void);
> +extern void snapshot_ret (void);
> +#define WRAP_CALL(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
> +#define WRAP_RET(N) \
> +  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
> +
> +/* Clear all integer registers.  */
> +#define clear_int_hardware_registers \
> +  asm __volatile__ ("xor %%rax, %%rax\n\t" \
> +                   "xor %%rbx, %%rbx\n\t" \
> +                   "xor %%rcx, %%rcx\n\t" \
> +                   "xor %%rdx, %%rdx\n\t" \
> +                   "xor %%rsi, %%rsi\n\t" \
> +                   "xor %%rdi, %%rdi\n\t" \
> +                   "xor %%r8, %%r8\n\t" \
> +                   "xor %%r9, %%r9\n\t" \
> +                   "xor %%r10, %%r10\n\t" \
> +                   "xor %%r11, %%r11\n\t" \
> +                   "xor %%r12, %%r12\n\t" \
> +                   "xor %%r13, %%r13\n\t" \
> +                   "xor %%r14, %%r14\n\t" \
> +                   "xor %%r15, %%r15\n\t" \
> +                   ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
> +                   "r9", "r10", "r11", "r12", "r13", "r14", "r15");
> +
> +/* This is the list of registers available for passing arguments. Not all of
> +   these are used or even really available.  */
> +struct IntegerRegisters
> +{
> +  unsigned long long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
> +};
> +struct FloatRegisters
> +{
> +  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
> +  long double st0, st1, st2, st3, st4, st5, st6, st7;
> +  ZMM_T zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7, zmm8, zmm9,
> +        zmm10, zmm11, zmm12, zmm13, zmm14, zmm15, zmm16, zmm17, zmm18,
> +       zmm19, zmm20, zmm21, zmm22, zmm23, zmm24, zmm25, zmm26, zmm27,
> +       zmm28, zmm29, zmm30, zmm31;
> +};
> +
> +/* Implemented in scalarargs.c  */
> +extern struct IntegerRegisters iregs;
> +extern struct FloatRegisters fregs;
> +extern unsigned int num_iregs, num_fregs;
> +
> +/* Clear register struct.  */
> +#define clear_struct_registers \
> +  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
> +    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
> +  memset (&iregs, 0, sizeof (iregs)); \
> +  memset (&fregs, 0, sizeof (fregs)); \
> +  memset (zmm_regs, 0, sizeof (zmm_regs)); \
> +  memset (x87_regs, 0, sizeof (x87_regs));
> +
> +/* Clear both hardware and register structs for integers.  */
> +#define clear_int_registers \
> +  clear_struct_registers \
> +  clear_int_hardware_registers
> +
> +#define check_vector_arguments(T,O) do { \
> +  assert (num_fregs <= 0 \
> +         || memcmp (((char *) &fregs.zmm0) + (O), \
> +                    &zmm_regs[0], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 1 \
> +         || memcmp (((char *) &fregs.zmm1) + (O), \
> +                    &zmm_regs[1], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 2 \
> +         || memcmp (((char *) &fregs.zmm2) + (O), \
> +                    &zmm_regs[2], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 3 \
> +         || memcmp (((char *) &fregs.zmm3) + (O), \
> +                    &zmm_regs[3], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 4 \
> +         || memcmp (((char *) &fregs.zmm4) + (O), \
> +                    &zmm_regs[4], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 5 \
> +         || memcmp (((char *) &fregs.zmm5) + (O), \
> +                    &zmm_regs[5], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 6 \
> +         || memcmp (((char *) &fregs.zmm6) + (O), \
> +                    &zmm_regs[6], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  assert (num_fregs <= 7 \
> +         || memcmp (((char *) &fregs.zmm7) + (O), \
> +                    &zmm_regs[7], \
> +                    sizeof (__ ## T) - (O)) == 0); \
> +  } while (0)
> +
> +#define check_m512_arguments check_vector_arguments(m512, 0)
> +
> +#endif /* INCLUDED_ARGS_H  */
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
> new file mode 100644
> index 00000000000..86d54d11c58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S
> @@ -0,0 +1,100 @@
> +       .text
> +       .p2align 4,,15
> +.globl snapshot
> +       .type   snapshot, @function
> +snapshot:
> +.LFB3:
> +       movq    %rax, rax(%rip)
> +       movq    %rbx, rbx(%rip)
> +       movq    %rcx, rcx(%rip)
> +       movq    %rdx, rdx(%rip)
> +       movq    %rdi, rdi(%rip)
> +       movq    %rsi, rsi(%rip)
> +       movq    %rbp, rbp(%rip)
> +       movq    %rsp, rsp(%rip)
> +       movq    %r8, r8(%rip)
> +       movq    %r9, r9(%rip)
> +       movq    %r10, r10(%rip)
> +       movq    %r11, r11(%rip)
> +       movq    %r12, r12(%rip)
> +       movq    %r13, r13(%rip)
> +       movq    %r14, r14(%rip)
> +       movq    %r15, r15(%rip)
> +       vmovdqu32 %zmm0, zmm_regs+0(%rip)
> +       vmovdqu32 %zmm1, zmm_regs+64(%rip)
> +       vmovdqu32 %zmm2, zmm_regs+128(%rip)
> +       vmovdqu32 %zmm3, zmm_regs+192(%rip)
> +       vmovdqu32 %zmm4, zmm_regs+256(%rip)
> +       vmovdqu32 %zmm5, zmm_regs+320(%rip)
> +       vmovdqu32 %zmm6, zmm_regs+384(%rip)
> +       vmovdqu32 %zmm7, zmm_regs+448(%rip)
> +       vmovdqu32 %zmm8, zmm_regs+512(%rip)
> +       vmovdqu32 %zmm9, zmm_regs+576(%rip)
> +       vmovdqu32 %zmm10, zmm_regs+640(%rip)
> +       vmovdqu32 %zmm11, zmm_regs+704(%rip)
> +       vmovdqu32 %zmm12, zmm_regs+768(%rip)
> +       vmovdqu32 %zmm13, zmm_regs+832(%rip)
> +       vmovdqu32 %zmm14, zmm_regs+896(%rip)
> +       vmovdqu32 %zmm15, zmm_regs+960(%rip)
> +       vmovdqu32 %zmm16, zmm_regs+1024(%rip)
> +       vmovdqu32 %zmm17, zmm_regs+1088(%rip)
> +       vmovdqu32 %zmm18, zmm_regs+1152(%rip)
> +       vmovdqu32 %zmm19, zmm_regs+1216(%rip)
> +       vmovdqu32 %zmm20, zmm_regs+1280(%rip)
> +       vmovdqu32 %zmm21, zmm_regs+1344(%rip)
> +       vmovdqu32 %zmm22, zmm_regs+1408(%rip)
> +       vmovdqu32 %zmm23, zmm_regs+1472(%rip)
> +       vmovdqu32 %zmm24, zmm_regs+1536(%rip)
> +       vmovdqu32 %zmm25, zmm_regs+1600(%rip)
> +       vmovdqu32 %zmm26, zmm_regs+1664(%rip)
> +       vmovdqu32 %zmm27, zmm_regs+1728(%rip)
> +       vmovdqu32 %zmm28, zmm_regs+1792(%rip)
> +       vmovdqu32 %zmm29, zmm_regs+1856(%rip)
> +       vmovdqu32 %zmm30, zmm_regs+1920(%rip)
> +       vmovdqu32 %zmm31, zmm_regs+1984(%rip)
> +       jmp     *callthis(%rip)
> +.LFE3:
> +       .size   snapshot, .-snapshot
> +
> +       .p2align 4,,15
> +.globl snapshot_ret
> +       .type   snapshot_ret, @function
> +snapshot_ret:
> +       movq    %rdi, rdi(%rip)
> +       subq    $8, %rsp
> +       call    *callthis(%rip)
> +       addq    $8, %rsp
> +       movq    %rax, rax(%rip)
> +       movq    %rdx, rdx(%rip)
> +       vmovdqu32       %zmm0, zmm_regs+0(%rip)
> +       vmovdqu32       %zmm1, zmm_regs+64(%rip)
> +       fstpt   x87_regs(%rip)
> +       fstpt   x87_regs+16(%rip)
> +       fldt    x87_regs+16(%rip)
> +       fldt    x87_regs(%rip)
> +       ret
> +       .size   snapshot_ret, .-snapshot_ret
> +
> +       .comm   callthis,8,8
> +       .comm   rax,8,8
> +       .comm   rbx,8,8
> +       .comm   rcx,8,8
> +       .comm   rdx,8,8
> +       .comm   rsi,8,8
> +       .comm   rdi,8,8
> +       .comm   rsp,8,8
> +       .comm   rbp,8,8
> +       .comm   r8,8,8
> +       .comm   r9,8,8
> +       .comm   r10,8,8
> +       .comm   r11,8,8
> +       .comm   r12,8,8
> +       .comm   r13,8,8
> +       .comm   r14,8,8
> +       .comm   r15,8,8
> +       .comm   zmm_regs,2048,64
> +       .comm   x87_regs,128,32
> +       .comm   volatile_var,8,8
> +#ifdef __linux__
> +       .section        .note.GNU-stack,"",@progbits
> +#endif
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
> new file mode 100644
> index 00000000000..8379fcfaf8c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
> @@ -0,0 +1,23 @@
> +#include <stdlib.h>
> +
> +static void do_test (void);
> +
> +int
> +main ()
> +{
> +
> +  if (__builtin_cpu_supports ("avx512f"))
> +    {
> +      do_test ();
> +#ifdef DEBUG
> +      printf ("PASSED\n");
> +#endif
> +      return 0;
> +    }
> +
> +#ifdef DEBUG
> +  printf ("SKIPPED\n");
> +#endif
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
> new file mode 100644
> index 00000000000..1a2500bd883
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c
> @@ -0,0 +1,44 @@
> +#include <stdio.h>
> +#include "bf16-zmm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +               bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +               bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
> +
> +__m512bf16
> +fun_test_returning___m512bf16 (void)
> +{
> +  volatile_var++;
> +  return (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                       bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                       bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                       bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +}
> +
> +__m512bf16 test_512bf16;
> +
> +static void
> +do_test (void)
> +{
> +  unsigned failed = 0;
> +  ZMM_T zmmt1, zmmt2;
> +
> +  clear_struct_registers;
> +  test_512bf16 = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                               bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                               bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +  zmmt1._m512bf16[0] = test_512bf16;
> +  zmmt2._m512bf16[0] = WRAP_RET (fun_test_returning___m512bf16)();
> +  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
> +    printf ("fail m512bf16\n"), failed++;
> +
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
> new file mode 100644
> index 00000000000..1c5c407efee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
> @@ -0,0 +1,243 @@
> +#include <stdio.h>
> +#include "bf16-zmm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
> +    i16, i17, i18, i19, i20, i21, i22, i23;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +fun_check_passing_m512bf16_8_values (__m512bf16 i0 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i1 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i2 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i3 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i4 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i5 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i6 ATTRIBUTE_UNUSED,
> +                                    __m512bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m512bf16);
> +  compare (values.i1, i1, __m512bf16);
> +  compare (values.i2, i2, __m512bf16);
> +  compare (values.i3, i3, __m512bf16);
> +  compare (values.i4, i4, __m512bf16);
> +  compare (values.i5, i5, __m512bf16);
> +  compare (values.i6, i6, __m512bf16);
> +  compare (values.i7, i7, __m512bf16);
> +}
> +
> +void
> +fun_check_passing_m512bf16_8_regs (__m512bf16 i0 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i1 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i2 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i3 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i4 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i5 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i6 ATTRIBUTE_UNUSED,
> +                                  __m512bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +void
> +fun_check_passing_m512bf16_20_values (__m512bf16 i0 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i1 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i2 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i3 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i4 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i5 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i6 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i7 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i8 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i9 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i10 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i11 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i12 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i13 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i14 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i15 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i16 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i17 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i18 ATTRIBUTE_UNUSED,
> +                                     __m512bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m512bf16);
> +  compare (values.i1, i1, __m512bf16);
> +  compare (values.i2, i2, __m512bf16);
> +  compare (values.i3, i3, __m512bf16);
> +  compare (values.i4, i4, __m512bf16);
> +  compare (values.i5, i5, __m512bf16);
> +  compare (values.i6, i6, __m512bf16);
> +  compare (values.i7, i7, __m512bf16);
> +  compare (values.i8, i8, __m512bf16);
> +  compare (values.i9, i9, __m512bf16);
> +  compare (values.i10, i10, __m512bf16);
> +  compare (values.i11, i11, __m512bf16);
> +  compare (values.i12, i12, __m512bf16);
> +  compare (values.i13, i13, __m512bf16);
> +  compare (values.i14, i14, __m512bf16);
> +  compare (values.i15, i15, __m512bf16);
> +  compare (values.i16, i16, __m512bf16);
> +  compare (values.i17, i17, __m512bf16);
> +  compare (values.i18, i18, __m512bf16);
> +  compare (values.i19, i19, __m512bf16);
> +}
> +
> +void
> +fun_check_passing_m512bf16_20_regs (__m512bf16 i0 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i1 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i2 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i3 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i4 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i5 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i6 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i7 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i8 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i9 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i10 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i11 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i12 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i13 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i14 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i15 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i16 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i17 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i18 ATTRIBUTE_UNUSED,
> +                                   __m512bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
> +  \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
> +
> +#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
> +                           _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
> +                           _i18, _i19, _func1, _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  values.i10.TYPE[0] = _i10; \
> +  values.i11.TYPE[0] = _i11; \
> +  values.i12.TYPE[0] = _i12; \
> +  values.i13.TYPE[0] = _i13; \
> +  values.i14.TYPE[0] = _i14; \
> +  values.i15.TYPE[0] = _i15; \
> +  values.i16.TYPE[0] = _i16; \
> +  values.i17.TYPE[0] = _i17; \
> +  values.i18.TYPE[0] = _i18; \
> +  values.i19.TYPE[0] = _i19; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
> +                    _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
> +                    _i18, _i19); \
> +  \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
> +                    _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
> +                    _i18, _i19);
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +               bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +               bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
> +
> +void
> +test_m512bf16_on_stack ()
> +{
> +  __m512bf16 x[8];
> +  int i;
> +  for (i = 0; i < 8; i++)
> +    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                         bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                         bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +
> +  pass = "m512bf16-8";
> +  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +                     fun_check_passing_m512bf16_8_values,
> +                     fun_check_passing_m512bf16_8_regs, _m512bf16);
> +}
> +
> +void
> +test_too_many_m512bf16 ()
> +{
> +  __m512bf16 x[20];
> +  int i;
> +  for (i = 0; i < 20; i++)
> +    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                         bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                         bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +  pass = "m512bf16-20";
> +  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
> +                      x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
> +                      x[17], x[18], x[19], fun_check_passing_m512bf16_20_values,
> +                      fun_check_passing_m512bf16_20_regs, _m512bf16);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  test_m512bf16_on_stack ();
> +  test_too_many_m512bf16 ();
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
> new file mode 100644
> index 00000000000..f93a2b81086
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c
> @@ -0,0 +1,77 @@
> +#include "bf16-zmm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +struct m512bf16_struct
> +{
> +  __m512bf16 x;
> +};
> +
> +struct m512bf16_2_struct
> +{
> +  __m512bf16 x1, x2;
> +};
> +
> +/* Check that the struct is passed as the individual members in fregs.  */
> +void
> +check_struct_passing1bf16 (struct m512bf16_struct ms1 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms2 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms3 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms4 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms5 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms6 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms7 ATTRIBUTE_UNUSED,
> +                          struct m512bf16_struct ms8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +void
> +check_struct_passing2bf16 (struct m512bf16_2_struct ms ATTRIBUTE_UNUSED)
> +{
> +  /* Check the passing on the stack by comparing the address of the
> +     stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&ms.x1 == rsp+8);
> +  assert ((unsigned long)&ms.x2 == rsp+72);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +        bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +        bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +        bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
> +  struct m512bf16_struct m512bf16s [8];
> +  struct m512bf16_2_struct m512bf16_2s = {
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +      bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +      bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 },
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +      bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +      bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +      bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 }
> +  };
> +  int i;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      m512bf16s[i].x = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                                     bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                                     bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                                     bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.zmm0)[i]._m512bf16[0] = m512bf16s[i].x;
> +  num_fregs = 8;
> +  WRAP_CALL (check_struct_passing1bf16) (m512bf16s[0], m512bf16s[1], m512bf16s[2], m512bf16s[3],
> +                                        m512bf16s[4], m512bf16s[5], m512bf16s[6], m512bf16s[7]);
> +  WRAP_CALL (check_struct_passing2bf16) (m512bf16_2s);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
> new file mode 100644
> index 00000000000..3769b38aeb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c
> @@ -0,0 +1,222 @@
> +#include "bf16-zmm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +union un1b
> +{
> +  __m512bf16 x;
> +  float f;
> +};
> +
> +union un1bb
> +{
> +  __m512bf16 x;
> +  __bf16 f;
> +};
> +
> +union un2b
> +{
> +  __m512bf16 x;
> +  double d;
> +};
> +
> +union un3b
> +{
> +  __m512bf16 x;
> +  __m128 v;
> +};
> +
> +union un4b
> +{
> +  __m512bf16 x;
> +  long double ld;
> +};
> +
> +union un5b
> +{
> +  __m512bf16 x;
> +  int i;
> +};
> +
> +union un6b
> +{
> +  __m512bf16 x;
> +  __m256 v;
> +};
> +
> +void
> +check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
> +                      union un1b u2 ATTRIBUTE_UNUSED,
> +                      union un1b u3 ATTRIBUTE_UNUSED,
> +                      union un1b u4 ATTRIBUTE_UNUSED,
> +                      union un1b u5 ATTRIBUTE_UNUSED,
> +                      union un1b u6 ATTRIBUTE_UNUSED,
> +                      union un1b u7 ATTRIBUTE_UNUSED,
> +                      union un1b u8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +void
> +check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
> +                       union un1bb u2 ATTRIBUTE_UNUSED,
> +                       union un1bb u3 ATTRIBUTE_UNUSED,
> +                       union un1bb u4 ATTRIBUTE_UNUSED,
> +                       union un1bb u5 ATTRIBUTE_UNUSED,
> +                       union un1bb u6 ATTRIBUTE_UNUSED,
> +                       union un1bb u7 ATTRIBUTE_UNUSED,
> +                       union un1bb u8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +
> +void
> +check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
> +                      union un2b u2 ATTRIBUTE_UNUSED,
> +                      union un2b u3 ATTRIBUTE_UNUSED,
> +                      union un2b u4 ATTRIBUTE_UNUSED,
> +                      union un2b u5 ATTRIBUTE_UNUSED,
> +                      union un2b u6 ATTRIBUTE_UNUSED,
> +                      union un2b u7 ATTRIBUTE_UNUSED,
> +                      union un2b u8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +void
> +check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
> +                      union un3b u2 ATTRIBUTE_UNUSED,
> +                      union un3b u3 ATTRIBUTE_UNUSED,
> +                      union un3b u4 ATTRIBUTE_UNUSED,
> +                      union un3b u5 ATTRIBUTE_UNUSED,
> +                      union un3b u6 ATTRIBUTE_UNUSED,
> +                      union un3b u7 ATTRIBUTE_UNUSED,
> +                      union un3b u8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +void
> +check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
> +{
> +   /* Check the passing on the stack by comparing the address of the
> +      stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&u.x == rsp+8);
> +  assert ((unsigned long)&u.ld == rsp+8);
> +}
> +
> +void
> +check_union_passing5b (union un5b u ATTRIBUTE_UNUSED)
> +{
> +   /* Check the passing on the stack by comparing the address of the
> +      stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&u.x == rsp+8);
> +  assert ((unsigned long)&u.i == rsp+8);
> +}
> +
> +void
> +check_union_passing6b (union un6b u1 ATTRIBUTE_UNUSED,
> +                      union un6b u2 ATTRIBUTE_UNUSED,
> +                      union un6b u3 ATTRIBUTE_UNUSED,
> +                      union un6b u4 ATTRIBUTE_UNUSED,
> +                      union un6b u5 ATTRIBUTE_UNUSED,
> +                      union un6b u6 ATTRIBUTE_UNUSED,
> +                      union un6b u7 ATTRIBUTE_UNUSED,
> +                      union un6b u8 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m512_arguments;
> +}
> +
> +#define check_union_passing1b WRAP_CALL(check_union_passing1b)
> +#define check_union_passing1bf WRAP_CALL(check_union_passing1bf)
> +#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
> +#define check_union_passing2b WRAP_CALL(check_union_passing2b)
> +#define check_union_passing3b WRAP_CALL(check_union_passing3b)
> +#define check_union_passing4b WRAP_CALL(check_union_passing4b)
> +#define check_union_passing5b WRAP_CALL(check_union_passing5b)
> +#define check_union_passing6b WRAP_CALL(check_union_passing6b)
> +
> +
> +static void
> +do_test (void)
> +{
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +        bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +        bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +        bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
> +  union un1b u1b[8];
> +  union un1bb u1bb[8];
> +  union un2b u2b[8];
> +  union un3b u3b[8];
> +  union un4b u4b;
> +  union un5b u5b;
> +  union un6b u6b[8];
> +  int i;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1b[i].x =  (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                                bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                                bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                                bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.zmm0)[i]._m512bf16[0] = u1b[i].x;
> +  num_fregs = 8;
> +  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
> +                        u1b[4], u1b[5], u1b[6], u1b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1bb[i].x = u1b[i].x;
> +      (&fregs.zmm0)[i]._m512bf16[0] = u1bb[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
> +                         u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u2b[i].x = u1bb[i].x;
> +      (&fregs.zmm0)[i]._m512bf16[0] = u2b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
> +                        u2b[4], u2b[5], u2b[6], u2b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u3b[i].x = u1b[i].x;
> +      (&fregs.zmm0)[i]._m512bf16[0] = u3b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
> +                        u3b[4], u3b[5], u3b[6], u3b[7]);
> +
> +  check_union_passing4b (u4b);
> +  check_union_passing5b (u5b);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u6b[i].x = u1b[i].x;
> +      (&fregs.zmm0)[i]._m512bf16[0] = u6b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing6b (u6b[0], u6b[1], u6b[2], u6b[3],
> +                        u6b[4], u6b[5], u6b[6], u6b[7]);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
> new file mode 100644
> index 00000000000..2be57b8b5fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c
> @@ -0,0 +1,111 @@
> +/* Test variable number of 512-bit vector arguments passed to functions.  */
> +
> +#include <stdio.h>
> +#include "bf16-zmm-check.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +void
> +fun_check_passing_m512bf16_varargs (__m512bf16 i0, __m512bf16 i1, __m512bf16 i2,
> +                                __m512bf16 i3, ...)
> +{
> +  /* Check argument values.  */
> +  void **fp = __builtin_frame_address (0);
> +  void *ra = __builtin_return_address (0);
> +  __m512bf16 *argp;
> +
> +  compare (values.i0, i0, __m512bf16);
> +  compare (values.i1, i1, __m512bf16);
> +  compare (values.i2, i2, __m512bf16);
> +  compare (values.i3, i3, __m512bf16);
> +
> +  /* Get the pointer to the return address on stack.  */
> +  while (*fp != ra)
> +    fp++;
> +
> +  /* Skip the return address stack slot.  */
> +  argp = (__m512bf16 *)(((char *) fp) + 8);
> +
> +  /* Check __m512bf16 arguments passed on stack.  */
> +  compare (values.i4, argp[0], __m512bf16);
> +  compare (values.i5, argp[1], __m512bf16);
> +  compare (values.i6, argp[2], __m512bf16);
> +  compare (values.i7, argp[3], __m512bf16);
> +  compare (values.i8, argp[4], __m512bf16);
> +  compare (values.i9, argp[5], __m512bf16);
> +
> +  /* Check register contents.  */
> +  compare (fregs.zmm0, zmm_regs[0], __m512bf16);
> +  compare (fregs.zmm1, zmm_regs[1], __m512bf16);
> +  compare (fregs.zmm2, zmm_regs[2], __m512bf16);
> +  compare (fregs.zmm3, zmm_regs[3], __m512bf16);
> +}
> +
> +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
> +                                     _i6, _i7, _i8, _i9, \
> +                                     _func, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  clear_struct_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
> +
> +void
> +test_m512bf16_varargs (void)
> +{
> +  __m512bf16 x[10];
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +        bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +        bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +        bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32;
> +  int i;
> +  for (i = 0; i < 10; i++)
> +    x[i] = (__m512bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                         bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                         bf17,bf18,bf19,bf20,bf21,bf22,bf23,bf24,
> +                         bf25,bf26,bf27,bf28,bf29,bf30,bf31,bf32 };
> +  pass = "m512bf16-varargs";
> +  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
> +                                x[6], x[7], x[8], x[9],
> +                                fun_check_passing_m512bf16_varargs,
> +                                _m512bf16);
> +}
> +
> +void
> +do_test (void)
> +{
> +  test_m512bf16_varargs ();
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
> new file mode 100644
> index 00000000000..98fbc660f27
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/macros.h
> @@ -0,0 +1,53 @@
> +#ifndef MACROS_H
> +
> +#define check_size(_t, _size) assert(sizeof(_t) == (_size))
> +
> +#define check_align(_t, _align) assert(__alignof__(_t) == (_align))
> +
> +#define check_align_lv(_t, _align) assert(__alignof__(_t) == (_align) \
> +                                         && (((unsigned long)&(_t)) & ((_align) - 1) ) == 0)
> +
> +#define check_basic_struct_size_and_align(_type, _size, _align) { \
> +  struct _str { _type dummy; } _t; \
> +  check_size(_t, _size); \
> +  check_align_lv(_t, _align); \
> +}
> +
> +#define check_array_size_and_align(_type, _size, _align) { \
> +  _type _a[1]; _type _b[2]; _type _c[16]; \
> +  struct _str { _type _a[1]; } _s; \
> +  check_align_lv(_a[0], _align); \
> +  check_size(_a, _size); \
> +  check_size(_b, (_size*2)); \
> +  check_size(_c, (_size*16)); \
> +  check_size(_s, _size); \
> +  check_align_lv(_s._a[0], _align); \
> +}
> +
> +#define check_basic_union_size_and_align(_type, _size, _align) { \
> +  union _union { _type dummy; } _u; \
> +  check_size(_u, _size); \
> +  check_align_lv(_u, _align); \
> +}
> +
> +#define run_signed_tests2(_function, _arg1, _arg2) \
> +  _function(_arg1, _arg2); \
> +  _function(signed _arg1, _arg2); \
> +  _function(unsigned _arg1, _arg2);
> +
> +#define run_signed_tests3(_function, _arg1, _arg2, _arg3) \
> +  _function(_arg1, _arg2, _arg3); \
> +  _function(signed _arg1, _arg2, _arg3); \
> +  _function(unsigned _arg1, _arg2, _arg3);
> +
> +/* Check size of a struct and a union of three types.  */
> +
> +#define check_struct_and_union3(type1, type2, type3, struct_size, align_size) \
> +{ \
> +  struct _str { type1 t1; type2 t2; type3 t3; } _t; \
> +  union _uni { type1 t1; type2 t2; type3 t3; } _u; \
> +  check_size(_t, struct_size); \
> +  check_size(_u, align_size); \
> +}
> +
> +#endif // MACROS_H
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
> new file mode 100644
> index 00000000000..0c58db101e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c
> @@ -0,0 +1,214 @@
> +/* This is an autogenerated file. Do not edit.  */
> +
> +#include "defines.h"
> +#include "macros.h"
> +
> +/* Check structs and unions of all permutations of 3 basic types.  */
> +int
> +main (void)
> +{
> +  check_struct_and_union3(char, char, __bf16, 4, 2);
> +  check_struct_and_union3(char, __bf16, char, 6, 2);
> +  check_struct_and_union3(char, __bf16, __bf16, 6, 2);
> +  check_struct_and_union3(char, __bf16, int, 8, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(char, __bf16, long, 16, 8);
> +#endif
> +  check_struct_and_union3(char, __bf16, long long, 16, 8);
> +  check_struct_and_union3(char, __bf16, float, 8, 4);
> +  check_struct_and_union3(char, __bf16, double, 16, 8);
> +  check_struct_and_union3(char, __bf16, long double, 32, 16);
> +  check_struct_and_union3(char, int, __bf16, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(char, long, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(char, long long, __bf16, 24, 8);
> +  check_struct_and_union3(char, float, __bf16, 12, 4);
> +  check_struct_and_union3(char, double, __bf16, 24, 8);
> +  check_struct_and_union3(char, long double, __bf16, 48, 16);
> +  check_struct_and_union3(__bf16, char, char, 4, 2);
> +  check_struct_and_union3(__bf16, char, __bf16, 6, 2);
> +  check_struct_and_union3(__bf16, char, int, 8, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(__bf16, char, long, 16, 8);
> +#endif
> +  check_struct_and_union3(__bf16, char, long long, 16, 8);
> +  check_struct_and_union3(__bf16, char, float, 8, 4);
> +  check_struct_and_union3(__bf16, char, double, 16, 8);
> +  check_struct_and_union3(__bf16, char, long double, 32, 16);
> +  check_struct_and_union3(__bf16, __bf16, char, 6, 2);
> +  check_struct_and_union3(__bf16, __bf16, __bf16, 6, 2);
> +  check_struct_and_union3(__bf16, __bf16, int, 8, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(__bf16, __bf16, long, 16, 8);
> +#endif
> +  check_struct_and_union3(__bf16, __bf16, long long, 16, 8);
> +  check_struct_and_union3(__bf16, __bf16, float, 8, 4);
> +  check_struct_and_union3(__bf16, __bf16, double, 16, 8);
> +  check_struct_and_union3(__bf16, __bf16, long double, 32, 16);
> +  check_struct_and_union3(__bf16, int, char, 12, 4);
> +  check_struct_and_union3(__bf16, int, __bf16, 12, 4);
> +  check_struct_and_union3(__bf16, int, int, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(__bf16, int, long, 16, 8);
> +#endif
> +  check_struct_and_union3(__bf16, int, long long, 16, 8);
> +  check_struct_and_union3(__bf16, int, float, 12, 4);
> +  check_struct_and_union3(__bf16, int, double, 16, 8);
> +  check_struct_and_union3(__bf16, int, long double, 32, 16);
> +#ifndef __ILP32__
> +  check_struct_and_union3(__bf16, long, char, 24, 8);
> +  check_struct_and_union3(__bf16, long, __bf16, 24, 8);
> +  check_struct_and_union3(__bf16, long, int, 24, 8);
> +  check_struct_and_union3(__bf16, long, long, 24, 8);
> +  check_struct_and_union3(__bf16, long, long long, 24, 8);
> +  check_struct_and_union3(__bf16, long, float, 24, 8);
> +  check_struct_and_union3(__bf16, long, double, 24, 8);
> +#endif
> +  check_struct_and_union3(__bf16, long, long double, 32, 16);
> +  check_struct_and_union3(__bf16, long long, char, 24, 8);
> +  check_struct_and_union3(__bf16, long long, __bf16, 24, 8);
> +  check_struct_and_union3(__bf16, long long, int, 24, 8);
> +  check_struct_and_union3(__bf16, long long, long, 24, 8);
> +  check_struct_and_union3(__bf16, long long, long long, 24, 8);
> +  check_struct_and_union3(__bf16, long long, float, 24, 8);
> +  check_struct_and_union3(__bf16, long long, double, 24, 8);
> +  check_struct_and_union3(__bf16, long long, long double, 32, 16);
> +  check_struct_and_union3(__bf16, float, char, 12, 4);
> +  check_struct_and_union3(__bf16, float, __bf16, 12, 4);
> +  check_struct_and_union3(__bf16, float, int, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(__bf16, float, long, 16, 8);
> +#endif
> +  check_struct_and_union3(__bf16, float, long long, 16, 8);
> +  check_struct_and_union3(__bf16, float, float, 12, 4);
> +  check_struct_and_union3(__bf16, float, double, 16, 8);
> +  check_struct_and_union3(__bf16, float, long double, 32, 16);
> +  check_struct_and_union3(__bf16, double, char, 24, 8);
> +  check_struct_and_union3(__bf16, double, __bf16, 24, 8);
> +  check_struct_and_union3(__bf16, double, int, 24, 8);
> +  check_struct_and_union3(__bf16, double, long, 24, 8);
> +  check_struct_and_union3(__bf16, double, long long, 24, 8);
> +  check_struct_and_union3(__bf16, double, float, 24, 8);
> +  check_struct_and_union3(__bf16, double, double, 24, 8);
> +  check_struct_and_union3(__bf16, double, long double, 32, 16);
> +  check_struct_and_union3(__bf16, long double, char, 48, 16);
> +  check_struct_and_union3(__bf16, long double, __bf16, 48, 16);
> +  check_struct_and_union3(__bf16, long double, int, 48, 16);
> +  check_struct_and_union3(__bf16, long double, long, 48, 16);
> +  check_struct_and_union3(__bf16, long double, long long, 48, 16);
> +  check_struct_and_union3(__bf16, long double, float, 48, 16);
> +  check_struct_and_union3(__bf16, long double, double, 48, 16);
> +  check_struct_and_union3(__bf16, long double, long double, 48, 16);
> +  check_struct_and_union3(int, char, __bf16, 8, 4);
> +  check_struct_and_union3(int, __bf16, char, 8, 4);
> +  check_struct_and_union3(int, __bf16, __bf16, 8, 4);
> +  check_struct_and_union3(int, __bf16, int, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(int, __bf16, long, 16, 8);
> +#endif
> +  check_struct_and_union3(int, __bf16, long long, 16, 8);
> +  check_struct_and_union3(int, __bf16, float, 12, 4);
> +  check_struct_and_union3(int, __bf16, double, 16, 8);
> +  check_struct_and_union3(int, __bf16, long double, 32, 16);
> +  check_struct_and_union3(int, int, __bf16, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(int, long, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(int, long long, __bf16, 24, 8);
> +  check_struct_and_union3(int, float, __bf16, 12, 4);
> +  check_struct_and_union3(int, double, __bf16, 24, 8);
> +  check_struct_and_union3(int, long double, __bf16, 48, 16);
> +#ifndef __ILP32__
> +  check_struct_and_union3(long, char, __bf16, 16, 8);
> +  check_struct_and_union3(long, __bf16, char, 16, 8);
> +  check_struct_and_union3(long, __bf16, __bf16, 16, 8);
> +  check_struct_and_union3(long, __bf16, int, 16, 8);
> +  check_struct_and_union3(long, __bf16, long, 24, 8);
> +  check_struct_and_union3(long, __bf16, long long, 24, 8);
> +  check_struct_and_union3(long, __bf16, float, 16, 8);
> +  check_struct_and_union3(long, __bf16, double, 24, 8);
> +#endif
> +  check_struct_and_union3(long, __bf16, long double, 32, 16);
> +#ifndef __ILP32__
> +  check_struct_and_union3(long, int, __bf16, 16, 8);
> +  check_struct_and_union3(long, long, __bf16, 24, 8);
> +  check_struct_and_union3(long, long long, __bf16, 24, 8);
> +  check_struct_and_union3(long, float, __bf16, 16, 8);
> +  check_struct_and_union3(long, double, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(long, long double, __bf16, 48, 16);
> +  check_struct_and_union3(long long, char, __bf16, 16, 8);
> +  check_struct_and_union3(long long, __bf16, char, 16, 8);
> +  check_struct_and_union3(long long, __bf16, __bf16, 16, 8);
> +  check_struct_and_union3(long long, __bf16, int, 16, 8);
> +#ifndef __ILP32__
> +  check_struct_and_union3(long long, __bf16, long, 24, 8);
> +#endif
> +  check_struct_and_union3(long long, __bf16, long long, 24, 8);
> +  check_struct_and_union3(long long, __bf16, float, 16, 8);
> +  check_struct_and_union3(long long, __bf16, double, 24, 8);
> +  check_struct_and_union3(long long, __bf16, long double, 32, 16);
> +  check_struct_and_union3(long long, int, __bf16, 16, 8);
> +#ifndef __ILP32__
> +  check_struct_and_union3(long long, long, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(long long, long long, __bf16, 24, 8);
> +  check_struct_and_union3(long long, float, __bf16, 16, 8);
> +  check_struct_and_union3(long long, double, __bf16, 24, 8);
> +  check_struct_and_union3(long long, long double, __bf16, 48, 16);
> +  check_struct_and_union3(float, char, __bf16, 8, 4);
> +  check_struct_and_union3(float, __bf16, char, 8, 4);
> +  check_struct_and_union3(float, __bf16, __bf16, 8, 4);
> +  check_struct_and_union3(float, __bf16, int, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(float, __bf16, long, 16, 8);
> +#endif
> +  check_struct_and_union3(float, __bf16, long long, 16, 8);
> +  check_struct_and_union3(float, __bf16, float, 12, 4);
> +  check_struct_and_union3(float, __bf16, double, 16, 8);
> +  check_struct_and_union3(float, __bf16, long double, 32, 16);
> +  check_struct_and_union3(float, int, __bf16, 12, 4);
> +#ifndef __ILP32__
> +  check_struct_and_union3(float, long, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(float, long long, __bf16, 24, 8);
> +  check_struct_and_union3(float, float, __bf16, 12, 4);
> +  check_struct_and_union3(float, double, __bf16, 24, 8);
> +  check_struct_and_union3(float, long double, __bf16, 48, 16);
> +  check_struct_and_union3(double, char, __bf16, 16, 8);
> +  check_struct_and_union3(double, __bf16, char, 16, 8);
> +  check_struct_and_union3(double, __bf16, __bf16, 16, 8);
> +  check_struct_and_union3(double, __bf16, int, 16, 8);
> +#ifndef __ILP32__
> +  check_struct_and_union3(double, __bf16, long, 24, 8);
> +#endif
> +  check_struct_and_union3(double, __bf16, long long, 24, 8);
> +  check_struct_and_union3(double, __bf16, float, 16, 8);
> +  check_struct_and_union3(double, __bf16, double, 24, 8);
> +  check_struct_and_union3(double, __bf16, long double, 32, 16);
> +  check_struct_and_union3(double, int, __bf16, 16, 8);
> +#ifndef __ILP32__
> +  check_struct_and_union3(double, long, __bf16, 24, 8);
> +#endif
> +  check_struct_and_union3(double, long long, __bf16, 24, 8);
> +  check_struct_and_union3(double, float, __bf16, 16, 8);
> +  check_struct_and_union3(double, double, __bf16, 24, 8);
> +  check_struct_and_union3(double, long double, __bf16, 48, 16);
> +  check_struct_and_union3(long double, char, __bf16, 32, 16);
> +  check_struct_and_union3(long double, __bf16, char, 32, 16);
> +  check_struct_and_union3(long double, __bf16, __bf16, 32, 16);
> +  check_struct_and_union3(long double, __bf16, int, 32, 16);
> +  check_struct_and_union3(long double, __bf16, long, 32, 16);
> +  check_struct_and_union3(long double, __bf16, long long, 32, 16);
> +  check_struct_and_union3(long double, __bf16, float, 32, 16);
> +  check_struct_and_union3(long double, __bf16, double, 32, 16);
> +  check_struct_and_union3(long double, __bf16, long double, 48, 16);
> +  check_struct_and_union3(long double, int, __bf16, 32, 16);
> +  check_struct_and_union3(long double, long, __bf16, 32, 16);
> +  check_struct_and_union3(long double, long long, __bf16, 32, 16);
> +  check_struct_and_union3(long double, float, __bf16, 32, 16);
> +  check_struct_and_union3(long double, double, __bf16, 32, 16);
> +  check_struct_and_union3(long double, long double, __bf16, 48, 16);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
> new file mode 100644
> index 00000000000..6490a5228ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_alignment.c
> @@ -0,0 +1,14 @@
> +/* This checks alignment of basic types.  */
> +
> +#include "defines.h"
> +#include "macros.h"
> +
> +
> +int
> +main (void)
> +{
> +  /* __bf16 point types.  */
> +  check_align(__bf16, TYPE_ALIGN_BF16);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
> new file mode 100644
> index 00000000000..c004c35bb83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c
> @@ -0,0 +1,13 @@
> +/* This checks .  */
> +
> +#include "defines.h"
> +#include "macros.h"
> +
> +
> +int
> +main (void)
> +{
> +  check_array_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
> new file mode 100644
> index 00000000000..cfea2224733
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_returning.c
> @@ -0,0 +1,20 @@
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +__bf16
> +fun_test_returning_bf16 (void)
> +{
> +  __bf16 b = make_f32_bf16 (72.0f);
> +  volatile_var++;
> +  return b;
> +}
> +
> +static void
> +do_test (void)
> +{
> +  __bf16 var = WRAP_RET (fun_test_returning_bf16) ();
> +  assert (check_bf16_float (xmm_regs[0].___bf16[0], 72.0f) == 1);
> +  assert (check_bf16_float (var, 72.0f) == 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
> new file mode 100644
> index 00000000000..b81a8d971b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_sizes.c
> @@ -0,0 +1,14 @@
> +/* This checks sizes of basic types.  */
> +
> +#include "defines.h"
> +#include "macros.h"
> +
> +
> +int
> +main (void)
> +{
> +  /* Floating point types.  */
> +  check_size(__bf16, TYPE_SIZE_BF16);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
> new file mode 100644
> index 00000000000..f282506703c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c
> @@ -0,0 +1,14 @@
> +/* This checks size and alignment of structs with a single basic type
> +   element. All basic types are checked.  */
> +
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +
> +
> +static void
> +do_test (void)
> +{
> +  /* Floating point types.  */
> +  check_basic_struct_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
> new file mode 100644
> index 00000000000..03afa68c0e4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c
> @@ -0,0 +1,12 @@
> +/* Test of simple unions, size and alignment.  */
> +
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +
> +static void
> +do_test (void)
> +{
> +  /* Floating point types.  */
> +  check_basic_union_size_and_align(__bf16, TYPE_SIZE_BF16, TYPE_ALIGN_BF16);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
> new file mode 100644
> index 00000000000..64857ce7b71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_m128_returning.c
> @@ -0,0 +1,38 @@
> +#include <stdio.h>
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
> +
> +__m128bf16
> +fun_test_returning___m128bf16 (void)
> +{
> +  volatile_var++;
> +  return (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +}
> +
> +__m128bf16 test_128bf16;
> +
> +static void
> +do_test (void)
> +{
> +  unsigned failed = 0;
> +  XMM_T xmmt1, xmmt2;
> +
> +  clear_struct_registers;
> +  test_128bf16 = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +  xmmt1._m128bf16[0] = test_128bf16;
> +  xmmt2._m128bf16[0] = WRAP_RET (fun_test_returning___m128bf16)();
> +  if (xmmt1._longlong[0] != xmmt2._longlong[0]
> +      || xmmt1._longlong[0] != xmm_regs[0]._longlong[0])
> +    printf ("fail m128bf16\n"), failed++;
> +
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
> new file mode 100644
> index 00000000000..fe08042286b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_floats.c
> @@ -0,0 +1,312 @@
> +/* This is an autogenerated file. Do not edit.  */
> +
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +struct IntegerRegisters iregs;
> +struct FloatRegisters fregs;
> +unsigned int num_iregs, num_fregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  __bf16 f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
> +    f15, f16, f17, f18, f19, f20, f21, f22, f23;
> +} values___bf16;
> +
> +void
> +fun_check_bf16_passing_8_values (__bf16 f0 ATTRIBUTE_UNUSED,
> +                                __bf16 f1 ATTRIBUTE_UNUSED,
> +                                __bf16 f2 ATTRIBUTE_UNUSED,
> +                                __bf16 f3 ATTRIBUTE_UNUSED,
> +                                __bf16 f4 ATTRIBUTE_UNUSED,
> +                                __bf16 f5 ATTRIBUTE_UNUSED,
> +                                __bf16 f6 ATTRIBUTE_UNUSED,
> +                                __bf16 f7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  check_bf16 (values___bf16.f0, f0);
> +  check_bf16 (values___bf16.f1, f1);
> +  check_bf16 (values___bf16.f2, f2);
> +  check_bf16 (values___bf16.f3, f3);
> +  check_bf16 (values___bf16.f4, f4);
> +  check_bf16 (values___bf16.f5, f5);
> +  check_bf16 (values___bf16.f6, f6);
> +  check_bf16 (values___bf16.f7, f7);
> +}
> +
> +void
> +fun_check_bf16_passing_8_regs (__bf16 f0 ATTRIBUTE_UNUSED,
> +                              __bf16 f1 ATTRIBUTE_UNUSED,
> +                              __bf16 f2 ATTRIBUTE_UNUSED,
> +                              __bf16 f3 ATTRIBUTE_UNUSED,
> +                              __bf16 f4 ATTRIBUTE_UNUSED,
> +                              __bf16 f5 ATTRIBUTE_UNUSED,
> +                              __bf16 f6 ATTRIBUTE_UNUSED,
> +                              __bf16 f7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_bf16_arguments;
> +}
> +
> +void
> +fun_check_bf16_passing_16_values (__bf16 f0 ATTRIBUTE_UNUSED,
> +                                 __bf16 f1 ATTRIBUTE_UNUSED,
> +                                 __bf16 f2 ATTRIBUTE_UNUSED,
> +                                 __bf16 f3 ATTRIBUTE_UNUSED,
> +                                 __bf16 f4 ATTRIBUTE_UNUSED,
> +                                 __bf16 f5 ATTRIBUTE_UNUSED,
> +                                 __bf16 f6 ATTRIBUTE_UNUSED,
> +                                 __bf16 f7 ATTRIBUTE_UNUSED,
> +                                 __bf16 f8 ATTRIBUTE_UNUSED,
> +                                 __bf16 f9 ATTRIBUTE_UNUSED,
> +                                 __bf16 f10 ATTRIBUTE_UNUSED,
> +                                 __bf16 f11 ATTRIBUTE_UNUSED,
> +                                 __bf16 f12 ATTRIBUTE_UNUSED,
> +                                 __bf16 f13 ATTRIBUTE_UNUSED,
> +                                 __bf16 f14 ATTRIBUTE_UNUSED,
> +                                 __bf16 f15 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  check_bf16 (values___bf16.f0, f0);
> +  check_bf16 (values___bf16.f1, f1);
> +  check_bf16 (values___bf16.f2, f2);
> +  check_bf16 (values___bf16.f3, f3);
> +  check_bf16 (values___bf16.f4, f4);
> +  check_bf16 (values___bf16.f5, f5);
> +  check_bf16 (values___bf16.f6, f6);
> +  check_bf16 (values___bf16.f7, f7);
> +  check_bf16 (values___bf16.f8, f8);
> +  check_bf16 (values___bf16.f9, f9);
> +  check_bf16 (values___bf16.f10, f10);
> +  check_bf16 (values___bf16.f11, f11);
> +  check_bf16 (values___bf16.f12, f12);
> +  check_bf16 (values___bf16.f13, f13);
> +  check_bf16 (values___bf16.f14, f14);
> +  check_bf16 (values___bf16.f15, f15);
> +}
> +
> +void
> +fun_check_bf16_passing_16_regs (__bf16 f0 ATTRIBUTE_UNUSED,
> +                               __bf16 f1 ATTRIBUTE_UNUSED,
> +                               __bf16 f2 ATTRIBUTE_UNUSED,
> +                               __bf16 f3 ATTRIBUTE_UNUSED,
> +                               __bf16 f4 ATTRIBUTE_UNUSED,
> +                               __bf16 f5 ATTRIBUTE_UNUSED,
> +                               __bf16 f6 ATTRIBUTE_UNUSED,
> +                               __bf16 f7 ATTRIBUTE_UNUSED,
> +                               __bf16 f8 ATTRIBUTE_UNUSED,
> +                               __bf16 f9 ATTRIBUTE_UNUSED,
> +                               __bf16 f10 ATTRIBUTE_UNUSED,
> +                               __bf16 f11 ATTRIBUTE_UNUSED,
> +                               __bf16 f12 ATTRIBUTE_UNUSED,
> +                               __bf16 f13 ATTRIBUTE_UNUSED,
> +                               __bf16 f14 ATTRIBUTE_UNUSED,
> +                               __bf16 f15 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_bf16_arguments;
> +}
> +
> +void
> +fun_check_bf16_passing_20_values (__bf16 f0 ATTRIBUTE_UNUSED,
> +                                 __bf16 f1 ATTRIBUTE_UNUSED,
> +                                 __bf16 f2 ATTRIBUTE_UNUSED,
> +                                 __bf16 f3 ATTRIBUTE_UNUSED,
> +                                 __bf16 f4 ATTRIBUTE_UNUSED,
> +                                 __bf16 f5 ATTRIBUTE_UNUSED,
> +                                 __bf16 f6 ATTRIBUTE_UNUSED,
> +                                 __bf16 f7 ATTRIBUTE_UNUSED,
> +                                 __bf16 f8 ATTRIBUTE_UNUSED,
> +                                 __bf16 f9 ATTRIBUTE_UNUSED,
> +                                 __bf16 f10 ATTRIBUTE_UNUSED,
> +                                 __bf16 f11 ATTRIBUTE_UNUSED,
> +                                 __bf16 f12 ATTRIBUTE_UNUSED,
> +                                 __bf16 f13 ATTRIBUTE_UNUSED,
> +                                 __bf16 f14 ATTRIBUTE_UNUSED,
> +                                 __bf16 f15 ATTRIBUTE_UNUSED,
> +                                 __bf16 f16 ATTRIBUTE_UNUSED,
> +                                 __bf16 f17 ATTRIBUTE_UNUSED,
> +                                 __bf16 f18 ATTRIBUTE_UNUSED,
> +                                 __bf16 f19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  check_bf16 (values___bf16.f0, f0);
> +  check_bf16 (values___bf16.f1, f1);
> +  check_bf16 (values___bf16.f2, f2);
> +  check_bf16 (values___bf16.f3, f3);
> +  check_bf16 (values___bf16.f4, f4);
> +  check_bf16 (values___bf16.f5, f5);
> +  check_bf16 (values___bf16.f6, f6);
> +  check_bf16 (values___bf16.f7, f7);
> +  check_bf16 (values___bf16.f8, f8);
> +  check_bf16 (values___bf16.f9, f9);
> +  check_bf16 (values___bf16.f10, f10);
> +  check_bf16 (values___bf16.f11, f11);
> +  check_bf16 (values___bf16.f12, f12);
> +  check_bf16 (values___bf16.f13, f13);
> +  check_bf16 (values___bf16.f14, f14);
> +  check_bf16 (values___bf16.f15, f15);
> +  check_bf16 (values___bf16.f16, f16);
> +  check_bf16 (values___bf16.f17, f17);
> +  check_bf16 (values___bf16.f18, f18);
> +  check_bf16 (values___bf16.f19, f19);
> +}
> +
> +void
> +fun_check_bf16_passing_20_regs (__bf16 f0 ATTRIBUTE_UNUSED,
> +                               __bf16 f1 ATTRIBUTE_UNUSED,
> +                               __bf16 f2 ATTRIBUTE_UNUSED,
> +                               __bf16 f3 ATTRIBUTE_UNUSED,
> +                               __bf16 f4 ATTRIBUTE_UNUSED,
> +                               __bf16 f5 ATTRIBUTE_UNUSED,
> +                               __bf16 f6 ATTRIBUTE_UNUSED,
> +                               __bf16 f7 ATTRIBUTE_UNUSED,
> +                               __bf16 f8 ATTRIBUTE_UNUSED,
> +                               __bf16 f9 ATTRIBUTE_UNUSED,
> +                               __bf16 f10 ATTRIBUTE_UNUSED,
> +                               __bf16 f11 ATTRIBUTE_UNUSED,
> +                               __bf16 f12 ATTRIBUTE_UNUSED,
> +                               __bf16 f13 ATTRIBUTE_UNUSED,
> +                               __bf16 f14 ATTRIBUTE_UNUSED,
> +                               __bf16 f15 ATTRIBUTE_UNUSED,
> +                               __bf16 f16 ATTRIBUTE_UNUSED,
> +                               __bf16 f17 ATTRIBUTE_UNUSED,
> +                               __bf16 f18 ATTRIBUTE_UNUSED,
> +                               __bf16 f19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_bf16_arguments;
> +}
> +
> +#define def_check_bf16_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6,\
> +                                  _f7, _func1, _func2, TYPE) \
> +  values_ ## TYPE .f0 = _f0; \
> +  values_ ## TYPE .f1 = _f1; \
> +  values_ ## TYPE .f2 = _f2; \
> +  values_ ## TYPE .f3 = _f3; \
> +  values_ ## TYPE .f4 = _f4; \
> +  values_ ## TYPE .f5 = _f5; \
> +  values_ ## TYPE .f6 = _f6; \
> +  values_ ## TYPE .f7 = _f7; \
> +  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
> +  clear_float_registers; \
> +  fregs.F0._ ## TYPE [0] = _f0; \
> +  fregs.F1._ ## TYPE [0] = _f1; \
> +  fregs.F2._ ## TYPE [0] = _f2; \
> +  fregs.F3._ ## TYPE [0] = _f3; \
> +  fregs.F4._ ## TYPE [0] = _f4; \
> +  fregs.F5._ ## TYPE [0] = _f5; \
> +  fregs.F6._ ## TYPE [0] = _f6; \
> +  fregs.F7._ ## TYPE [0] = _f7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
> +
> +#define def_check_bf16_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
> +                                   _f7, _f8, _f9, _f10, _f11, _f12, _f13, \
> +                                   _f14, _f15, _func1, _func2, TYPE) \
> +  values_ ## TYPE .f0 = _f0; \
> +  values_ ## TYPE .f1 = _f1; \
> +  values_ ## TYPE .f2 = _f2; \
> +  values_ ## TYPE .f3 = _f3; \
> +  values_ ## TYPE .f4 = _f4; \
> +  values_ ## TYPE .f5 = _f5; \
> +  values_ ## TYPE .f6 = _f6; \
> +  values_ ## TYPE .f7 = _f7; \
> +  values_ ## TYPE .f8 = _f8; \
> +  values_ ## TYPE .f9 = _f9; \
> +  values_ ## TYPE .f10 = _f10; \
> +  values_ ## TYPE .f11 = _f11; \
> +  values_ ## TYPE .f12 = _f12; \
> +  values_ ## TYPE .f13 = _f13; \
> +  values_ ## TYPE .f14 = _f14; \
> +  values_ ## TYPE .f15 = _f15; \
> +  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
> +                    _f10, _f11, _f12, _f13, _f14, _f15); \
> +  clear_float_registers; \
> +  fregs.F0._ ## TYPE [0] = _f0; \
> +  fregs.F1._ ## TYPE [0] = _f1; \
> +  fregs.F2._ ## TYPE [0] = _f2; \
> +  fregs.F3._ ## TYPE [0] = _f3; \
> +  fregs.F4._ ## TYPE [0] = _f4; \
> +  fregs.F5._ ## TYPE [0] = _f5; \
> +  fregs.F6._ ## TYPE [0] = _f6; \
> +  fregs.F7._ ## TYPE [0] = _f7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
> +                    _f10, _f11, _f12, _f13, _f14, _f15);
> +
> +#define def_check_bf16_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
> +                                   _f7, _f8, _f9, _f10, _f11, _f12, \
> +                                   _f13, _f14, _f15, _f16, _f17, \
> +                                   _f18, _f19, _func1, _func2, TYPE) \
> +  values_ ## TYPE .f0 = _f0; \
> +  values_ ## TYPE .f1 = _f1; \
> +  values_ ## TYPE .f2 = _f2; \
> +  values_ ## TYPE .f3 = _f3; \
> +  values_ ## TYPE .f4 = _f4; \
> +  values_ ## TYPE .f5 = _f5; \
> +  values_ ## TYPE .f6 = _f6; \
> +  values_ ## TYPE .f7 = _f7; \
> +  values_ ## TYPE .f8 = _f8; \
> +  values_ ## TYPE .f9 = _f9; \
> +  values_ ## TYPE .f10 = _f10; \
> +  values_ ## TYPE .f11 = _f11; \
> +  values_ ## TYPE .f12 = _f12; \
> +  values_ ## TYPE .f13 = _f13; \
> +  values_ ## TYPE .f14 = _f14; \
> +  values_ ## TYPE .f15 = _f15; \
> +  values_ ## TYPE .f16 = _f16; \
> +  values_ ## TYPE .f17 = _f17; \
> +  values_ ## TYPE .f18 = _f18; \
> +  values_ ## TYPE .f19 = _f19; \
> +  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, \
> +                    _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, \
> +                    _f17, _f18, _f19); \
> +  clear_float_registers; \
> +  fregs.F0._ ## TYPE [0] = _f0; \
> +  fregs.F1._ ## TYPE [0] = _f1; \
> +  fregs.F2._ ## TYPE [0] = _f2; \
> +  fregs.F3._ ## TYPE [0] = _f3; \
> +  fregs.F4._ ## TYPE [0] = _f4; \
> +  fregs.F5._ ## TYPE [0] = _f5; \
> +  fregs.F6._ ## TYPE [0] = _f6; \
> +  fregs.F7._ ## TYPE [0] = _f7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
> +                    _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, \
> +                    _f18, _f19);
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10,
> +               bf11,bf12,bf13,bf14,bf15,bf16,bf17,bf18,bf19,bf20;
> +
> +void
> +test_bf16_on_stack ()
> +{
> +  def_check_bf16_passing8 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                          fun_check_bf16_passing_8_values,
> +                          fun_check_bf16_passing_8_regs, __bf16);
> +
> +  def_check_bf16_passing16 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +                           bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16,
> +                           fun_check_bf16_passing_16_values,
> +                           fun_check_bf16_passing_16_regs, __bf16);
> +}
> +
> +void
> +test_too_many_bf16 ()
> +{
> +  def_check_bf16_passing20 (bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10,
> +                           bf11,bf12,bf13,bf14,bf15,bf16,bf17,bf18,bf19,bf20,
> +                           fun_check_bf16_passing_20_values,
> +                           fun_check_bf16_passing_20_regs, __bf16);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  test_bf16_on_stack ();
> +  test_too_many_bf16 ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
> new file mode 100644
> index 00000000000..298b644e93d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_m128.c
> @@ -0,0 +1,238 @@
> +#include <stdio.h>
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
> +    i16, i17, i18, i19, i20, i21, i22, i23;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +void
> +fun_check_passing_m128bf16_8_values (__m128bf16 i0 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i1 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i2 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i3 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i4 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i5 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i6 ATTRIBUTE_UNUSED,
> +                                    __m128bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m128bf16);
> +  compare (values.i1, i1, __m128bf16);
> +  compare (values.i2, i2, __m128bf16);
> +  compare (values.i3, i3, __m128bf16);
> +  compare (values.i4, i4, __m128bf16);
> +  compare (values.i5, i5, __m128bf16);
> +  compare (values.i6, i6, __m128bf16);
> +  compare (values.i7, i7, __m128bf16);
> +}
> +
> +void
> +fun_check_passing_m128bf16_8_regs (__m128bf16 i0 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i1 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i2 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i3 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i4 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i5 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i6 ATTRIBUTE_UNUSED,
> +                                  __m128bf16 i7 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m128_arguments;
> +}
> +
> +void
> +fun_check_passing_m128bf16_20_values (__m128bf16 i0 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i1 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i2 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i3 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i4 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i5 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i6 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i7 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i8 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i9 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i10 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i11 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i12 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i13 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i14 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i15 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i16 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i17 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i18 ATTRIBUTE_UNUSED,
> +                                     __m128bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check argument values.  */
> +  compare (values.i0, i0, __m128bf16);
> +  compare (values.i1, i1, __m128bf16);
> +  compare (values.i2, i2, __m128bf16);
> +  compare (values.i3, i3, __m128bf16);
> +  compare (values.i4, i4, __m128bf16);
> +  compare (values.i5, i5, __m128bf16);
> +  compare (values.i6, i6, __m128bf16);
> +  compare (values.i7, i7, __m128bf16);
> +  compare (values.i8, i8, __m128bf16);
> +  compare (values.i9, i9, __m128bf16);
> +  compare (values.i10, i10, __m128bf16);
> +  compare (values.i11, i11, __m128bf16);
> +  compare (values.i12, i12, __m128bf16);
> +  compare (values.i13, i13, __m128bf16);
> +  compare (values.i14, i14, __m128bf16);
> +  compare (values.i15, i15, __m128bf16);
> +  compare (values.i16, i16, __m128bf16);
> +  compare (values.i17, i17, __m128bf16);
> +  compare (values.i18, i18, __m128bf16);
> +  compare (values.i19, i19, __m128bf16);
> +}
> +
> +void
> +fun_check_passing_m128bf16_20_regs (__m128bf16 i0 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i1 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i2 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i3 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i4 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i5 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i6 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i7 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i8 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i9 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i10 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i11 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i12 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i13 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i14 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i15 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i16 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i17 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i18 ATTRIBUTE_UNUSED,
> +                                   __m128bf16 i19 ATTRIBUTE_UNUSED)
> +{
> +  /* Check register contents.  */
> +  check_m128_arguments;
> +}
> +
> +#define def_check_int_passing8(_i0, _i1, _i2, _i3, \
> +                              _i4, _i5, _i6, _i7, \
> +                              _func1, _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
> +  clear_float_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
> +
> +#define def_check_int_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, \
> +                               _i7, _i8, _i9, _i10, _i11, _i12, _i13, \
> +                               _i14, _i15, _i16, _i17, _i18, _i19, \
> +                               _func1, _func2, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  values.i10.TYPE[0] = _i10; \
> +  values.i11.TYPE[0] = _i11; \
> +  values.i12.TYPE[0] = _i12; \
> +  values.i13.TYPE[0] = _i13; \
> +  values.i14.TYPE[0] = _i14; \
> +  values.i15.TYPE[0] = _i15; \
> +  values.i16.TYPE[0] = _i16; \
> +  values.i17.TYPE[0] = _i17; \
> +  values.i18.TYPE[0] = _i18; \
> +  values.i19.TYPE[0] = _i19; \
> +  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
> +                    _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
> +                    _i17, _i18, _i19); \
> +  clear_float_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  num_fregs = 8; \
> +  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
> +                    _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
> +                    _i17, _i18, _i19);
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
> +
> +void
> +test_m128bf16_on_stack ()
> +{
> +  __m128bf16 x[8];
> +  int i;
> +  for (i = 0; i < 8; i++)
> +    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +  pass = "m128bf16-8";
> +  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +                         fun_check_passing_m128bf16_8_values,
> +                         fun_check_passing_m128bf16_8_regs, _m128bf16);
> +}
> +
> +void
> +test_too_many_m128bf16 ()
> +{
> +  __m128bf16 x[20];
> +  int i;
> +  for (i = 0; i < 20; i++)
> +    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +  pass = "m128bf16-20";
> +  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +                          x[8], x[9], x[10], x[11], x[12], x[13], x[14],
> +                          x[15], x[16], x[17], x[18], x[19],
> +                          fun_check_passing_m128bf16_20_values,
> +                          fun_check_passing_m128bf16_20_regs, _m128bf16);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  test_m128bf16_on_stack ();
> +  test_too_many_m128bf16 ();
> +  if (failed)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
> new file mode 100644
> index 00000000000..8d966005741
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_structs.c
> @@ -0,0 +1,67 @@
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +struct m128bf16_struct
> +{
> +  __m128bf16 x;
> +};
> +
> +struct m128bf16_2_struct
> +{
> +  __m128bf16 x1, x2;
> +};
> +
> +/* Check that the struct is passed as the individual members in fregs.  */
> +void
> +check_struct_passing1bf16 (struct m128bf16_struct ms1 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms2 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms3 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms4 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms5 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms6 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms7 ATTRIBUTE_UNUSED,
> +                          struct m128bf16_struct ms8 ATTRIBUTE_UNUSED)
> +{
> +  check_m128_arguments;
> +}
> +
> +void
> +check_struct_passing2bf16 (struct m128bf16_2_struct ms ATTRIBUTE_UNUSED)
> +{
> +  /* Check the passing on the stack by comparing the address of the
> +     stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&ms.x1 == rsp+8);
> +  assert ((unsigned long)&ms.x2 == rsp+24);
> +}
> +
> +volatile __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8,
> +               bf9, bf10,bf11,bf12,bf13,bf14,bf15,bf16;
> +
> +static void
> +do_test (void)
> +{
> +  struct m128bf16_struct m128bf16s [8];
> +  struct m128bf16_2_struct m128bf16_2s = {
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 },
> +    { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 },
> +  };
> +  int i;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      m128bf16s[i].x = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.xmm0)[i]._m128bf16[0] = m128bf16s[i].x;
> +  num_fregs = 8;
> +  WRAP_CALL (check_struct_passing1bf16) (m128bf16s[0], m128bf16s[1], m128bf16s[2], m128bf16s[3],
> +                                        m128bf16s[4], m128bf16s[5], m128bf16s[6], m128bf16s[7]);
> +  WRAP_CALL (check_struct_passing2bf16) (m128bf16_2s);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
> new file mode 100644
> index 00000000000..83e4380512b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_passing_unions.c
> @@ -0,0 +1,160 @@
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +unsigned int num_fregs, num_iregs;
> +
> +union un1b
> +{
> +  __m128bf16 x;
> +  float f;
> +};
> +
> +union un1bb
> +{
> +  __m128bf16 x;
> +  __bf16 f;
> +};
> +
> +union un2b
> +{
> +  __m128bf16 x;
> +  double d;
> +};
> +
> +union un3b
> +{
> +  __m128bf16 x;
> +  __m128 v;
> +};
> +
> +union un4b
> +{
> +  __m128bf16 x;
> +  long double ld;
> +};
> +
> +void
> +check_union_passing1b (union un1b u1 ATTRIBUTE_UNUSED,
> +                      union un1b u2 ATTRIBUTE_UNUSED,
> +                      union un1b u3 ATTRIBUTE_UNUSED,
> +                      union un1b u4 ATTRIBUTE_UNUSED,
> +                      union un1b u5 ATTRIBUTE_UNUSED,
> +                      union un1b u6 ATTRIBUTE_UNUSED,
> +                      union un1b u7 ATTRIBUTE_UNUSED,
> +                      union un1b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m128_arguments;
> +}
> +
> +void
> +check_union_passing1bb (union un1bb u1 ATTRIBUTE_UNUSED,
> +                       union un1bb u2 ATTRIBUTE_UNUSED,
> +                       union un1bb u3 ATTRIBUTE_UNUSED,
> +                       union un1bb u4 ATTRIBUTE_UNUSED,
> +                       union un1bb u5 ATTRIBUTE_UNUSED,
> +                       union un1bb u6 ATTRIBUTE_UNUSED,
> +                       union un1bb u7 ATTRIBUTE_UNUSED,
> +                       union un1bb u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m128_arguments;
> +}
> +
> +void
> +check_union_passing2b (union un2b u1 ATTRIBUTE_UNUSED,
> +                      union un2b u2 ATTRIBUTE_UNUSED,
> +                      union un2b u3 ATTRIBUTE_UNUSED,
> +                      union un2b u4 ATTRIBUTE_UNUSED,
> +                      union un2b u5 ATTRIBUTE_UNUSED,
> +                      union un2b u6 ATTRIBUTE_UNUSED,
> +                      union un2b u7 ATTRIBUTE_UNUSED,
> +                      union un2b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m128_arguments;
> +}
> +
> +void
> +check_union_passing3b (union un3b u1 ATTRIBUTE_UNUSED,
> +                      union un3b u2 ATTRIBUTE_UNUSED,
> +                      union un3b u3 ATTRIBUTE_UNUSED,
> +                      union un3b u4 ATTRIBUTE_UNUSED,
> +                      union un3b u5 ATTRIBUTE_UNUSED,
> +                      union un3b u6 ATTRIBUTE_UNUSED,
> +                      union un3b u7 ATTRIBUTE_UNUSED,
> +                      union un3b u8 ATTRIBUTE_UNUSED)
> +{
> +  check_m128_arguments;
> +}
> +
> +void
> +check_union_passing4b (union un4b u ATTRIBUTE_UNUSED)
> +{
> +   /* Check the passing on the stack by comparing the address of the
> +      stack elements to the expected place on the stack.  */
> +  assert ((unsigned long)&u.x == rsp+8);
> +  assert ((unsigned long)&u.ld == rsp+8);
> +}
> +
> +#define check_union_passing1b WRAP_CALL(check_union_passing1b)
> +#define check_union_passing1bb WRAP_CALL(check_union_passing1bb)
> +#define check_union_passing2b WRAP_CALL(check_union_passing2b)
> +#define check_union_passing3b WRAP_CALL(check_union_passing3b)
> +#define check_union_passing4b WRAP_CALL(check_union_passing4b)
> +
> +static void
> +do_test (void)
> +{
> +  union un1b u1b[8];
> +  union un1bb u1bb[8];
> +  union un2b u2b[8];
> +  union un3b u3b[8];
> +  union un4b u4b;
> +  int i;
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
> +
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1b[i].x = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +    }
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    (&fregs.xmm0)[i]._m128bf16[0] = u1b[i].x;
> +  num_fregs = 8;
> +  check_union_passing1b (u1b[0], u1b[1], u1b[2], u1b[3],
> +                        u1b[4], u1b[5], u1b[6], u1b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u1bb[i].x = u1b[i].x;
> +      (&fregs.xmm0)[i]._m128bf16[0] = u1bb[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing1bb (u1bb[0], u1bb[1], u1bb[2], u1bb[3],
> +                         u1bb[4], u1bb[5], u1bb[6], u1bb[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u2b[i].x = u1b[i].x;
> +      (&fregs.xmm0)[i]._m128bf16[0] = u2b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing2b (u2b[0], u2b[1], u2b[2], u2b[3],
> +                        u2b[4], u2b[5], u2b[6], u2b[7]);
> +
> +  clear_struct_registers;
> +  for (i = 0; i < 8; i++)
> +    {
> +      u3b[i].x = u1b[i].x;
> +      (&fregs.xmm0)[i]._m128bf16[0] = u3b[i].x;
> +    }
> +  num_fregs = 8;
> +  check_union_passing3b (u3b[0], u3b[1], u3b[2], u3b[3],
> +                        u3b[4], u3b[5], u3b[6], u3b[7]);
> +
> +  check_union_passing4b (u4b);
> +}
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
> new file mode 100644
> index 00000000000..757ccc26b79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_struct_returning.c
> @@ -0,0 +1,176 @@
> +/* This tests returning of structures.  */
> +
> +#include <stdio.h>
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +struct IntegerRegisters iregs;
> +struct FloatRegisters fregs;
> +unsigned int num_iregs, num_fregs;
> +
> +int current_test;
> +int num_failed = 0;
> +
> +#undef assert
> +#define assert(test) do { if (!(test)) {fprintf (stderr, "failed in test %d\n", current_test); num_failed++; } } while (0)
> +
> +#define xmm0b xmm_regs[0].___bf16
> +#define xmm1b xmm_regs[1].___bf16
> +#define xmm0f xmm_regs[0]._float
> +#define xmm0d xmm_regs[0]._double
> +#define xmm1f xmm_regs[1]._float
> +#define xmm1d xmm_regs[1]._double
> +
> +typedef enum {
> +  SSE_B = 0,
> +  SSE_D,
> +  MEM,
> +  INT_SSE,
> +  SSE_INT,
> +  SSE_F_H,
> +  SSE_F_H8
> +} Type;
> +
> +/* Structures which should be returned in SSE.  */
> +#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
> +struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
> +
> +D(120,__bf16 f,SSE_B, s.f=make_f32_bf16(42.0f))
> +D(121,__bf16 f;__bf16 f2,SSE_B, s.f=make_f32_bf16(42.0f))
> +D(122,__bf16 f;float d,SSE_B, s.f=make_f32_bf16(42.0f))
> +D(123,__bf16 f;double d,SSE_B, s.f=make_f32_bf16(42.0f))
> +D(124,double d; __bf16 f,SSE_D, s.d=42)
> +D(125,__bf16 f[2],SSE_B, s.f[0]=make_f32_bf16(42.0f))
> +D(126,__bf16 f[3],SSE_B, s.f[0]=make_f32_bf16(42.0f))
> +D(127,__bf16 f[4],SSE_B, s.f[0]=make_f32_bf16(42.0f))
> +D(128,__bf16 f[2]; double d,SSE_B, s.f[0]=make_f32_bf16(42.0f))
> +D(129,double d;__bf16 f[2],SSE_D, s.d=42)
> +
> +#undef D
> +
> +#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT_SSE; \
> +struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42, make_f32_bf16(43.0f) }; return s; }
> +
> +D(310,char m1; __bf16 m2)
> +D(311,short m1; __bf16 m2)
> +D(312,int m1; __bf16 m2)
> +D(313,long long m1; __bf16 m2)
> +
> +#undef D
> +
> +void check_300 (void)
> +{
> +  XMM_T x;
> +  x._ulonglong[0] = rax;
> +  switch (current_test) {
> +    case 310: assert ((rax & 0xff) == 42
> +                     && check_bf16_float (x.___bf16[1], 43.0f) == 1); break;
> +    case 311: assert ((rax & 0xffff) == 42
> +                     && check_bf16_float (x.___bf16[1], 43.0f) == 1); break;
> +    case 312: assert ((rax & 0xffffffff) == 42
> +                     && check_bf16_float (x.___bf16[2], 43.0f) == 1); break;
> +    case 313: assert (rax == 42
> +                     && check_bf16_float (xmm0b[0], 43.0f) == 1); break;
> +
> +    default: assert (0); break;
> +  }
> +}
> +
> +/* Structures which should be returned in SSE (low) and INT (high).  */
> +#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = SSE_INT; \
> +struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s));  B; return s; }
> +
> +D(402,__bf16 f[4];char c, s.f[0]=make_f32_bf16(42.0f); s.c=43)
> +
> +#undef D
> +
> +void check_400 (void)
> +{
> +  switch (current_test) {
> +    case 402: assert (check_bf16_float (xmm0b[0], 42.0f) == 1 && (rax & 0xff) == 43); break;
> +
> +    default: assert (0); break;
> +  }
> +}
> +
> +/* Structures which should be returned in MEM.  */
> +void *struct_addr;
> +#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = MEM; \
> +struct S_ ## I f_ ## I (void) { union {unsigned char c; struct S_ ## I s;} u; memset (&u.s, 0, sizeof(u.s)); u.c = 42; return u.s; }
> +
> +/* Unnaturally aligned members.  */
> +D(540,__bf16 m1[10])
> +D(541,char m1[1];__bf16 f[8])
> +
> +#undef D
> +
> +
> +/* Special tests.  */
> +#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
> +struct S_ ## I f_ ## I (void) { struct S_ ## I s; B; return s; }
> +D(601,__bf16 f[4], SSE_F_H, s.f[0] = s.f[1] = s.f[2] = s.f[3] = make_f32_bf16 (42.0f))
> +D(602,__bf16 f[8], SSE_F_H8,
> +  s.f[0] = s.f[1] = s.f[2] = s.f[3] = s.f[4] = s.f[5] = s.f[6] = s.f[7] = make_f32_bf16 (42.0f))
> +#undef D
> +
> +void clear_all (void)
> +{
> +  clear_int_registers;
> +}
> +
> +void check_all (Type class, unsigned long size)
> +{
> +  switch (class) {
> +    case SSE_B: assert (check_bf16_float (xmm0b[0], 42.0f) == 1); break;
> +    case SSE_D: assert (xmm0d[0] == 42); break;
> +    case SSE_F_H: assert (check_bf16_float (xmm0b[0], 42) == 1
> +                         && check_bf16_float (xmm0b[1], 42) == 1
> +                         && check_bf16_float (xmm0b[2], 42) == 1
> +                         && check_bf16_float (xmm0b[3], 42) == 1); break;
> +    case SSE_F_H8: assert (check_bf16_float (xmm0b[0], 42) == 1
> +                          && check_bf16_float (xmm0b[1], 42) == 1
> +                          && check_bf16_float (xmm0b[2], 42) == 1
> +                          && check_bf16_float (xmm0b[3], 42) == 1
> +                          && check_bf16_float (xmm1b[0], 42) == 1
> +                          && check_bf16_float (xmm1b[1], 42) == 1
> +                          && check_bf16_float (xmm1b[2], 42) == 1
> +                          && check_bf16_float (xmm1b[3], 42) == 1); break;
> +    case INT_SSE: check_300(); break;
> +    case SSE_INT: check_400(); break;
> +    /* Ideally we would like to check that rax == struct_addr.
> +       Unfortunately the address of the target struct escapes (for setting
> +       struct_addr), so the return struct is a temporary one whose address
> +       is given to the f_* functions, otherwise a conforming program
> +       could notice the struct changing already before the function returns.
> +       This temporary struct could be anywhere.  For GCC it will be on
> +       stack, but no one is forbidding that it could be a static variable
> +       if there's no threading or proper locking.  Nobody in his right mind
> +       will not use the stack for that.  */
> +    case MEM: assert (*(unsigned char*)struct_addr == 42 && rdi == rax); break;
> +  }
> +}
> +
> +#define D(I) { struct S_ ## I s; current_test = I; struct_addr = (void*)&s; \
> +  clear_all(); \
> +  s = WRAP_RET(f_ ## I) (); \
> +  check_all(class_ ## I, sizeof(s)); \
> +}
> +
> +static void
> +do_test (void)
> +{
> +  D(120) D(121) D(122) D(123) D(124) D(125) D(126) D(127) D(128) D(129)
> +
> +  D(310) D(311) D(312) D(313)
> +
> +  D(402)
> +
> +  D(540) D(541)
> +
> +  D(601) D(602)
> +  if (num_failed)
> +    abort ();
> +}
> +#undef D
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c
> new file mode 100644
> index 00000000000..4eea7eb7d3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/test_varargs-m128.c
> @@ -0,0 +1,111 @@
> +/* Test variable number of 128-bit vector arguments passed to functions.  */
> +
> +#include <stdio.h>
> +#include "bf16-check.h"
> +#include "defines.h"
> +#include "macros.h"
> +#include "args.h"
> +
> +struct FloatRegisters fregs;
> +struct IntegerRegisters iregs;
> +
> +/* This struct holds values for argument checking.  */
> +struct
> +{
> +  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
> +} values;
> +
> +char *pass;
> +int failed = 0;
> +
> +#undef assert
> +#define assert(c) do { \
> +  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
> +} while (0)
> +
> +#define compare(X1,X2,T) do { \
> +  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
> +} while (0)
> +
> +void
> +fun_check_passing_m128bf16_varargs (__m128bf16 i0, __m128bf16 i1, __m128bf16 i2,
> +                                __m128bf16 i3, ...)
> +{
> +  /* Check argument values.  */
> +  void **fp = __builtin_frame_address (0);
> +  void *ra = __builtin_return_address (0);
> +  __m128bf16 *argp;
> +
> +  compare (values.i0, i0, __m128bf16);
> +  compare (values.i1, i1, __m128bf16);
> +  compare (values.i2, i2, __m128bf16);
> +  compare (values.i3, i3, __m128bf16);
> +
> +  /* Get the pointer to the return address on stack.  */
> +  while (*fp != ra)
> +    fp++;
> +
> +  /* Skip the return address stack slot.  */
> +  argp = (__m128bf16 *) (((char *) fp) + 8);
> +
> +  /* Check __m128bf16 arguments passed on stack.  */
> +  compare (values.i8, argp[0], __m128bf16);
> +  compare (values.i9, argp[1], __m128bf16);
> +
> +  /* Check register contents.  */
> +  compare (fregs.xmm0, xmm_regs[0], __m128bf16);
> +  compare (fregs.xmm1, xmm_regs[1], __m128bf16);
> +  compare (fregs.xmm2, xmm_regs[2], __m128bf16);
> +  compare (fregs.xmm3, xmm_regs[3], __m128bf16);
> +  compare (fregs.xmm4, xmm_regs[4], __m128bf16);
> +  compare (fregs.xmm5, xmm_regs[5], __m128bf16);
> +  compare (fregs.xmm6, xmm_regs[6], __m128bf16);
> +  compare (fregs.xmm7, xmm_regs[7], __m128bf16);
> +}
> +
> +#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
> +                                     _i6, _i7, _i8, _i9, \
> +                                     _func, TYPE) \
> +  values.i0.TYPE[0] = _i0; \
> +  values.i1.TYPE[0] = _i1; \
> +  values.i2.TYPE[0] = _i2; \
> +  values.i3.TYPE[0] = _i3; \
> +  values.i4.TYPE[0] = _i4; \
> +  values.i5.TYPE[0] = _i5; \
> +  values.i6.TYPE[0] = _i6; \
> +  values.i7.TYPE[0] = _i7; \
> +  values.i8.TYPE[0] = _i8; \
> +  values.i9.TYPE[0] = _i9; \
> +  clear_float_registers; \
> +  fregs.F0.TYPE[0] = _i0; \
> +  fregs.F1.TYPE[0] = _i1; \
> +  fregs.F2.TYPE[0] = _i2; \
> +  fregs.F3.TYPE[0] = _i3; \
> +  fregs.F4.TYPE[0] = _i4; \
> +  fregs.F5.TYPE[0] = _i5; \
> +  fregs.F6.TYPE[0] = _i6; \
> +  fregs.F7.TYPE[0] = _i7; \
> +  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
> +
> +void
> +test_m128bf16_varargs (void)
> +{
> +  __m128bf16 x[10];
> +  __bf16 bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8;
> +  int i;
> +  for (i = 0; i < 10; i++)
> +    x[i] = (__m128bf16) { bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8 };
> +  pass = "m128bf16-varargs";
> +  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
> +                                x[6], x[7], x[8], x[9],
> +                                fun_check_passing_m128bf16_varargs,
> +                                _m128bf16);
> +}
> +
> +static void
> +do_test (void)
> +{
> +  test_m128bf16_varargs ();
> +  if (failed)
> +    abort ();
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add ABI test for __bf16 type
  2022-08-19  0:58   ` Hongtao Liu
@ 2022-08-19 17:30     ` H.J. Lu
  2022-08-22  1:02       ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2022-08-19 17:30 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Haochen Jiang, Hongtao Liu, GCC Patches

On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi all,
> >
> > This patch aims to add bf16 abi test after the whole __bf16 type is added.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> Ok.

All BF16 ABI tests failed due to missing __m128bf16/__m256bf16/__m512bf16.
When will __bf16 types be added?

> >
> > BRs,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
> >         * gcc.target/x86_64/abi/bf16/args.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
> >         * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/defines.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/macros.h: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
> >         * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.



-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add ABI test for __bf16 type
  2022-08-19 17:30     ` H.J. Lu
@ 2022-08-22  1:02       ` Hongtao Liu
  2022-08-22  1:04         ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2022-08-22  1:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Haochen Jiang, Hongtao Liu, GCC Patches

On Sat, Aug 20, 2022 at 1:31 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi all,
> > >
> > > This patch aims to add bf16 abi test after the whole __bf16 type is added.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > Ok.
>
> All BF16 ABI tests failed due to missing __m128bf16/__m256bf16/__m512bf16.
> When will __bf16 types be added?
It should be already in the trunk.
>
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
> > >         * gcc.target/x86_64/abi/bf16/args.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/defines.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/macros.h: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
> > >         * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.
>
>
>
> --
> H.J.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add ABI test for __bf16 type
  2022-08-22  1:02       ` Hongtao Liu
@ 2022-08-22  1:04         ` Hongtao Liu
  2022-08-22  2:15           ` [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test Haochen Jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2022-08-22  1:04 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Haochen Jiang, Hongtao Liu, GCC Patches

On Mon, Aug 22, 2022 at 9:02 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Sat, Aug 20, 2022 at 1:31 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This patch aims to add bf16 abi test after the whole __bf16 type is added.
> > > >
> > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > Ok.
> >
> > All BF16 ABI tests failed due to missing __m128bf16/__m256bf16/__m512bf16.
> > When will __bf16 types be added?
> It should be already in the trunk.
Oh, __m128bf16/__m256bf16/__m512bf16 is not added to the trunk.
> >
> > > >
> > > > BRs,
> > > > Haochen
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
> > > >         * gcc.target/x86_64/abi/bf16/args.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/defines.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/macros.h: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
> > > >         * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.
> >
> >
> >
> > --
> > H.J.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test
  2022-08-22  1:04         ` Hongtao Liu
@ 2022-08-22  2:15           ` Haochen Jiang
  2022-08-23  3:01             ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Haochen Jiang @ 2022-08-22  2:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, lingling.kong, hjl.tools

Hi all,

This patch added __m128bf16/__m256bf16/__m512bf16 type in testcases.

BRs,
Haochen

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/bf16/bf16-helper.h:
	Add _m128bf16/m256bf16/_m512bf16.
	* gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h:
	Include bf16-helper.h.
---
 gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h        | 4 ++++
 .../gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
index 83d89fcf62c..e090a7254f4 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
@@ -1,3 +1,7 @@
+typedef __bf16 __m128bf16 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef __bf16 __m256bf16 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef __bf16 __m512bf16 __attribute__((__vector_size__(64), __aligned__(64)));
+
 typedef union
 {
   float f;
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
index 8379fcfaf8c..9cd39b878dd 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
+++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
@@ -1,4 +1,5 @@
 #include <stdlib.h>
+#include "../bf16-helper.h"
 
 static void do_test (void);
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test
  2022-08-22  2:15           ` [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test Haochen Jiang
@ 2022-08-23  3:01             ` Hongtao Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Hongtao Liu @ 2022-08-23  3:01 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, hongtao.liu

On Mon, Aug 22, 2022 at 10:16 AM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi all,
>
> This patch added __m128bf16/__m256bf16/__m512bf16 type in testcases.
Ok.
>
> BRs,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/x86_64/abi/bf16/bf16-helper.h:
>         Add _m128bf16/m256bf16/_m512bf16.
>         * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h:
>         Include bf16-helper.h.
> ---
>  gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h        | 4 ++++
>  .../gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h      | 1 +
>  2 files changed, 5 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
> index 83d89fcf62c..e090a7254f4 100644
> --- a/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/bf16-helper.h
> @@ -1,3 +1,7 @@
> +typedef __bf16 __m128bf16 __attribute__((__vector_size__(16), __aligned__(16)));
> +typedef __bf16 __m256bf16 __attribute__((__vector_size__(32), __aligned__(32)));
> +typedef __bf16 __m512bf16 __attribute__((__vector_size__(64), __aligned__(64)));
> +
>  typedef union
>  {
>    float f;
> diff --git a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
> index 8379fcfaf8c..9cd39b878dd 100644
> --- a/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
> +++ b/gcc/testsuite/gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h
> @@ -1,4 +1,5 @@
>  #include <stdlib.h>
> +#include "../bf16-helper.h"
>
>  static void do_test (void);
>
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-23  3:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16  7:49 [PATCH] x86: Support vector __bf16 type Kong, Lingling
2022-08-17  5:56 ` Hongtao Liu
2022-08-18  7:34 ` [PATCH] Add ABI test for " Haochen Jiang
2022-08-19  0:58   ` Hongtao Liu
2022-08-19 17:30     ` H.J. Lu
2022-08-22  1:02       ` Hongtao Liu
2022-08-22  1:04         ` Hongtao Liu
2022-08-22  2:15           ` [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test Haochen Jiang
2022-08-23  3:01             ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).