[RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and _

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
@ 2022-09-29 15:55 Jakub Jelinek
  2022-09-30 13:49 ` Jason Merrill
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-09-29 15:55 UTC (permalink / raw)
  To: Jason Merrill, Joseph S. Myers, Hongtao Liu, hjl.tools,
	Richard Earnshaw, Kyrylo Tkachov, richard.sandiford
  Cc: gcc-patches

Hi!

Here is more complete patch to add std::bfloat16_t support on
x86, AArch64 and (only partially) on ARM 32-bit.  No BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
In C by default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
In C++ unfortunately (but that is the case of also _Float16) we don't
support excess precision yet which means that for
__bf16 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) { return a * b + c * d; }
we do a lot of conversions.
The aarch64 part is untested but has a chance of working (IMHO),
though I'd appreciate if ARM maintainers could decide whether it is
acceptable for them that __bf16 changes mangling and will allow arithmetics
and conversions.
The arm part is partial, libgcc side is missing as the target doesn't really
seem to use soft-fp right now.  Perhaps the config/arm/ changes can be
left out from the patch (thus keep ARM 32-bit __bf16 as before) and support
for it can be done at some later time.

Thoughts on this?

2022-09-29  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* config/arm/arm.h (arm_bf16_type_node): Remove.
	(arm_bf16_ptr_type_node): Adjust comment.
	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	(arm_mangle_type): Mangle BFmode as DFb16_.
	(arm_invalid_conversion): Only reject BF <-> HF conversions if
	HFmode is non-IEEE format.
	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
	arm_bf16_type_node.
	(arm_init_simd_builtin_types): Likewise.
	(arm_init_simd_builtin_scalar_types): Likewise.
	(arm_init_bf16_types): Likewise.
	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
	(aarch64_bf16_ptr_type_node): Adjust comment.
	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
	bfloat16_type_node rather than aarch64_bf16_type_node.
	(aarch64_mangle_type): Mangle BFmode as DFb16_.
	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
	aarch64_invalid_binary_op): Remove BFmode related rejections.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
	aarch64_bf16_type_node.
	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine for C++ __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DFb16_.
	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.

--- gcc/tree-core.h.jj	2022-09-29 09:13:25.717718458 +0200
+++ gcc/tree-core.h	2022-09-29 12:40:17.417778754 +0200
@@ -665,6 +665,9 @@ enum tree_index {
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
 
+  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
+  TI_BFLOAT16_TYPE,
+
   /* The _FloatN and _FloatNx types must be consecutive, and in the
      same sequence as the corresponding complex types, which must also
      be consecutive; _FloatN must come before _FloatNx; the order must
--- gcc/tree.h.jj	2022-09-29 09:13:25.720718416 +0200
+++ gcc/tree.h	2022-09-29 12:40:17.416778768 +0200
@@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
 #define float_type_node			global_trees[TI_FLOAT_TYPE]
 #define double_type_node		global_trees[TI_DOUBLE_TYPE]
 #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
+#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
 
 /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
 #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
--- gcc/tree.cc.jj	2022-09-29 09:13:31.328641080 +0200
+++ gcc/tree.cc	2022-09-29 12:40:17.400778985 +0200
@@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
        : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
-	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
@@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
   machine_mode float16_type_mode = (float16_type_node
 				    ? TYPE_MODE (float16_type_node)
 				    : VOIDmode);
+  machine_mode bfloat16_type_mode = (bfloat16_type_node
+				     ? TYPE_MODE (bfloat16_type_node)
+				     : VOIDmode);
   machine_mode float_type_mode = TYPE_MODE (float_type_node);
   machine_mode double_type_mode = TYPE_MODE (double_type_node);
 
@@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return long_double_type_node;
@@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return complex_float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return complex_double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return complex_long_double_type_node;
--- gcc/expmed.h.jj	2022-07-26 10:32:23.681271790 +0200
+++ gcc/expmed.h	2022-09-29 15:18:46.457023535 +0200
@@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
 				  rtx, tree, rtx, int);
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
+extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
+			       int);
 #ifdef GCC_OPTABS_H
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
--- gcc/expmed.cc.jj	2022-08-31 10:20:20.000000000 +0200
+++ gcc/expmed.cc	2022-09-29 15:17:52.224769673 +0200
@@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
 
 /* Likewise, but return 0 if that cannot be done.  */
 
-static rtx
+rtx
 maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 		    int amount, rtx target, int unsignedp)
 {
--- gcc/expr.cc.jj	2022-09-09 09:50:35.228575531 +0200
+++ gcc/expr.cc	2022-09-29 17:09:46.716352938 +0200
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
 
--- gcc/config/arm/arm.h.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.h	2022-09-29 12:40:17.401778971 +0200
@@ -78,9 +78,8 @@ extern void (*arm_lang_output_object_att
    the backend.  Defined in arm-builtins.cc.  */
 extern tree arm_fp16_type_node;
 
-/* This type is the user-visible __bf16.  We need it in a few places in
-   the backend.  Defined in arm-builtins.cc.  */
-extern tree arm_bf16_type_node;
+/* The user-visible __bf16 uses bfloat16_type_node, but for pointer to that
+   use backend specific tree.  Defined in arm-builtins.cc.  */
 extern tree arm_bf16_ptr_type_node;
 
 \f
--- gcc/config/arm/arm.cc.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.cc	2022-09-29 15:33:07.997170885 +0200
@@ -688,12 +688,6 @@ static const struct attribute_spec arm_a
 #undef TARGET_INVALID_CONVERSION
 #define TARGET_INVALID_CONVERSION arm_invalid_conversion
 
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
-
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
 
@@ -30360,7 +30354,7 @@ arm_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -33996,47 +33990,22 @@ arm_invalid_conversion (const_tree fromt
 {
   if (element_mode (fromtype) != element_mode (totype))
     {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
+      /* Do no allow conversions from BFmode to non-ieee HFmode
+	 scalar types or vice versa.  */
+      if (TYPE_MODE (fromtype) == BFmode
+	  && TYPE_MODE (totype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion from type %<bfloat16_t%> to %<__fp16%>");
+      if (TYPE_MODE (totype) == BFmode
+	  && TYPE_MODE (fromtype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion to type %<bfloat16_t%> from %<__fp16%>");
     }
 
   /* Conversion allowed.  */
   return NULL;
 }
 
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-arm_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
 
    In VFPv1, VFP registers could only be accessed in the mode they were
--- gcc/config/arm/arm-builtins.cc.jj	2022-09-29 09:13:25.681718954 +0200
+++ gcc/config/arm/arm-builtins.cc	2022-09-29 12:40:17.405778917 +0200
@@ -1370,7 +1370,6 @@ struct arm_simd_type_info arm_simd_types
 tree arm_fp16_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree arm_bf16_type_node = NULL_TREE;
 tree arm_bf16_ptr_type_node = NULL_TREE;
 
 static tree arm_simd_intOI_type_node = NULL_TREE;
@@ -1459,7 +1458,7 @@ arm_simd_builtin_std_type (machine_mode
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return arm_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1570,9 +1569,9 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 scalar type.  */
-  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
+  arm_simd_types[Bfloat16x2_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1658,7 +1657,7 @@ arm_init_simd_builtin_scalar_types (void
 					     "__builtin_neon_df");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_neon_ti");
-  (*lang_hooks.types.register_builtin_type) (arm_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
                                              "__builtin_neon_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1797,13 +1796,13 @@ arm_init_builtin (unsigned int fcode, ar
 static void
 arm_init_bf16_types (void)
 {
-  arm_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (arm_bf16_type_node) = 16;
-  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
-  layout_type (arm_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
-  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  arm_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Set up ACLE builtins, even builtins for instructions that are not
--- gcc/config/i386/i386.cc.jj	2022-09-29 12:03:12.073350093 +0200
+++ gcc/config/i386/i386.cc	2022-09-29 12:40:17.409778863 +0200
@@ -22728,7 +22728,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DFb16_";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22747,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24804,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
--- gcc/config/i386/i386-builtins.cc.jj	2022-09-29 09:13:25.710718554 +0200
+++ gcc/config/i386/i386-builtins.cc	2022-09-29 12:40:17.406778903 +0200
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,15 @@ ix86_register_float16_builtin_type (void
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
--- gcc/config/i386/i386-builtin-types.def.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/i386/i386-builtin-types.def	2022-09-29 12:40:17.406778903 +0200
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
--- gcc/config/aarch64/aarch64.h.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.h	2022-09-29 12:40:17.409778863 +0200
@@ -1337,9 +1337,8 @@ extern const char *aarch64_rewrite_mcpu
 extern GTY(()) tree aarch64_fp16_type_node;
 extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
-/* This type is the user-visible __bf16, and a pointer to that type.  Defined
-   in aarch64-builtins.cc.  */
-extern GTY(()) tree aarch64_bf16_type_node;
+/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
+   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
 extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
--- gcc/config/aarch64/aarch64.cc.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.cc	2022-09-29 12:40:17.413778808 +0200
@@ -19741,7 +19741,7 @@ aarch64_gimplify_va_arg_expr (tree valis
 	  field_ptr_t = aarch64_fp16_ptr_type_node;
 	  break;
 	case E_BFmode:
-	  field_t = aarch64_bf16_type_node;
+	  field_t = bfloat16_type_node;
 	  field_ptr_t = aarch64_bf16_ptr_type_node;
 	  break;
 	case E_V2SImode:
@@ -20645,7 +20645,7 @@ aarch64_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -26820,39 +26820,6 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Return the diagnostic message string if the binary operation OP is
    not permitted on TYPE1 and TYPE2, NULL otherwise.  */
 
@@ -26860,11 +26827,6 @@ static const char *
 aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
 			   const_tree type2)
 {
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
   if (VECTOR_TYPE_P (type1)
       && VECTOR_TYPE_P (type2)
       && !TYPE_INDIVISIBLE_P (type1)
@@ -27461,12 +27423,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
-
 #undef TARGET_INVALID_BINARY_OP
 #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
 
--- gcc/config/aarch64/aarch64-builtins.cc.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-builtins.cc	2022-09-29 12:40:17.410778849 +0200
@@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree aarch64_bf16_type_node = NULL_TREE;
 tree aarch64_bf16_ptr_type_node = NULL_TREE;
 
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
@@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return aarch64_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 type.  */
-  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
-  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
 					     "__builtin_aarch64_simd_poly128");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_aarch64_simd_ti");
-  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
 					     "__builtin_aarch64_simd_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1682,13 +1681,13 @@ aarch64_init_fp16_types (void)
 static void
 aarch64_init_bf16_types (void)
 {
-  aarch64_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
-  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
-  layout_type (aarch64_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
-  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Pointer authentication builtins that will become NOP on legacy platform.
--- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-sve-builtins.def	2022-09-29 12:40:17.413778808 +0200
@@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
 DEF_SVE_MODE (vnum, none, none, vectors)
 
 DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
-DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
+DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
 DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
 DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
 DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
--- gcc/c-family/c-cppbuiltin.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-cppbuiltin.cc	2022-09-29 12:40:17.416778768 +0200
@@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
       builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
 				      csuffix, FLOATN_NX_TYPE_NODE (i));
     }
+  if (bfloat16_type_node && c_dialect_cxx ())
+    {
+      if (cxx_dialect > cxx20)
+	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
+      builtin_define_float_constants ("BFLT16", "BF16", "%s",
+				      "BF16", bfloat16_type_node);
+    }
 
   /* For float.h.  */
   if (targetm.decimal_float_supported_p ())
--- gcc/c-family/c-lex.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-lex.cc	2022-09-29 12:40:17.416778768 +0200
@@ -995,6 +995,19 @@ interpret_float (const cpp_token *token,
 	  pedwarn (input_location, OPT_Wpedantic,
 		   "non-standard suffix on floating constant");
       }
+    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
+      {
+	type = bfloat16_type_node;
+	if (type == NULL_TREE)
+	  {
+	    error ("unsupported non-standard suffix on floating constant");
+	    return error_mark_node;
+	  }
+	if (cxx_dialect < cxx23)
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "%<bf16%> or %<BF16%> suffix on floating constant only "
+		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
+      }
     else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
       type = long_double_type_node;
     else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
--- gcc/cp/cp-tree.h.jj	2022-09-29 09:13:31.164643341 +0200
+++ gcc/cp/cp-tree.h	2022-09-29 12:40:17.414778795 +0200
@@ -8714,6 +8714,8 @@ extended_float_type_p (tree type)
   for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
     if (type == FLOATN_TYPE_NODE (i))
       return true;
+  if (type == bfloat16_type_node)
+    return true;
   return false;
 }
 
--- gcc/cp/typeck.cc.jj	2022-09-29 09:13:25.716718472 +0200
+++ gcc/cp/typeck.cc	2022-09-29 12:40:17.415778781 +0200
@@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
       if (mv2 == FLOATN_NX_TYPE_NODE (i))
 	extended2 = i + 1;
     }
+  if (mv1 == bfloat16_type_node)
+    extended1 = true;
+  if (mv2 == bfloat16_type_node)
+    extended2 = true;
   if (extended2 && !extended1)
     {
       int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
@@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
   if (cnt > 1 && mv2 == long_double_type_node)
     return -2;
   /* Otherwise, they have equal rank, but extended types
-     (other than std::bfloat16_t) have higher subrank.  */
+     (other than std::bfloat16_t) have higher subrank.
+     std::bfloat16_t shouldn't have equal rank to any standard
+     floating point type.  */
   return 1;
 }
 
--- libcpp/include/cpplib.h.jj	2022-09-08 13:01:19.853771383 +0200
+++ libcpp/include/cpplib.h	2022-09-28 19:06:59.615380690 +0200
@@ -1275,6 +1275,7 @@ struct cpp_num
 #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
 
 #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
+#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
 
 #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
 					      of N, divided by 16.  */
--- libcpp/expr.cc.jj	2022-09-27 08:03:27.119982735 +0200
+++ libcpp/expr.cc	2022-09-28 17:55:36.667177540 +0200
@@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
   size_t orig_len = len;
   const uchar *orig_s = s;
   size_t flags;
-  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
+  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
 
   flags = 0;
-  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
+  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
 
   /* The following decimal float suffixes, from TR 24732:2009, TS
      18661-2:2015 and C2X, are supported:
@@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
      w, W - machine-specific type such as __float80 (GNU extension).
      q, Q - machine-specific type such as __float128 (GNU extension).
      fN, FN - _FloatN (TS 18661-3:2015).
-     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
+     fNx, FNx - _FloatNx (TS 18661-3:2015).
+     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
 
   /* Process decimal float suffixes, which are two letters starting
      with d or D.  Order and case are significant.  */
@@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
 		fn++;
 	    }
 	  break;
+	case 'b': case 'B':
+	  if (len > 2
+	      /* Except for bf16 / BF16 where case is significant.  */
+	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
+	      && s[2] == '1'
+	      && s[3] == '6'
+	      && CPP_OPTION (pfile, cplusplus))
+	    {
+	      bf16++;
+	      len -= 3;
+	      s += 3;
+	      break;
+	    }
+	  return 0;
 	case 'd': case 'D': d++; break;
 	case 'l': case 'L': l++; break;
 	case 'w': case 'W': w++; break;
@@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
      of N larger than can be represented in the return value.  The
      caller is responsible for rejecting _FloatN suffixes where
      _FloatN is not supported on the chosen target.  */
-  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
+  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
     return 0;
   if (fn_bits > CPP_FLOATN_MAX)
     return 0;
@@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
 	     q ? CPP_N_MD_Q :
 	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
 	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
+	     bf16 ? CPP_N_BFLOAT16 :
 	     CPP_N_DEFAULT));
 }
 
--- libgcc/config/arm/sfp-machine.h.jj	2020-01-12 11:54:38.615380187 +0100
+++ libgcc/config/arm/sfp-machine.h	2022-09-28 19:02:51.922710542 +0200
@@ -22,6 +22,7 @@ typedef int __gcc_CMPtype __attribute__
 /* According to RTABI, QNAN is only with the most significant bit of the
    significand set, and all other significand bits zero.  */
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 #define _FP_NANFRAC_Q		_FP_QNANBIT_Q, 0, 0, 0
--- libgcc/config/aarch64/t-softfp.jj	2020-09-29 11:32:02.988602194 +0200
+++ libgcc/config/aarch64/t-softfp	2022-09-28 18:59:43.381246466 +0200
@@ -1,7 +1,7 @@
 softfp_float_modes := tf
 softfp_int_modes := si di ti
-softfp_extensions := sftf dftf hftf
-softfp_truncations := tfsf tfdf tfhf
+softfp_extensions := sftf dftf hftf bfsf
+softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
 softfp_extras := fixhfti fixunshfti floattihf floatuntihf
 
--- libgcc/config/aarch64/libgcc-softfp.ver.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/libgcc-softfp.ver	2022-09-28 19:00:36.050537146 +0200
@@ -26,3 +26,12 @@ GCC_11.0 {
   __mulhc3
   __trunctfhf2
 }
+
+%inherit GCC_13.0.0 GCC_11.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __trunchfbf2
+}
--- libgcc/config/aarch64/sfp-machine.h.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/sfp-machine.h	2022-09-28 19:02:10.303270053 +0200
@@ -43,6 +43,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
+#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
 #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
 #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
 #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
--- libgcc/config/i386/t-softfp.jj	2022-09-23 09:02:31.759659479 +0200
+++ libgcc/config/i386/t-softfp	2022-09-28 18:58:09.114520943 +0200
@@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
 LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
 
-softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
-softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
+		      tfbf xfbf dfbf sfbf hfbf
 
 softfp_extras += eqhf2
 
@@ -20,6 +21,7 @@ CFLAGS-truncsfhf2.c += -msse2
 CFLAGS-truncdfhf2.c += -msse2
 CFLAGS-truncxfhf2.c += -msse2
 CFLAGS-trunctfhf2.c += -msse2
+CFLAGS-trunchfbf2.c += -msse2
 
 CFLAGS-eqhf2.c += -msse2
 CFLAGS-_divhc3.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-23 09:02:31.746659658 +0200
+++ libgcc/config/i386/libgcc-glibc.ver	2022-09-28 18:58:09.114520943 +0200
@@ -214,3 +214,13 @@ GCC_12.0.0 {
   __trunctfhf2
   __truncxfhf2
 }
+
+%inherit GCC_13.0.0 GCC_12.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __truncxfbf2
+  __trunchfbf2
+}
--- libgcc/config/i386/sfp-machine.h.jj	2022-09-23 09:02:31.747659644 +0200
+++ libgcc/config/i386/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_QNANNEGATEDP 0
 
 #define _FP_NANSIGN_H		1
+#define _FP_NANSIGN_B		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
--- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-23 09:02:31.700660291 +0200
+++ libgcc/config/i386/64/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
--- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-23 09:02:31.683660526 +0200
+++ libgcc/config/i386/32/sfp-machine.h	2022-09-28 18:58:09.115520929 +0200
@@ -87,6 +87,7 @@
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
--- libgcc/soft-fp/brain.h.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/brain.h	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef SOFT_FP_BRAIN_H
+#define SOFT_FP_BRAIN_H	1
+
+#if _FP_W_TYPE_SIZE < 32
+# error "Here's a nickel kid.  Go buy yourself a real computer."
+#endif
+
+#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACBITS_B		8
+#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
+#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
+#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
+#define _FP_EXPBITS_B		8
+#define _FP_EXPBIAS_B		127
+#define _FP_EXPMAX_B		255
+
+#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
+#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
+#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
+#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
+#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
+
+#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
+#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
+#define _FP_HIGHBIT_DW_B	\
+  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
+
+/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
+   chosen by the target machine.  */
+
+typedef float BFtype __attribute__ ((mode (BF)));
+
+union _FP_UNION_B
+{
+  BFtype flt;
+  struct _FP_STRUCT_LAYOUT
+  {
+#if __BYTE_ORDER == __BIG_ENDIAN
+    unsigned sign : 1;
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+#else
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned sign : 1;
+#endif
+  } bits;
+};
+
+#define FP_DECL_B(X)		_FP_DECL (1, X)
+#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
+#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
+#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
+#define FP_PACK_RAW_BP(val, X)			\
+  do						\
+    {						\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_B(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_BP(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_B(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_BP(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_B(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_BP(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_B(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_BP(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
+#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
+  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
+#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
+
+/* BFmode arithmetic is not implemented.  */
+
+#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
+
+#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
+
+#endif /* !SOFT_FP_BRAIN_H */
--- libgcc/soft-fp/truncsfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncsfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+BFtype
+__truncsfbf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (B, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncdfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/truncdfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "double.h"
+
+BFtype
+__truncdfbf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (B, D, 1, 2, R, A);
+#else
+  FP_TRUNC (B, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncxfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncxfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "extended.h"
+
+BFtype
+__truncxfbf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, E, 1, 4, R, A);
+#else
+  FP_TRUNC (B, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunctfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunctfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE quad into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "quad.h"
+
+BFtype
+__trunctfbf2 (TFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_Q (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_Q (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, Q, 1, 4, R, A);
+#else
+  FP_TRUNC (B, Q, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunchfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunchfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,58 @@
+/* Software floating-point emulation.
+   Truncate IEEE half into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "half.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert HFtype to SFtype (lossless) and then
+   truncate to BFtype.  */
+
+BFtype
+__trunchfbf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, B, A);
+  FP_PACK_RAW_S (b, B);
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (B, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncbfhf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncbfhf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,75 @@
+/* Software floating-point emulation.
+   Truncate bfloat16 into IEEE half.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "brain.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert BFtype to SFtype (lossless) and then
+   truncate to HFtype.  */
+
+HFtype
+__truncbfhf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  /* Optimize BFtype to SFtype conversion to simple left shift
+     by 16 if possible, we don't need to raise exceptions on sNaN
+     here as the SFtype to HFtype truncation should do that too.  */
+  if (sizeof (BFtype) == 2
+      && sizeof (unsigned short) == 2
+      && sizeof (SFtype) == 4
+      && sizeof (unsigned int) == 4)
+    {
+      union { BFtype a; unsigned short b; } u1;
+      union { SFtype a; unsigned int b; } u2;
+      u1.a = a;
+      u2.b = (u1.b << 8) << 8;
+      b = u2.a;
+    }
+  else
+    {
+      FP_UNPACK_RAW_B (A, a);
+      FP_EXTEND (S, B, 1, 1, B, A);
+      FP_PACK_RAW_S (b, B);
+    }
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (H, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/extendbfsf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/extendbfsf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an bfloat16 converted to IEEE single
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+SFtype
+__extendbfsf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_B (A, a);
+  FP_EXTEND (S, B, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libiberty/cp-demangle.h.jj	2022-09-27 08:03:27.142982423 +0200
+++ libiberty/cp-demangle.h	2022-09-29 12:42:47.291727886 +0200
@@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (35)
+#define D_BUILTIN_TYPE_COUNT (36)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
--- libiberty/cp-demangle.c.jj	2022-09-27 08:03:27.141982437 +0200
+++ libiberty/cp-demangle.c	2022-09-29 13:04:57.083526204 +0200
@@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
   /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
   /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
+  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
 };
 
 CP_STATIC_IF_GLIBCPP_V3
@@ -2753,8 +2754,20 @@ cplus_demangle_type (struct d_info *di)
 
 	case 'F':
 	  /* DF<number>_ - _Float<number>.
-	     DF<number>x - _Float<number>x.  */
+	     DF<number>x - _Float<number>x
+	     DFb16_ - std::bfloat16_t.  */
 	  {
+	    if (d_peek_char (di) == 'b')
+	      {
+		d_advance (di, 1);
+		if (d_number (di) != 16 || d_peek_char (di) != '_')
+		  return NULL;
+		d_advance (di, 1);
+		ret = d_make_builtin_type (di,
+					   &cplus_demangle_builtin_types[35]);
+		di->expansion += ret->u.s_builtin.type->len;
+		break;
+	      }
 	    int arg = d_number (di);
 	    char buf[12];
 	    char suffix = 0;
--- libiberty/testsuite/demangle-expected.jj	2022-09-27 08:03:27.168982071 +0200
+++ libiberty/testsuite/demangle-expected	2022-09-29 12:49:02.181597532 +0200
@@ -1249,6 +1249,10 @@ xxx
 _Z3xxxDF32xDF64xDF128xCDF32xVb
 xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
 xxx
+--format=auto --no-params
+_Z3xxxDFb16_
+xxx(std::bfloat16_t)
+xxx
 # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
 --format=auto --no-params
 _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z

	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-29 15:55 [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
@ 2022-09-30 13:49 ` Jason Merrill
  2022-09-30 14:08   ` Jakub Jelinek
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Merrill @ 2022-09-30 13:49 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph S. Myers, Hongtao Liu, hjl.tools,
	Richard Earnshaw, Kyrylo Tkachov, richard.sandiford
  Cc: gcc-patches

On 9/29/22 11:55, Jakub Jelinek wrote:
> Hi!
> 
> Here is more complete patch to add std::bfloat16_t support on
> x86, AArch64 and (only partially) on ARM 32-bit.  No BFmode optabs
> are added by the patch, so for binops/unops it extends to SFmode
> first and then truncates back to BFmode.
> For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
> of all those conversions so that we avoid double rounding, for
> BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
> it emits BFmode -> SFmode conversion first and then converts to the even
> wider mode, neither step should be imprecise.
> For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
> and then SFmode -> HFmode, because neither format is subset or superset
> of the other, while SFmode is superset of both.
> expr.cc then contains a -ffast-math optimization of the BF -> SF and
> SF -> BF conversions if we don't optimize for space (and for the latter
> if -frounding-math isn't enabled either).
> For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
> but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
> || !flag_unsafe_math_optimizations, because I think the insn doesn't
> raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
> In C by default (unless x86 -fexcess-precision=16) we use float excess
> precision for BFmode, so truncate only on explicit casts and assignments.
> In C++ unfortunately (but that is the case of also _Float16) we don't
> support excess precision yet which means that for
> __bf16 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) { return a * b + c * d; }
> we do a lot of conversions.

The comment from Apple on the ABI mangling proposal suggests to me that 
we might want to delay enabling C++ std::bfloat16_t (i.e. defining 
__STDCPP_BFLOAT16_T__) until we have that excess precision support?

"Steve [Cannon] is concerned that adding this type as an arithmetic type 
might serve to be an attractive nuisance. Because the precision of 
bfloat16 is so limited, controlling when truncation back to bfloat16 
occurs is of paramount practical importance to bfloat16 users. The 
normal semantics of an arithmetic type in C and C++ encourage the 
independent evaluation of operations, which would require an implicit 
truncation back to bfloat16 on every intermediate result. That would 
have catastrophic effects on both the precision and the performance of 
typical bfloat16 code. For example, on the performance side, typical 
hardware support is built around complex fused operations (e.g. float32 
+= bfloat16 * bfloat16 + bfloat16 * bfloat16, with all intermediate 
results computed in float32) that it would not be correct to 
pattern-match from independent operations.

Now, C and C++ do allow excess precision evaluation (C 6.5p8; C++ 
[expr.pre]p6), and Steve and I think that that might fix this problem. 
But we'd really need to force excess precision evaluation in order to 
get acceptable results; otherwise, allowing arithmetic is really just 
encouraging people to write code that is effectively incorrect. And even 
then there's definitely risk that someone might e.g. accumulate the 
intermediate results of a loop in std::bfloat16_t instead of in float."

> The aarch64 part is untested but has a chance of working (IMHO),
> though I'd appreciate if ARM maintainers could decide whether it is
> acceptable for them that __bf16 changes mangling and will allow arithmetics
> and conversions.
> The arm part is partial, libgcc side is missing as the target doesn't really
> seem to use soft-fp right now.  Perhaps the config/arm/ changes can be
> left out from the patch (thus keep ARM 32-bit __bf16 as before) and support
> for it can be done at some later time.
> 
> Thoughts on this?
> 
> 2022-09-29  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/
> 	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> 	* tree.h (bfloat16_type_node): Define.
> 	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
> 	like float16_type_mode.
> 	* expmed.h (maybe_expand_shift): Declare.
> 	* expmed.cc (maybe_expand_shift): No longer static.
> 	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> 	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> 	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> 	-ffast-math generic implementation for BF -> SF and SF -> BF
> 	conversions.
> 	* config/arm/arm.h (arm_bf16_type_node): Remove.
> 	(arm_bf16_ptr_type_node): Adjust comment.
> 	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	(arm_mangle_type): Mangle BFmode as DFb16_.

If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?

> 	(arm_invalid_conversion): Only reject BF <-> HF conversions if
> 	HFmode is non-IEEE format.
> 	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
> 	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
> 	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
> 	arm_bf16_type_node.
> 	(arm_init_simd_builtin_types): Likewise.
> 	(arm_init_simd_builtin_scalar_types): Likewise.
> 	(arm_init_bf16_types): Likewise.
> 	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
> 	(ix86_invalid_conversion, ix86_invalid_unary_op,
> 	ix86_invalid_binary_op): Remove.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> 	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> 	ix86_bf16_type_node.
> 	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> 	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
> 	(aarch64_bf16_ptr_type_node): Adjust comment.
> 	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
> 	bfloat16_type_node rather than aarch64_bf16_type_node.
> 	(aarch64_mangle_type): Mangle BFmode as DFb16_.
> 	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
> 	aarch64_invalid_binary_op): Remove BFmode related rejections.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
> 	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
> 	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
> 	aarch64_bf16_type_node.
> 	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
> 	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
> gcc/c-family/
> 	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> 	predefine for C++ __BFLT16_*__ macros and for C++23 also
> 	__STDCPP_BFLOAT16_T__.
> 	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
> gcc/cp/
> 	* cp-tree.h (extended_float_type_p): Return true for
> 	bfloat16_type_node.
> 	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> 	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> libcpp/
> 	* include/cpplib.h (CPP_N_BFLOAT16): Define.
> 	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> 	C++.
> libgcc/
> 	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
> 	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
> 	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> 	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> 	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> 	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* soft-fp/brain.h: New file.
> 	* soft-fp/truncsfbf2.c: New file.
> 	* soft-fp/truncdfbf2.c: New file.
> 	* soft-fp/truncxfbf2.c: New file.
> 	* soft-fp/trunctfbf2.c: New file.
> 	* soft-fp/trunchfbf2.c: New file.
> 	* soft-fp/truncbfhf2.c: New file.
> 	* soft-fp/extendbfsf2.c: New file.
> libiberty/
> 	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> 	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> 	entry.
> 	(cplus_demangle_type): Demangle DFb16_.
> 	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.
> 
> --- gcc/tree-core.h.jj	2022-09-29 09:13:25.717718458 +0200
> +++ gcc/tree-core.h	2022-09-29 12:40:17.417778754 +0200
> @@ -665,6 +665,9 @@ enum tree_index {
>     TI_DOUBLE_TYPE,
>     TI_LONG_DOUBLE_TYPE,
>   
> +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> +  TI_BFLOAT16_TYPE,
> +
>     /* The _FloatN and _FloatNx types must be consecutive, and in the
>        same sequence as the corresponding complex types, which must also
>        be consecutive; _FloatN must come before _FloatNx; the order must
> --- gcc/tree.h.jj	2022-09-29 09:13:25.720718416 +0200
> +++ gcc/tree.h	2022-09-29 12:40:17.416778768 +0200
> @@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
>   #define float_type_node			global_trees[TI_FLOAT_TYPE]
>   #define double_type_node		global_trees[TI_DOUBLE_TYPE]
>   #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
> +#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
>   
>   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
>   #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> --- gcc/tree.cc.jj	2022-09-29 09:13:31.328641080 +0200
> +++ gcc/tree.cc	2022-09-29 12:40:17.400778985 +0200
> @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
>       = (flag_excess_precision == EXCESS_PRECISION_FAST
>          ? EXCESS_PRECISION_TYPE_FAST
>          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> -	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> +	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
>   
>     enum flt_eval_method target_flt_eval_method
>       = targetm.c.excess_precision (requested_type);
> @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
>     machine_mode float16_type_mode = (float16_type_node
>   				    ? TYPE_MODE (float16_type_node)
>   				    : VOIDmode);
> +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +				     ? TYPE_MODE (bfloat16_type_node)
> +				     : VOIDmode);
>     machine_mode float_type_mode = TYPE_MODE (float_type_node);
>     machine_mode double_type_mode = TYPE_MODE (double_type_node);
>   
> @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return long_double_type_node;
> @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return complex_float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return complex_double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return complex_long_double_type_node;
> --- gcc/expmed.h.jj	2022-07-26 10:32:23.681271790 +0200
> +++ gcc/expmed.h	2022-09-29 15:18:46.457023535 +0200
> @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
>   				  rtx, tree, rtx, int);
>   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
>   			 int);
> +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> +			       int);
>   #ifdef GCC_OPTABS_H
>   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>   			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> --- gcc/expmed.cc.jj	2022-08-31 10:20:20.000000000 +0200
> +++ gcc/expmed.cc	2022-09-29 15:17:52.224769673 +0200
> @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
>   
>   /* Likewise, but return 0 if that cannot be done.  */
>   
> -static rtx
> +rtx
>   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>   		    int amount, rtx target, int unsignedp)
>   {
> --- gcc/expr.cc.jj	2022-09-09 09:50:35.228575531 +0200
> +++ gcc/expr.cc	2022-09-29 17:09:46.716352938 +0200
> @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
>         gcc_assert ((GET_MODE_PRECISION (from_mode)
>   		   != GET_MODE_PRECISION (to_mode))
>   		  || (DECIMAL_FLOAT_MODE_P (from_mode)
> -		      != DECIMAL_FLOAT_MODE_P (to_mode)));
> +		      != DECIMAL_FLOAT_MODE_P (to_mode))
> +		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>   
>         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   	/* Conversion between decimal float and binary float, same size.  */
> @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
>   	  return;
>   	}
>   
> +#ifdef HAVE_SFmode
> +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> +	{
> +	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> +	    {
> +	      /* To cut down on libgcc size, implement
> +		 BFmode -> {DF,XF,TF}mode conversions by
> +		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +	    {
> +	      /* Similarly, implement BFmode -> HFmode as
> +		 BFmode -> SFmode -> HFmode conversion where SFmode
> +		 has superset of BFmode values.  We don't need
> +		 to handle sNaNs by raising exception and turning
> +		 into into qNaN though, as that can be done in the
> +		 SFmode -> HFmode conversion too.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      int save_flag_finite_math_only = flag_finite_math_only;
> +	      flag_finite_math_only = true;
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      flag_finite_math_only = save_flag_finite_math_only;
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (to_mode == SFmode
> +	      && !HONOR_NANS (from_mode)
> +	      && !HONOR_NANS (to_mode)
> +	      && optimize_insn_for_speed_p ())
> +	    {
> +	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> +		 shift the bits up.  */
> +	      machine_mode fromi_mode, toi_mode;
> +	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				     0).exists (&fromi_mode)
> +		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +					0).exists (&toi_mode))
> +		{
> +		  start_sequence ();
> +		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +		  rtx tof = NULL_RTX;
> +		  if (fromi)
> +		    {
> +		      rtx toi = gen_reg_rtx (toi_mode);
> +		      convert_mode_scalar (toi, fromi, 1);
> +		      toi
> +			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> +					      GET_MODE_PRECISION (to_mode)
> +					      - GET_MODE_PRECISION (from_mode),
> +					      NULL_RTX, 1);
> +		      if (toi)
> +			{
> +			  tof = lowpart_subreg (to_mode, toi, toi_mode);
> +			  if (tof)
> +			    emit_move_insn (to, tof);
> +			}
> +		    }
> +		  insns = get_insns ();
> +		  end_sequence ();
> +		  if (tof)
> +		    {
> +		      emit_insn (insns);
> +		      return;
> +		    }
> +		}
> +	    }
> +	}
> +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> +	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +	  && !HONOR_NANS (from_mode)
> +	  && !HONOR_NANS (to_mode)
> +	  && !flag_rounding_math
> +	  && optimize_insn_for_speed_p ())
> +	{
> +	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
> +	     to nearest, we can expand the conversion inline as
> +	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> +	  machine_mode fromi_mode, toi_mode;
> +	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				 0).exists (&fromi_mode)
> +	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +				    0).exists (&toi_mode))
> +	    {
> +	      start_sequence ();
> +	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +	      rtx tof = NULL_RTX;
> +	      do
> +		{
> +		  if (!fromi)
> +		    break;
> +		  int shift = (GET_MODE_PRECISION (from_mode)
> +			       - GET_MODE_PRECISION (to_mode));
> +		  rtx temp1
> +		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> +					  shift, NULL_RTX, 1);
> +		  if (!temp1)
> +		    break;
> +		  rtx temp2
> +		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp2)
> +		    break;
> +		  rtx temp3
> +		    = expand_binop (fromi_mode, add_optab, fromi,
> +				    gen_int_mode ((HOST_WIDE_INT_1U
> +						   << (shift - 1)) - 1,
> +						  fromi_mode), NULL_RTX,
> +				    1, OPTAB_DIRECT);
> +		  if (!temp3)
> +		    break;
> +		  rtx temp4
> +		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp4)
> +		    break;
> +		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> +						  temp4, shift, NULL_RTX, 1);
> +		  if (!temp5)
> +		    break;
> +		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> +		  if (!temp6)
> +		    break;
> +		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> +					toi_mode);
> +		  if (tof)
> +		    emit_move_insn (to, tof);
> +		}
> +	      while (0);
> +	      insns = get_insns ();
> +	      end_sequence ();
> +	      if (tof)
> +		{
> +		  emit_insn (insns);
> +		  return;
> +		}
> +	    }
> +	}
> +#endif
> +
>         /* Otherwise use a libcall.  */
>         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
>   
> --- gcc/config/arm/arm.h.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/arm/arm.h	2022-09-29 12:40:17.401778971 +0200
> @@ -78,9 +78,8 @@ extern void (*arm_lang_output_object_att
>      the backend.  Defined in arm-builtins.cc.  */
>   extern tree arm_fp16_type_node;
>   
> -/* This type is the user-visible __bf16.  We need it in a few places in
> -   the backend.  Defined in arm-builtins.cc.  */
> -extern tree arm_bf16_type_node;
> +/* The user-visible __bf16 uses bfloat16_type_node, but for pointer to that
> +   use backend specific tree.  Defined in arm-builtins.cc.  */
>   extern tree arm_bf16_ptr_type_node;
>   
>   \f
> --- gcc/config/arm/arm.cc.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/arm/arm.cc	2022-09-29 15:33:07.997170885 +0200
> @@ -688,12 +688,6 @@ static const struct attribute_spec arm_a
>   #undef TARGET_INVALID_CONVERSION
>   #define TARGET_INVALID_CONVERSION arm_invalid_conversion
>   
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
> -
>   #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
>   #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
>   
> @@ -30360,7 +30354,7 @@ arm_mangle_type (const_tree type)
>     if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
>       {
>         if (TYPE_MODE (type) == BFmode)
> -	return "u6__bf16";
> +	return "DFb16_";
>         else
>   	return "Dh";
>       }
> @@ -33996,47 +33990,22 @@ arm_invalid_conversion (const_tree fromt
>   {
>     if (element_mode (fromtype) != element_mode (totype))
>       {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<bfloat16_t%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<bfloat16_t%>");
> +      /* Do no allow conversions from BFmode to non-ieee HFmode
> +	 scalar types or vice versa.  */
> +      if (TYPE_MODE (fromtype) == BFmode
> +	  && TYPE_MODE (totype) == HFmode
> +	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
> +	return N_("invalid conversion from type %<bfloat16_t%> to %<__fp16%>");
> +      if (TYPE_MODE (totype) == BFmode
> +	  && TYPE_MODE (fromtype) == HFmode
> +	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
> +	return N_("invalid conversion to type %<bfloat16_t%> from %<__fp16%>");
>       }
>   
>     /* Conversion allowed.  */
>     return NULL;
>   }
>   
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -arm_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
>   
>      In VFPv1, VFP registers could only be accessed in the mode they were
> --- gcc/config/arm/arm-builtins.cc.jj	2022-09-29 09:13:25.681718954 +0200
> +++ gcc/config/arm/arm-builtins.cc	2022-09-29 12:40:17.405778917 +0200
> @@ -1370,7 +1370,6 @@ struct arm_simd_type_info arm_simd_types
>   tree arm_fp16_type_node = NULL_TREE;
>   
>   /* Back-end node type for brain float (bfloat) types.  */
> -tree arm_bf16_type_node = NULL_TREE;
>   tree arm_bf16_ptr_type_node = NULL_TREE;
>   
>   static tree arm_simd_intOI_type_node = NULL_TREE;
> @@ -1459,7 +1458,7 @@ arm_simd_builtin_std_type (machine_mode
>       case E_DFmode:
>         return double_type_node;
>       case E_BFmode:
> -      return arm_bf16_type_node;
> +      return bfloat16_type_node;
>       default:
>         gcc_unreachable ();
>       }
> @@ -1570,9 +1569,9 @@ arm_init_simd_builtin_types (void)
>     arm_simd_types[Float32x4_t].eltype = float_type_node;
>   
>     /* Init Bfloat vector types with underlying __bf16 scalar type.  */
> -  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
> -  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
> -  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
> +  arm_simd_types[Bfloat16x2_t].eltype = bfloat16_type_node;
> +  arm_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
> +  arm_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
>   
>     for (i = 0; i < nelts; i++)
>       {
> @@ -1658,7 +1657,7 @@ arm_init_simd_builtin_scalar_types (void
>   					     "__builtin_neon_df");
>     (*lang_hooks.types.register_builtin_type) (intTI_type_node,
>   					     "__builtin_neon_ti");
> -  (*lang_hooks.types.register_builtin_type) (arm_bf16_type_node,
> +  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
>                                                "__builtin_neon_bf");
>     /* Unsigned integer types for various mode sizes.  */
>     (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
> @@ -1797,13 +1796,13 @@ arm_init_builtin (unsigned int fcode, ar
>   static void
>   arm_init_bf16_types (void)
>   {
> -  arm_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (arm_bf16_type_node) = 16;
> -  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
> -  layout_type (arm_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
> -  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
> -  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
> +  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +  arm_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>   }
>   
>   /* Set up ACLE builtins, even builtins for instructions that are not
> --- gcc/config/i386/i386.cc.jj	2022-09-29 12:03:12.073350093 +0200
> +++ gcc/config/i386/i386.cc	2022-09-29 12:40:17.409778863 +0200
> @@ -22728,7 +22728,7 @@ ix86_mangle_type (const_tree type)
>     switch (TYPE_MODE (type))
>       {
>       case E_BFmode:
> -      return "u6__bf16";
> +      return "DFb16_";
>       case E_HFmode:
>         /* _Float16 is "DF16_".
>   	 Align with clang's decision in https://reviews.llvm.org/D33719. */
> @@ -22747,55 +22747,6 @@ ix86_mangle_type (const_tree type)
>       }
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<__bf16%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<__bf16%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
>   
>   static tree
> @@ -24853,15 +24804,6 @@ ix86_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE ix86_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> -
>   #undef TARGET_STACK_PROTECT_GUARD
>   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
>   
> --- gcc/config/i386/i386-builtins.cc.jj	2022-09-29 09:13:25.710718554 +0200
> +++ gcc/config/i386/i386-builtins.cc	2022-09-29 12:40:17.406778903 +0200
> @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>   
>   tree ix86_float16_type_node = NULL_TREE;
> -tree ix86_bf16_type_node = NULL_TREE;
>   tree ix86_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Retrieve an element from the above table, building some of
> @@ -1372,16 +1371,15 @@ ix86_register_float16_builtin_type (void
>   static void
>   ix86_register_bf16_builtin_type (void)
>   {
> -  ix86_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> -  layout_type (ix86_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
>     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
>       {
> -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> -					    "__bf16");
> -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>       }
>   }
>   
> --- gcc/config/i386/i386-builtin-types.def.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/i386/i386-builtin-types.def	2022-09-29 12:40:17.406778903 +0200
> @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
>   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> --- gcc/config/aarch64/aarch64.h.jj	2022-09-29 09:13:25.680718968 +0200
> +++ gcc/config/aarch64/aarch64.h	2022-09-29 12:40:17.409778863 +0200
> @@ -1337,9 +1337,8 @@ extern const char *aarch64_rewrite_mcpu
>   extern GTY(()) tree aarch64_fp16_type_node;
>   extern GTY(()) tree aarch64_fp16_ptr_type_node;
>   
> -/* This type is the user-visible __bf16, and a pointer to that type.  Defined
> -   in aarch64-builtins.cc.  */
> -extern GTY(()) tree aarch64_bf16_type_node;
> +/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
> +   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
>   extern GTY(()) tree aarch64_bf16_ptr_type_node;
>   
>   /* The generic unwind code in libgcc does not initialize the frame pointer.
> --- gcc/config/aarch64/aarch64.cc.jj	2022-09-29 09:13:25.680718968 +0200
> +++ gcc/config/aarch64/aarch64.cc	2022-09-29 12:40:17.413778808 +0200
> @@ -19741,7 +19741,7 @@ aarch64_gimplify_va_arg_expr (tree valis
>   	  field_ptr_t = aarch64_fp16_ptr_type_node;
>   	  break;
>   	case E_BFmode:
> -	  field_t = aarch64_bf16_type_node;
> +	  field_t = bfloat16_type_node;
>   	  field_ptr_t = aarch64_bf16_ptr_type_node;
>   	  break;
>   	case E_V2SImode:
> @@ -20645,7 +20645,7 @@ aarch64_mangle_type (const_tree type)
>     if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
>       {
>         if (TYPE_MODE (type) == BFmode)
> -	return "u6__bf16";
> +	return "DFb16_";
>         else
>   	return "Dh";
>       }
> @@ -26820,39 +26820,6 @@ aarch64_stack_protect_guard (void)
>     return NULL_TREE;
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<bfloat16_t%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<bfloat16_t%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -aarch64_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   /* Return the diagnostic message string if the binary operation OP is
>      not permitted on TYPE1 and TYPE2, NULL otherwise.  */
>   
> @@ -26860,11 +26827,6 @@ static const char *
>   aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
>   			   const_tree type2)
>   {
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
>     if (VECTOR_TYPE_P (type1)
>         && VECTOR_TYPE_P (type2)
>         && !TYPE_INDIVISIBLE_P (type1)
> @@ -27461,12 +27423,6 @@ aarch64_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE aarch64_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
> -
>   #undef TARGET_INVALID_BINARY_OP
>   #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
>   
> --- gcc/config/aarch64/aarch64-builtins.cc.jj	2022-09-29 09:13:25.676719023 +0200
> +++ gcc/config/aarch64/aarch64-builtins.cc	2022-09-29 12:40:17.410778849 +0200
> @@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
>   tree aarch64_fp16_ptr_type_node = NULL_TREE;
>   
>   /* Back-end node type for brain float (bfloat) types.  */
> -tree aarch64_bf16_type_node = NULL_TREE;
>   tree aarch64_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Wrapper around add_builtin_function.  NAME is the name of the built-in
> @@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
>       case E_DFmode:
>         return double_type_node;
>       case E_BFmode:
> -      return aarch64_bf16_type_node;
> +      return bfloat16_type_node;
>       default:
>         gcc_unreachable ();
>       }
> @@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
>     aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>   
>     /* Init Bfloat vector types with underlying __bf16 type.  */
> -  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
> -  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> +  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
> +  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
>   
>     for (i = 0; i < nelts; i++)
>       {
> @@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
>   					     "__builtin_aarch64_simd_poly128");
>     (*lang_hooks.types.register_builtin_type) (intTI_type_node,
>   					     "__builtin_aarch64_simd_ti");
> -  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
> +  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
>   					     "__builtin_aarch64_simd_bf");
>     /* Unsigned integer types for various mode sizes.  */
>     (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
> @@ -1682,13 +1681,13 @@ aarch64_init_fp16_types (void)
>   static void
>   aarch64_init_bf16_types (void)
>   {
> -  aarch64_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
> -  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
> -  layout_type (aarch64_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
> -  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
> -  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
> +  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>   }
>   
>   /* Pointer authentication builtins that will become NOP on legacy platform.
> --- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2022-09-29 09:13:25.676719023 +0200
> +++ gcc/config/aarch64/aarch64-sve-builtins.def	2022-09-29 12:40:17.413778808 +0200
> @@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
>   DEF_SVE_MODE (vnum, none, none, vectors)
>   
>   DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
> -DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
> +DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
>   DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
>   DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
>   DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
> --- gcc/c-family/c-cppbuiltin.cc.jj	2022-09-29 09:13:25.675719037 +0200
> +++ gcc/c-family/c-cppbuiltin.cc	2022-09-29 12:40:17.416778768 +0200
> @@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
>         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
>   				      csuffix, FLOATN_NX_TYPE_NODE (i));
>       }
> +  if (bfloat16_type_node && c_dialect_cxx ())
> +    {
> +      if (cxx_dialect > cxx20)
> +	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> +				      "BF16", bfloat16_type_node);
> +    }
>   
>     /* For float.h.  */
>     if (targetm.decimal_float_supported_p ())
> --- gcc/c-family/c-lex.cc.jj	2022-09-29 09:13:25.675719037 +0200
> +++ gcc/c-family/c-lex.cc	2022-09-29 12:40:17.416778768 +0200
> @@ -995,6 +995,19 @@ interpret_float (const cpp_token *token,
>   	  pedwarn (input_location, OPT_Wpedantic,
>   		   "non-standard suffix on floating constant");
>         }
> +    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
> +      {
> +	type = bfloat16_type_node;
> +	if (type == NULL_TREE)
> +	  {
> +	    error ("unsupported non-standard suffix on floating constant");
> +	    return error_mark_node;
> +	  }
> +	if (cxx_dialect < cxx23)
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "%<bf16%> or %<BF16%> suffix on floating constant only "
> +		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> +      }
>       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
>         type = long_double_type_node;
>       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> --- gcc/cp/cp-tree.h.jj	2022-09-29 09:13:31.164643341 +0200
> +++ gcc/cp/cp-tree.h	2022-09-29 12:40:17.414778795 +0200
> @@ -8714,6 +8714,8 @@ extended_float_type_p (tree type)
>     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
>       if (type == FLOATN_TYPE_NODE (i))
>         return true;
> +  if (type == bfloat16_type_node)
> +    return true;
>     return false;
>   }
>   
> --- gcc/cp/typeck.cc.jj	2022-09-29 09:13:25.716718472 +0200
> +++ gcc/cp/typeck.cc	2022-09-29 12:40:17.415778781 +0200
> @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
>         if (mv2 == FLOATN_NX_TYPE_NODE (i))
>   	extended2 = i + 1;
>       }
> +  if (mv1 == bfloat16_type_node)
> +    extended1 = true;
> +  if (mv2 == bfloat16_type_node)
> +    extended2 = true;
>     if (extended2 && !extended1)
>       {
>         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
>     if (cnt > 1 && mv2 == long_double_type_node)
>       return -2;
>     /* Otherwise, they have equal rank, but extended types
> -     (other than std::bfloat16_t) have higher subrank.  */
> +     (other than std::bfloat16_t) have higher subrank.
> +     std::bfloat16_t shouldn't have equal rank to any standard
> +     floating point type.  */
>     return 1;
>   }
>   
> --- libcpp/include/cpplib.h.jj	2022-09-08 13:01:19.853771383 +0200
> +++ libcpp/include/cpplib.h	2022-09-28 19:06:59.615380690 +0200
> @@ -1275,6 +1275,7 @@ struct cpp_num
>   #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
>   
>   #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
> +#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
>   
>   #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
>   					      of N, divided by 16.  */
> --- libcpp/expr.cc.jj	2022-09-27 08:03:27.119982735 +0200
> +++ libcpp/expr.cc	2022-09-28 17:55:36.667177540 +0200
> @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
>     size_t orig_len = len;
>     const uchar *orig_s = s;
>     size_t flags;
> -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
>   
>     flags = 0;
> -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
>   
>     /* The following decimal float suffixes, from TR 24732:2009, TS
>        18661-2:2015 and C2X, are supported:
> @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
>        w, W - machine-specific type such as __float80 (GNU extension).
>        q, Q - machine-specific type such as __float128 (GNU extension).
>        fN, FN - _FloatN (TS 18661-3:2015).
> -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
>   
>     /* Process decimal float suffixes, which are two letters starting
>        with d or D.  Order and case are significant.  */
> @@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
>   		fn++;
>   	    }
>   	  break;
> +	case 'b': case 'B':
> +	  if (len > 2
> +	      /* Except for bf16 / BF16 where case is significant.  */
> +	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
> +	      && s[2] == '1'
> +	      && s[3] == '6'
> +	      && CPP_OPTION (pfile, cplusplus))
> +	    {
> +	      bf16++;
> +	      len -= 3;
> +	      s += 3;
> +	      break;
> +	    }
> +	  return 0;
>   	case 'd': case 'D': d++; break;
>   	case 'l': case 'L': l++; break;
>   	case 'w': case 'W': w++; break;
> @@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
>        of N larger than can be represented in the return value.  The
>        caller is responsible for rejecting _FloatN suffixes where
>        _FloatN is not supported on the chosen target.  */
> -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
>       return 0;
>     if (fn_bits > CPP_FLOATN_MAX)
>       return 0;
> @@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
>   	     q ? CPP_N_MD_Q :
>   	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
>   	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> +	     bf16 ? CPP_N_BFLOAT16 :
>   	     CPP_N_DEFAULT));
>   }
>   
> --- libgcc/config/arm/sfp-machine.h.jj	2020-01-12 11:54:38.615380187 +0100
> +++ libgcc/config/arm/sfp-machine.h	2022-09-28 19:02:51.922710542 +0200
> @@ -22,6 +22,7 @@ typedef int __gcc_CMPtype __attribute__
>   /* According to RTABI, QNAN is only with the most significant bit of the
>      significand set, and all other significand bits zero.  */
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   #define _FP_NANFRAC_Q		_FP_QNANBIT_Q, 0, 0, 0
> --- libgcc/config/aarch64/t-softfp.jj	2020-09-29 11:32:02.988602194 +0200
> +++ libgcc/config/aarch64/t-softfp	2022-09-28 18:59:43.381246466 +0200
> @@ -1,7 +1,7 @@
>   softfp_float_modes := tf
>   softfp_int_modes := si di ti
> -softfp_extensions := sftf dftf hftf
> -softfp_truncations := tfsf tfdf tfhf
> +softfp_extensions := sftf dftf hftf bfsf
> +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
>   softfp_exclude_libgcc2 := n
>   softfp_extras := fixhfti fixunshfti floattihf floatuntihf
>   
> --- libgcc/config/aarch64/libgcc-softfp.ver.jj	2022-01-11 23:11:23.691271871 +0100
> +++ libgcc/config/aarch64/libgcc-softfp.ver	2022-09-28 19:00:36.050537146 +0200
> @@ -26,3 +26,12 @@ GCC_11.0 {
>     __mulhc3
>     __trunctfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_11.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/aarch64/sfp-machine.h.jj	2022-01-11 23:11:23.691271871 +0100
> +++ libgcc/config/aarch64/sfp-machine.h	2022-09-28 19:02:10.303270053 +0200
> @@ -43,6 +43,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
> +#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
>   #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
>   #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
>   #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
> --- libgcc/config/i386/t-softfp.jj	2022-09-23 09:02:31.759659479 +0200
> +++ libgcc/config/i386/t-softfp	2022-09-28 18:58:09.114520943 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>   
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +		      tfbf xfbf dfbf sfbf hfbf
>   
>   softfp_extras += eqhf2
>   
> @@ -20,6 +21,7 @@ CFLAGS-truncsfhf2.c += -msse2
>   CFLAGS-truncdfhf2.c += -msse2
>   CFLAGS-truncxfhf2.c += -msse2
>   CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>   
>   CFLAGS-eqhf2.c += -msse2
>   CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-23 09:02:31.746659658 +0200
> +++ libgcc/config/i386/libgcc-glibc.ver	2022-09-28 18:58:09.114520943 +0200
> @@ -214,3 +214,13 @@ GCC_12.0.0 {
>     __trunctfhf2
>     __truncxfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/i386/sfp-machine.h.jj	2022-09-23 09:02:31.747659644 +0200
> +++ libgcc/config/i386/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_QNANNEGATEDP 0
>   
>   #define _FP_NANSIGN_H		1
> +#define _FP_NANSIGN_B		1
>   #define _FP_NANSIGN_S		1
>   #define _FP_NANSIGN_D		1
>   #define _FP_NANSIGN_E		1
> --- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-23 09:02:31.700660291 +0200
> +++ libgcc/config/i386/64/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D
>   #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
> --- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-23 09:02:31.683660526 +0200
> +++ libgcc/config/i386/32/sfp-machine.h	2022-09-28 18:58:09.115520929 +0200
> @@ -87,6 +87,7 @@
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   /* Even if XFmode is 12byte,  we have to pad it to
> --- libgcc/soft-fp/brain.h.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/brain.h	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H	1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B		8
> +#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B		8
> +#define _FP_EXPBIAS_B		127
> +#define _FP_EXPMAX_B		255
> +
> +#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B	\
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)		_FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)			\
> +  do						\
> +    {						\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_B(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncsfbf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/truncdfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncxfbf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/trunctfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/trunchfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncbfhf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/extendbfsf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libiberty/cp-demangle.h.jj	2022-09-27 08:03:27.142982423 +0200
> +++ libiberty/cp-demangle.h	2022-09-29 12:42:47.291727886 +0200
> @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
>   extern const struct demangle_operator_info cplus_demangle_operators[];
>   #endif
>   
> -#define D_BUILTIN_TYPE_COUNT (35)
> +#define D_BUILTIN_TYPE_COUNT (36)
>   
>   CP_STATIC_IF_GLIBCPP_V3
>   const struct demangle_builtin_type_info
> --- libiberty/cp-demangle.c.jj	2022-09-27 08:03:27.141982437 +0200
> +++ libiberty/cp-demangle.c	2022-09-29 13:04:57.083526204 +0200
> @@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
>     /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
>   	     D_PRINT_DEFAULT },
>     /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
> +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
>   };
>   
>   CP_STATIC_IF_GLIBCPP_V3
> @@ -2753,8 +2754,20 @@ cplus_demangle_type (struct d_info *di)
>   
>   	case 'F':
>   	  /* DF<number>_ - _Float<number>.
> -	     DF<number>x - _Float<number>x.  */
> +	     DF<number>x - _Float<number>x
> +	     DFb16_ - std::bfloat16_t.  */
>   	  {
> +	    if (d_peek_char (di) == 'b')
> +	      {
> +		d_advance (di, 1);
> +		if (d_number (di) != 16 || d_peek_char (di) != '_')
> +		  return NULL;
> +		d_advance (di, 1);
> +		ret = d_make_builtin_type (di,
> +					   &cplus_demangle_builtin_types[35]);
> +		di->expansion += ret->u.s_builtin.type->len;
> +		break;
> +	      }
>   	    int arg = d_number (di);
>   	    char buf[12];
>   	    char suffix = 0;
> --- libiberty/testsuite/demangle-expected.jj	2022-09-27 08:03:27.168982071 +0200
> +++ libiberty/testsuite/demangle-expected	2022-09-29 12:49:02.181597532 +0200
> @@ -1249,6 +1249,10 @@ xxx
>   _Z3xxxDF32xDF64xDF128xCDF32xVb
>   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
>   xxx
> +--format=auto --no-params
> +_Z3xxxDFb16_
> +xxx(std::bfloat16_t)
> +xxx
>   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
>   --format=auto --no-params
>   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-30 13:49 ` Jason Merrill
@ 2022-09-30 14:08   ` Jakub Jelinek
  2022-09-30 18:21     ` Joseph Myers
  2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
  0 siblings, 2 replies; 22+ messages in thread
From: Jakub Jelinek @ 2022-09-30 14:08 UTC (permalink / raw)
  To: Jason Merrill, Jonathan Wakely
  Cc: Joseph S. Myers, Hongtao Liu, hjl.tools, Richard Earnshaw,
	Kyrylo Tkachov, richard.sandiford, gcc-patches

On Fri, Sep 30, 2022 at 09:49:08AM -0400, Jason Merrill wrote:
> The comment from Apple on the ABI mangling proposal suggests to me that we
> might want to delay enabling C++ std::bfloat16_t (i.e. defining
> __STDCPP_BFLOAT16_T__) until we have that excess precision support?

I saw that comment.  We have similar problem with _Float16 too, where C++
effectively right now works as when one uses -fexcess-precision=16 in C
(which isn't default).
I can see how hard would it be to add EXCESS_PRECISION_EXPR support to C++
FE.

> > 	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
> > 	TARGET_INVALID_BINARY_OP): Don't redefine.
> > 	(arm_mangle_type): Mangle BFmode as DFb16_.
> 
> If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?

Perhaps, I just followed what was in the pull request.  Can change it.

Anyway, overnight testing found some missing CFLAGS-*.c += -msse2
lines in libgcc/config/i386/t-softfp and some i386 tests that were testing
the inability to cast or use __bf16 in expressions, so have adjusted that
too.
What isn't in the patch but I think we'll need to also change are some
minimal set of __builtin_*bf16 builtins.  Seems for _Float16, GCC provides
all the __builtin_*f16 (and for C/ObjC even with *f16 names), but there is
no glibc support for any of that, so builtins that are expanded by the
compiler are fine, but what should be fall back to libm won't work.
Maybe at least for now it is acceptable to implement most <cmath> and
<complex> std::float16_t and std::bfloat16_t overloads with using
__builtin_*f and explicitly narrow down, but I think at least nextafter
(and apparently nexttoward as an alias for it for extended floating
point types) needs to be specific for the particular floating point format.
And, while e.g. most of bfloat16_t stuff in numeric_limits can be done
using say (__bf16) __builtin_nanf ("") and similar, at least for sNaN
I'm afraid there is no replacement, as it needs to be constexpr and so
we can't use union type punning to get signaling NaN value.  So we'll
need at least __builtin_nansbf16, maybe __builtin_huge_valbf16 and some
others for start.

2022-09-30  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* config/arm/arm.h (arm_bf16_type_node): Remove.
	(arm_bf16_ptr_type_node): Adjust comment.
	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	(arm_mangle_type): Mangle BFmode as DFb16_.
	(arm_invalid_conversion): Only reject BF <-> HF conversions if
	HFmode is non-IEEE format.
	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
	arm_bf16_type_node.
	(arm_init_simd_builtin_types): Likewise.
	(arm_init_simd_builtin_scalar_types): Likewise.
	(arm_init_bf16_types): Likewise.
	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
	(aarch64_bf16_ptr_type_node): Adjust comment.
	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
	bfloat16_type_node rather than aarch64_bf16_type_node.
	(aarch64_mangle_type): Mangle BFmode as DFb16_.
	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
	aarch64_invalid_binary_op): Remove BFmode related rejections.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
	aarch64_bf16_type_node.
	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine for C++ __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
	diagnostics.
	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
	-msse2.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DFb16_.
	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.

--- gcc/tree-core.h.jj	2022-09-29 09:13:25.717718458 +0200
+++ gcc/tree-core.h	2022-09-29 12:40:17.417778754 +0200
@@ -665,6 +665,9 @@ enum tree_index {
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
 
+  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
+  TI_BFLOAT16_TYPE,
+
   /* The _FloatN and _FloatNx types must be consecutive, and in the
      same sequence as the corresponding complex types, which must also
      be consecutive; _FloatN must come before _FloatNx; the order must
--- gcc/tree.h.jj	2022-09-29 09:13:25.720718416 +0200
+++ gcc/tree.h	2022-09-29 12:40:17.416778768 +0200
@@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
 #define float_type_node			global_trees[TI_FLOAT_TYPE]
 #define double_type_node		global_trees[TI_DOUBLE_TYPE]
 #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
+#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
 
 /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
 #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
--- gcc/tree.cc.jj	2022-09-29 09:13:31.328641080 +0200
+++ gcc/tree.cc	2022-09-29 12:40:17.400778985 +0200
@@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
        : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
-	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
@@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
   machine_mode float16_type_mode = (float16_type_node
 				    ? TYPE_MODE (float16_type_node)
 				    : VOIDmode);
+  machine_mode bfloat16_type_mode = (bfloat16_type_node
+				     ? TYPE_MODE (bfloat16_type_node)
+				     : VOIDmode);
   machine_mode float_type_mode = TYPE_MODE (float_type_node);
   machine_mode double_type_mode = TYPE_MODE (double_type_node);
 
@@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return long_double_type_node;
@@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return complex_float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return complex_double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return complex_long_double_type_node;
--- gcc/expmed.h.jj	2022-07-26 10:32:23.681271790 +0200
+++ gcc/expmed.h	2022-09-29 15:18:46.457023535 +0200
@@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
 				  rtx, tree, rtx, int);
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
+extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
+			       int);
 #ifdef GCC_OPTABS_H
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
--- gcc/expmed.cc.jj	2022-08-31 10:20:20.000000000 +0200
+++ gcc/expmed.cc	2022-09-29 15:17:52.224769673 +0200
@@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
 
 /* Likewise, but return 0 if that cannot be done.  */
 
-static rtx
+rtx
 maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 		    int amount, rtx target, int unsignedp)
 {
--- gcc/expr.cc.jj	2022-09-09 09:50:35.228575531 +0200
+++ gcc/expr.cc	2022-09-29 17:09:46.716352938 +0200
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
 
--- gcc/config/arm/arm.h.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.h	2022-09-29 12:40:17.401778971 +0200
@@ -78,9 +78,8 @@ extern void (*arm_lang_output_object_att
    the backend.  Defined in arm-builtins.cc.  */
 extern tree arm_fp16_type_node;
 
-/* This type is the user-visible __bf16.  We need it in a few places in
-   the backend.  Defined in arm-builtins.cc.  */
-extern tree arm_bf16_type_node;
+/* The user-visible __bf16 uses bfloat16_type_node, but for pointer to that
+   use backend specific tree.  Defined in arm-builtins.cc.  */
 extern tree arm_bf16_ptr_type_node;
 
 \f
--- gcc/config/arm/arm.cc.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.cc	2022-09-29 15:33:07.997170885 +0200
@@ -688,12 +688,6 @@ static const struct attribute_spec arm_a
 #undef TARGET_INVALID_CONVERSION
 #define TARGET_INVALID_CONVERSION arm_invalid_conversion
 
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
-
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
 
@@ -30360,7 +30354,7 @@ arm_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -33996,47 +33990,22 @@ arm_invalid_conversion (const_tree fromt
 {
   if (element_mode (fromtype) != element_mode (totype))
     {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
+      /* Do no allow conversions from BFmode to non-ieee HFmode
+	 scalar types or vice versa.  */
+      if (TYPE_MODE (fromtype) == BFmode
+	  && TYPE_MODE (totype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion from type %<bfloat16_t%> to %<__fp16%>");
+      if (TYPE_MODE (totype) == BFmode
+	  && TYPE_MODE (fromtype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion to type %<bfloat16_t%> from %<__fp16%>");
     }
 
   /* Conversion allowed.  */
   return NULL;
 }
 
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-arm_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
 
    In VFPv1, VFP registers could only be accessed in the mode they were
--- gcc/config/arm/arm-builtins.cc.jj	2022-09-29 09:13:25.681718954 +0200
+++ gcc/config/arm/arm-builtins.cc	2022-09-29 12:40:17.405778917 +0200
@@ -1370,7 +1370,6 @@ struct arm_simd_type_info arm_simd_types
 tree arm_fp16_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree arm_bf16_type_node = NULL_TREE;
 tree arm_bf16_ptr_type_node = NULL_TREE;
 
 static tree arm_simd_intOI_type_node = NULL_TREE;
@@ -1459,7 +1458,7 @@ arm_simd_builtin_std_type (machine_mode
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return arm_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1570,9 +1569,9 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 scalar type.  */
-  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
+  arm_simd_types[Bfloat16x2_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1658,7 +1657,7 @@ arm_init_simd_builtin_scalar_types (void
 					     "__builtin_neon_df");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_neon_ti");
-  (*lang_hooks.types.register_builtin_type) (arm_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
                                              "__builtin_neon_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1797,13 +1796,13 @@ arm_init_builtin (unsigned int fcode, ar
 static void
 arm_init_bf16_types (void)
 {
-  arm_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (arm_bf16_type_node) = 16;
-  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
-  layout_type (arm_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
-  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  arm_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Set up ACLE builtins, even builtins for instructions that are not
--- gcc/config/i386/i386.cc.jj	2022-09-29 12:03:12.073350093 +0200
+++ gcc/config/i386/i386.cc	2022-09-29 12:40:17.409778863 +0200
@@ -22728,7 +22728,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DFb16_";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22747,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24804,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
--- gcc/config/i386/i386-builtins.cc.jj	2022-09-29 09:13:25.710718554 +0200
+++ gcc/config/i386/i386-builtins.cc	2022-09-29 12:40:17.406778903 +0200
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,15 @@ ix86_register_float16_builtin_type (void
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
--- gcc/config/i386/i386-builtin-types.def.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/i386/i386-builtin-types.def	2022-09-29 12:40:17.406778903 +0200
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
--- gcc/config/aarch64/aarch64.h.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.h	2022-09-29 12:40:17.409778863 +0200
@@ -1337,9 +1337,8 @@ extern const char *aarch64_rewrite_mcpu
 extern GTY(()) tree aarch64_fp16_type_node;
 extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
-/* This type is the user-visible __bf16, and a pointer to that type.  Defined
-   in aarch64-builtins.cc.  */
-extern GTY(()) tree aarch64_bf16_type_node;
+/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
+   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
 extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
--- gcc/config/aarch64/aarch64.cc.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.cc	2022-09-29 12:40:17.413778808 +0200
@@ -19741,7 +19741,7 @@ aarch64_gimplify_va_arg_expr (tree valis
 	  field_ptr_t = aarch64_fp16_ptr_type_node;
 	  break;
 	case E_BFmode:
-	  field_t = aarch64_bf16_type_node;
+	  field_t = bfloat16_type_node;
 	  field_ptr_t = aarch64_bf16_ptr_type_node;
 	  break;
 	case E_V2SImode:
@@ -20645,7 +20645,7 @@ aarch64_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -26820,39 +26820,6 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Return the diagnostic message string if the binary operation OP is
    not permitted on TYPE1 and TYPE2, NULL otherwise.  */
 
@@ -26860,11 +26827,6 @@ static const char *
 aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
 			   const_tree type2)
 {
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
   if (VECTOR_TYPE_P (type1)
       && VECTOR_TYPE_P (type2)
       && !TYPE_INDIVISIBLE_P (type1)
@@ -27461,12 +27423,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
-
 #undef TARGET_INVALID_BINARY_OP
 #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
 
--- gcc/config/aarch64/aarch64-builtins.cc.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-builtins.cc	2022-09-29 12:40:17.410778849 +0200
@@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree aarch64_bf16_type_node = NULL_TREE;
 tree aarch64_bf16_ptr_type_node = NULL_TREE;
 
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
@@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return aarch64_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 type.  */
-  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
-  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
 					     "__builtin_aarch64_simd_poly128");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_aarch64_simd_ti");
-  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
 					     "__builtin_aarch64_simd_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1682,13 +1681,13 @@ aarch64_init_fp16_types (void)
 static void
 aarch64_init_bf16_types (void)
 {
-  aarch64_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
-  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
-  layout_type (aarch64_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
-  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Pointer authentication builtins that will become NOP on legacy platform.
--- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-sve-builtins.def	2022-09-29 12:40:17.413778808 +0200
@@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
 DEF_SVE_MODE (vnum, none, none, vectors)
 
 DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
-DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
+DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
 DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
 DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
 DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
--- gcc/c-family/c-cppbuiltin.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-cppbuiltin.cc	2022-09-29 12:40:17.416778768 +0200
@@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
       builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
 				      csuffix, FLOATN_NX_TYPE_NODE (i));
     }
+  if (bfloat16_type_node && c_dialect_cxx ())
+    {
+      if (cxx_dialect > cxx20)
+	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
+      builtin_define_float_constants ("BFLT16", "BF16", "%s",
+				      "BF16", bfloat16_type_node);
+    }
 
   /* For float.h.  */
   if (targetm.decimal_float_supported_p ())
--- gcc/c-family/c-lex.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-lex.cc	2022-09-29 12:40:17.416778768 +0200
@@ -995,6 +995,19 @@ interpret_float (const cpp_token *token,
 	  pedwarn (input_location, OPT_Wpedantic,
 		   "non-standard suffix on floating constant");
       }
+    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
+      {
+	type = bfloat16_type_node;
+	if (type == NULL_TREE)
+	  {
+	    error ("unsupported non-standard suffix on floating constant");
+	    return error_mark_node;
+	  }
+	if (cxx_dialect < cxx23)
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "%<bf16%> or %<BF16%> suffix on floating constant only "
+		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
+      }
     else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
       type = long_double_type_node;
     else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
--- gcc/cp/cp-tree.h.jj	2022-09-29 09:13:31.164643341 +0200
+++ gcc/cp/cp-tree.h	2022-09-29 12:40:17.414778795 +0200
@@ -8714,6 +8714,8 @@ extended_float_type_p (tree type)
   for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
     if (type == FLOATN_TYPE_NODE (i))
       return true;
+  if (type == bfloat16_type_node)
+    return true;
   return false;
 }
 
--- gcc/cp/typeck.cc.jj	2022-09-29 09:13:25.716718472 +0200
+++ gcc/cp/typeck.cc	2022-09-29 12:40:17.415778781 +0200
@@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
       if (mv2 == FLOATN_NX_TYPE_NODE (i))
 	extended2 = i + 1;
     }
+  if (mv1 == bfloat16_type_node)
+    extended1 = true;
+  if (mv2 == bfloat16_type_node)
+    extended2 = true;
   if (extended2 && !extended1)
     {
       int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
@@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
   if (cnt > 1 && mv2 == long_double_type_node)
     return -2;
   /* Otherwise, they have equal rank, but extended types
-     (other than std::bfloat16_t) have higher subrank.  */
+     (other than std::bfloat16_t) have higher subrank.
+     std::bfloat16_t shouldn't have equal rank to any standard
+     floating point type.  */
   return 1;
 }
 
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-08-19 23:26:22.656373568 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-09-30 01:37:41.092709166 +0200
@@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
   __m256bf16 vector2_1 = {};
   __m256bf16 vector2_2 = { glob_bfloat };
   __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256bf16 vector2_4 = { 0 };
+  __m256bf16 vector2_5 = { 0.1 };
+  __m256bf16 vector2_6 = { is_a_float16 };
+  __m256bf16 vector2_7 = { is_a_float };
+  __m256bf16 vector2_8 = { is_an_int };
+  __m256bf16 vector2_9 = { is_a_short_int };
+  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m256h initi_2_3 = { glob_bfloat };
+  __m256i initi_2_5 = { glob_bfloat };
+  __v16hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m256bf16) {};
 
-  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m256bf16) { 0 };
+  (__m256bf16) { 0.1 };
   (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
@@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-08-27 23:01:28.323565905 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-09-30 01:22:54.886600717 +0200
@@ -12,8 +12,8 @@ double is_a_double;
 
 float *float_ptr;
 
-__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
-__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
+__bf16 foo1 (void) { return (__bf16) 0x1234; }
+__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
 
 __bf16 footest (__bf16 scalar0)
 {
@@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
 
   __bf16 scalar1_1;
   __bf16 scalar1_2 = glob_bfloat;
-  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar1_3 = 0;
+  __bf16 scalar1_4 = 0.1;
+  __bf16 scalar1_5 = is_a_float;
+  __bf16 scalar1_6 = is_an_int;
+  __bf16 scalar1_7 = is_a_float16;
+  __bf16 scalar1_8 = is_a_double;
+  __bf16 scalar1_9 = is_a_short_int;
+
+  int initi_1_1 = glob_bfloat;
+  float initi_1_2 = glob_bfloat;
+  _Float16 initi_1_3 = glob_bfloat;
+  short initi_1_4 = glob_bfloat;
+  double initi_1_5 = glob_bfloat;
 
   __bf16 scalar2_1 = {};
   __bf16 scalar2_2 = { glob_bfloat };
-  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar2_3 = { 0 };
+  __bf16 scalar2_4 = { 0.1 };
+  __bf16 scalar2_5 = { is_a_float };
+  __bf16 scalar2_6 = { is_an_int };
+  __bf16 scalar2_7 = { is_a_float16 };
+  __bf16 scalar2_8 = { is_a_double };
+  __bf16 scalar2_9 = { is_a_short_int };
+
+  int initi_2_1 = { glob_bfloat };
+  float initi_2_2 = { glob_bfloat };
+  _Float16 initi_2_3 = { glob_bfloat };
+  short initi_2_4 = { glob_bfloat };
+  double initi_2_5 = { glob_bfloat };
 
   /* Assignments.  */
 
   glob_bfloat = glob_bfloat;
-  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  glob_bfloat = 0;
+  glob_bfloat = 0.1;
+  glob_bfloat = is_a_float;
+  glob_bfloat = is_an_int;
+  glob_bfloat = is_a_float16;
+  glob_bfloat = is_a_double;
+  glob_bfloat = is_a_short_int;
+
+  is_an_int = glob_bfloat;
+  is_a_float = glob_bfloat;
+  is_a_float16 = glob_bfloat;
+  is_a_double = glob_bfloat;
+  is_a_short_int = glob_bfloat;
 
   /* Casting.  */
 
   (void) glob_bfloat;
   (__bf16) glob_bfloat;
 
-  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-
-  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (int) glob_bfloat;
+  (float) glob_bfloat;
+  (_Float16) glob_bfloat;
+  (double) glob_bfloat;
+  (short) glob_bfloat;
+
+  (__bf16) is_an_int;
+  (__bf16) is_a_float;
+  (__bf16) is_a_float16;
+  (__bf16) is_a_double;
+  (__bf16) is_a_short_int;
 
   /* Compound literals.  */
 
   (__bf16) {};
   (__bf16) { glob_bfloat };
-  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  (__bf16) { 0 };
+  (__bf16) { 0.1 };
+  (__bf16) { is_a_float };
+  (__bf16) { is_an_int };
+  (__bf16) { is_a_float16 };
+  (__bf16) { is_a_double };
+  (__bf16) { is_a_short_int };
+
+  (int) { glob_bfloat };
+  (float) { glob_bfloat };
+  (_Float16) { glob_bfloat };
+  (double) { glob_bfloat };
+  (short) { glob_bfloat };
 
   /* Arrays and Structs.  */
 
@@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 > glob_bfloat;
+  glob_bfloat == scalar0;
+  scalar0 > is_a_float;
+  is_a_float == scalar0;
+  scalar0 > 0;
+  0 == scalar0;
+  scalar0 > 0.1;
+  0.1 == scalar0;
+  scalar0 > is_an_int;
+  is_an_int == scalar0;
 
   /* Pointer comparison.  */
 
@@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
   /* Conditional expressions.  */
 
   0 ? scalar0 : scalar0;
-  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
+  0 ? scalar0 : is_a_float;
+  0 ? is_a_float : scalar0;
+  0 ? scalar0 : 0;
+  0 ? 0 : scalar0;
+  0 ? 0.1 : scalar0;
+  0 ? scalar0 : 0.1;
   0 ? bfloat_ptr : bfloat_ptr2;
   0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
   0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
 
-  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 ? scalar0 : scalar0;
+  scalar0 ? is_a_float : scalar0;
+  scalar0 ? scalar0 : is_a_float;
+  scalar0 ? is_a_float : is_a_float;
 
   /* Unary operators.  */
 
-  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +scalar0;
+  -scalar0;
+  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
+  !scalar0;
   *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real scalar0;
+  __imag scalar0;
+  ++scalar0;
+  --scalar0;
+  scalar0++;
+  scalar0--;
 
   /* Binary arithmetic operations.  */
 
-  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 = glob_bfloat + *bfloat_ptr;
+  scalar0 = glob_bfloat + 0.1;
+  scalar0 = glob_bfloat + 0;
+  scalar0 = glob_bfloat + is_a_float;
 
   return scalar0;
 }
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-08-19 23:26:22.656373568 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-09-30 01:37:34.270800614 +0200
@@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
   __m128bf16 vector2_1 = {};
   __m128bf16 vector2_2 = { glob_bfloat };
   __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m128bf16 vector2_4 = { 0 };
+  __m128bf16 vector2_5 = { 0.1 };
+  __m128bf16 vector2_6 = { is_a_float16 };
+  __m128bf16 vector2_7 = { is_a_float };
+  __m128bf16 vector2_8 = { is_an_int };
+  __m128bf16 vector2_9 = { is_a_short_int };
+  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m128h initi_2_3 = { glob_bfloat };
+  __m128 initi_2_4 = { glob_bfloat };
+  __v4si initi_2_5 = { glob_bfloat };
+  __v4hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m128bf16) {};
 
-  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m128bf16) { 0 };
+  (__m128bf16) { 0.1 };
   (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
@@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-08-16 23:06:21.395321891 +0200
+++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-09-30 01:39:06.137569307 +0200
@@ -5,6 +5,6 @@ void foo (void)
 {
   __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
   __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
-  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
+  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
+  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
 }
--- libcpp/include/cpplib.h.jj	2022-09-08 13:01:19.853771383 +0200
+++ libcpp/include/cpplib.h	2022-09-28 19:06:59.615380690 +0200
@@ -1275,6 +1275,7 @@ struct cpp_num
 #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
 
 #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
+#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
 
 #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
 					      of N, divided by 16.  */
--- libcpp/expr.cc.jj	2022-09-27 08:03:27.119982735 +0200
+++ libcpp/expr.cc	2022-09-28 17:55:36.667177540 +0200
@@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
   size_t orig_len = len;
   const uchar *orig_s = s;
   size_t flags;
-  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
+  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
 
   flags = 0;
-  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
+  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
 
   /* The following decimal float suffixes, from TR 24732:2009, TS
      18661-2:2015 and C2X, are supported:
@@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
      w, W - machine-specific type such as __float80 (GNU extension).
      q, Q - machine-specific type such as __float128 (GNU extension).
      fN, FN - _FloatN (TS 18661-3:2015).
-     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
+     fNx, FNx - _FloatNx (TS 18661-3:2015).
+     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
 
   /* Process decimal float suffixes, which are two letters starting
      with d or D.  Order and case are significant.  */
@@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
 		fn++;
 	    }
 	  break;
+	case 'b': case 'B':
+	  if (len > 2
+	      /* Except for bf16 / BF16 where case is significant.  */
+	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
+	      && s[2] == '1'
+	      && s[3] == '6'
+	      && CPP_OPTION (pfile, cplusplus))
+	    {
+	      bf16++;
+	      len -= 3;
+	      s += 3;
+	      break;
+	    }
+	  return 0;
 	case 'd': case 'D': d++; break;
 	case 'l': case 'L': l++; break;
 	case 'w': case 'W': w++; break;
@@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
      of N larger than can be represented in the return value.  The
      caller is responsible for rejecting _FloatN suffixes where
      _FloatN is not supported on the chosen target.  */
-  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
+  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
     return 0;
   if (fn_bits > CPP_FLOATN_MAX)
     return 0;
@@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
 	     q ? CPP_N_MD_Q :
 	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
 	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
+	     bf16 ? CPP_N_BFLOAT16 :
 	     CPP_N_DEFAULT));
 }
 
--- libgcc/config/arm/sfp-machine.h.jj	2020-01-12 11:54:38.615380187 +0100
+++ libgcc/config/arm/sfp-machine.h	2022-09-28 19:02:51.922710542 +0200
@@ -22,6 +22,7 @@ typedef int __gcc_CMPtype __attribute__
 /* According to RTABI, QNAN is only with the most significant bit of the
    significand set, and all other significand bits zero.  */
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 #define _FP_NANFRAC_Q		_FP_QNANBIT_Q, 0, 0, 0
--- libgcc/config/aarch64/t-softfp.jj	2020-09-29 11:32:02.988602194 +0200
+++ libgcc/config/aarch64/t-softfp	2022-09-28 18:59:43.381246466 +0200
@@ -1,7 +1,7 @@
 softfp_float_modes := tf
 softfp_int_modes := si di ti
-softfp_extensions := sftf dftf hftf
-softfp_truncations := tfsf tfdf tfhf
+softfp_extensions := sftf dftf hftf bfsf
+softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
 softfp_extras := fixhfti fixunshfti floattihf floatuntihf
 
--- libgcc/config/aarch64/libgcc-softfp.ver.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/libgcc-softfp.ver	2022-09-28 19:00:36.050537146 +0200
@@ -26,3 +26,12 @@ GCC_11.0 {
   __mulhc3
   __trunctfhf2
 }
+
+%inherit GCC_13.0.0 GCC_11.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __trunchfbf2
+}
--- libgcc/config/aarch64/sfp-machine.h.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/sfp-machine.h	2022-09-28 19:02:10.303270053 +0200
@@ -43,6 +43,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
+#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
 #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
 #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
 #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
--- libgcc/config/i386/t-softfp.jj	2022-09-23 09:02:31.759659479 +0200
+++ libgcc/config/i386/t-softfp	2022-09-29 22:42:26.554156219 +0200
@@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
 LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
 
-softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
-softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
+		      tfbf xfbf dfbf sfbf hfbf
 
 softfp_extras += eqhf2
 
@@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
 CFLAGS-extendhfdf2.c += -msse2
 CFLAGS-extendhftf2.c += -msse2
 CFLAGS-extendhfxf2.c += -msse2
+CFLAGS-extendbfsf2.c += -msse2
 
 CFLAGS-truncsfhf2.c += -msse2
 CFLAGS-truncdfhf2.c += -msse2
 CFLAGS-truncxfhf2.c += -msse2
 CFLAGS-trunctfhf2.c += -msse2
+CFLAGS-truncsfbf2.c += -msse2
+CFLAGS-truncdfbf2.c += -msse2
+CFLAGS-truncxfbf2.c += -msse2
+CFLAGS-trunctfbf2.c += -msse2
+CFLAGS-trunchfbf2.c += -msse2
 
 CFLAGS-eqhf2.c += -msse2
 CFLAGS-_divhc3.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-23 09:02:31.746659658 +0200
+++ libgcc/config/i386/libgcc-glibc.ver	2022-09-28 18:58:09.114520943 +0200
@@ -214,3 +214,13 @@ GCC_12.0.0 {
   __trunctfhf2
   __truncxfhf2
 }
+
+%inherit GCC_13.0.0 GCC_12.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __truncxfbf2
+  __trunchfbf2
+}
--- libgcc/config/i386/sfp-machine.h.jj	2022-09-23 09:02:31.747659644 +0200
+++ libgcc/config/i386/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_QNANNEGATEDP 0
 
 #define _FP_NANSIGN_H		1
+#define _FP_NANSIGN_B		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
--- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-23 09:02:31.700660291 +0200
+++ libgcc/config/i386/64/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
--- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-23 09:02:31.683660526 +0200
+++ libgcc/config/i386/32/sfp-machine.h	2022-09-28 18:58:09.115520929 +0200
@@ -87,6 +87,7 @@
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
--- libgcc/soft-fp/brain.h.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/brain.h	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef SOFT_FP_BRAIN_H
+#define SOFT_FP_BRAIN_H	1
+
+#if _FP_W_TYPE_SIZE < 32
+# error "Here's a nickel kid.  Go buy yourself a real computer."
+#endif
+
+#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACBITS_B		8
+#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
+#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
+#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
+#define _FP_EXPBITS_B		8
+#define _FP_EXPBIAS_B		127
+#define _FP_EXPMAX_B		255
+
+#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
+#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
+#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
+#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
+#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
+
+#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
+#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
+#define _FP_HIGHBIT_DW_B	\
+  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
+
+/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
+   chosen by the target machine.  */
+
+typedef float BFtype __attribute__ ((mode (BF)));
+
+union _FP_UNION_B
+{
+  BFtype flt;
+  struct _FP_STRUCT_LAYOUT
+  {
+#if __BYTE_ORDER == __BIG_ENDIAN
+    unsigned sign : 1;
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+#else
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned sign : 1;
+#endif
+  } bits;
+};
+
+#define FP_DECL_B(X)		_FP_DECL (1, X)
+#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
+#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
+#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
+#define FP_PACK_RAW_BP(val, X)			\
+  do						\
+    {						\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_B(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_BP(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_B(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_BP(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_B(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_BP(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_B(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_BP(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
+#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
+  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
+#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
+
+/* BFmode arithmetic is not implemented.  */
+
+#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
+
+#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
+
+#endif /* !SOFT_FP_BRAIN_H */
--- libgcc/soft-fp/truncsfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncsfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+BFtype
+__truncsfbf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (B, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncdfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/truncdfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "double.h"
+
+BFtype
+__truncdfbf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (B, D, 1, 2, R, A);
+#else
+  FP_TRUNC (B, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncxfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncxfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "extended.h"
+
+BFtype
+__truncxfbf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, E, 1, 4, R, A);
+#else
+  FP_TRUNC (B, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunctfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunctfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE quad into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "quad.h"
+
+BFtype
+__trunctfbf2 (TFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_Q (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_Q (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, Q, 1, 4, R, A);
+#else
+  FP_TRUNC (B, Q, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunchfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunchfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,58 @@
+/* Software floating-point emulation.
+   Truncate IEEE half into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "half.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert HFtype to SFtype (lossless) and then
+   truncate to BFtype.  */
+
+BFtype
+__trunchfbf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, B, A);
+  FP_PACK_RAW_S (b, B);
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (B, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncbfhf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncbfhf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,75 @@
+/* Software floating-point emulation.
+   Truncate bfloat16 into IEEE half.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "brain.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert BFtype to SFtype (lossless) and then
+   truncate to HFtype.  */
+
+HFtype
+__truncbfhf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  /* Optimize BFtype to SFtype conversion to simple left shift
+     by 16 if possible, we don't need to raise exceptions on sNaN
+     here as the SFtype to HFtype truncation should do that too.  */
+  if (sizeof (BFtype) == 2
+      && sizeof (unsigned short) == 2
+      && sizeof (SFtype) == 4
+      && sizeof (unsigned int) == 4)
+    {
+      union { BFtype a; unsigned short b; } u1;
+      union { SFtype a; unsigned int b; } u2;
+      u1.a = a;
+      u2.b = (u1.b << 8) << 8;
+      b = u2.a;
+    }
+  else
+    {
+      FP_UNPACK_RAW_B (A, a);
+      FP_EXTEND (S, B, 1, 1, B, A);
+      FP_PACK_RAW_S (b, B);
+    }
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (H, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/extendbfsf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/extendbfsf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an bfloat16 converted to IEEE single
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+SFtype
+__extendbfsf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_B (A, a);
+  FP_EXTEND (S, B, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libiberty/cp-demangle.h.jj	2022-09-27 08:03:27.142982423 +0200
+++ libiberty/cp-demangle.h	2022-09-29 12:42:47.291727886 +0200
@@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (35)
+#define D_BUILTIN_TYPE_COUNT (36)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
--- libiberty/cp-demangle.c.jj	2022-09-27 08:03:27.141982437 +0200
+++ libiberty/cp-demangle.c	2022-09-29 13:04:57.083526204 +0200
@@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
   /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
   /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
+  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
 };
 
 CP_STATIC_IF_GLIBCPP_V3
@@ -2753,8 +2754,20 @@ cplus_demangle_type (struct d_info *di)
 
 	case 'F':
 	  /* DF<number>_ - _Float<number>.
-	     DF<number>x - _Float<number>x.  */
+	     DF<number>x - _Float<number>x
+	     DFb16_ - std::bfloat16_t.  */
 	  {
+	    if (d_peek_char (di) == 'b')
+	      {
+		d_advance (di, 1);
+		if (d_number (di) != 16 || d_peek_char (di) != '_')
+		  return NULL;
+		d_advance (di, 1);
+		ret = d_make_builtin_type (di,
+					   &cplus_demangle_builtin_types[35]);
+		di->expansion += ret->u.s_builtin.type->len;
+		break;
+	      }
 	    int arg = d_number (di);
 	    char buf[12];
 	    char suffix = 0;
--- libiberty/testsuite/demangle-expected.jj	2022-09-27 08:03:27.168982071 +0200
+++ libiberty/testsuite/demangle-expected	2022-09-29 12:49:02.181597532 +0200
@@ -1249,6 +1249,10 @@ xxx
 _Z3xxxDF32xDF64xDF128xCDF32xVb
 xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
 xxx
+--format=auto --no-params
+_Z3xxxDFb16_
+xxx(std::bfloat16_t)
+xxx
 # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
 --format=auto --no-params
 _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z


	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-30 14:08   ` Jakub Jelinek
@ 2022-09-30 18:21     ` Joseph Myers
  2022-09-30 18:38       ` Jakub Jelinek
  2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
  1 sibling, 1 reply; 22+ messages in thread
From: Joseph Myers @ 2022-09-30 18:21 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Jonathan Wakely, Richard Earnshaw,
	richard.sandiford, gcc-patches

On Fri, 30 Sep 2022, Jakub Jelinek via Gcc-patches wrote:

> What isn't in the patch but I think we'll need to also change are some
> minimal set of __builtin_*bf16 builtins.  Seems for _Float16, GCC provides
> all the __builtin_*f16 (and for C/ObjC even with *f16 names), but there is
> no glibc support for any of that, so builtins that are expanded by the
> compiler are fine, but what should be fall back to libm won't work.
> Maybe at least for now it is acceptable to implement most <cmath> and
> <complex> std::float16_t and std::bfloat16_t overloads with using
> __builtin_*f and explicitly narrow down, but I think at least nextafter
> (and apparently nexttoward as an alias for it for extended floating
> point types) needs to be specific for the particular floating point format.

C doesn't have nexttoward (the function whose second argument is always 
long double, independent of the type of the first argument) for any of the 
_FloatN or _FloatNx types; why does C++ have in (but with second argument 
the same type as the first?) for those types?

(C2x also doesn't have fmax or fmin for the _FloatN or _FloatNx types, but 
that's because the operations those are bound to are replaced in IEEE 
754-2019 by different max/min operations with different corresponding 
functions; glibc still provides the fmax and fmin operations for _FloatN 
and _FloatNx, as in TS 18661-3, as extensions.)

Providing a full set of _Float16 operations in glibc would be reasonable 
(modulo the issues with implications for the minimum GCC version for 
building glibc for a given architecture).  It's rather less clear that 
__bf16 operations would be appropriate.

There may also be double rounding considerations for implementing some 
functions (that are required to be correctly rounded) via operations on 
float; at least for fma (if the result of fmaf is half way between two 
representable values of the narrower type, but not exactly equal to the 
value of a*b+c).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-30 18:21     ` Joseph Myers
@ 2022-09-30 18:38       ` Jakub Jelinek
  2022-09-30 19:27         ` Jonathan Wakely
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-09-30 18:38 UTC (permalink / raw)
  To: Jonathan Wakely, Joseph Myers
  Cc: Jason Merrill, Richard Earnshaw, richard.sandiford, gcc-patches

On Fri, Sep 30, 2022 at 06:21:04PM +0000, Joseph Myers wrote:
> On Fri, 30 Sep 2022, Jakub Jelinek via Gcc-patches wrote:
> 
> > What isn't in the patch but I think we'll need to also change are some
> > minimal set of __builtin_*bf16 builtins.  Seems for _Float16, GCC provides
> > all the __builtin_*f16 (and for C/ObjC even with *f16 names), but there is
> > no glibc support for any of that, so builtins that are expanded by the
> > compiler are fine, but what should be fall back to libm won't work.
> > Maybe at least for now it is acceptable to implement most <cmath> and
> > <complex> std::float16_t and std::bfloat16_t overloads with using
> > __builtin_*f and explicitly narrow down, but I think at least nextafter
> > (and apparently nexttoward as an alias for it for extended floating
> > point types) needs to be specific for the particular floating point format.
> 
> C doesn't have nexttoward (the function whose second argument is always 
> long double, independent of the type of the first argument) for any of the 
> _FloatN or _FloatNx types; why does C++ have in (but with second argument 
> the same type as the first?) for those types?

No idea.
https://eel.is/c++draft/cmath.syn
has (since that std::float*_t/std::bfloat16_t paper)
constexpr floating-point-type nextafter(floating-point-type x, floating-point-type y);
constexpr float nextafterf(float x, float y);
constexpr long double nextafterl(long double x, long double y);

constexpr floating-point-type nexttoward(floating-point-type x, floating-point-type y);
constexpr float nexttowardf(float x, long double y);
constexpr long double nexttowardl(long double x, long double y);

That is certainly wrong for double, because it is a backwards incompatible
change, where
constexpr double nexttoward(double x, long double y);
is gone and
constexpr double nexttoward(double x, double y);
is added.
IMHO the nexttoward case just shouldn't be changed, so it should remain:
constexpr float nexttoward(float x, long double y);
constexpr double nexttoward(double x, long double y);
constexpr long double nexttoward(long double x, long double y);
constexpr float nexttowardf(float x, long double y);
constexpr long double nexttowardl(long double x, long double y);
Having
constexpr floating-point-type nexttoward(floating-point-type x, long double y);
would be problematic, because long double might have unordered floating
point rank to floating-point-type (say powerpc IBM extended long double
vs. std::float128_t), or could have smaller floating point rank (say if
it is IEEE double).
But like nexttowardl or nexttoward(long double, long double) aren't very
useful because they are the same thing as nextafterl or
nextafter(long double, long double), introducing other nexttoward overloads
doesn't seem useful.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-30 18:38       ` Jakub Jelinek
@ 2022-09-30 19:27         ` Jonathan Wakely
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Wakely @ 2022-09-30 19:27 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Joseph Myers, Jason Merrill, Richard Earnshaw, richard.sandiford,
	gcc-patches

On Fri, 30 Sept 2022 at 19:38, Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Sep 30, 2022 at 06:21:04PM +0000, Joseph Myers wrote:
> > On Fri, 30 Sep 2022, Jakub Jelinek via Gcc-patches wrote:
> >
> > > What isn't in the patch but I think we'll need to also change are some
> > > minimal set of __builtin_*bf16 builtins.  Seems for _Float16, GCC provides
> > > all the __builtin_*f16 (and for C/ObjC even with *f16 names), but there is
> > > no glibc support for any of that, so builtins that are expanded by the
> > > compiler are fine, but what should be fall back to libm won't work.
> > > Maybe at least for now it is acceptable to implement most <cmath> and
> > > <complex> std::float16_t and std::bfloat16_t overloads with using
> > > __builtin_*f and explicitly narrow down, but I think at least nextafter
> > > (and apparently nexttoward as an alias for it for extended floating
> > > point types) needs to be specific for the particular floating point format.
> >
> > C doesn't have nexttoward (the function whose second argument is always
> > long double, independent of the type of the first argument) for any of the
> > _FloatN or _FloatNx types; why does C++ have in (but with second argument
> > the same type as the first?) for those types?
>
> No idea.
> https://eel.is/c++draft/cmath.syn
> has (since that std::float*_t/std::bfloat16_t paper)
> constexpr floating-point-type nextafter(floating-point-type x, floating-point-type y);
> constexpr float nextafterf(float x, float y);
> constexpr long double nextafterl(long double x, long double y);
>
> constexpr floating-point-type nexttoward(floating-point-type x, floating-point-type y);
> constexpr float nexttowardf(float x, long double y);
> constexpr long double nexttowardl(long double x, long double y);
>
> That is certainly wrong for double, because it is a backwards incompatible
> change, where
> constexpr double nexttoward(double x, long double y);
> is gone and
> constexpr double nexttoward(double x, double y);
> is added.
> IMHO the nexttoward case just shouldn't be changed, so it should remain:
> constexpr float nexttoward(float x, long double y);
> constexpr double nexttoward(double x, long double y);
> constexpr long double nexttoward(long double x, long double y);
> constexpr float nexttowardf(float x, long double y);
> constexpr long double nexttowardl(long double x, long double y);

Right, this is a defect. The author of P1467 reported that as an
accidental change (which we all missed). It should be in the LWG
issues list by Monday, and (I hope) will be voted Tentatively Ready
immediately.

> Having
> constexpr floating-point-type nexttoward(floating-point-type x, long double y);
> would be problematic, because long double might have unordered floating
> point rank to floating-point-type (say powerpc IBM extended long double
> vs. std::float128_t), or could have smaller floating point rank (say if
> it is IEEE double).
> But like nexttowardl or nexttoward(long double, long double) aren't very
> useful because they are the same thing as nextafterl or
> nextafter(long double, long double), introducing other nexttoward overloads
> doesn't seem useful.
>
>         Jakub
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-09-30 14:08   ` Jakub Jelinek
  2022-09-30 18:21     ` Joseph Myers
@ 2022-10-04  9:06     ` Jakub Jelinek
  2022-10-04 15:54       ` Joseph Myers
  2022-10-04 21:50       ` Jason Merrill
  1 sibling, 2 replies; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-04  9:06 UTC (permalink / raw)
  To: Jason Merrill, Joseph S. Myers, Richard Biener, Jeff Law, Uros Bizjak
  Cc: gcc-patches

On Fri, Sep 30, 2022 at 04:08:10PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Sep 30, 2022 at 09:49:08AM -0400, Jason Merrill wrote:
> > The comment from Apple on the ABI mangling proposal suggests to me that we
> > might want to delay enabling C++ std::bfloat16_t (i.e. defining
> > __STDCPP_BFLOAT16_T__) until we have that excess precision support?
> 
> I saw that comment.  We have similar problem with _Float16 too, where C++
> effectively right now works as when one uses -fexcess-precision=16 in C
> (which isn't default).
> I can see how hard would it be to add EXCESS_PRECISION_EXPR support to C++
> FE.

I've started on that but it will take some time.  That said, it should
work though less efficiently even without that, even in C users can always
select request such behavior with -fexcess-precision=16.

> > If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?
> 
> Perhaps, I just followed what was in the pull request.  Can change it.

Changed now, added support for the builtins and ported most of the
float16 tests, so that it gets at least some test coverage.
Also, for now I've left the aarch64 and arm changes out of the patch,
because I haven't tested it on aarch64 yet and arm support was incomplete
and I haven't heard from the ARM maintainers yet what they want or don't
want.

The added testcases showed a few problems.  One is that i?86 maintains
2 kinds of fp comparisons, trivial and non-trivial, the trivial which can
be handled by just a single conditional jump or setCC are handled directly,
while the complex ones which need two are not handled and the generic
code then figures it out using the trivial ones.  Unfortunately this means
that for == and != we end up with libcalls for it.  For _Float16, we have
added __nehf2 and __eqhf2 entrypoints last year.  I wanted to avoid doing
the same for __bf16, so I've added cbranchbf4 and cstorebf4 expanders
that handle all fp comparisons and internally just shift the operands up
to construct SFmode without even handling sNaNs and then call the generic
code to handle SFmode comparisons.

Another problem is for HFmode comparisons, when we see we don't support
directly some HFmode comparison, we iterate on wider scalar float modes
and look for usable comparisons, but BFmode and HFmode are unordered and
one of them has to appear as wider but neither is a subset nor superset,
so I had to skip wider modes which have equal precision to the starting one.
Yet another problem is because I've only enabled the bf16/BF16 suffixes in
C++ because for C it might clash with some later extension.  Am I right to
fear about that, or do you think C will never standardize suffixes that
would clash with that because C++ standardized the bf16/BF16 suffixes for
something already?  If I could enable it, I'd always pedwarn for C for those
and could enable the __BF16_*__ macros.  Right now I had to disable some
-fbuilding-libgcc macros because of that (though nothing really uses them
right now).

Another question is the suffixes of the builtins.  For now I have added
bf16 suffix and enabled the builtins with !both_p, so one always needs to
use __builtin_* form for them.  None of the GCC builtins end with b,
so this isn't ambiguous with __builtin_*f16, but some libm functions do end
with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
log?  Shall the builtins use f16b suffixes instead like the mangling does?

Full patch bootstrapped/regtested on x86_64-linux and i686-linux.

2022-10-04  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	(CASE_FLT_FN_FLOATN_NX): Also include BUILT_IN_*BF16.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	(build_common_tree_nodes): Initialize bfloat16_type_node if
	BFmode is supported.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	(emit_store_flag_1): Don't consider [BH]Fmode as wider mode to
	narrower modes.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16,
	BT_FN_BFLOAT16_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING,
	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): New.
	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS,
	DEF_EXT_LIB_FLOATN_NX_BUILTINS): Also add *bf16 suffixed builtins,
	but for these only __builtin_ prefixed functions.
	* optabs.cc (can_compare_p, prepare_cmp_insn): Don't consider
	[BH]Fmode as wider mode to narrower modes.
	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
	for -msse2.
	(ix86_mangle_type): Mangle BFmode as DF16b.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node, only create it if still NULL.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine for C++ __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
gcc/c/
	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
	double.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_bfloat16,
	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
	New.
	* gcc.dg/torture/bfloat16-basic.c: New test.
	* gcc.dg/torture/bfloat16-builtin.c: New test.
	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
	* gcc.dg/torture/bfloat16-complex.c: New test.
	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
	from bfloat16-builtin-issignaling-1.c.
	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
	bfloat16-basic.c.
	* gcc.dg/torture/floatn-builtin.h: Allow to be includable from
	bfloat16-builtin.c.
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
	diagnostics.
	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
	-msse2.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DF16b.
	* testsuite/demangle-expected (_Z3xxxDF16b): New test.

--- gcc/tree-core.h.jj	2022-10-01 21:44:52.521002702 +0200
+++ gcc/tree-core.h	2022-10-03 22:46:34.218787107 +0200
@@ -665,6 +665,9 @@ enum tree_index {
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
 
+  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
+  TI_BFLOAT16_TYPE,
+
   /* The _FloatN and _FloatNx types must be consecutive, and in the
      same sequence as the corresponding complex types, which must also
      be consecutive; _FloatN must come before _FloatNx; the order must
--- gcc/tree.h.jj	2022-10-01 21:44:52.525002648 +0200
+++ gcc/tree.h	2022-10-03 22:46:34.220787080 +0200
@@ -279,7 +279,7 @@ code_helper::is_builtin_fn () const
 #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L
 #define CASE_FLT_FN_FLOATN_NX(FN)			   \
   case FN##F16: case FN##F32: case FN##F64: case FN##F128: \
-  case FN##F32X: case FN##F64X: case FN##F128X
+  case FN##F32X: case FN##F64X: case FN##F128X: case FN##BF16
 #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R
 #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX
 
@@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
 #define float_type_node			global_trees[TI_FLOAT_TYPE]
 #define double_type_node		global_trees[TI_DOUBLE_TYPE]
 #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
+#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
 
 /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
 #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
--- gcc/tree.cc.jj	2022-10-01 21:44:52.524002662 +0200
+++ gcc/tree.cc	2022-10-03 22:46:34.223787040 +0200
@@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
        : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
-	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
@@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
   machine_mode float16_type_mode = (float16_type_node
 				    ? TYPE_MODE (float16_type_node)
 				    : VOIDmode);
+  machine_mode bfloat16_type_mode = (bfloat16_type_node
+				     ? TYPE_MODE (bfloat16_type_node)
+				     : VOIDmode);
   machine_mode float_type_mode = TYPE_MODE (float_type_node);
   machine_mode double_type_mode = TYPE_MODE (double_type_node);
 
@@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return long_double_type_node;
@@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return complex_float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return complex_double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return complex_long_double_type_node;
@@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
       SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
     }
   float128t_type_node = float128_type_node;
+#ifdef HAVE_BFmode
+  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
+      && targetm.scalar_mode_supported_p (BFmode)
+      && targetm.libgcc_floating_mode_supported_p (BFmode))
+    {
+      bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
+      layout_type (bfloat16_type_node);
+      SET_TYPE_MODE (bfloat16_type_node, BFmode);
+    }
+#endif
 
   float_ptr_type_node = build_pointer_type (float_type_node);
   double_ptr_type_node = build_pointer_type (double_type_node);
--- gcc/expmed.h.jj	2022-10-01 21:44:52.503002947 +0200
+++ gcc/expmed.h	2022-10-03 22:46:34.223787040 +0200
@@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
 				  rtx, tree, rtx, int);
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
+extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
+			       int);
 #ifdef GCC_OPTABS_H
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
--- gcc/expmed.cc.jj	2022-10-01 21:44:52.501002974 +0200
+++ gcc/expmed.cc	2022-10-03 22:59:19.176483448 +0200
@@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
 
 /* Likewise, but return 0 if that cannot be done.  */
 
-static rtx
+rtx
 maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 		    int amount, rtx target, int unsignedp)
 {
@@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
     {
      machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
      icode = optab_handler (cstore_optab, optab_mode);
-     if (icode != CODE_FOR_nothing)
+     if (icode != CODE_FOR_nothing
+	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
+	    a subset or superset of the other.  */
+	 && (compare_mode == mode
+	     || !SCALAR_FLOAT_MODE_P (compare_mode)
+	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
+			  GET_MODE_PRECISION (mode))))
 	{
 	  do_pending_stack_adjust ();
 	  rtx tem = emit_cstore (target, icode, code, mode, compare_mode,
--- gcc/expr.cc.jj	2022-10-01 21:44:52.506002906 +0200
+++ gcc/expr.cc	2022-10-03 22:46:34.226787000 +0200
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
 
--- gcc/builtin-types.def.jj	2022-01-11 22:31:40.590769786 +0100
+++ gcc/builtin-types.def	2022-10-03 22:46:34.227786987 +0200
@@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
+DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
+				  ? bfloat16_type_node
+				  : error_mark_node))
 DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
 				 ? float16_type_node
 				 : error_mark_node))
@@ -187,6 +190,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DO
    distinguish it from two types in sequence, "long" followed by
    "double".  */
 DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16)
 DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16)
 DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32)
 DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64)
@@ -206,6 +210,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE,
 		     BT_LONGDOUBLE, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16)
+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT128_FLOAT128, BT_FLOAT128, BT_FLOAT128)
@@ -264,6 +269,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
 DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
 		     BT_LONGDOUBLE, BT_CONST_STRING)
+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
@@ -401,6 +407,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE
 		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
 DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
 		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
+		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
 DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16,
 		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
 DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32,
@@ -554,6 +562,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE
 		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
 DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
 		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16,
+		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
 DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16,
 		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
 DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32,
--- gcc/builtins.def.jj	2022-09-29 22:16:46.928044191 +0200
+++ gcc/builtins.def	2022-10-03 22:46:34.227786987 +0200
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.
    value for the type.  */
 #undef DEF_GCC_FLOATN_NX_BUILTINS
 #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
+  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
   DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
   DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
   DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
@@ -123,6 +124,7 @@ along with GCC; see the file COPYING3.
 	       false, true)
 #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS
 #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
+  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
   DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
   DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
   DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
--- gcc/optabs.cc.jj	2022-07-26 21:43:55.638403562 +0200
+++ gcc/optabs.cc	2022-10-03 23:00:17.402698229 +0200
@@ -4254,11 +4254,24 @@ can_compare_p (enum rtx_code code, machi
 	       enum can_compare_purpose purpose)
 {
   rtx test;
+  machine_mode orig_mode = mode;
   test = gen_rtx_fmt_ee (code, mode, const0_rtx, const0_rtx);
   do
     {
       enum insn_code icode;
 
+      /* Don't consider [BH]Fmode as usable wider mode, as neither is
+	 a subset or superset of the other.  */
+      if (mode != orig_mode
+	  && SCALAR_FLOAT_MODE_P (mode)
+	  && known_eq (GET_MODE_PRECISION (mode),
+		       GET_MODE_PRECISION (orig_mode)))
+	{
+	  mode = GET_MODE_WIDER_MODE (mode).else_void ();
+	  PUT_MODE (test, mode);
+	  continue;
+	}
+
       if (purpose == ccp_jump
           && (icode = optab_handler (cbranch_optab, mode)) != CODE_FOR_nothing
           && insn_operand_matches (icode, 0, test))
@@ -4497,7 +4510,13 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
       enum insn_code icode;
       icode = optab_handler (cbranch_optab, cmp_mode);
       if (icode != CODE_FOR_nothing
-	  && insn_operand_matches (icode, 0, test))
+	  && insn_operand_matches (icode, 0, test)
+	  /* Don't consider [BH]Fmode as usable wider mode, as neither is
+	     a subset or superset of the other.  */
+	  && (cmp_mode == mode
+	      || !SCALAR_FLOAT_MODE_P (cmp_mode)
+	      || maybe_ne (GET_MODE_PRECISION (cmp_mode),
+			   GET_MODE_PRECISION (mode))))
 	{
 	  rtx_insn *last = get_last_insn ();
 	  rtx op0 = prepare_operand (icode, x, 1, mode, cmp_mode, unsignedp);
--- gcc/config/i386/i386.cc.jj	2022-10-01 21:44:58.477921753 +0200
+++ gcc/config/i386/i386.cc	2022-10-03 22:46:34.233786906 +0200
@@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
       classes[1] = X86_64_SSEUP_CLASS;
       return 2;
     case E_HCmode:
+    case E_BCmode:
       classes[0] = X86_64_SSE_CLASS;
       if (!(bit_offset % 64))
 	return 1;
@@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
      be defined by the C front-end for AVX512FP16 intrinsics.  We will
      issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
      enabled.  */
-  return ((mode == HFmode && TARGET_SSE2)
+  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
 	  ? true
 	  : default_libgcc_floating_mode_supported_p (mode));
 }
@@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DF16b";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
--- gcc/config/i386/i386-builtins.cc.jj	2022-10-01 21:44:52.478003286 +0200
+++ gcc/config/i386/i386-builtins.cc	2022-10-03 22:46:34.233786906 +0200
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  if (bfloat16_type_node == NULL_TREE)
+    {
+      bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (bfloat16_type_node) = 16;
+      SET_TYPE_MODE (bfloat16_type_node, BFmode);
+      layout_type (bfloat16_type_node);
+    }
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
--- gcc/config/i386/i386-builtin-types.def.jj	2022-10-01 21:44:52.476003314 +0200
+++ gcc/config/i386/i386-builtin-types.def	2022-10-03 22:46:34.233786906 +0200
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
--- gcc/config/i386/i386.md.jj	2022-09-05 23:25:28.627019050 +0200
+++ gcc/config/i386/i386.md	2022-10-03 22:46:34.239786826 +0200
@@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+(define_expand "cbranchbf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
+		    (match_operand:BF 2 "cmp_fp_expander_operand")))
+   (set (pc) (if_then_else
+	      (match_operator 0 "comparison_operator"
+	       [(reg:CC FLAGS_REG)
+		(const_int 0)])
+	      (label_ref (match_operand 3))
+	      (pc)))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[1]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[1], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
+			   SFmode, NULL_RTX, NULL,
+			   as_a <rtx_code_label *> (operands[3]),
+			   /* Unfortunately this isn't propagated.  */
+			   profile_probability::even ());
+  DONE;
+})
+
 (define_expand "cstorehf4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
@@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
   DONE;
 })
 
+(define_expand "cstorebf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
+		    (match_operand:BF 3 "cmp_fp_expander_operand")))
+   (set (match_operand:QI 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	  [(reg:CC FLAGS_REG)
+	   (const_int 0)]))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[3]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[3], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
+				   op1, op2, SFmode, 0, 1);
+  if (!rtx_equal_p (res, operands[0]))
+    emit_move_insn (operands[0], res);
+  DONE;
+})
+
 (define_expand "cstore<mode>4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
--- gcc/c-family/c-cppbuiltin.cc.jj	2022-10-03 22:45:46.041435824 +0200
+++ gcc/c-family/c-cppbuiltin.cc	2022-10-03 23:11:46.111410475 +0200
@@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
       builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
 				      csuffix, FLOATN_NX_TYPE_NODE (i));
     }
+  if (bfloat16_type_node && c_dialect_cxx ())
+    {
+      if (cxx_dialect > cxx20)
+	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
+      builtin_define_float_constants ("BFLT16", "BF16", "%s",
+				      "BF16", bfloat16_type_node);
+    }
 
   /* For float.h.  */
   if (targetm.decimal_float_supported_p ())
@@ -1351,6 +1358,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	  if (!targetm.scalar_mode_supported_p (mode)
 	      || !targetm.libgcc_floating_mode_supported_p (mode))
 	    continue;
+	  if (bfloat16_type_node && TYPE_MODE (bfloat16_type_node) == mode)
+	    continue;
 	  macro_name = XALLOCAVEC (char, name_len
 				   + sizeof ("__LIBGCC_HAS__MODE__"));
 	  sprintf (macro_name, "__LIBGCC_HAS_%s_MODE__", name);
--- gcc/c-family/c-lex.cc.jj	2022-10-03 22:46:14.597051320 +0200
+++ gcc/c-family/c-lex.cc	2022-10-03 22:46:34.240786812 +0200
@@ -1000,6 +1000,19 @@ interpret_float (const cpp_token *token,
 	  pedwarn (input_location, OPT_Wpedantic,
 		   "non-standard suffix on floating constant");
       }
+    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
+      {
+	type = bfloat16_type_node;
+	if (type == NULL_TREE)
+	  {
+	    error ("unsupported non-standard suffix on floating constant");
+	    return error_mark_node;
+	  }
+	if (cxx_dialect < cxx23)
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "%<bf16%> or %<BF16%> suffix on floating constant only "
+		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
+      }
     else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
       type = long_double_type_node;
     else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
--- gcc/c/c-typeck.cc.jj	2022-09-25 22:22:03.963596917 +0200
+++ gcc/c/c-typeck.cc	2022-10-03 22:46:34.245786745 +0200
@@ -3676,6 +3676,9 @@ convert_arguments (location_t loc, vec<l
 		promote_float_arg = false;
 		break;
 	      }
+	  /* Don't promote __bf16 either.  */
+	  if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
+	    promote_float_arg = false;
 	}
 
       if (type != NULL_TREE)
--- gcc/cp/cp-tree.h.jj	2022-10-03 22:46:23.896926090 +0200
+++ gcc/cp/cp-tree.h	2022-10-03 22:46:34.246786732 +0200
@@ -8702,6 +8702,8 @@ extended_float_type_p (tree type)
   for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
     if (type == FLOATN_TYPE_NODE (i))
       return true;
+  if (type == bfloat16_type_node)
+    return true;
   return false;
 }
 
--- gcc/cp/typeck.cc.jj	2022-10-01 21:44:52.497003028 +0200
+++ gcc/cp/typeck.cc	2022-10-03 22:46:34.249786691 +0200
@@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
       if (mv2 == FLOATN_NX_TYPE_NODE (i))
 	extended2 = i + 1;
     }
+  if (mv1 == bfloat16_type_node)
+    extended1 = true;
+  if (mv2 == bfloat16_type_node)
+    extended2 = true;
   if (extended2 && !extended1)
     {
       int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
@@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
   if (cnt > 1 && mv2 == long_double_type_node)
     return -2;
   /* Otherwise, they have equal rank, but extended types
-     (other than std::bfloat16_t) have higher subrank.  */
+     (other than std::bfloat16_t) have higher subrank.
+     std::bfloat16_t shouldn't have equal rank to any standard
+     floating point type.  */
   return 1;
 }
 
--- gcc/testsuite/lib/target-supports.exp.jj	2022-10-01 21:44:58.540920897 +0200
+++ gcc/testsuite/lib/target-supports.exp	2022-10-03 22:46:34.250786678 +0200
@@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
     return 1
 }
 
+# Return 1 if the target supports the __bf16 type, 0 otherwise.
+
+proc check_effective_target_bfloat16 {} {
+    return [check_no_compiler_messages_nocache bfloat16 object {
+	__bf16 foo (__bf16 x) { return x + x; }
+    } [add_options_for_bfloat16 ""]]
+}
+
+proc check_effective_target_bfloat16_runtime {} {
+    return [check_effective_target_bfloat16]
+}
+
+proc add_options_for_bfloat16 { flags } {
+    return "$flags"
+}
+
 # Return 1 if the target supports all four forms of fused multiply-add
 # (fma, fms, fnma, and fnms) for both float and double.
 
--- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj	2022-10-03 22:46:34.251786665 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c	2022-10-03 22:46:34.251786665 +0200
@@ -0,0 +1,11 @@
+/* Test __bf16.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+#define TYPE __bf16
+#define CST(C) ((__bf16) (C))
+#define CSTU(C) CST(C)
+
+#include "floatn-basic.h"
--- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj	2022-10-03 22:46:34.251786665 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c	2022-10-03 22:46:34.251786665 +0200
@@ -0,0 +1,15 @@
+/* Test __bf16 built-in functions.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-add-options ieee } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+#define CONCATX(X, Y) X ## Y
+#define CONCAT(X, Y) CONCATX (X, Y)
+
+#define TYPE __bf16
+#define CST(C) ((__bf16) C)
+#define FN(F) CONCAT (F, bf16)
+
+#include "floatn-builtin.h"
--- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj	2022-10-03 22:46:34.251786665 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
@@ -0,0 +1,19 @@
+/* Test __bf16 __builtin_issignaling.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-add-options ieee } */
+/* { dg-require-effective-target bfloat16_runtime } */
+/* { dg-additional-options "-fsignaling-nans" } */
+/* Workaround for PR57484 on ia32: */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
+
+#define CONCATX(X, Y) X ## Y
+#define CONCAT(X, Y) CONCATX (X, Y)
+
+#define TYPE __bf16
+#define CST(C) ((__bf16) C)
+#define FN(F) CONCAT (F, bf16)
+#define EXT 0
+
+#include "builtin-issignaling-1.c"
--- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj	2022-10-03 22:46:34.251786665 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c	2022-10-03 22:46:34.251786665 +0200
@@ -0,0 +1,61 @@
+/* Test __bf16 complex arithmetic.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+extern void exit (int);
+extern void abort (void);
+
+volatile __bf16 a = ((__bf16) 1.0);
+typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
+volatile __cbf16 b = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
+volatile __cbf16 c = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
+volatile __cbf16 d = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
+
+__cbf16
+fn (__cbf16 arg)
+{
+  return arg / 4;
+}
+
+int
+main (void)
+{
+  volatile __cbf16 r;
+  if (b != c)
+    abort ();
+  if (b != d)
+    abort ();
+  r = a + b;
+  if (__real__ r != ((__bf16) 3.0) || __imag__ r != ((__bf16) 3.0))
+    abort ();
+  r += d;
+  if (__real__ r != ((__bf16) 5.0) || __imag__ r != ((__bf16) 6.0))
+    abort ();
+  r -= a;
+  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
+    abort ();
+  r /= (a + a);
+  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
+    abort ();
+  r *= (a + a);
+  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
+    abort ();
+  r -= b;
+  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
+    abort ();
+  r *= r;
+  if (__real__ r != -((__bf16) 5.0) || __imag__ r != ((__bf16) 12.0))
+    abort ();
+  /* Division may not be exact, so round result before comparing.  */
+  r /= b;
+  r += __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
+  r -= __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
+  if (r != b)
+    abort ();
+  r = fn (r);
+  if (__real__ r != ((__bf16) 0.5) || __imag__ r != ((__bf16) 0.75))
+    abort ();
+  exit (0);
+}
--- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj	2022-08-27 23:01:28.323565905 +0200
+++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
@@ -4,7 +4,7 @@
 /* Workaround for PR57484 on ia32: */
 /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
 
-#ifndef EXT
+#if !defined(EXT) && !defined(TYPE)
 int
 f1 (void)
 {
@@ -41,19 +41,21 @@ f6 (long double x)
   return __builtin_issignaling (x);
 }
 #else
-#define CONCATX(X, Y) X ## Y
-#define CONCAT(X, Y) CONCATX (X, Y)
-#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
-#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
-
-#if EXT
-# define TYPE CONCAT3 (_Float, WIDTH, x)
-# define CST(C) CONCAT4 (C, f, WIDTH, x)
-# define FN(F) CONCAT4 (F, f, WIDTH, x)
-#else
-# define TYPE CONCAT (_Float, WIDTH)
-# define CST(C) CONCAT3 (C, f, WIDTH)
-# define FN(F) CONCAT3 (F, f, WIDTH)
+#ifndef TYPE
+# define CONCATX(X, Y) X ## Y
+# define CONCAT(X, Y) CONCATX (X, Y)
+# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
+# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
+
+# if EXT
+#  define TYPE CONCAT3 (_Float, WIDTH, x)
+#  define CST(C) CONCAT4 (C, f, WIDTH, x)
+#  define FN(F) CONCAT4 (F, f, WIDTH, x)
+# else
+#  define TYPE CONCAT (_Float, WIDTH)
+#  define CST(C) CONCAT3 (C, f, WIDTH)
+#  define FN(F) CONCAT3 (F, f, WIDTH)
+# endif
 #endif
 
 int
--- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj	2020-01-14 20:02:47.411600427 +0100
+++ gcc/testsuite/gcc.dg/torture/floatn-basic.h	2022-10-03 22:46:34.251786665 +0200
@@ -9,14 +9,16 @@
 #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
 #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
 
-#if EXT
-# define TYPE CONCAT3 (_Float, WIDTH, x)
-# define CST(C) CONCAT4 (C, f, WIDTH, x)
-# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
-#else
-# define TYPE CONCAT (_Float, WIDTH)
-# define CST(C) CONCAT3 (C, f, WIDTH)
-# define CSTU(C) CONCAT3 (C, F, WIDTH)
+#ifndef TYPE
+# if EXT
+#  define TYPE CONCAT3 (_Float, WIDTH, x)
+#  define CST(C) CONCAT4 (C, f, WIDTH, x)
+#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
+# else
+#  define TYPE CONCAT (_Float, WIDTH)
+#  define CST(C) CONCAT3 (C, f, WIDTH)
+#  define CSTU(C) CONCAT3 (C, F, WIDTH)
+# endif
 #endif
 
 extern void exit (int);
--- gcc/testsuite/gcc.dg/torture/floatn-builtin.h.jj	2020-01-14 20:02:47.412600412 +0100
+++ gcc/testsuite/gcc.dg/torture/floatn-builtin.h	2022-10-03 22:46:34.251786665 +0200
@@ -2,19 +2,21 @@
    built-in functions.  Before including this file, define WIDTH as
    the value N; define EXT to 1 for _FloatNx and 0 for _FloatN.  */
 
-#define CONCATX(X, Y) X ## Y
-#define CONCAT(X, Y) CONCATX (X, Y)
-#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
-#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
+#ifndef TYPE
+# define CONCATX(X, Y) X ## Y
+# define CONCAT(X, Y) CONCATX (X, Y)
+# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
+# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
 
-#if EXT
-# define TYPE CONCAT3 (_Float, WIDTH, x)
-# define CST(C) CONCAT4 (C, f, WIDTH, x)
-# define FN(F) CONCAT4 (F, f, WIDTH, x)
-#else
-# define TYPE CONCAT (_Float, WIDTH)
-# define CST(C) CONCAT3 (C, f, WIDTH)
-# define FN(F) CONCAT3 (F, f, WIDTH)
+# if EXT
+#  define TYPE CONCAT3 (_Float, WIDTH, x)
+#  define CST(C) CONCAT4 (C, f, WIDTH, x)
+#  define FN(F) CONCAT4 (F, f, WIDTH, x)
+# else
+#  define TYPE CONCAT (_Float, WIDTH)
+#  define CST(C) CONCAT3 (C, f, WIDTH)
+#  define FN(F) CONCAT3 (F, f, WIDTH)
+# endif
 #endif
 
 extern void exit (int);
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-10-01 21:44:52.519002730 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-10-03 22:46:34.252786651 +0200
@@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
   __m256bf16 vector2_1 = {};
   __m256bf16 vector2_2 = { glob_bfloat };
   __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256bf16 vector2_4 = { 0 };
+  __m256bf16 vector2_5 = { 0.1 };
+  __m256bf16 vector2_6 = { is_a_float16 };
+  __m256bf16 vector2_7 = { is_a_float };
+  __m256bf16 vector2_8 = { is_an_int };
+  __m256bf16 vector2_9 = { is_a_short_int };
+  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m256h initi_2_3 = { glob_bfloat };
+  __m256i initi_2_5 = { glob_bfloat };
+  __v16hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m256bf16) {};
 
-  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m256bf16) { 0 };
+  (__m256bf16) { 0.1 };
   (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
@@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-10-01 21:44:52.515002784 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-10-03 22:46:34.252786651 +0200
@@ -12,8 +12,8 @@ double is_a_double;
 
 float *float_ptr;
 
-__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
-__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
+__bf16 foo1 (void) { return (__bf16) 0x1234; }
+__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
 
 __bf16 footest (__bf16 scalar0)
 {
@@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
 
   __bf16 scalar1_1;
   __bf16 scalar1_2 = glob_bfloat;
-  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar1_3 = 0;
+  __bf16 scalar1_4 = 0.1;
+  __bf16 scalar1_5 = is_a_float;
+  __bf16 scalar1_6 = is_an_int;
+  __bf16 scalar1_7 = is_a_float16;
+  __bf16 scalar1_8 = is_a_double;
+  __bf16 scalar1_9 = is_a_short_int;
+
+  int initi_1_1 = glob_bfloat;
+  float initi_1_2 = glob_bfloat;
+  _Float16 initi_1_3 = glob_bfloat;
+  short initi_1_4 = glob_bfloat;
+  double initi_1_5 = glob_bfloat;
 
   __bf16 scalar2_1 = {};
   __bf16 scalar2_2 = { glob_bfloat };
-  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar2_3 = { 0 };
+  __bf16 scalar2_4 = { 0.1 };
+  __bf16 scalar2_5 = { is_a_float };
+  __bf16 scalar2_6 = { is_an_int };
+  __bf16 scalar2_7 = { is_a_float16 };
+  __bf16 scalar2_8 = { is_a_double };
+  __bf16 scalar2_9 = { is_a_short_int };
+
+  int initi_2_1 = { glob_bfloat };
+  float initi_2_2 = { glob_bfloat };
+  _Float16 initi_2_3 = { glob_bfloat };
+  short initi_2_4 = { glob_bfloat };
+  double initi_2_5 = { glob_bfloat };
 
   /* Assignments.  */
 
   glob_bfloat = glob_bfloat;
-  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  glob_bfloat = 0;
+  glob_bfloat = 0.1;
+  glob_bfloat = is_a_float;
+  glob_bfloat = is_an_int;
+  glob_bfloat = is_a_float16;
+  glob_bfloat = is_a_double;
+  glob_bfloat = is_a_short_int;
+
+  is_an_int = glob_bfloat;
+  is_a_float = glob_bfloat;
+  is_a_float16 = glob_bfloat;
+  is_a_double = glob_bfloat;
+  is_a_short_int = glob_bfloat;
 
   /* Casting.  */
 
   (void) glob_bfloat;
   (__bf16) glob_bfloat;
 
-  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-
-  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (int) glob_bfloat;
+  (float) glob_bfloat;
+  (_Float16) glob_bfloat;
+  (double) glob_bfloat;
+  (short) glob_bfloat;
+
+  (__bf16) is_an_int;
+  (__bf16) is_a_float;
+  (__bf16) is_a_float16;
+  (__bf16) is_a_double;
+  (__bf16) is_a_short_int;
 
   /* Compound literals.  */
 
   (__bf16) {};
   (__bf16) { glob_bfloat };
-  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  (__bf16) { 0 };
+  (__bf16) { 0.1 };
+  (__bf16) { is_a_float };
+  (__bf16) { is_an_int };
+  (__bf16) { is_a_float16 };
+  (__bf16) { is_a_double };
+  (__bf16) { is_a_short_int };
+
+  (int) { glob_bfloat };
+  (float) { glob_bfloat };
+  (_Float16) { glob_bfloat };
+  (double) { glob_bfloat };
+  (short) { glob_bfloat };
 
   /* Arrays and Structs.  */
 
@@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 > glob_bfloat;
+  glob_bfloat == scalar0;
+  scalar0 > is_a_float;
+  is_a_float == scalar0;
+  scalar0 > 0;
+  0 == scalar0;
+  scalar0 > 0.1;
+  0.1 == scalar0;
+  scalar0 > is_an_int;
+  is_an_int == scalar0;
 
   /* Pointer comparison.  */
 
@@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
   /* Conditional expressions.  */
 
   0 ? scalar0 : scalar0;
-  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
+  0 ? scalar0 : is_a_float;
+  0 ? is_a_float : scalar0;
+  0 ? scalar0 : 0;
+  0 ? 0 : scalar0;
+  0 ? 0.1 : scalar0;
+  0 ? scalar0 : 0.1;
   0 ? bfloat_ptr : bfloat_ptr2;
   0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
   0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
 
-  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 ? scalar0 : scalar0;
+  scalar0 ? is_a_float : scalar0;
+  scalar0 ? scalar0 : is_a_float;
+  scalar0 ? is_a_float : is_a_float;
 
   /* Unary operators.  */
 
-  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +scalar0;
+  -scalar0;
+  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
+  !scalar0;
   *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real scalar0;
+  __imag scalar0;
+  ++scalar0;
+  --scalar0;
+  scalar0++;
+  scalar0--;
 
   /* Binary arithmetic operations.  */
 
-  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 = glob_bfloat + *bfloat_ptr;
+  scalar0 = glob_bfloat + 0.1;
+  scalar0 = glob_bfloat + 0;
+  scalar0 = glob_bfloat + is_a_float;
 
   return scalar0;
 }
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-10-01 21:44:52.517002757 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-10-03 22:46:34.252786651 +0200
@@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
   __m128bf16 vector2_1 = {};
   __m128bf16 vector2_2 = { glob_bfloat };
   __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m128bf16 vector2_4 = { 0 };
+  __m128bf16 vector2_5 = { 0.1 };
+  __m128bf16 vector2_6 = { is_a_float16 };
+  __m128bf16 vector2_7 = { is_a_float };
+  __m128bf16 vector2_8 = { is_an_int };
+  __m128bf16 vector2_9 = { is_a_short_int };
+  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m128h initi_2_3 = { glob_bfloat };
+  __m128 initi_2_4 = { glob_bfloat };
+  __v4si initi_2_5 = { glob_bfloat };
+  __v4hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m128bf16) {};
 
-  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m128bf16) { 0 };
+  (__m128bf16) { 0.1 };
   (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
@@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-10-01 21:44:52.512002825 +0200
+++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-10-03 22:46:34.252786651 +0200
@@ -5,6 +5,6 @@ void foo (void)
 {
   __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
   __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
-  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
+  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
+  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
 }
--- libcpp/include/cpplib.h.jj	2022-09-29 18:11:28.760749857 +0200
+++ libcpp/include/cpplib.h	2022-10-03 11:10:11.084028291 +0200
@@ -1275,6 +1275,7 @@ struct cpp_num
 #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
 
 #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
+#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
 
 #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
 					      of N, divided by 16.  */
--- libcpp/expr.cc.jj	2022-09-29 18:11:28.760749857 +0200
+++ libcpp/expr.cc	2022-10-03 11:10:11.107027980 +0200
@@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
   size_t orig_len = len;
   const uchar *orig_s = s;
   size_t flags;
-  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
+  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
 
   flags = 0;
-  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
+  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
 
   /* The following decimal float suffixes, from TR 24732:2009, TS
      18661-2:2015 and C2X, are supported:
@@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
      w, W - machine-specific type such as __float80 (GNU extension).
      q, Q - machine-specific type such as __float128 (GNU extension).
      fN, FN - _FloatN (TS 18661-3:2015).
-     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
+     fNx, FNx - _FloatNx (TS 18661-3:2015).
+     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
 
   /* Process decimal float suffixes, which are two letters starting
      with d or D.  Order and case are significant.  */
@@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
 		fn++;
 	    }
 	  break;
+	case 'b': case 'B':
+	  if (len > 2
+	      /* Except for bf16 / BF16 where case is significant.  */
+	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
+	      && s[2] == '1'
+	      && s[3] == '6'
+	      && CPP_OPTION (pfile, cplusplus))
+	    {
+	      bf16++;
+	      len -= 3;
+	      s += 3;
+	      break;
+	    }
+	  return 0;
 	case 'd': case 'D': d++; break;
 	case 'l': case 'L': l++; break;
 	case 'w': case 'W': w++; break;
@@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
      of N larger than can be represented in the return value.  The
      caller is responsible for rejecting _FloatN suffixes where
      _FloatN is not supported on the chosen target.  */
-  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
+  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
     return 0;
   if (fn_bits > CPP_FLOATN_MAX)
     return 0;
@@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
 	     q ? CPP_N_MD_Q :
 	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
 	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
+	     bf16 ? CPP_N_BFLOAT16 :
 	     CPP_N_DEFAULT));
 }
 
--- libgcc/config/i386/t-softfp.jj	2022-09-29 18:11:28.761749843 +0200
+++ libgcc/config/i386/t-softfp	2022-10-03 11:10:11.158027289 +0200
@@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
 LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
 
-softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
-softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
+		      tfbf xfbf dfbf sfbf hfbf
 
 softfp_extras += eqhf2
 
@@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
 CFLAGS-extendhfdf2.c += -msse2
 CFLAGS-extendhftf2.c += -msse2
 CFLAGS-extendhfxf2.c += -msse2
+CFLAGS-extendbfsf2.c += -msse2
 
 CFLAGS-truncsfhf2.c += -msse2
 CFLAGS-truncdfhf2.c += -msse2
 CFLAGS-truncxfhf2.c += -msse2
 CFLAGS-trunctfhf2.c += -msse2
+CFLAGS-truncsfbf2.c += -msse2
+CFLAGS-truncdfbf2.c += -msse2
+CFLAGS-truncxfbf2.c += -msse2
+CFLAGS-trunctfbf2.c += -msse2
+CFLAGS-trunchfbf2.c += -msse2
 
 CFLAGS-eqhf2.c += -msse2
 CFLAGS-_divhc3.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-29 18:11:28.761749843 +0200
+++ libgcc/config/i386/libgcc-glibc.ver	2022-10-03 11:10:11.168027153 +0200
@@ -214,3 +214,13 @@ GCC_12.0.0 {
   __trunctfhf2
   __truncxfhf2
 }
+
+%inherit GCC_13.0.0 GCC_12.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __truncxfbf2
+  __trunchfbf2
+}
--- libgcc/config/i386/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
+++ libgcc/config/i386/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
@@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_QNANNEGATEDP 0
 
 #define _FP_NANSIGN_H		1
+#define _FP_NANSIGN_B		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
--- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
+++ libgcc/config/i386/64/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
@@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
--- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
+++ libgcc/config/i386/32/sfp-machine.h	2022-10-03 11:10:11.182026963 +0200
@@ -87,6 +87,7 @@
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
--- libgcc/soft-fp/brain.h.jj	2022-10-03 11:10:11.182026963 +0200
+++ libgcc/soft-fp/brain.h	2022-10-03 11:10:11.182026963 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef SOFT_FP_BRAIN_H
+#define SOFT_FP_BRAIN_H	1
+
+#if _FP_W_TYPE_SIZE < 32
+# error "Here's a nickel kid.  Go buy yourself a real computer."
+#endif
+
+#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACBITS_B		8
+#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
+#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
+#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
+#define _FP_EXPBITS_B		8
+#define _FP_EXPBIAS_B		127
+#define _FP_EXPMAX_B		255
+
+#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
+#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
+#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
+#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
+#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
+
+#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
+#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
+#define _FP_HIGHBIT_DW_B	\
+  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
+
+/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
+   chosen by the target machine.  */
+
+typedef float BFtype __attribute__ ((mode (BF)));
+
+union _FP_UNION_B
+{
+  BFtype flt;
+  struct _FP_STRUCT_LAYOUT
+  {
+#if __BYTE_ORDER == __BIG_ENDIAN
+    unsigned sign : 1;
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+#else
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned sign : 1;
+#endif
+  } bits;
+};
+
+#define FP_DECL_B(X)		_FP_DECL (1, X)
+#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
+#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
+#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
+#define FP_PACK_RAW_BP(val, X)			\
+  do						\
+    {						\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_B(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_BP(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_B(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_BP(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_B(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_BP(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_B(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_BP(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
+#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
+  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
+#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
+
+/* BFmode arithmetic is not implemented.  */
+
+#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
+
+#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
+
+#endif /* !SOFT_FP_BRAIN_H */
--- libgcc/soft-fp/truncsfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
+++ libgcc/soft-fp/truncsfbf2.c	2022-10-03 11:10:11.182026963 +0200
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+BFtype
+__truncsfbf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (B, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncdfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
+++ libgcc/soft-fp/truncdfbf2.c	2022-10-03 11:10:11.182026963 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "double.h"
+
+BFtype
+__truncdfbf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (B, D, 1, 2, R, A);
+#else
+  FP_TRUNC (B, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncxfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
+++ libgcc/soft-fp/truncxfbf2.c	2022-10-03 11:10:11.183026950 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "extended.h"
+
+BFtype
+__truncxfbf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, E, 1, 4, R, A);
+#else
+  FP_TRUNC (B, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunctfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
+++ libgcc/soft-fp/trunctfbf2.c	2022-10-03 11:10:11.183026950 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE quad into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "quad.h"
+
+BFtype
+__trunctfbf2 (TFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_Q (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_Q (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, Q, 1, 4, R, A);
+#else
+  FP_TRUNC (B, Q, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunchfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
+++ libgcc/soft-fp/trunchfbf2.c	2022-10-03 11:10:11.183026950 +0200
@@ -0,0 +1,58 @@
+/* Software floating-point emulation.
+   Truncate IEEE half into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "half.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert HFtype to SFtype (lossless) and then
+   truncate to BFtype.  */
+
+BFtype
+__trunchfbf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, B, A);
+  FP_PACK_RAW_S (b, B);
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (B, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncbfhf2.c.jj	2022-10-03 11:10:11.183026950 +0200
+++ libgcc/soft-fp/truncbfhf2.c	2022-10-03 11:10:11.183026950 +0200
@@ -0,0 +1,75 @@
+/* Software floating-point emulation.
+   Truncate bfloat16 into IEEE half.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "brain.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert BFtype to SFtype (lossless) and then
+   truncate to HFtype.  */
+
+HFtype
+__truncbfhf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  /* Optimize BFtype to SFtype conversion to simple left shift
+     by 16 if possible, we don't need to raise exceptions on sNaN
+     here as the SFtype to HFtype truncation should do that too.  */
+  if (sizeof (BFtype) == 2
+      && sizeof (unsigned short) == 2
+      && sizeof (SFtype) == 4
+      && sizeof (unsigned int) == 4)
+    {
+      union { BFtype a; unsigned short b; } u1;
+      union { SFtype a; unsigned int b; } u2;
+      u1.a = a;
+      u2.b = (u1.b << 8) << 8;
+      b = u2.a;
+    }
+  else
+    {
+      FP_UNPACK_RAW_B (A, a);
+      FP_EXTEND (S, B, 1, 1, B, A);
+      FP_PACK_RAW_S (b, B);
+    }
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (H, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/extendbfsf2.c.jj	2022-10-03 11:10:11.183026950 +0200
+++ libgcc/soft-fp/extendbfsf2.c	2022-10-03 11:10:11.183026950 +0200
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an bfloat16 converted to IEEE single
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+SFtype
+__extendbfsf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_B (A, a);
+  FP_EXTEND (S, B, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libiberty/cp-demangle.h.jj	2022-09-29 18:11:28.762749829 +0200
+++ libiberty/cp-demangle.h	2022-10-03 11:10:11.184026936 +0200
@@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (35)
+#define D_BUILTIN_TYPE_COUNT (36)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
--- libiberty/cp-demangle.c.jj	2022-09-29 18:11:28.762749829 +0200
+++ libiberty/cp-demangle.c	2022-10-03 11:39:01.324587895 +0200
@@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
   /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
   /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
+  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
 };
 
 CP_STATIC_IF_GLIBCPP_V3
@@ -2753,11 +2754,22 @@ cplus_demangle_type (struct d_info *di)
 
 	case 'F':
 	  /* DF<number>_ - _Float<number>.
-	     DF<number>x - _Float<number>x.  */
+	     DF<number>x - _Float<number>x
+	     DF16b - std::bfloat16_t.  */
 	  {
 	    int arg = d_number (di);
 	    char buf[12];
 	    char suffix = 0;
+	    if (d_peek_char (di) == 'b')
+	      {
+		if (arg != 16)
+		  return NULL;
+		d_advance (di, 1);
+		ret = d_make_builtin_type (di,
+					   &cplus_demangle_builtin_types[35]);
+		di->expansion += ret->u.s_builtin.type->len;
+		break;
+	      }
 	    if (d_peek_char (di) == 'x')
 	      suffix = 'x';
 	    if (!suffix && d_peek_char (di) != '_')
--- libiberty/testsuite/demangle-expected.jj	2022-09-29 18:11:28.762749829 +0200
+++ libiberty/testsuite/demangle-expected	2022-10-03 11:39:12.666434242 +0200
@@ -1249,6 +1249,10 @@ xxx
 _Z3xxxDF32xDF64xDF128xCDF32xVb
 xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
 xxx
+--format=auto --no-params
+_Z3xxxDF16b
+xxx(std::bfloat16_t)
+xxx
 # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
 --format=auto --no-params
 _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z


	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
@ 2022-10-04 15:54       ` Joseph Myers
  2022-10-04 21:50       ` Jason Merrill
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2022-10-04 15:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Richard Biener, Jeff Law, Uros Bizjak, gcc-patches

On Tue, 4 Oct 2022, Jakub Jelinek via Gcc-patches wrote:

> Yet another problem is because I've only enabled the bf16/BF16 suffixes in
> C++ because for C it might clash with some later extension.  Am I right to
> fear about that, or do you think C will never standardize suffixes that
> would clash with that because C++ standardized the bf16/BF16 suffixes for
> something already?  If I could enable it, I'd always pedwarn for C for those

I think any C proposal to standardize something conflicting with C++ would 
get objections from the WG21 liaison.

> Another question is the suffixes of the builtins.  For now I have added
> bf16 suffix and enabled the builtins with !both_p, so one always needs to
> use __builtin_* form for them.  None of the GCC builtins end with b,
> so this isn't ambiguous with __builtin_*f16, but some libm functions do end
> with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
> always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
> log?  Shall the builtins use f16b suffixes instead like the mangling does?

Indeed, that conflict means bf16 isn't suitable for the built-in function 
suffix.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
  2022-10-04 15:54       ` Joseph Myers
@ 2022-10-04 21:50       ` Jason Merrill
  2022-10-05 13:47         ` Jakub Jelinek
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Merrill @ 2022-10-04 21:50 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph S. Myers, Richard Biener, Jeff Law, Uros Bizjak
  Cc: gcc-patches

On 10/4/22 05:06, Jakub Jelinek wrote:
> On Fri, Sep 30, 2022 at 04:08:10PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> On Fri, Sep 30, 2022 at 09:49:08AM -0400, Jason Merrill wrote:
>>> The comment from Apple on the ABI mangling proposal suggests to me that we
>>> might want to delay enabling C++ std::bfloat16_t (i.e. defining
>>> __STDCPP_BFLOAT16_T__) until we have that excess precision support?
>>
>> I saw that comment.  We have similar problem with _Float16 too, where C++
>> effectively right now works as when one uses -fexcess-precision=16 in C
>> (which isn't default).
>> I can see how hard would it be to add EXCESS_PRECISION_EXPR support to C++
>> FE.
> 
> I've started on that but it will take some time.  That said, it should
> work though less efficiently even without that, even in C users can always
> select request such behavior with -fexcess-precision=16.
> 
>>> If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?
>>
>> Perhaps, I just followed what was in the pull request.  Can change it.
> 
> Changed now, added support for the builtins and ported most of the
> float16 tests, so that it gets at least some test coverage.
> Also, for now I've left the aarch64 and arm changes out of the patch,
> because I haven't tested it on aarch64 yet and arm support was incomplete
> and I haven't heard from the ARM maintainers yet what they want or don't
> want.
> 
> The added testcases showed a few problems.  One is that i?86 maintains
> 2 kinds of fp comparisons, trivial and non-trivial, the trivial which can
> be handled by just a single conditional jump or setCC are handled directly,
> while the complex ones which need two are not handled and the generic
> code then figures it out using the trivial ones.  Unfortunately this means
> that for == and != we end up with libcalls for it.  For _Float16, we have
> added __nehf2 and __eqhf2 entrypoints last year.  I wanted to avoid doing
> the same for __bf16, so I've added cbranchbf4 and cstorebf4 expanders
> that handle all fp comparisons and internally just shift the operands up
> to construct SFmode without even handling sNaNs and then call the generic
> code to handle SFmode comparisons.
> 
> Another problem is for HFmode comparisons, when we see we don't support
> directly some HFmode comparison, we iterate on wider scalar float modes
> and look for usable comparisons, but BFmode and HFmode are unordered and
> one of them has to appear as wider but neither is a subset nor superset,
> so I had to skip wider modes which have equal precision to the starting one.
> Yet another problem is because I've only enabled the bf16/BF16 suffixes in
> C++ because for C it might clash with some later extension.  Am I right to
> fear about that, or do you think C will never standardize suffixes that
> would clash with that because C++ standardized the bf16/BF16 suffixes for
> something already?  If I could enable it, I'd always pedwarn for C for those
> and could enable the __BF16_*__ macros.  Right now I had to disable some
> -fbuilding-libgcc macros because of that (though nothing really uses them
> right now).
> 
> Another question is the suffixes of the builtins.  For now I have added
> bf16 suffix and enabled the builtins with !both_p, so one always needs to
> use __builtin_* form for them.  None of the GCC builtins end with b,
> so this isn't ambiguous with __builtin_*f16, but some libm functions do end
> with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
> always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
> log?  Shall the builtins use f16b suffixes instead like the mangling does?

Do we want bf16 builtins at all?  The impression I've gotten is that 
users want computation to happen in SFmode and only later truncate back 
to BFmode.

> Full patch bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> 2022-10-04  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/
> 	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> 	* tree.h (bfloat16_type_node): Define.
> 	(CASE_FLT_FN_FLOATN_NX): Also include BUILT_IN_*BF16.
> 	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
> 	like float16_type_mode.
> 	(build_common_tree_nodes): Initialize bfloat16_type_node if
> 	BFmode is supported.
> 	* expmed.h (maybe_expand_shift): Declare.
> 	* expmed.cc (maybe_expand_shift): No longer static.
> 	(emit_store_flag_1): Don't consider [BH]Fmode as wider mode to
> 	narrower modes.
> 	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> 	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> 	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> 	-ffast-math generic implementation for BF -> SF and SF -> BF
> 	conversions.
> 	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16,
> 	BT_FN_BFLOAT16_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING,
> 	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
> 	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): New.
> 	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS,
> 	DEF_EXT_LIB_FLOATN_NX_BUILTINS): Also add *bf16 suffixed builtins,
> 	but for these only __builtin_ prefixed functions.
> 	* optabs.cc (can_compare_p, prepare_cmp_insn): Don't consider
> 	[BH]Fmode as wider mode to narrower modes.
> 	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
> 	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
> 	for -msse2.
> 	(ix86_mangle_type): Mangle BFmode as DF16b.
> 	(ix86_invalid_conversion, ix86_invalid_unary_op,
> 	ix86_invalid_binary_op): Remove.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> 	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> 	ix86_bf16_type_node, only create it if still NULL.
> 	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> 	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
> gcc/c-family/
> 	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> 	predefine for C++ __BFLT16_*__ macros and for C++23 also
> 	__STDCPP_BFLOAT16_T__.
> 	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
> gcc/c/
> 	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
> 	double.
> gcc/cp/
> 	* cp-tree.h (extended_float_type_p): Return true for
> 	bfloat16_type_node.
> 	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> 	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> gcc/testsuite/
> 	* lib/target-supports.exp (check_effective_target_bfloat16,
> 	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
> 	New.
> 	* gcc.dg/torture/bfloat16-basic.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
> 	* gcc.dg/torture/bfloat16-complex.c: New test.
> 	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
> 	from bfloat16-builtin-issignaling-1.c.
> 	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
> 	bfloat16-basic.c.
> 	* gcc.dg/torture/floatn-builtin.h: Allow to be includable from
> 	bfloat16-builtin.c.
> 	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
> 	diagnostics.
> 	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
> 	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
> 	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
> libcpp/
> 	* include/cpplib.h (CPP_N_BFLOAT16): Define.
> 	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> 	C++.
> libgcc/
> 	* config/i386/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> 	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
> 	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
> 	-msse2.
> 	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> 	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> 	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* soft-fp/brain.h: New file.
> 	* soft-fp/truncsfbf2.c: New file.
> 	* soft-fp/truncdfbf2.c: New file.
> 	* soft-fp/truncxfbf2.c: New file.
> 	* soft-fp/trunctfbf2.c: New file.
> 	* soft-fp/trunchfbf2.c: New file.
> 	* soft-fp/truncbfhf2.c: New file.
> 	* soft-fp/extendbfsf2.c: New file.
> libiberty/
> 	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> 	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> 	entry.
> 	(cplus_demangle_type): Demangle DF16b.
> 	* testsuite/demangle-expected (_Z3xxxDF16b): New test.
> 
> --- gcc/tree-core.h.jj	2022-10-01 21:44:52.521002702 +0200
> +++ gcc/tree-core.h	2022-10-03 22:46:34.218787107 +0200
> @@ -665,6 +665,9 @@ enum tree_index {
>     TI_DOUBLE_TYPE,
>     TI_LONG_DOUBLE_TYPE,
>   
> +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> +  TI_BFLOAT16_TYPE,
> +
>     /* The _FloatN and _FloatNx types must be consecutive, and in the
>        same sequence as the corresponding complex types, which must also
>        be consecutive; _FloatN must come before _FloatNx; the order must
> --- gcc/tree.h.jj	2022-10-01 21:44:52.525002648 +0200
> +++ gcc/tree.h	2022-10-03 22:46:34.220787080 +0200
> @@ -279,7 +279,7 @@ code_helper::is_builtin_fn () const
>   #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L
>   #define CASE_FLT_FN_FLOATN_NX(FN)			   \
>     case FN##F16: case FN##F32: case FN##F64: case FN##F128: \
> -  case FN##F32X: case FN##F64X: case FN##F128X
> +  case FN##F32X: case FN##F64X: case FN##F128X: case FN##BF16
>   #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R
>   #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX
>   
> @@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
>   #define float_type_node			global_trees[TI_FLOAT_TYPE]
>   #define double_type_node		global_trees[TI_DOUBLE_TYPE]
>   #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
> +#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
>   
>   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
>   #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> --- gcc/tree.cc.jj	2022-10-01 21:44:52.524002662 +0200
> +++ gcc/tree.cc	2022-10-03 22:46:34.223787040 +0200
> @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
>       = (flag_excess_precision == EXCESS_PRECISION_FAST
>          ? EXCESS_PRECISION_TYPE_FAST
>          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> -	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> +	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
>   
>     enum flt_eval_method target_flt_eval_method
>       = targetm.c.excess_precision (requested_type);
> @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
>     machine_mode float16_type_mode = (float16_type_node
>   				    ? TYPE_MODE (float16_type_node)
>   				    : VOIDmode);
> +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +				     ? TYPE_MODE (bfloat16_type_node)
> +				     : VOIDmode);
>     machine_mode float_type_mode = TYPE_MODE (float_type_node);
>     machine_mode double_type_mode = TYPE_MODE (double_type_node);
>   
> @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return long_double_type_node;
> @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return complex_float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return complex_double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return complex_long_double_type_node;
> @@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
>         SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
>       }
>     float128t_type_node = float128_type_node;
> +#ifdef HAVE_BFmode
> +  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
> +      && targetm.scalar_mode_supported_p (BFmode)
> +      && targetm.libgcc_floating_mode_supported_p (BFmode))
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
> +      layout_type (bfloat16_type_node);
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +    }
> +#endif
>   
>     float_ptr_type_node = build_pointer_type (float_type_node);
>     double_ptr_type_node = build_pointer_type (double_type_node);
> --- gcc/expmed.h.jj	2022-10-01 21:44:52.503002947 +0200
> +++ gcc/expmed.h	2022-10-03 22:46:34.223787040 +0200
> @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
>   				  rtx, tree, rtx, int);
>   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
>   			 int);
> +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> +			       int);
>   #ifdef GCC_OPTABS_H
>   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>   			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> --- gcc/expmed.cc.jj	2022-10-01 21:44:52.501002974 +0200
> +++ gcc/expmed.cc	2022-10-03 22:59:19.176483448 +0200
> @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
>   
>   /* Likewise, but return 0 if that cannot be done.  */
>   
> -static rtx
> +rtx
>   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>   		    int amount, rtx target, int unsignedp)
>   {
> @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
>       {
>        machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>        icode = optab_handler (cstore_optab, optab_mode);
> -     if (icode != CODE_FOR_nothing)
> +     if (icode != CODE_FOR_nothing
> +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	    a subset or superset of the other.  */
> +	 && (compare_mode == mode
> +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
> +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
> +			  GET_MODE_PRECISION (mode))))

Why do you need to do this here (and in prepare_cmp_insn, and similarly 
in can_compare_p)?  Shouldn't get_wider skip over modes that are not 
actually wider?

>   	{
>   	  do_pending_stack_adjust ();
>   	  rtx tem = emit_cstore (target, icode, code, mode, compare_mode,
> --- gcc/expr.cc.jj	2022-10-01 21:44:52.506002906 +0200
> +++ gcc/expr.cc	2022-10-03 22:46:34.226787000 +0200
> @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
>         gcc_assert ((GET_MODE_PRECISION (from_mode)
>   		   != GET_MODE_PRECISION (to_mode))
>   		  || (DECIMAL_FLOAT_MODE_P (from_mode)
> -		      != DECIMAL_FLOAT_MODE_P (to_mode)));
> +		      != DECIMAL_FLOAT_MODE_P (to_mode))
> +		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>   
>         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   	/* Conversion between decimal float and binary float, same size.  */
> @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
>   	  return;
>   	}
>   
> +#ifdef HAVE_SFmode
> +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> +	{
> +	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> +	    {
> +	      /* To cut down on libgcc size, implement
> +		 BFmode -> {DF,XF,TF}mode conversions by
> +		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +	    {
> +	      /* Similarly, implement BFmode -> HFmode as
> +		 BFmode -> SFmode -> HFmode conversion where SFmode
> +		 has superset of BFmode values.  We don't need
> +		 to handle sNaNs by raising exception and turning
> +		 into into qNaN though, as that can be done in the
> +		 SFmode -> HFmode conversion too.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      int save_flag_finite_math_only = flag_finite_math_only;
> +	      flag_finite_math_only = true;
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      flag_finite_math_only = save_flag_finite_math_only;
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (to_mode == SFmode
> +	      && !HONOR_NANS (from_mode)
> +	      && !HONOR_NANS (to_mode)
> +	      && optimize_insn_for_speed_p ())
> +	    {
> +	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> +		 shift the bits up.  */
> +	      machine_mode fromi_mode, toi_mode;
> +	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				     0).exists (&fromi_mode)
> +		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +					0).exists (&toi_mode))
> +		{
> +		  start_sequence ();
> +		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +		  rtx tof = NULL_RTX;
> +		  if (fromi)
> +		    {
> +		      rtx toi = gen_reg_rtx (toi_mode);
> +		      convert_mode_scalar (toi, fromi, 1);
> +		      toi
> +			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> +					      GET_MODE_PRECISION (to_mode)
> +					      - GET_MODE_PRECISION (from_mode),
> +					      NULL_RTX, 1);
> +		      if (toi)
> +			{
> +			  tof = lowpart_subreg (to_mode, toi, toi_mode);
> +			  if (tof)
> +			    emit_move_insn (to, tof);
> +			}
> +		    }
> +		  insns = get_insns ();
> +		  end_sequence ();
> +		  if (tof)
> +		    {
> +		      emit_insn (insns);
> +		      return;
> +		    }
> +		}
> +	    }
> +	}
> +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> +	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +	  && !HONOR_NANS (from_mode)
> +	  && !HONOR_NANS (to_mode)
> +	  && !flag_rounding_math
> +	  && optimize_insn_for_speed_p ())
> +	{
> +	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
> +	     to nearest, we can expand the conversion inline as
> +	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> +	  machine_mode fromi_mode, toi_mode;
> +	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				 0).exists (&fromi_mode)
> +	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +				    0).exists (&toi_mode))
> +	    {
> +	      start_sequence ();
> +	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +	      rtx tof = NULL_RTX;
> +	      do
> +		{
> +		  if (!fromi)
> +		    break;
> +		  int shift = (GET_MODE_PRECISION (from_mode)
> +			       - GET_MODE_PRECISION (to_mode));
> +		  rtx temp1
> +		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> +					  shift, NULL_RTX, 1);
> +		  if (!temp1)
> +		    break;
> +		  rtx temp2
> +		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp2)
> +		    break;
> +		  rtx temp3
> +		    = expand_binop (fromi_mode, add_optab, fromi,
> +				    gen_int_mode ((HOST_WIDE_INT_1U
> +						   << (shift - 1)) - 1,
> +						  fromi_mode), NULL_RTX,
> +				    1, OPTAB_DIRECT);
> +		  if (!temp3)
> +		    break;
> +		  rtx temp4
> +		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp4)
> +		    break;
> +		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> +						  temp4, shift, NULL_RTX, 1);
> +		  if (!temp5)
> +		    break;
> +		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> +		  if (!temp6)
> +		    break;
> +		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> +					toi_mode);
> +		  if (tof)
> +		    emit_move_insn (to, tof);
> +		}
> +	      while (0);
> +	      insns = get_insns ();
> +	      end_sequence ();
> +	      if (tof)
> +		{
> +		  emit_insn (insns);
> +		  return;
> +		}
> +	    }
> +	}
> +#endif
> +
>         /* Otherwise use a libcall.  */
>         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
>   
> --- gcc/builtin-types.def.jj	2022-01-11 22:31:40.590769786 +0100
> +++ gcc/builtin-types.def	2022-10-03 22:46:34.227786987 +0200
> @@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
>   DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
> +DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
> +				  ? bfloat16_type_node
> +				  : error_mark_node))
>   DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
>   				 ? float16_type_node
>   				 : error_mark_node))
> @@ -187,6 +190,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DO
>      distinguish it from two types in sequence, "long" followed by
>      "double".  */
>   DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64)
> @@ -206,6 +210,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE
>   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16)
> +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT128_FLOAT128, BT_FLOAT128, BT_FLOAT128)
> @@ -264,6 +269,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
>   DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
>   		     BT_LONGDOUBLE, BT_CONST_STRING)
> +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
> @@ -401,6 +407,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE
>   		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
>   DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
> +		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16,
>   		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32,
> @@ -554,6 +562,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE
>   		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
>   DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16,
> +		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16,
>   		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32,
> --- gcc/builtins.def.jj	2022-09-29 22:16:46.928044191 +0200
> +++ gcc/builtins.def	2022-10-03 22:46:34.227786987 +0200
> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.
>      value for the type.  */
>   #undef DEF_GCC_FLOATN_NX_BUILTINS
>   #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
> +  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
> @@ -123,6 +124,7 @@ along with GCC; see the file COPYING3.
>   	       false, true)
>   #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS
>   #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
> +  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
> --- gcc/optabs.cc.jj	2022-07-26 21:43:55.638403562 +0200
> +++ gcc/optabs.cc	2022-10-03 23:00:17.402698229 +0200
> @@ -4254,11 +4254,24 @@ can_compare_p (enum rtx_code code, machi
>   	       enum can_compare_purpose purpose)
>   {
>     rtx test;
> +  machine_mode orig_mode = mode;
>     test = gen_rtx_fmt_ee (code, mode, const0_rtx, const0_rtx);
>     do
>       {
>         enum insn_code icode;
>   
> +      /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	 a subset or superset of the other.  */
> +      if (mode != orig_mode
> +	  && SCALAR_FLOAT_MODE_P (mode)
> +	  && known_eq (GET_MODE_PRECISION (mode),
> +		       GET_MODE_PRECISION (orig_mode)))
> +	{
> +	  mode = GET_MODE_WIDER_MODE (mode).else_void ();
> +	  PUT_MODE (test, mode);
> +	  continue;
> +	}
> +
>         if (purpose == ccp_jump
>             && (icode = optab_handler (cbranch_optab, mode)) != CODE_FOR_nothing
>             && insn_operand_matches (icode, 0, test))
> @@ -4497,7 +4510,13 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>         enum insn_code icode;
>         icode = optab_handler (cbranch_optab, cmp_mode);
>         if (icode != CODE_FOR_nothing
> -	  && insn_operand_matches (icode, 0, test))
> +	  && insn_operand_matches (icode, 0, test)
> +	  /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	     a subset or superset of the other.  */
> +	  && (cmp_mode == mode
> +	      || !SCALAR_FLOAT_MODE_P (cmp_mode)
> +	      || maybe_ne (GET_MODE_PRECISION (cmp_mode),
> +			   GET_MODE_PRECISION (mode))))
>   	{
>   	  rtx_insn *last = get_last_insn ();
>   	  rtx op0 = prepare_operand (icode, x, 1, mode, cmp_mode, unsignedp);
> --- gcc/config/i386/i386.cc.jj	2022-10-01 21:44:58.477921753 +0200
> +++ gcc/config/i386/i386.cc	2022-10-03 22:46:34.233786906 +0200
> @@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
>         classes[1] = X86_64_SSEUP_CLASS;
>         return 2;
>       case E_HCmode:
> +    case E_BCmode:
>         classes[0] = X86_64_SSE_CLASS;
>         if (!(bit_offset % 64))
>   	return 1;
> @@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
>        be defined by the C front-end for AVX512FP16 intrinsics.  We will
>        issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
>        enabled.  */
> -  return ((mode == HFmode && TARGET_SSE2)
> +  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
>   	  ? true
>   	  : default_libgcc_floating_mode_supported_p (mode));
>   }
> @@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
>     switch (TYPE_MODE (type))
>       {
>       case E_BFmode:
> -      return "u6__bf16";
> +      return "DF16b";
>       case E_HFmode:
>         /* _Float16 is "DF16_".
>   	 Align with clang's decision in https://reviews.llvm.org/D33719. */
> @@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
>       }
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<__bf16%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<__bf16%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
>   
>   static tree
> @@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE ix86_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> -
>   #undef TARGET_STACK_PROTECT_GUARD
>   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
>   
> --- gcc/config/i386/i386-builtins.cc.jj	2022-10-01 21:44:52.478003286 +0200
> +++ gcc/config/i386/i386-builtins.cc	2022-10-03 22:46:34.233786906 +0200
> @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>   
>   tree ix86_float16_type_node = NULL_TREE;
> -tree ix86_bf16_type_node = NULL_TREE;
>   tree ix86_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Retrieve an element from the above table, building some of
> @@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
>   static void
>   ix86_register_bf16_builtin_type (void)
>   {
> -  ix86_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> -  layout_type (ix86_bf16_type_node);
> +  if (bfloat16_type_node == NULL_TREE)
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = 16;
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +      layout_type (bfloat16_type_node);
> +    }
>   
>     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
>       {
> -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> -					    "__bf16");
> -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>       }
>   }
>   
> --- gcc/config/i386/i386-builtin-types.def.jj	2022-10-01 21:44:52.476003314 +0200
> +++ gcc/config/i386/i386-builtin-types.def	2022-10-03 22:46:34.233786906 +0200
> @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
>   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> --- gcc/config/i386/i386.md.jj	2022-09-05 23:25:28.627019050 +0200
> +++ gcc/config/i386/i386.md	2022-10-03 22:46:34.239786826 +0200
> @@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
>     DONE;
>   })
>   
> +(define_expand "cbranchbf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
> +		    (match_operand:BF 2 "cmp_fp_expander_operand")))
> +   (set (pc) (if_then_else
> +	      (match_operator 0 "comparison_operator"
> +	       [(reg:CC FLAGS_REG)
> +		(const_int 0)])
> +	      (label_ref (match_operand 3))
> +	      (pc)))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[1]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[1], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> +			   SFmode, NULL_RTX, NULL,
> +			   as_a <rtx_code_label *> (operands[3]),
> +			   /* Unfortunately this isn't propagated.  */
> +			   profile_probability::even ());
> +  DONE;
> +})
> +
>   (define_expand "cstorehf4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
> @@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
>     DONE;
>   })
>   
> +(define_expand "cstorebf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
> +		    (match_operand:BF 3 "cmp_fp_expander_operand")))
> +   (set (match_operand:QI 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	  [(reg:CC FLAGS_REG)
> +	   (const_int 0)]))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[3]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[3], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
> +				   op1, op2, SFmode, 0, 1);
> +  if (!rtx_equal_p (res, operands[0]))
> +    emit_move_insn (operands[0], res);
> +  DONE;
> +})
> +
>   (define_expand "cstore<mode>4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
> --- gcc/c-family/c-cppbuiltin.cc.jj	2022-10-03 22:45:46.041435824 +0200
> +++ gcc/c-family/c-cppbuiltin.cc	2022-10-03 23:11:46.111410475 +0200
> @@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
>         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
>   				      csuffix, FLOATN_NX_TYPE_NODE (i));
>       }
> +  if (bfloat16_type_node && c_dialect_cxx ())
> +    {
> +      if (cxx_dialect > cxx20)
> +	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> +				      "BF16", bfloat16_type_node);
> +    }
>   
>     /* For float.h.  */
>     if (targetm.decimal_float_supported_p ())
> @@ -1351,6 +1358,8 @@ c_cpp_builtins (cpp_reader *pfile)
>   	  if (!targetm.scalar_mode_supported_p (mode)
>   	      || !targetm.libgcc_floating_mode_supported_p (mode))
>   	    continue;
> +	  if (bfloat16_type_node && TYPE_MODE (bfloat16_type_node) == mode)
> +	    continue;
>   	  macro_name = XALLOCAVEC (char, name_len
>   				   + sizeof ("__LIBGCC_HAS__MODE__"));
>   	  sprintf (macro_name, "__LIBGCC_HAS_%s_MODE__", name);
> --- gcc/c-family/c-lex.cc.jj	2022-10-03 22:46:14.597051320 +0200
> +++ gcc/c-family/c-lex.cc	2022-10-03 22:46:34.240786812 +0200
> @@ -1000,6 +1000,19 @@ interpret_float (const cpp_token *token,
>   	  pedwarn (input_location, OPT_Wpedantic,
>   		   "non-standard suffix on floating constant");
>         }
> +    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
> +      {
> +	type = bfloat16_type_node;
> +	if (type == NULL_TREE)
> +	  {
> +	    error ("unsupported non-standard suffix on floating constant");
> +	    return error_mark_node;
> +	  }
> +	if (cxx_dialect < cxx23)
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "%<bf16%> or %<BF16%> suffix on floating constant only "
> +		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> +      }
>       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
>         type = long_double_type_node;
>       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> --- gcc/c/c-typeck.cc.jj	2022-09-25 22:22:03.963596917 +0200
> +++ gcc/c/c-typeck.cc	2022-10-03 22:46:34.245786745 +0200
> @@ -3676,6 +3676,9 @@ convert_arguments (location_t loc, vec<l
>   		promote_float_arg = false;
>   		break;
>   	      }
> +	  /* Don't promote __bf16 either.  */
> +	  if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
> +	    promote_float_arg = false;
>   	}
>   
>         if (type != NULL_TREE)
> --- gcc/cp/cp-tree.h.jj	2022-10-03 22:46:23.896926090 +0200
> +++ gcc/cp/cp-tree.h	2022-10-03 22:46:34.246786732 +0200
> @@ -8702,6 +8702,8 @@ extended_float_type_p (tree type)
>     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
>       if (type == FLOATN_TYPE_NODE (i))
>         return true;
> +  if (type == bfloat16_type_node)
> +    return true;
>     return false;
>   }
>   
> --- gcc/cp/typeck.cc.jj	2022-10-01 21:44:52.497003028 +0200
> +++ gcc/cp/typeck.cc	2022-10-03 22:46:34.249786691 +0200
> @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
>         if (mv2 == FLOATN_NX_TYPE_NODE (i))
>   	extended2 = i + 1;
>       }
> +  if (mv1 == bfloat16_type_node)
> +    extended1 = true;
> +  if (mv2 == bfloat16_type_node)
> +    extended2 = true;
>     if (extended2 && !extended1)
>       {
>         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
>     if (cnt > 1 && mv2 == long_double_type_node)
>       return -2;
>     /* Otherwise, they have equal rank, but extended types
> -     (other than std::bfloat16_t) have higher subrank.  */
> +     (other than std::bfloat16_t) have higher subrank.
> +     std::bfloat16_t shouldn't have equal rank to any standard
> +     floating point type.  */
>     return 1;
>   }
>   
> --- gcc/testsuite/lib/target-supports.exp.jj	2022-10-01 21:44:58.540920897 +0200
> +++ gcc/testsuite/lib/target-supports.exp	2022-10-03 22:46:34.250786678 +0200
> @@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
>       return 1
>   }
>   
> +# Return 1 if the target supports the __bf16 type, 0 otherwise.
> +
> +proc check_effective_target_bfloat16 {} {
> +    return [check_no_compiler_messages_nocache bfloat16 object {
> +	__bf16 foo (__bf16 x) { return x + x; }
> +    } [add_options_for_bfloat16 ""]]
> +}
> +
> +proc check_effective_target_bfloat16_runtime {} {
> +    return [check_effective_target_bfloat16]
> +}
> +
> +proc add_options_for_bfloat16 { flags } {
> +    return "$flags"
> +}
> +
>   # Return 1 if the target supports all four forms of fused multiply-add
>   # (fma, fms, fnma, and fnms) for both float and double.
>   
> --- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,11 @@
> +/* Test __bf16.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) (C))
> +#define CSTU(C) CST(C)
> +
> +#include "floatn-basic.h"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,15 @@
> +/* Test __bf16 built-in functions.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) C)
> +#define FN(F) CONCAT (F, bf16)
> +
> +#include "floatn-builtin.h"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,19 @@
> +/* Test __bf16 __builtin_issignaling.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +/* { dg-additional-options "-fsignaling-nans" } */
> +/* Workaround for PR57484 on ia32: */
> +/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) C)
> +#define FN(F) CONCAT (F, bf16)
> +#define EXT 0
> +
> +#include "builtin-issignaling-1.c"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,61 @@
> +/* Test __bf16 complex arithmetic.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +extern void exit (int);
> +extern void abort (void);
> +
> +volatile __bf16 a = ((__bf16) 1.0);
> +typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
> +volatile __cbf16 b = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +volatile __cbf16 c = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +volatile __cbf16 d = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +
> +__cbf16
> +fn (__cbf16 arg)
> +{
> +  return arg / 4;
> +}
> +
> +int
> +main (void)
> +{
> +  volatile __cbf16 r;
> +  if (b != c)
> +    abort ();
> +  if (b != d)
> +    abort ();
> +  r = a + b;
> +  if (__real__ r != ((__bf16) 3.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r += d;
> +  if (__real__ r != ((__bf16) 5.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r -= a;
> +  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r /= (a + a);
> +  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r *= (a + a);
> +  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r -= b;
> +  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r *= r;
> +  if (__real__ r != -((__bf16) 5.0) || __imag__ r != ((__bf16) 12.0))
> +    abort ();
> +  /* Division may not be exact, so round result before comparing.  */
> +  r /= b;
> +  r += __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
> +  r -= __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
> +  if (r != b)
> +    abort ();
> +  r = fn (r);
> +  if (__real__ r != ((__bf16) 0.5) || __imag__ r != ((__bf16) 0.75))
> +    abort ();
> +  exit (0);
> +}
> --- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj	2022-08-27 23:01:28.323565905 +0200
> +++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
> @@ -4,7 +4,7 @@
>   /* Workaround for PR57484 on ia32: */
>   /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
>   
> -#ifndef EXT
> +#if !defined(EXT) && !defined(TYPE)
>   int
>   f1 (void)
>   {
> @@ -41,19 +41,21 @@ f6 (long double x)
>     return __builtin_issignaling (x);
>   }
>   #else
> -#define CONCATX(X, Y) X ## Y
> -#define CONCAT(X, Y) CONCATX (X, Y)
> -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> -
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define FN(F) CONCAT3 (F, f, WIDTH)
> +#ifndef TYPE
> +# define CONCATX(X, Y) X ## Y
> +# define CONCAT(X, Y) CONCATX (X, Y)
> +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> +
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define FN(F) CONCAT3 (F, f, WIDTH)
> +# endif
>   #endif
>   
>   int
> --- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj	2020-01-14 20:02:47.411600427 +0100
> +++ gcc/testsuite/gcc.dg/torture/floatn-basic.h	2022-10-03 22:46:34.251786665 +0200
> @@ -9,14 +9,16 @@
>   #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
>   #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
>   
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define CSTU(C) CONCAT3 (C, F, WIDTH)
> +#ifndef TYPE
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define CSTU(C) CONCAT3 (C, F, WIDTH)
> +# endif
>   #endif
>   
>   extern void exit (int);
> --- gcc/testsuite/gcc.dg/torture/floatn-builtin.h.jj	2020-01-14 20:02:47.412600412 +0100
> +++ gcc/testsuite/gcc.dg/torture/floatn-builtin.h	2022-10-03 22:46:34.251786665 +0200
> @@ -2,19 +2,21 @@
>      built-in functions.  Before including this file, define WIDTH as
>      the value N; define EXT to 1 for _FloatNx and 0 for _FloatN.  */
>   
> -#define CONCATX(X, Y) X ## Y
> -#define CONCAT(X, Y) CONCATX (X, Y)
> -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> +#ifndef TYPE
> +# define CONCATX(X, Y) X ## Y
> +# define CONCAT(X, Y) CONCATX (X, Y)
> +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
>   
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define FN(F) CONCAT3 (F, f, WIDTH)
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define FN(F) CONCAT3 (F, f, WIDTH)
> +# endif
>   #endif
>   
>   extern void exit (int);
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-10-01 21:44:52.519002730 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-10-03 22:46:34.252786651 +0200
> @@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
>     __m256bf16 vector2_1 = {};
>     __m256bf16 vector2_2 = { glob_bfloat };
>     __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256bf16 vector2_4 = { 0 };
> +  __m256bf16 vector2_5 = { 0.1 };
> +  __m256bf16 vector2_6 = { is_a_float16 };
> +  __m256bf16 vector2_7 = { is_a_float };
> +  __m256bf16 vector2_8 = { is_an_int };
> +  __m256bf16 vector2_9 = { is_a_short_int };
> +  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m256h initi_2_3 = { glob_bfloat };
> +  __m256i initi_2_5 = { glob_bfloat };
> +  __v16hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m256bf16) {};
>   
> -  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m256bf16) { 0 };
> +  (__m256bf16) { 0.1 };
>     (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
> @@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-10-01 21:44:52.515002784 +0200
> +++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-10-03 22:46:34.252786651 +0200
> @@ -12,8 +12,8 @@ double is_a_double;
>   
>   float *float_ptr;
>   
> -__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> -__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> +__bf16 foo1 (void) { return (__bf16) 0x1234; }
> +__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
>   
>   __bf16 footest (__bf16 scalar0)
>   {
> @@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
>   
>     __bf16 scalar1_1;
>     __bf16 scalar1_2 = glob_bfloat;
> -  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar1_3 = 0;
> +  __bf16 scalar1_4 = 0.1;
> +  __bf16 scalar1_5 = is_a_float;
> +  __bf16 scalar1_6 = is_an_int;
> +  __bf16 scalar1_7 = is_a_float16;
> +  __bf16 scalar1_8 = is_a_double;
> +  __bf16 scalar1_9 = is_a_short_int;
> +
> +  int initi_1_1 = glob_bfloat;
> +  float initi_1_2 = glob_bfloat;
> +  _Float16 initi_1_3 = glob_bfloat;
> +  short initi_1_4 = glob_bfloat;
> +  double initi_1_5 = glob_bfloat;
>   
>     __bf16 scalar2_1 = {};
>     __bf16 scalar2_2 = { glob_bfloat };
> -  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar2_3 = { 0 };
> +  __bf16 scalar2_4 = { 0.1 };
> +  __bf16 scalar2_5 = { is_a_float };
> +  __bf16 scalar2_6 = { is_an_int };
> +  __bf16 scalar2_7 = { is_a_float16 };
> +  __bf16 scalar2_8 = { is_a_double };
> +  __bf16 scalar2_9 = { is_a_short_int };
> +
> +  int initi_2_1 = { glob_bfloat };
> +  float initi_2_2 = { glob_bfloat };
> +  _Float16 initi_2_3 = { glob_bfloat };
> +  short initi_2_4 = { glob_bfloat };
> +  double initi_2_5 = { glob_bfloat };
>   
>     /* Assignments.  */
>   
>     glob_bfloat = glob_bfloat;
> -  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  glob_bfloat = 0;
> +  glob_bfloat = 0.1;
> +  glob_bfloat = is_a_float;
> +  glob_bfloat = is_an_int;
> +  glob_bfloat = is_a_float16;
> +  glob_bfloat = is_a_double;
> +  glob_bfloat = is_a_short_int;
> +
> +  is_an_int = glob_bfloat;
> +  is_a_float = glob_bfloat;
> +  is_a_float16 = glob_bfloat;
> +  is_a_double = glob_bfloat;
> +  is_a_short_int = glob_bfloat;
>   
>     /* Casting.  */
>   
>     (void) glob_bfloat;
>     (__bf16) glob_bfloat;
>   
> -  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -
> -  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (int) glob_bfloat;
> +  (float) glob_bfloat;
> +  (_Float16) glob_bfloat;
> +  (double) glob_bfloat;
> +  (short) glob_bfloat;
> +
> +  (__bf16) is_an_int;
> +  (__bf16) is_a_float;
> +  (__bf16) is_a_float16;
> +  (__bf16) is_a_double;
> +  (__bf16) is_a_short_int;
>   
>     /* Compound literals.  */
>   
>     (__bf16) {};
>     (__bf16) { glob_bfloat };
> -  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  (__bf16) { 0 };
> +  (__bf16) { 0.1 };
> +  (__bf16) { is_a_float };
> +  (__bf16) { is_an_int };
> +  (__bf16) { is_a_float16 };
> +  (__bf16) { is_a_double };
> +  (__bf16) { is_a_short_int };
> +
> +  (int) { glob_bfloat };
> +  (float) { glob_bfloat };
> +  (_Float16) { glob_bfloat };
> +  (double) { glob_bfloat };
> +  (short) { glob_bfloat };
>   
>     /* Arrays and Structs.  */
>   
> @@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 > glob_bfloat;
> +  glob_bfloat == scalar0;
> +  scalar0 > is_a_float;
> +  is_a_float == scalar0;
> +  scalar0 > 0;
> +  0 == scalar0;
> +  scalar0 > 0.1;
> +  0.1 == scalar0;
> +  scalar0 > is_an_int;
> +  is_an_int == scalar0;
>   
>     /* Pointer comparison.  */
>   
> @@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
>     /* Conditional expressions.  */
>   
>     0 ? scalar0 : scalar0;
> -  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  0 ? scalar0 : is_a_float;
> +  0 ? is_a_float : scalar0;
> +  0 ? scalar0 : 0;
> +  0 ? 0 : scalar0;
> +  0 ? 0.1 : scalar0;
> +  0 ? scalar0 : 0.1;
>     0 ? bfloat_ptr : bfloat_ptr2;
>     0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>     0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>   
> -  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 ? scalar0 : scalar0;
> +  scalar0 ? is_a_float : scalar0;
> +  scalar0 ? scalar0 : is_a_float;
> +  scalar0 ? is_a_float : is_a_float;
>   
>     /* Unary operators.  */
>   
> -  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +scalar0;
> +  -scalar0;
> +  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !scalar0;
>     *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real scalar0;
> +  __imag scalar0;
> +  ++scalar0;
> +  --scalar0;
> +  scalar0++;
> +  scalar0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 = glob_bfloat + *bfloat_ptr;
> +  scalar0 = glob_bfloat + 0.1;
> +  scalar0 = glob_bfloat + 0;
> +  scalar0 = glob_bfloat + is_a_float;
>   
>     return scalar0;
>   }
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-10-01 21:44:52.517002757 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-10-03 22:46:34.252786651 +0200
> @@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
>     __m128bf16 vector2_1 = {};
>     __m128bf16 vector2_2 = { glob_bfloat };
>     __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m128bf16 vector2_4 = { 0 };
> +  __m128bf16 vector2_5 = { 0.1 };
> +  __m128bf16 vector2_6 = { is_a_float16 };
> +  __m128bf16 vector2_7 = { is_a_float };
> +  __m128bf16 vector2_8 = { is_an_int };
> +  __m128bf16 vector2_9 = { is_a_short_int };
> +  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m128h initi_2_3 = { glob_bfloat };
> +  __m128 initi_2_4 = { glob_bfloat };
> +  __v4si initi_2_5 = { glob_bfloat };
> +  __v4hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m128bf16) {};
>   
> -  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m128bf16) { 0 };
> +  (__m128bf16) { 0.1 };
>     (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
> @@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-10-01 21:44:52.512002825 +0200
> +++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-10-03 22:46:34.252786651 +0200
> @@ -5,6 +5,6 @@ void foo (void)
>   {
>     __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>     __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> -  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> +  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>   }
> --- libcpp/include/cpplib.h.jj	2022-09-29 18:11:28.760749857 +0200
> +++ libcpp/include/cpplib.h	2022-10-03 11:10:11.084028291 +0200
> @@ -1275,6 +1275,7 @@ struct cpp_num
>   #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
>   
>   #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
> +#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
>   
>   #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
>   					      of N, divided by 16.  */
> --- libcpp/expr.cc.jj	2022-09-29 18:11:28.760749857 +0200
> +++ libcpp/expr.cc	2022-10-03 11:10:11.107027980 +0200
> @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
>     size_t orig_len = len;
>     const uchar *orig_s = s;
>     size_t flags;
> -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
>   
>     flags = 0;
> -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
>   
>     /* The following decimal float suffixes, from TR 24732:2009, TS
>        18661-2:2015 and C2X, are supported:
> @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
>        w, W - machine-specific type such as __float80 (GNU extension).
>        q, Q - machine-specific type such as __float128 (GNU extension).
>        fN, FN - _FloatN (TS 18661-3:2015).
> -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
>   
>     /* Process decimal float suffixes, which are two letters starting
>        with d or D.  Order and case are significant.  */
> @@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
>   		fn++;
>   	    }
>   	  break;
> +	case 'b': case 'B':
> +	  if (len > 2
> +	      /* Except for bf16 / BF16 where case is significant.  */
> +	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
> +	      && s[2] == '1'
> +	      && s[3] == '6'
> +	      && CPP_OPTION (pfile, cplusplus))
> +	    {
> +	      bf16++;
> +	      len -= 3;
> +	      s += 3;
> +	      break;
> +	    }
> +	  return 0;
>   	case 'd': case 'D': d++; break;
>   	case 'l': case 'L': l++; break;
>   	case 'w': case 'W': w++; break;
> @@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
>        of N larger than can be represented in the return value.  The
>        caller is responsible for rejecting _FloatN suffixes where
>        _FloatN is not supported on the chosen target.  */
> -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
>       return 0;
>     if (fn_bits > CPP_FLOATN_MAX)
>       return 0;
> @@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
>   	     q ? CPP_N_MD_Q :
>   	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
>   	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> +	     bf16 ? CPP_N_BFLOAT16 :
>   	     CPP_N_DEFAULT));
>   }
>   
> --- libgcc/config/i386/t-softfp.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/t-softfp	2022-10-03 11:10:11.158027289 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>   
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +		      tfbf xfbf dfbf sfbf hfbf
>   
>   softfp_extras += eqhf2
>   
> @@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
>   CFLAGS-extendhfdf2.c += -msse2
>   CFLAGS-extendhftf2.c += -msse2
>   CFLAGS-extendhfxf2.c += -msse2
> +CFLAGS-extendbfsf2.c += -msse2
>   
>   CFLAGS-truncsfhf2.c += -msse2
>   CFLAGS-truncdfhf2.c += -msse2
>   CFLAGS-truncxfhf2.c += -msse2
>   CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-truncsfbf2.c += -msse2
> +CFLAGS-truncdfbf2.c += -msse2
> +CFLAGS-truncxfbf2.c += -msse2
> +CFLAGS-trunctfbf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>   
>   CFLAGS-eqhf2.c += -msse2
>   CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/libgcc-glibc.ver	2022-10-03 11:10:11.168027153 +0200
> @@ -214,3 +214,13 @@ GCC_12.0.0 {
>     __trunctfhf2
>     __truncxfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/i386/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_QNANNEGATEDP 0
>   
>   #define _FP_NANSIGN_H		1
> +#define _FP_NANSIGN_B		1
>   #define _FP_NANSIGN_S		1
>   #define _FP_NANSIGN_D		1
>   #define _FP_NANSIGN_E		1
> --- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/64/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D
>   #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
> --- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/32/sfp-machine.h	2022-10-03 11:10:11.182026963 +0200
> @@ -87,6 +87,7 @@
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   /* Even if XFmode is 12byte,  we have to pad it to
> --- libgcc/soft-fp/brain.h.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/brain.h	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H	1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B		8
> +#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B		8
> +#define _FP_EXPBIAS_B		127
> +#define _FP_EXPMAX_B		255
> +
> +#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B	\
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)		_FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)			\
> +  do						\
> +    {						\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_B(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/truncsfbf2.c	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/truncdfbf2.c	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/truncxfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/trunctfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/trunchfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/truncbfhf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/extendbfsf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libiberty/cp-demangle.h.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/cp-demangle.h	2022-10-03 11:10:11.184026936 +0200
> @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
>   extern const struct demangle_operator_info cplus_demangle_operators[];
>   #endif
>   
> -#define D_BUILTIN_TYPE_COUNT (35)
> +#define D_BUILTIN_TYPE_COUNT (36)
>   
>   CP_STATIC_IF_GLIBCPP_V3
>   const struct demangle_builtin_type_info
> --- libiberty/cp-demangle.c.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/cp-demangle.c	2022-10-03 11:39:01.324587895 +0200
> @@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
>     /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
>   	     D_PRINT_DEFAULT },
>     /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
> +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
>   };
>   
>   CP_STATIC_IF_GLIBCPP_V3
> @@ -2753,11 +2754,22 @@ cplus_demangle_type (struct d_info *di)
>   
>   	case 'F':
>   	  /* DF<number>_ - _Float<number>.
> -	     DF<number>x - _Float<number>x.  */
> +	     DF<number>x - _Float<number>x
> +	     DF16b - std::bfloat16_t.  */
>   	  {
>   	    int arg = d_number (di);
>   	    char buf[12];
>   	    char suffix = 0;
> +	    if (d_peek_char (di) == 'b')
> +	      {
> +		if (arg != 16)
> +		  return NULL;
> +		d_advance (di, 1);
> +		ret = d_make_builtin_type (di,
> +					   &cplus_demangle_builtin_types[35]);
> +		di->expansion += ret->u.s_builtin.type->len;
> +		break;
> +	      }
>   	    if (d_peek_char (di) == 'x')
>   	      suffix = 'x';
>   	    if (!suffix && d_peek_char (di) != '_')
> --- libiberty/testsuite/demangle-expected.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/testsuite/demangle-expected	2022-10-03 11:39:12.666434242 +0200
> @@ -1249,6 +1249,10 @@ xxx
>   _Z3xxxDF32xDF64xDF128xCDF32xVb
>   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
>   xxx
> +--format=auto --no-params
> +_Z3xxxDF16b
> +xxx(std::bfloat16_t)
> +xxx
>   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
>   --format=auto --no-params
>   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> 
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-10-04 21:50       ` Jason Merrill
@ 2022-10-05 13:47         ` Jakub Jelinek
  2022-10-05 20:02           ` Jason Merrill
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-05 13:47 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Joseph S. Myers, Richard Biener, Jeff Law, Uros Bizjak, gcc-patches

On Tue, Oct 04, 2022 at 05:50:50PM -0400, Jason Merrill wrote:
> > Another question is the suffixes of the builtins.  For now I have added
> > bf16 suffix and enabled the builtins with !both_p, so one always needs to
> > use __builtin_* form for them.  None of the GCC builtins end with b,
> > so this isn't ambiguous with __builtin_*f16, but some libm functions do end
> > with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
> > always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
> > log?  Shall the builtins use f16b suffixes instead like the mangling does?
> 
> Do we want bf16 builtins at all?  The impression I've gotten is that users
> want computation to happen in SFmode and only later truncate back to BFmode.

As I wrote earlier, I think we need at least one, __builtin_nans variant
which would be used in libstdc++
std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
I think
std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
return (__bf16) __builtin_huge_valf ();
and similarly
std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
return (__bf16) __builtin_nanf ("");
but
return (__bf16) __builtin_nansf ("");
would loose the signaling NaN on the conversion and raise exception,
and as the method is constexpr,
union { unsigned short a; __bf16 b; } u = { 0x7f81 };
return u.b;
wouldn't work.  I can certainly restrict the builtins to the single
one, but wonder whether the suffix for that builtin shouldn't be chosen
such that eventually we could add more builtins if we need to
and don't run into the log with bf16 suffix vs. logb with f16 suffix
ambiguity.
As you said, most of the libstdc++ overloads for std::bfloat16_t then
can use float builtins or library calls under the hood, but std::nextafter
is another case where I think we'll need to have something bfloat16_t
specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.

Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
in the next iteration (always with pedwarn in that case).

> > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
> >       {
> >        machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
> >        icode = optab_handler (cstore_optab, optab_mode);
> > -     if (icode != CODE_FOR_nothing)
> > +     if (icode != CODE_FOR_nothing
> > +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
> > +	    a subset or superset of the other.  */
> > +	 && (compare_mode == mode
> > +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
> > +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
> > +			  GET_MODE_PRECISION (mode))))
> 
> Why do you need to do this here (and in prepare_cmp_insn, and similarly in
> can_compare_p)?  Shouldn't get_wider skip over modes that are not actually
> wider?

I'm afraid too many places rely on all modes of a certain class to be
visible when walking from "narrowest" to "widest" mode, say
FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
&& GET_MODE_WIDER_MODE (HFmode) == SFmode.

Note, besides this GET_MODE_PRECISION (HFmode) == GET_MODE_PRECISION (BFmode)
case, another set of modes which have the same size are powerpc*
TFmode/IFmode/KFmode, but in that case it makes ugly hacks where it
artificially lowers the precision of 2 of them:
rs6000-modes.h:#define FLOAT_PRECISION_IFmode	128
rs6000-modes.h:#define FLOAT_PRECISION_TFmode	127
rs6000-modes.h:#define FLOAT_PRECISION_KFmode	126
(and the middle-end then has to work around that mess).  Doing something
similar wouldn't help the BFmode vs. HFmode case though, one of them would
have wider precision and so e.g. C FE would then prefer it, but more
importantly, as they are unordered modes where most of the optabs aren't
implemented it is bad to pick optabs for the "wider" mode to handle the
"narrower" one.  I think powerpc works because they define optabs for
all the 3 modes when those modes are usable.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
  2022-10-05 13:47         ` Jakub Jelinek
@ 2022-10-05 20:02           ` Jason Merrill
  2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
  2022-10-13 16:50             ` [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
  0 siblings, 2 replies; 22+ messages in thread
From: Jason Merrill @ 2022-10-05 20:02 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Joseph S. Myers, Richard Biener, Jeff Law, Uros Bizjak, gcc-patches

On 10/5/22 09:47, Jakub Jelinek wrote:
> On Tue, Oct 04, 2022 at 05:50:50PM -0400, Jason Merrill wrote:
>>> Another question is the suffixes of the builtins.  For now I have added
>>> bf16 suffix and enabled the builtins with !both_p, so one always needs to
>>> use __builtin_* form for them.  None of the GCC builtins end with b,
>>> so this isn't ambiguous with __builtin_*f16, but some libm functions do end
>>> with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
>>> always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
>>> log?  Shall the builtins use f16b suffixes instead like the mangling does?
>>
>> Do we want bf16 builtins at all?  The impression I've gotten is that users
>> want computation to happen in SFmode and only later truncate back to BFmode.
> 
> As I wrote earlier, I think we need at least one, __builtin_nans variant
> which would be used in libstdc++
> std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
> I think
> std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
> return (__bf16) __builtin_huge_valf ();
> and similarly
> std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
> return (__bf16) __builtin_nanf ("");
> but
> return (__bf16) __builtin_nansf ("");
> would loose the signaling NaN on the conversion and raise exception,
> and as the method is constexpr,
> union { unsigned short a; __bf16 b; } u = { 0x7f81 };
> return u.b;
> wouldn't work.  I can certainly restrict the builtins to the single
> one, but wonder whether the suffix for that builtin shouldn't be chosen
> such that eventually we could add more builtins if we need to
> and don't run into the log with bf16 suffix vs. logb with f16 suffix
> ambiguity.
> As you said, most of the libstdc++ overloads for std::bfloat16_t then
> can use float builtins or library calls under the hood, but std::nextafter
> is another case where I think we'll need to have something bfloat16_t
> specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.

Makes sense.

> Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
> in the next iteration (always with pedwarn in that case).
> 
>>> @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
>>>        {
>>>         machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>>>         icode = optab_handler (cstore_optab, optab_mode);
>>> -     if (icode != CODE_FOR_nothing)
>>> +     if (icode != CODE_FOR_nothing
>>> +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
>>> +	    a subset or superset of the other.  */
>>> +	 && (compare_mode == mode
>>> +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
>>> +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
>>> +			  GET_MODE_PRECISION (mode))))
>>
>> Why do you need to do this here (and in prepare_cmp_insn, and similarly in
>> can_compare_p)?  Shouldn't get_wider skip over modes that are not actually
>> wider?
> 
> I'm afraid too many places rely on all modes of a certain class to be
> visible when walking from "narrowest" to "widest" mode, say
> FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> && GET_MODE_WIDER_MODE (HFmode) == SFmode.

Yes, it seems they need to change now that their assumptions have been 
violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not 
use get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to 
decide whether they want an iteration that uses get_wider (likely with a 
new name) or not.

> Note, besides this GET_MODE_PRECISION (HFmode) == GET_MODE_PRECISION (BFmode)
> case, another set of modes which have the same size are powerpc*
> TFmode/IFmode/KFmode, but in that case it makes ugly hacks where it
> artificially lowers the precision of 2 of them:
> rs6000-modes.h:#define FLOAT_PRECISION_IFmode	128
> rs6000-modes.h:#define FLOAT_PRECISION_TFmode	127
> rs6000-modes.h:#define FLOAT_PRECISION_KFmode	126
> (and the middle-end then has to work around that mess).  Doing something
> similar wouldn't help the BFmode vs. HFmode case though, one of them would
> have wider precision and so e.g. C FE would then prefer it, but more
> importantly, as they are unordered modes where most of the optabs aren't
> implemented it is bad to pick optabs for the "wider" mode to handle the
> "narrower" one.  I think powerpc works because they define optabs for
> all the 3 modes when those modes are usable.
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-05 20:02           ` Jason Merrill
@ 2022-10-12  8:23             ` Jakub Jelinek
  2022-10-12 10:15               ` Richard Sandiford
  2022-10-12 10:37               ` [PATCH] machmode: " Eric Botcazou
  2022-10-13 16:50             ` [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
  1 sibling, 2 replies; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-12  8:23 UTC (permalink / raw)
  To: Jason Merrill, Richard Biener, Jeff Law, Eric Botcazou,
	Richard Sandiford
  Cc: gcc-patches

On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
> > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
> > > >        {
> > > >         machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
> > > >         icode = optab_handler (cstore_optab, optab_mode);
> > > > -     if (icode != CODE_FOR_nothing)
> > > > +     if (icode != CODE_FOR_nothing
> > > > +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
> > > > +	    a subset or superset of the other.  */
> > > > +	 && (compare_mode == mode
> > > > +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
> > > > +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
> > > > +			  GET_MODE_PRECISION (mode))))
> > > 
> > > Why do you need to do this here (and in prepare_cmp_insn, and similarly in
> > > can_compare_p)?  Shouldn't get_wider skip over modes that are not actually
> > > wider?
> > 
> > I'm afraid too many places rely on all modes of a certain class to be
> > visible when walking from "narrowest" to "widest" mode, say
> > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> > && GET_MODE_WIDER_MODE (HFmode) == SFmode.
> 
> Yes, it seems they need to change now that their assumptions have been
> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
> whether they want an iteration that uses get_wider (likely with a new name)
> or not.

Here is a patch which does that.
Passes bootstrap/regtest on x86_64-linux and i686-linux.

Though I admit I didn't go carefully through all 24 GET_MODE_WIDER_MODE
uses, 54 FOR_EACH_MODE_IN_CLASS uses, 3 FOR_EACH_MODE uses, 24
FOR_EACH_MODE_FROM, 6 FOR_EACH_MODE_UNTIL and 15 FOR_EACH_WIDER_MODE uses.
It is more important to go through the GET_MODE_WIDER_MODE and
FOR_EACH_WIDER_MODE uses because the patch changes behavior for those,
the rest keep their previous meaning and so can be changed incrementally
if the other meaning is desirable to them (I've of course changed the 3
spots I had to change in the previous BFmode patch and whatever triggered
during the bootstraps).

Thoughts on this?

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

	* genmodes.cc (emit_mode_wider): Emit previous content of
	mode_wider array into mode_next array and for mode_wider
	emit always VOIDmode for !CLASS_HAS_WIDER_MODES_P classes,
	otherwise skip through modes with the same precision.
	* machmode.h (mode_next): Declare.
	(GET_MODE_NEXT_MODE): New inline function.
	(mode_iterator::get_next, mode_iterator::get_known_next): New
	function templates.
	(FOR_EACH_MODE_IN_CLASS): Use get_next instead of get_wider.
	(FOR_EACH_MODE): Use get_known_next instead of get_known_wider.
	(FOR_EACH_MODE_FROM): Use get_next instead of get_wider.
	(FOR_EACH_WIDER_MODE_FROM): Define.
	(FOR_EACH_NEXT_MODE): Define.
	* expmed.cc (emit_store_flag_1): Use FOR_EACH_WIDER_MODE_FROM
	instead of FOR_EACH_MODE_FROM.
	* optabs.cc (prepare_cmp_insn): Likewise.  Remove redundant
	!CLASS_HAS_WIDER_MODES_P check.
	(prepare_float_lib_cmp): Use FOR_EACH_WIDER_MODE_FROM instead of
	FOR_EACH_MODE_FROM.
	* config/i386/i386-expand.cc (get_mode_wider_vector): Use
	GET_MODE_NEXT_MODE instead of GET_MODE_WIDER_MODE.

--- gcc/genmodes.cc.jj	2022-05-23 21:44:48.080857253 +0200
+++ gcc/genmodes.cc	2022-10-11 22:35:39.680286764 +0200
@@ -1527,7 +1527,7 @@ emit_mode_wider (void)
   int c;
   struct mode_data *m;
 
-  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
+  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
     tagged_printf ("E_%smode",
@@ -1535,6 +1535,37 @@ emit_mode_wider (void)
 		   m->name);
 
   print_closer ();
+  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
+
+  for_all_modes (c, m)
+    {
+      struct mode_data *m2 = 0;
+
+      if (m->cl == MODE_INT
+	  || m->cl == MODE_PARTIAL_INT
+	  || m->cl == MODE_FLOAT
+	  || m->cl == MODE_DECIMAL_FLOAT
+	  || m->cl == MODE_COMPLEX_FLOAT
+	  || m->cl == MODE_FRACT
+	  || m->cl == MODE_UFRACT
+	  || m->cl == MODE_ACCUM
+	  || m->cl == MODE_UACCUM)
+	for (m2 = m->wider; m2 && m2 != void_mode; m2 = m2->wider)
+	  {
+	    if (m2->bytesize == m->bytesize
+		&& m2->precision == m->precision)
+	      continue;
+	    break;
+	  }
+
+      if (m2 == void_mode)
+	m2 = 0;
+      tagged_printf ("E_%smode",
+		     m2 ? m2->name : void_mode->name,
+		     m->name);
+    }
+
+  print_closer ();
   print_decl ("unsigned char", "mode_2xwider", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
--- gcc/machmode.h.jj	2022-01-18 00:18:02.823743394 +0100
+++ gcc/machmode.h	2022-10-11 22:35:39.680286764 +0200
@@ -28,6 +28,7 @@ extern const unsigned char mode_inner[NU
 extern CONST_MODE_NUNITS poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
+extern const unsigned char mode_next[NUM_MACHINE_MODES];
 extern const unsigned char mode_wider[NUM_MACHINE_MODES];
 extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
 
@@ -760,7 +761,21 @@ GET_MODE_NUNITS (const T &mode)
 }
 #endif
 
-/* Get the next wider natural mode (eg, QI -> HI -> SI -> DI -> TI).  */
+/* Get the next natural mode (not narrower, eg, QI -> HI -> SI -> DI -> TI).  */
+
+template<typename T>
+ALWAYS_INLINE opt_mode<T>
+GET_MODE_NEXT_MODE (const T &m)
+{
+  return typename opt_mode<T>::from_int (mode_next[m]);
+}
+
+/* Get the next wider mode (eg, QI -> HI -> SI -> DI -> TI).
+   This is similar to GET_MODE_NEXT_MODE, but while GET_MODE_NEXT_MODE
+   can include mode that have the same precision (e.g.
+   GET_MODE_NEXT_MODE (HFmode) can be BFmode even when both have the same
+   precision), this one will skip those.  And always VOIDmode for
+   modes whose class is !CLASS_HAS_WIDER_MODES_P.  */
 
 template<typename T>
 ALWAYS_INLINE opt_mode<T>
@@ -1098,7 +1113,33 @@ namespace mode_iterator
     return *iter != E_VOIDmode;
   }
 
-  /* Set mode iterator *ITER to the next widest mode in the same class,
+  /* Set mode iterator *ITER to the next mode in the same class,
+     if any.  */
+
+  template<typename T>
+  inline void
+  get_next (opt_mode<T> *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (iter->require ());
+  }
+
+  inline void
+  get_next (machine_mode *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (*iter).else_void ();
+  }
+
+  /* Set mode iterator *ITER to the next wider mode in the same class.
+     Such a mode is known to exist.  */
+
+  template<typename T>
+  inline void
+  get_known_next (T *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (*iter).require ();
+  }
+
+  /* Set mode iterator *ITER to the next wider mode in the same class,
      if any.  */
 
   template<typename T>
@@ -1114,7 +1155,7 @@ namespace mode_iterator
     *iter = GET_MODE_WIDER_MODE (*iter).else_void ();
   }
 
-  /* Set mode iterator *ITER to the next widest mode in the same class.
+  /* Set mode iterator *ITER to the next wider mode in the same class.
      Such a mode is known to exist.  */
 
   template<typename T>
@@ -1146,20 +1187,27 @@ namespace mode_iterator
 #define FOR_EACH_MODE_IN_CLASS(ITERATOR, CLASS)  \
   for (mode_iterator::start (&(ITERATOR), CLASS); \
        mode_iterator::iterate_p (&(ITERATOR)); \
-       mode_iterator::get_wider (&(ITERATOR)))
+       mode_iterator::get_next (&(ITERATOR)))
 
 /* Make ITERATOR iterate over all the modes in the range [START, END),
    in order of increasing width.  */
 #define FOR_EACH_MODE(ITERATOR, START, END) \
   for ((ITERATOR) = (START); \
        (ITERATOR) != (END); \
-       mode_iterator::get_known_wider (&(ITERATOR)))
+       mode_iterator::get_known_next (&(ITERATOR)))
 
-/* Make ITERATOR iterate over START and all wider modes in the same
+/* Make ITERATOR iterate over START and all non-narrower modes in the same
    class, in order of increasing width.  */
 #define FOR_EACH_MODE_FROM(ITERATOR, START) \
   for ((ITERATOR) = (START); \
        mode_iterator::iterate_p (&(ITERATOR)); \
+       mode_iterator::get_next (&(ITERATOR)))
+
+/* Make ITERATOR iterate over START and all wider modes in the same
+   class, in order of strictly increasing width.  */
+#define FOR_EACH_WIDER_MODE_FROM(ITERATOR, START) \
+  for ((ITERATOR) = (START); \
+       mode_iterator::iterate_p (&(ITERATOR)); \
        mode_iterator::get_wider (&(ITERATOR)))
 
 /* Make ITERATOR iterate over modes in the range [NARROWEST, END)
@@ -1170,6 +1218,14 @@ namespace mode_iterator
 
 /* Make ITERATOR iterate over modes in the same class as MODE, in order
    of increasing width.  Start at the first mode wider than START,
+   or don't iterate at all if there is no wider mode.  */
+#define FOR_EACH_NEXT_MODE(ITERATOR, START) \
+  for ((ITERATOR) = (START), mode_iterator::get_next (&(ITERATOR)); \
+       mode_iterator::iterate_p (&(ITERATOR)); \
+       mode_iterator::get_next (&(ITERATOR)))
+
+/* Make ITERATOR iterate over modes in the same class as MODE, in order
+   of increasing width.  Start at the first mode wider than START,
    or don't iterate at all if there is no wider mode.  */
 #define FOR_EACH_WIDER_MODE(ITERATOR, START) \
   for ((ITERATOR) = (START), mode_iterator::get_wider (&(ITERATOR)); \
--- gcc/expmed.cc.jj	2022-10-05 21:22:56.191918383 +0200
+++ gcc/expmed.cc	2022-10-11 22:35:39.682286736 +0200
@@ -5712,7 +5712,7 @@ emit_store_flag_1 (rtx target, enum rtx_
 
   /* Next try expanding this via the backend's cstore<mode>4.  */
   mclass = GET_MODE_CLASS (mode);
-  FOR_EACH_MODE_FROM (compare_mode, mode)
+  FOR_EACH_WIDER_MODE_FROM (compare_mode, mode)
     {
      machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
      icode = optab_handler (cstore_optab, optab_mode);
--- gcc/optabs.cc.jj	2022-10-05 21:22:56.217918032 +0200
+++ gcc/optabs.cc	2022-10-11 23:20:08.216037640 +0200
@@ -4384,7 +4384,6 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
   machine_mode mode = *pmode;
   rtx libfunc, test;
   machine_mode cmp_mode;
-  enum mode_class mclass;
 
   /* The other methods are not needed.  */
   gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
@@ -4490,9 +4489,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
       return;
     }
 
-  mclass = GET_MODE_CLASS (mode);
   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
-  FOR_EACH_MODE_FROM (cmp_mode, mode)
+  FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
     {
       enum insn_code icode;
       icode = optab_handler (cbranch_optab, cmp_mode);
@@ -4515,7 +4513,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
 	  delete_insns_since (last);
 	}
 
-      if (methods == OPTAB_DIRECT || !CLASS_HAS_WIDER_MODES_P (mclass))
+      if (methods == OPTAB_DIRECT)
 	break;
     }
 
@@ -4711,7 +4709,7 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
   bool reversed_p = false;
   scalar_int_mode cmp_mode = targetm.libgcc_cmp_return_mode ();
 
-  FOR_EACH_MODE_FROM (mode, orig_mode)
+  FOR_EACH_WIDER_MODE_FROM (mode, orig_mode)
     {
       if (code_to_optab (comparison)
 	  && (libfunc = optab_libfunc (code_to_optab (comparison), mode)))
--- gcc/config/i386/i386-expand.cc.jj	2022-09-26 22:29:41.407322933 +0200
+++ gcc/config/i386/i386-expand.cc	2022-10-11 23:22:55.579761522 +0200
@@ -14941,7 +14941,7 @@ static machine_mode
 get_mode_wider_vector (machine_mode o)
 {
   /* ??? Rely on the ordering that genmodes.cc gives to vectors.  */
-  machine_mode n = GET_MODE_WIDER_MODE (o).require ();
+  machine_mode n = GET_MODE_NEXT_MODE (o).require ();
   gcc_assert (GET_MODE_NUNITS (o) == GET_MODE_NUNITS (n) * 2);
   gcc_assert (GET_MODE_SIZE (o) == GET_MODE_SIZE (n));
   return n;


	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
@ 2022-10-12 10:15               ` Richard Sandiford
  2022-10-12 11:07                 ` [PATCH] machmode, v2: " Jakub Jelinek
  2022-10-12 10:37               ` [PATCH] machmode: " Eric Botcazou
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2022-10-12 10:15 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Richard Biener, Jeff Law, Eric Botcazou, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
>> > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
>> > > >        {
>> > > >         machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>> > > >         icode = optab_handler (cstore_optab, optab_mode);
>> > > > -     if (icode != CODE_FOR_nothing)
>> > > > +     if (icode != CODE_FOR_nothing
>> > > > +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
>> > > > +	    a subset or superset of the other.  */
>> > > > +	 && (compare_mode == mode
>> > > > +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
>> > > > +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
>> > > > +			  GET_MODE_PRECISION (mode))))
>> > > 
>> > > Why do you need to do this here (and in prepare_cmp_insn, and similarly in
>> > > can_compare_p)?  Shouldn't get_wider skip over modes that are not actually
>> > > wider?
>> > 
>> > I'm afraid too many places rely on all modes of a certain class to be
>> > visible when walking from "narrowest" to "widest" mode, say
>> > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
>> > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
>> > && GET_MODE_WIDER_MODE (HFmode) == SFmode.
>> 
>> Yes, it seems they need to change now that their assumptions have been
>> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
>> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
>> whether they want an iteration that uses get_wider (likely with a new name)
>> or not.
>
> Here is a patch which does that.
> Passes bootstrap/regtest on x86_64-linux and i686-linux.
>
> Though I admit I didn't go carefully through all 24 GET_MODE_WIDER_MODE
> uses, 54 FOR_EACH_MODE_IN_CLASS uses, 3 FOR_EACH_MODE uses, 24
> FOR_EACH_MODE_FROM, 6 FOR_EACH_MODE_UNTIL and 15 FOR_EACH_WIDER_MODE uses.
> It is more important to go through the GET_MODE_WIDER_MODE and
> FOR_EACH_WIDER_MODE uses because the patch changes behavior for those,
> the rest keep their previous meaning and so can be changed incrementally
> if the other meaning is desirable to them (I've of course changed the 3
> spots I had to change in the previous BFmode patch and whatever triggered
> during the bootstraps).
>
> Thoughts on this?

Looks good to me, just some minor comments below.

> 2022-10-12  Jakub Jelinek  <jakub@redhat.com>
>
> 	* genmodes.cc (emit_mode_wider): Emit previous content of
> 	mode_wider array into mode_next array and for mode_wider
> 	emit always VOIDmode for !CLASS_HAS_WIDER_MODES_P classes,
> 	otherwise skip through modes with the same precision.
> 	* machmode.h (mode_next): Declare.
> 	(GET_MODE_NEXT_MODE): New inline function.
> 	(mode_iterator::get_next, mode_iterator::get_known_next): New
> 	function templates.
> 	(FOR_EACH_MODE_IN_CLASS): Use get_next instead of get_wider.
> 	(FOR_EACH_MODE): Use get_known_next instead of get_known_wider.
> 	(FOR_EACH_MODE_FROM): Use get_next instead of get_wider.
> 	(FOR_EACH_WIDER_MODE_FROM): Define.
> 	(FOR_EACH_NEXT_MODE): Define.
> 	* expmed.cc (emit_store_flag_1): Use FOR_EACH_WIDER_MODE_FROM
> 	instead of FOR_EACH_MODE_FROM.
> 	* optabs.cc (prepare_cmp_insn): Likewise.  Remove redundant
> 	!CLASS_HAS_WIDER_MODES_P check.
> 	(prepare_float_lib_cmp): Use FOR_EACH_WIDER_MODE_FROM instead of
> 	FOR_EACH_MODE_FROM.
> 	* config/i386/i386-expand.cc (get_mode_wider_vector): Use
> 	GET_MODE_NEXT_MODE instead of GET_MODE_WIDER_MODE.
>
> --- gcc/genmodes.cc.jj	2022-05-23 21:44:48.080857253 +0200
> +++ gcc/genmodes.cc	2022-10-11 22:35:39.680286764 +0200
> @@ -1527,7 +1527,7 @@ emit_mode_wider (void)
>    int c;
>    struct mode_data *m;
>  
> -  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
> +  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
>  
>    for_all_modes (c, m)
>      tagged_printf ("E_%smode",
> @@ -1535,6 +1535,37 @@ emit_mode_wider (void)
>  		   m->name);
>  
>    print_closer ();
> +  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
> +
> +  for_all_modes (c, m)
> +    {
> +      struct mode_data *m2 = 0;
> +
> +      if (m->cl == MODE_INT
> +	  || m->cl == MODE_PARTIAL_INT
> +	  || m->cl == MODE_FLOAT
> +	  || m->cl == MODE_DECIMAL_FLOAT
> +	  || m->cl == MODE_COMPLEX_FLOAT
> +	  || m->cl == MODE_FRACT
> +	  || m->cl == MODE_UFRACT
> +	  || m->cl == MODE_ACCUM
> +	  || m->cl == MODE_UACCUM)
> +	for (m2 = m->wider; m2 && m2 != void_mode; m2 = m2->wider)
> +	  {
> +	    if (m2->bytesize == m->bytesize
> +		&& m2->precision == m->precision)
> +	      continue;
> +	    break;
> +	  }
> +
> +      if (m2 == void_mode)
> +	m2 = 0;
> +      tagged_printf ("E_%smode",
> +		     m2 ? m2->name : void_mode->name,
> +		     m->name);
> +    }
> +
> +  print_closer ();
>    print_decl ("unsigned char", "mode_2xwider", "NUM_MACHINE_MODES");
>  
>    for_all_modes (c, m)
> --- gcc/machmode.h.jj	2022-01-18 00:18:02.823743394 +0100
> +++ gcc/machmode.h	2022-10-11 22:35:39.680286764 +0200
> @@ -28,6 +28,7 @@ extern const unsigned char mode_inner[NU
>  extern CONST_MODE_NUNITS poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
>  extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
>  extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
> +extern const unsigned char mode_next[NUM_MACHINE_MODES];
>  extern const unsigned char mode_wider[NUM_MACHINE_MODES];
>  extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
>  
> @@ -760,7 +761,21 @@ GET_MODE_NUNITS (const T &mode)
>  }
>  #endif
>  
> -/* Get the next wider natural mode (eg, QI -> HI -> SI -> DI -> TI).  */
> +/* Get the next natural mode (not narrower, eg, QI -> HI -> SI -> DI -> TI).  */

In addition to the comment you added below, I think it would be good to
give an FP example here as well, with HF and BF both included.

> +
> +template<typename T>
> +ALWAYS_INLINE opt_mode<T>
> +GET_MODE_NEXT_MODE (const T &m)
> +{
> +  return typename opt_mode<T>::from_int (mode_next[m]);
> +}
> +
> +/* Get the next wider mode (eg, QI -> HI -> SI -> DI -> TI).

And then the same example here, but with BF removed.

How robust is the mechanism that guarantees HF comes before BF,
and so is the mode that appears in the (new) wider list?

> +   This is similar to GET_MODE_NEXT_MODE, but while GET_MODE_NEXT_MODE
> +   can include mode that have the same precision (e.g.
> +   GET_MODE_NEXT_MODE (HFmode) can be BFmode even when both have the same
> +   precision), this one will skip those.  And always VOIDmode for
> +   modes whose class is !CLASS_HAS_WIDER_MODES_P.  */
>  
>  template<typename T>
>  ALWAYS_INLINE opt_mode<T>
> @@ -1098,7 +1113,33 @@ namespace mode_iterator
>      return *iter != E_VOIDmode;
>    }
>  
> -  /* Set mode iterator *ITER to the next widest mode in the same class,
> +  /* Set mode iterator *ITER to the next mode in the same class,
> +     if any.  */
> +
> +  template<typename T>
> +  inline void
> +  get_next (opt_mode<T> *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (iter->require ());
> +  }
> +
> +  inline void
> +  get_next (machine_mode *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (*iter).else_void ();
> +  }
> +
> +  /* Set mode iterator *ITER to the next wider mode in the same class.

s/wider //

> +     Such a mode is known to exist.  */
> +
> +  template<typename T>
> +  inline void
> +  get_known_next (T *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (*iter).require ();
> +  }
> +
> +  /* Set mode iterator *ITER to the next wider mode in the same class,
>       if any.  */
>  
>    template<typename T>
> @@ -1114,7 +1155,7 @@ namespace mode_iterator
>      *iter = GET_MODE_WIDER_MODE (*iter).else_void ();
>    }
>  
> -  /* Set mode iterator *ITER to the next widest mode in the same class.
> +  /* Set mode iterator *ITER to the next wider mode in the same class.
>       Such a mode is known to exist.  */

I'll take your word for it that this is correct. ;-)  I would say
"next widest", but it's very likely that I'm wrong.

>    template<typename T>
> @@ -1146,20 +1187,27 @@ namespace mode_iterator
>  #define FOR_EACH_MODE_IN_CLASS(ITERATOR, CLASS)  \
>    for (mode_iterator::start (&(ITERATOR), CLASS); \
>         mode_iterator::iterate_p (&(ITERATOR)); \
> -       mode_iterator::get_wider (&(ITERATOR)))
> +       mode_iterator::get_next (&(ITERATOR)))
>  
>  /* Make ITERATOR iterate over all the modes in the range [START, END),
>     in order of increasing width.  */
>  #define FOR_EACH_MODE(ITERATOR, START, END) \
>    for ((ITERATOR) = (START); \
>         (ITERATOR) != (END); \
> -       mode_iterator::get_known_wider (&(ITERATOR)))
> +       mode_iterator::get_known_next (&(ITERATOR)))
>  
> -/* Make ITERATOR iterate over START and all wider modes in the same
> +/* Make ITERATOR iterate over START and all non-narrower modes in the same
>     class, in order of increasing width.  */
>  #define FOR_EACH_MODE_FROM(ITERATOR, START) \
>    for ((ITERATOR) = (START); \
>         mode_iterator::iterate_p (&(ITERATOR)); \
> +       mode_iterator::get_next (&(ITERATOR)))
> +
> +/* Make ITERATOR iterate over START and all wider modes in the same
> +   class, in order of strictly increasing width.  */
> +#define FOR_EACH_WIDER_MODE_FROM(ITERATOR, START) \
> +  for ((ITERATOR) = (START); \
> +       mode_iterator::iterate_p (&(ITERATOR)); \
>         mode_iterator::get_wider (&(ITERATOR)))
>  
>  /* Make ITERATOR iterate over modes in the range [NARROWEST, END)
> @@ -1170,6 +1218,14 @@ namespace mode_iterator
>  
>  /* Make ITERATOR iterate over modes in the same class as MODE, in order
>     of increasing width.  Start at the first mode wider than START,

Maybe s/increasing/non-decreasing/?  And maybe
s/first mode wider than/next such mode after/.

Thanks,
Richard

> +   or don't iterate at all if there is no wider mode.  */
> +#define FOR_EACH_NEXT_MODE(ITERATOR, START) \
> +  for ((ITERATOR) = (START), mode_iterator::get_next (&(ITERATOR)); \
> +       mode_iterator::iterate_p (&(ITERATOR)); \
> +       mode_iterator::get_next (&(ITERATOR)))
> +
> +/* Make ITERATOR iterate over modes in the same class as MODE, in order
> +   of increasing width.  Start at the first mode wider than START,
>     or don't iterate at all if there is no wider mode.  */
>  #define FOR_EACH_WIDER_MODE(ITERATOR, START) \
>    for ((ITERATOR) = (START), mode_iterator::get_wider (&(ITERATOR)); \
> --- gcc/expmed.cc.jj	2022-10-05 21:22:56.191918383 +0200
> +++ gcc/expmed.cc	2022-10-11 22:35:39.682286736 +0200
> @@ -5712,7 +5712,7 @@ emit_store_flag_1 (rtx target, enum rtx_
>  
>    /* Next try expanding this via the backend's cstore<mode>4.  */
>    mclass = GET_MODE_CLASS (mode);
> -  FOR_EACH_MODE_FROM (compare_mode, mode)
> +  FOR_EACH_WIDER_MODE_FROM (compare_mode, mode)
>      {
>       machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>       icode = optab_handler (cstore_optab, optab_mode);
> --- gcc/optabs.cc.jj	2022-10-05 21:22:56.217918032 +0200
> +++ gcc/optabs.cc	2022-10-11 23:20:08.216037640 +0200
> @@ -4384,7 +4384,6 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>    machine_mode mode = *pmode;
>    rtx libfunc, test;
>    machine_mode cmp_mode;
> -  enum mode_class mclass;
>  
>    /* The other methods are not needed.  */
>    gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
> @@ -4490,9 +4489,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>        return;
>      }
>  
> -  mclass = GET_MODE_CLASS (mode);
>    test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> -  FOR_EACH_MODE_FROM (cmp_mode, mode)
> +  FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
>      {
>        enum insn_code icode;
>        icode = optab_handler (cbranch_optab, cmp_mode);
> @@ -4515,7 +4513,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>  	  delete_insns_since (last);
>  	}
>  
> -      if (methods == OPTAB_DIRECT || !CLASS_HAS_WIDER_MODES_P (mclass))
> +      if (methods == OPTAB_DIRECT)
>  	break;
>      }
>  
> @@ -4711,7 +4709,7 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
>    bool reversed_p = false;
>    scalar_int_mode cmp_mode = targetm.libgcc_cmp_return_mode ();
>  
> -  FOR_EACH_MODE_FROM (mode, orig_mode)
> +  FOR_EACH_WIDER_MODE_FROM (mode, orig_mode)
>      {
>        if (code_to_optab (comparison)
>  	  && (libfunc = optab_libfunc (code_to_optab (comparison), mode)))
> --- gcc/config/i386/i386-expand.cc.jj	2022-09-26 22:29:41.407322933 +0200
> +++ gcc/config/i386/i386-expand.cc	2022-10-11 23:22:55.579761522 +0200
> @@ -14941,7 +14941,7 @@ static machine_mode
>  get_mode_wider_vector (machine_mode o)
>  {
>    /* ??? Rely on the ordering that genmodes.cc gives to vectors.  */
> -  machine_mode n = GET_MODE_WIDER_MODE (o).require ();
> +  machine_mode n = GET_MODE_NEXT_MODE (o).require ();
>    gcc_assert (GET_MODE_NUNITS (o) == GET_MODE_NUNITS (n) * 2);
>    gcc_assert (GET_MODE_SIZE (o) == GET_MODE_SIZE (n));
>    return n;
>
>
> 	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] machmode, v2: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-12 10:15               ` Richard Sandiford
@ 2022-10-12 11:07                 ` Jakub Jelinek
  2022-10-12 11:49                   ` Richard Sandiford
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-12 11:07 UTC (permalink / raw)
  To: Jason Merrill, Richard Biener, Jeff Law, Eric Botcazou,
	gcc-patches, richard.sandiford

On Wed, Oct 12, 2022 at 11:15:40AM +0100, Richard Sandiford wrote:
> Looks good to me, just some minor comments below.

Here is an updated patch.

> How robust is the mechanism that guarantees HF comes before BF,
> and so is the mode that appears in the (new) wider list?

genmodes.cc seems to have cmp_modes which does a lot of different
comparisons to make sure it is a total order.
I think the BFmode vs. HFmode ordering is about the last case:
  if (m->counter < n->counter)
    return -1;
  else
    return 1;
there because everything else is equal and ->counter is about which
mode is declared first in *-modes.def.
And my code for the new mode_wider in genmodes.cc always uses VOIDmode
for !CLASS_HAS_WIDER_MODES_P classes and for CLASS_HAS_WIDER_MODES_P
classes provides a subset of the total ordering, in the already computed
->wider chain it skips modes with the same size/precision.

> > -  /* Set mode iterator *ITER to the next widest mode in the same class.
> > +  /* Set mode iterator *ITER to the next wider mode in the same class.
> >       Such a mode is known to exist.  */
> 
> I'll take your word for it that this is correct. ;-)  I would say
> "next widest", but it's very likely that I'm wrong.

I'm not a native english speaker, but to me next with superlative would be
if we have the widest mode, next widest would be the one whose only
wider mode is the widest mode.

Everything else changed.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

	* genmodes.cc (emit_mode_wider): Emit previous content of
	mode_wider array into mode_next array and for mode_wider
	emit always VOIDmode for !CLASS_HAS_WIDER_MODES_P classes,
	otherwise skip through modes with the same precision.
	* machmode.h (mode_next): Declare.
	(GET_MODE_NEXT_MODE): New inline function.
	(mode_iterator::get_next, mode_iterator::get_known_next): New
	function templates.
	(FOR_EACH_MODE_IN_CLASS): Use get_next instead of get_wider.
	(FOR_EACH_MODE): Use get_known_next instead of get_known_wider.
	(FOR_EACH_MODE_FROM): Use get_next instead of get_wider.
	(FOR_EACH_WIDER_MODE_FROM): Define.
	(FOR_EACH_NEXT_MODE): Define.
	* expmed.cc (emit_store_flag_1): Use FOR_EACH_WIDER_MODE_FROM
	instead of FOR_EACH_MODE_FROM.
	* optabs.cc (prepare_cmp_insn): Likewise.  Remove redundant
	!CLASS_HAS_WIDER_MODES_P check.
	(prepare_float_lib_cmp): Use FOR_EACH_WIDER_MODE_FROM instead of
	FOR_EACH_MODE_FROM.
	* config/i386/i386-expand.cc (get_mode_wider_vector): Use
	GET_MODE_NEXT_MODE instead of GET_MODE_WIDER_MODE.

--- gcc/genmodes.cc.jj	2022-10-12 10:15:21.444381490 +0200
+++ gcc/genmodes.cc	2022-10-12 12:28:02.414528652 +0200
@@ -1527,7 +1527,7 @@ emit_mode_wider (void)
   int c;
   struct mode_data *m;
 
-  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
+  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
     tagged_printf ("E_%smode",
@@ -1535,6 +1535,37 @@ emit_mode_wider (void)
 		   m->name);
 
   print_closer ();
+  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
+
+  for_all_modes (c, m)
+    {
+      struct mode_data *m2 = 0;
+
+      if (m->cl == MODE_INT
+	  || m->cl == MODE_PARTIAL_INT
+	  || m->cl == MODE_FLOAT
+	  || m->cl == MODE_DECIMAL_FLOAT
+	  || m->cl == MODE_COMPLEX_FLOAT
+	  || m->cl == MODE_FRACT
+	  || m->cl == MODE_UFRACT
+	  || m->cl == MODE_ACCUM
+	  || m->cl == MODE_UACCUM)
+	for (m2 = m->wider; m2 && m2 != void_mode; m2 = m2->wider)
+	  {
+	    if (m2->bytesize == m->bytesize
+		&& m2->precision == m->precision)
+	      continue;
+	    break;
+	  }
+
+      if (m2 == void_mode)
+	m2 = 0;
+      tagged_printf ("E_%smode",
+		     m2 ? m2->name : void_mode->name,
+		     m->name);
+    }
+
+  print_closer ();
   print_decl ("unsigned char", "mode_2xwider", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
--- gcc/machmode.h.jj	2022-10-12 10:15:21.491380846 +0200
+++ gcc/machmode.h	2022-10-12 12:33:36.795975117 +0200
@@ -28,6 +28,7 @@ extern const unsigned char mode_inner[NU
 extern CONST_MODE_NUNITS poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
+extern const unsigned char mode_next[NUM_MACHINE_MODES];
 extern const unsigned char mode_wider[NUM_MACHINE_MODES];
 extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
 
@@ -760,7 +761,23 @@ GET_MODE_NUNITS (const T &mode)
 }
 #endif
 
-/* Get the next wider natural mode (eg, QI -> HI -> SI -> DI -> TI).  */
+/* Get the next natural mode (not narrower, eg, QI -> HI -> SI -> DI -> TI
+   or HF -> BF -> SF -> DF -> XF -> TF).  */
+
+template<typename T>
+ALWAYS_INLINE opt_mode<T>
+GET_MODE_NEXT_MODE (const T &m)
+{
+  return typename opt_mode<T>::from_int (mode_next[m]);
+}
+
+/* Get the next wider mode (eg, QI -> HI -> SI -> DI -> TI
+   or { HF, BF } -> SF -> DF -> XF -> TF).
+   This is similar to GET_MODE_NEXT_MODE, but while GET_MODE_NEXT_MODE
+   can include mode that have the same precision (e.g.
+   GET_MODE_NEXT_MODE (HFmode) can be BFmode even when both have the same
+   precision), this one will skip those.  And always VOIDmode for
+   modes whose class is !CLASS_HAS_WIDER_MODES_P.  */
 
 template<typename T>
 ALWAYS_INLINE opt_mode<T>
@@ -1098,7 +1115,33 @@ namespace mode_iterator
     return *iter != E_VOIDmode;
   }
 
-  /* Set mode iterator *ITER to the next widest mode in the same class,
+  /* Set mode iterator *ITER to the next mode in the same class,
+     if any.  */
+
+  template<typename T>
+  inline void
+  get_next (opt_mode<T> *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (iter->require ());
+  }
+
+  inline void
+  get_next (machine_mode *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (*iter).else_void ();
+  }
+
+  /* Set mode iterator *ITER to the next mode in the same class.
+     Such a mode is known to exist.  */
+
+  template<typename T>
+  inline void
+  get_known_next (T *iter)
+  {
+    *iter = GET_MODE_NEXT_MODE (*iter).require ();
+  }
+
+  /* Set mode iterator *ITER to the next wider mode in the same class,
      if any.  */
 
   template<typename T>
@@ -1114,7 +1157,7 @@ namespace mode_iterator
     *iter = GET_MODE_WIDER_MODE (*iter).else_void ();
   }
 
-  /* Set mode iterator *ITER to the next widest mode in the same class.
+  /* Set mode iterator *ITER to the next wider mode in the same class.
      Such a mode is known to exist.  */
 
   template<typename T>
@@ -1146,20 +1189,27 @@ namespace mode_iterator
 #define FOR_EACH_MODE_IN_CLASS(ITERATOR, CLASS)  \
   for (mode_iterator::start (&(ITERATOR), CLASS); \
        mode_iterator::iterate_p (&(ITERATOR)); \
-       mode_iterator::get_wider (&(ITERATOR)))
+       mode_iterator::get_next (&(ITERATOR)))
 
 /* Make ITERATOR iterate over all the modes in the range [START, END),
    in order of increasing width.  */
 #define FOR_EACH_MODE(ITERATOR, START, END) \
   for ((ITERATOR) = (START); \
        (ITERATOR) != (END); \
-       mode_iterator::get_known_wider (&(ITERATOR)))
+       mode_iterator::get_known_next (&(ITERATOR)))
 
-/* Make ITERATOR iterate over START and all wider modes in the same
+/* Make ITERATOR iterate over START and all non-narrower modes in the same
    class, in order of increasing width.  */
 #define FOR_EACH_MODE_FROM(ITERATOR, START) \
   for ((ITERATOR) = (START); \
        mode_iterator::iterate_p (&(ITERATOR)); \
+       mode_iterator::get_next (&(ITERATOR)))
+
+/* Make ITERATOR iterate over START and all wider modes in the same
+   class, in order of strictly increasing width.  */
+#define FOR_EACH_WIDER_MODE_FROM(ITERATOR, START) \
+  for ((ITERATOR) = (START); \
+       mode_iterator::iterate_p (&(ITERATOR)); \
        mode_iterator::get_wider (&(ITERATOR)))
 
 /* Make ITERATOR iterate over modes in the range [NARROWEST, END)
@@ -1169,6 +1219,14 @@ namespace mode_iterator
   FOR_EACH_MODE (ITERATOR, get_narrowest_mode (END), END)
 
 /* Make ITERATOR iterate over modes in the same class as MODE, in order
+   of non-decreasing width.  Start at next such mode after START,
+   or don't iterate at all if there is no such mode.  */
+#define FOR_EACH_NEXT_MODE(ITERATOR, START) \
+  for ((ITERATOR) = (START), mode_iterator::get_next (&(ITERATOR)); \
+       mode_iterator::iterate_p (&(ITERATOR)); \
+       mode_iterator::get_next (&(ITERATOR)))
+
+/* Make ITERATOR iterate over modes in the same class as MODE, in order
    of increasing width.  Start at the first mode wider than START,
    or don't iterate at all if there is no wider mode.  */
 #define FOR_EACH_WIDER_MODE(ITERATOR, START) \
--- gcc/expmed.cc.jj	2022-10-12 10:15:21.335382983 +0200
+++ gcc/expmed.cc	2022-10-12 12:28:02.417528612 +0200
@@ -5712,7 +5712,7 @@ emit_store_flag_1 (rtx target, enum rtx_
 
   /* Next try expanding this via the backend's cstore<mode>4.  */
   mclass = GET_MODE_CLASS (mode);
-  FOR_EACH_MODE_FROM (compare_mode, mode)
+  FOR_EACH_WIDER_MODE_FROM (compare_mode, mode)
     {
      machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
      icode = optab_handler (cstore_optab, optab_mode);
--- gcc/optabs.cc.jj	2022-10-12 10:15:21.542380147 +0200
+++ gcc/optabs.cc	2022-10-12 12:28:02.418528598 +0200
@@ -4384,7 +4384,6 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
   machine_mode mode = *pmode;
   rtx libfunc, test;
   machine_mode cmp_mode;
-  enum mode_class mclass;
 
   /* The other methods are not needed.  */
   gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
@@ -4490,9 +4489,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
       return;
     }
 
-  mclass = GET_MODE_CLASS (mode);
   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
-  FOR_EACH_MODE_FROM (cmp_mode, mode)
+  FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
     {
       enum insn_code icode;
       icode = optab_handler (cbranch_optab, cmp_mode);
@@ -4515,7 +4513,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
 	  delete_insns_since (last);
 	}
 
-      if (methods == OPTAB_DIRECT || !CLASS_HAS_WIDER_MODES_P (mclass))
+      if (methods == OPTAB_DIRECT)
 	break;
     }
 
@@ -4711,7 +4709,7 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
   bool reversed_p = false;
   scalar_int_mode cmp_mode = targetm.libgcc_cmp_return_mode ();
 
-  FOR_EACH_MODE_FROM (mode, orig_mode)
+  FOR_EACH_WIDER_MODE_FROM (mode, orig_mode)
     {
       if (code_to_optab (comparison)
 	  && (libfunc = optab_libfunc (code_to_optab (comparison), mode)))
--- gcc/config/i386/i386-expand.cc.jj	2022-09-26 18:47:26.892350579 +0200
+++ gcc/config/i386/i386-expand.cc	2022-10-12 12:28:02.421528557 +0200
@@ -14941,7 +14941,7 @@ static machine_mode
 get_mode_wider_vector (machine_mode o)
 {
   /* ??? Rely on the ordering that genmodes.cc gives to vectors.  */
-  machine_mode n = GET_MODE_WIDER_MODE (o).require ();
+  machine_mode n = GET_MODE_NEXT_MODE (o).require ();
   gcc_assert (GET_MODE_NUNITS (o) == GET_MODE_NUNITS (n) * 2);
   gcc_assert (GET_MODE_SIZE (o) == GET_MODE_SIZE (n));
   return n;


	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] machmode, v2: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-12 11:07                 ` [PATCH] machmode, v2: " Jakub Jelinek
@ 2022-10-12 11:49                   ` Richard Sandiford
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Sandiford @ 2022-10-12 11:49 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Richard Biener, Jeff Law, Eric Botcazou, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Oct 12, 2022 at 11:15:40AM +0100, Richard Sandiford wrote:
>> Looks good to me, just some minor comments below.
>
> Here is an updated patch.
>
>> How robust is the mechanism that guarantees HF comes before BF,
>> and so is the mode that appears in the (new) wider list?
>
> genmodes.cc seems to have cmp_modes which does a lot of different
> comparisons to make sure it is a total order.
> I think the BFmode vs. HFmode ordering is about the last case:
>   if (m->counter < n->counter)
>     return -1;
>   else
>     return 1;
> there because everything else is equal and ->counter is about which
> mode is declared first in *-modes.def.
> And my code for the new mode_wider in genmodes.cc always uses VOIDmode
> for !CLASS_HAS_WIDER_MODES_P classes and for CLASS_HAS_WIDER_MODES_P
> classes provides a subset of the total ordering, in the already computed
> ->wider chain it skips modes with the same size/precision.

OK, I guess that's good enough.

>
>> > -  /* Set mode iterator *ITER to the next widest mode in the same class.
>> > +  /* Set mode iterator *ITER to the next wider mode in the same class.
>> >       Such a mode is known to exist.  */
>> 
>> I'll take your word for it that this is correct. ;-)  I would say
>> "next widest", but it's very likely that I'm wrong.
>
> I'm not a native english speaker, but to me next with superlative would be
> if we have the widest mode, next widest would be the one whose only
> wider mode is the widest mode.
>
> Everything else changed.
>
> 2022-10-12  Jakub Jelinek  <jakub@redhat.com>
>
> 	* genmodes.cc (emit_mode_wider): Emit previous content of
> 	mode_wider array into mode_next array and for mode_wider
> 	emit always VOIDmode for !CLASS_HAS_WIDER_MODES_P classes,
> 	otherwise skip through modes with the same precision.
> 	* machmode.h (mode_next): Declare.
> 	(GET_MODE_NEXT_MODE): New inline function.
> 	(mode_iterator::get_next, mode_iterator::get_known_next): New
> 	function templates.
> 	(FOR_EACH_MODE_IN_CLASS): Use get_next instead of get_wider.
> 	(FOR_EACH_MODE): Use get_known_next instead of get_known_wider.
> 	(FOR_EACH_MODE_FROM): Use get_next instead of get_wider.
> 	(FOR_EACH_WIDER_MODE_FROM): Define.
> 	(FOR_EACH_NEXT_MODE): Define.
> 	* expmed.cc (emit_store_flag_1): Use FOR_EACH_WIDER_MODE_FROM
> 	instead of FOR_EACH_MODE_FROM.
> 	* optabs.cc (prepare_cmp_insn): Likewise.  Remove redundant
> 	!CLASS_HAS_WIDER_MODES_P check.
> 	(prepare_float_lib_cmp): Use FOR_EACH_WIDER_MODE_FROM instead of
> 	FOR_EACH_MODE_FROM.
> 	* config/i386/i386-expand.cc (get_mode_wider_vector): Use
> 	GET_MODE_NEXT_MODE instead of GET_MODE_WIDER_MODE.

LGTM, but please give others 24 hours to object.

Thanks,
Richard

> --- gcc/genmodes.cc.jj	2022-10-12 10:15:21.444381490 +0200
> +++ gcc/genmodes.cc	2022-10-12 12:28:02.414528652 +0200
> @@ -1527,7 +1527,7 @@ emit_mode_wider (void)
>    int c;
>    struct mode_data *m;
>  
> -  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
> +  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
>  
>    for_all_modes (c, m)
>      tagged_printf ("E_%smode",
> @@ -1535,6 +1535,37 @@ emit_mode_wider (void)
>  		   m->name);
>  
>    print_closer ();
> +  print_decl ("unsigned char", "mode_wider", "NUM_MACHINE_MODES");
> +
> +  for_all_modes (c, m)
> +    {
> +      struct mode_data *m2 = 0;
> +
> +      if (m->cl == MODE_INT
> +	  || m->cl == MODE_PARTIAL_INT
> +	  || m->cl == MODE_FLOAT
> +	  || m->cl == MODE_DECIMAL_FLOAT
> +	  || m->cl == MODE_COMPLEX_FLOAT
> +	  || m->cl == MODE_FRACT
> +	  || m->cl == MODE_UFRACT
> +	  || m->cl == MODE_ACCUM
> +	  || m->cl == MODE_UACCUM)
> +	for (m2 = m->wider; m2 && m2 != void_mode; m2 = m2->wider)
> +	  {
> +	    if (m2->bytesize == m->bytesize
> +		&& m2->precision == m->precision)
> +	      continue;
> +	    break;
> +	  }
> +
> +      if (m2 == void_mode)
> +	m2 = 0;
> +      tagged_printf ("E_%smode",
> +		     m2 ? m2->name : void_mode->name,
> +		     m->name);
> +    }
> +
> +  print_closer ();
>    print_decl ("unsigned char", "mode_2xwider", "NUM_MACHINE_MODES");
>  
>    for_all_modes (c, m)
> --- gcc/machmode.h.jj	2022-10-12 10:15:21.491380846 +0200
> +++ gcc/machmode.h	2022-10-12 12:33:36.795975117 +0200
> @@ -28,6 +28,7 @@ extern const unsigned char mode_inner[NU
>  extern CONST_MODE_NUNITS poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
>  extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
>  extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
> +extern const unsigned char mode_next[NUM_MACHINE_MODES];
>  extern const unsigned char mode_wider[NUM_MACHINE_MODES];
>  extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
>  
> @@ -760,7 +761,23 @@ GET_MODE_NUNITS (const T &mode)
>  }
>  #endif
>  
> -/* Get the next wider natural mode (eg, QI -> HI -> SI -> DI -> TI).  */
> +/* Get the next natural mode (not narrower, eg, QI -> HI -> SI -> DI -> TI
> +   or HF -> BF -> SF -> DF -> XF -> TF).  */
> +
> +template<typename T>
> +ALWAYS_INLINE opt_mode<T>
> +GET_MODE_NEXT_MODE (const T &m)
> +{
> +  return typename opt_mode<T>::from_int (mode_next[m]);
> +}
> +
> +/* Get the next wider mode (eg, QI -> HI -> SI -> DI -> TI
> +   or { HF, BF } -> SF -> DF -> XF -> TF).
> +   This is similar to GET_MODE_NEXT_MODE, but while GET_MODE_NEXT_MODE
> +   can include mode that have the same precision (e.g.
> +   GET_MODE_NEXT_MODE (HFmode) can be BFmode even when both have the same
> +   precision), this one will skip those.  And always VOIDmode for
> +   modes whose class is !CLASS_HAS_WIDER_MODES_P.  */
>  
>  template<typename T>
>  ALWAYS_INLINE opt_mode<T>
> @@ -1098,7 +1115,33 @@ namespace mode_iterator
>      return *iter != E_VOIDmode;
>    }
>  
> -  /* Set mode iterator *ITER to the next widest mode in the same class,
> +  /* Set mode iterator *ITER to the next mode in the same class,
> +     if any.  */
> +
> +  template<typename T>
> +  inline void
> +  get_next (opt_mode<T> *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (iter->require ());
> +  }
> +
> +  inline void
> +  get_next (machine_mode *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (*iter).else_void ();
> +  }
> +
> +  /* Set mode iterator *ITER to the next mode in the same class.
> +     Such a mode is known to exist.  */
> +
> +  template<typename T>
> +  inline void
> +  get_known_next (T *iter)
> +  {
> +    *iter = GET_MODE_NEXT_MODE (*iter).require ();
> +  }
> +
> +  /* Set mode iterator *ITER to the next wider mode in the same class,
>       if any.  */
>  
>    template<typename T>
> @@ -1114,7 +1157,7 @@ namespace mode_iterator
>      *iter = GET_MODE_WIDER_MODE (*iter).else_void ();
>    }
>  
> -  /* Set mode iterator *ITER to the next widest mode in the same class.
> +  /* Set mode iterator *ITER to the next wider mode in the same class.
>       Such a mode is known to exist.  */
>  
>    template<typename T>
> @@ -1146,20 +1189,27 @@ namespace mode_iterator
>  #define FOR_EACH_MODE_IN_CLASS(ITERATOR, CLASS)  \
>    for (mode_iterator::start (&(ITERATOR), CLASS); \
>         mode_iterator::iterate_p (&(ITERATOR)); \
> -       mode_iterator::get_wider (&(ITERATOR)))
> +       mode_iterator::get_next (&(ITERATOR)))
>  
>  /* Make ITERATOR iterate over all the modes in the range [START, END),
>     in order of increasing width.  */
>  #define FOR_EACH_MODE(ITERATOR, START, END) \
>    for ((ITERATOR) = (START); \
>         (ITERATOR) != (END); \
> -       mode_iterator::get_known_wider (&(ITERATOR)))
> +       mode_iterator::get_known_next (&(ITERATOR)))
>  
> -/* Make ITERATOR iterate over START and all wider modes in the same
> +/* Make ITERATOR iterate over START and all non-narrower modes in the same
>     class, in order of increasing width.  */
>  #define FOR_EACH_MODE_FROM(ITERATOR, START) \
>    for ((ITERATOR) = (START); \
>         mode_iterator::iterate_p (&(ITERATOR)); \
> +       mode_iterator::get_next (&(ITERATOR)))
> +
> +/* Make ITERATOR iterate over START and all wider modes in the same
> +   class, in order of strictly increasing width.  */
> +#define FOR_EACH_WIDER_MODE_FROM(ITERATOR, START) \
> +  for ((ITERATOR) = (START); \
> +       mode_iterator::iterate_p (&(ITERATOR)); \
>         mode_iterator::get_wider (&(ITERATOR)))
>  
>  /* Make ITERATOR iterate over modes in the range [NARROWEST, END)
> @@ -1169,6 +1219,14 @@ namespace mode_iterator
>    FOR_EACH_MODE (ITERATOR, get_narrowest_mode (END), END)
>  
>  /* Make ITERATOR iterate over modes in the same class as MODE, in order
> +   of non-decreasing width.  Start at next such mode after START,
> +   or don't iterate at all if there is no such mode.  */
> +#define FOR_EACH_NEXT_MODE(ITERATOR, START) \
> +  for ((ITERATOR) = (START), mode_iterator::get_next (&(ITERATOR)); \
> +       mode_iterator::iterate_p (&(ITERATOR)); \
> +       mode_iterator::get_next (&(ITERATOR)))
> +
> +/* Make ITERATOR iterate over modes in the same class as MODE, in order
>     of increasing width.  Start at the first mode wider than START,
>     or don't iterate at all if there is no wider mode.  */
>  #define FOR_EACH_WIDER_MODE(ITERATOR, START) \
> --- gcc/expmed.cc.jj	2022-10-12 10:15:21.335382983 +0200
> +++ gcc/expmed.cc	2022-10-12 12:28:02.417528612 +0200
> @@ -5712,7 +5712,7 @@ emit_store_flag_1 (rtx target, enum rtx_
>  
>    /* Next try expanding this via the backend's cstore<mode>4.  */
>    mclass = GET_MODE_CLASS (mode);
> -  FOR_EACH_MODE_FROM (compare_mode, mode)
> +  FOR_EACH_WIDER_MODE_FROM (compare_mode, mode)
>      {
>       machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>       icode = optab_handler (cstore_optab, optab_mode);
> --- gcc/optabs.cc.jj	2022-10-12 10:15:21.542380147 +0200
> +++ gcc/optabs.cc	2022-10-12 12:28:02.418528598 +0200
> @@ -4384,7 +4384,6 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>    machine_mode mode = *pmode;
>    rtx libfunc, test;
>    machine_mode cmp_mode;
> -  enum mode_class mclass;
>  
>    /* The other methods are not needed.  */
>    gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
> @@ -4490,9 +4489,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>        return;
>      }
>  
> -  mclass = GET_MODE_CLASS (mode);
>    test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> -  FOR_EACH_MODE_FROM (cmp_mode, mode)
> +  FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
>      {
>        enum insn_code icode;
>        icode = optab_handler (cbranch_optab, cmp_mode);
> @@ -4515,7 +4513,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>  	  delete_insns_since (last);
>  	}
>  
> -      if (methods == OPTAB_DIRECT || !CLASS_HAS_WIDER_MODES_P (mclass))
> +      if (methods == OPTAB_DIRECT)
>  	break;
>      }
>  
> @@ -4711,7 +4709,7 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
>    bool reversed_p = false;
>    scalar_int_mode cmp_mode = targetm.libgcc_cmp_return_mode ();
>  
> -  FOR_EACH_MODE_FROM (mode, orig_mode)
> +  FOR_EACH_WIDER_MODE_FROM (mode, orig_mode)
>      {
>        if (code_to_optab (comparison)
>  	  && (libfunc = optab_libfunc (code_to_optab (comparison), mode)))
> --- gcc/config/i386/i386-expand.cc.jj	2022-09-26 18:47:26.892350579 +0200
> +++ gcc/config/i386/i386-expand.cc	2022-10-12 12:28:02.421528557 +0200
> @@ -14941,7 +14941,7 @@ static machine_mode
>  get_mode_wider_vector (machine_mode o)
>  {
>    /* ??? Rely on the ordering that genmodes.cc gives to vectors.  */
> -  machine_mode n = GET_MODE_WIDER_MODE (o).require ();
> +  machine_mode n = GET_MODE_NEXT_MODE (o).require ();
>    gcc_assert (GET_MODE_NUNITS (o) == GET_MODE_NUNITS (n) * 2);
>    gcc_assert (GET_MODE_SIZE (o) == GET_MODE_SIZE (n));
>    return n;
>
>
> 	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
  2022-10-12 10:15               ` Richard Sandiford
@ 2022-10-12 10:37               ` Eric Botcazou
  2022-10-12 10:57                 ` Jakub Jelinek
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Botcazou @ 2022-10-12 10:37 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Richard Biener, Jeff Law, Richard Sandiford, gcc-patches

> Though I admit I didn't go carefully through all 24 GET_MODE_WIDER_MODE
> uses, 54 FOR_EACH_MODE_IN_CLASS uses, 3 FOR_EACH_MODE uses, 24
> FOR_EACH_MODE_FROM, 6 FOR_EACH_MODE_UNTIL and 15 FOR_EACH_WIDER_MODE uses.
> It is more important to go through the GET_MODE_WIDER_MODE and
> FOR_EACH_WIDER_MODE uses because the patch changes behavior for those,
> the rest keep their previous meaning and so can be changed incrementally
> if the other meaning is desirable to them (I've of course changed the 3
> spots I had to change in the previous BFmode patch and whatever triggered
> during the bootstraps).
> 
> Thoughts on this?

Can't we declare that one is wider than the other, for example BFmode since it 
has got a larger range?  Though I guess this would mean special-casing them in 
genmodes.cc as they are presumably strictly identical except for the format.

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE
  2022-10-12 10:37               ` [PATCH] machmode: " Eric Botcazou
@ 2022-10-12 10:57                 ` Jakub Jelinek
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-12 10:57 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: Jason Merrill, Richard Biener, Jeff Law, Richard Sandiford, gcc-patches

On Wed, Oct 12, 2022 at 12:37:39PM +0200, Eric Botcazou wrote:
> > Though I admit I didn't go carefully through all 24 GET_MODE_WIDER_MODE
> > uses, 54 FOR_EACH_MODE_IN_CLASS uses, 3 FOR_EACH_MODE uses, 24
> > FOR_EACH_MODE_FROM, 6 FOR_EACH_MODE_UNTIL and 15 FOR_EACH_WIDER_MODE uses.
> > It is more important to go through the GET_MODE_WIDER_MODE and
> > FOR_EACH_WIDER_MODE uses because the patch changes behavior for those,
> > the rest keep their previous meaning and so can be changed incrementally
> > if the other meaning is desirable to them (I've of course changed the 3
> > spots I had to change in the previous BFmode patch and whatever triggered
> > during the bootstraps).
> > 
> > Thoughts on this?
> 
> Can't we declare that one is wider than the other, for example BFmode since it 
> has got a larger range?  Though I guess this would mean special-casing them in 
> genmodes.cc as they are presumably strictly identical except for the format.

That doesn't work, one of the modes has larger range, the other has larger
floating point precision.  So, neither of the modes is a subset or superset
of the other.  If we don't handle a particular optab for one of these modes
and allow widening, for both of these modes we want to try SFmode next
(which is a true superset of both modes, it has the same range as BFmode
but higher floating point precision than both HFmode and BFmode).

The only way to work around this widening problem would be always make sure
that whenever we implement any optab for HFmode, we also implement the same
optab for BFmode under exact same conditions and vice versa, even if those
optabs just do by hand whatever the generic code would do if the optab
didn't exist.  But that is way too limiting.

It is true that on PowerPC we have a similar situation for the widest
floating point modes, TFmode/IFmode/KFmode and the backend has the ugly hack
of pretending they have different GET_MODE_PRECISION, but as those are the
widest modes and are implemented in hardware or in software emulation, they
have in the backends the same optabs implemented.  While for HFmode/BFmode,
very few optabs are actually implemented directly and the usual intended use
is performing most arithmetics in SFmode.  Even on PowerPC,
ibm_extended_format and ieee_quad_format are neither subset nor superset of
each other, the latter has larger range and in most cases higher floating
point precision, but the former for certain values can have even 10 times
higher floating point precision.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support
  2022-10-05 20:02           ` Jason Merrill
  2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
@ 2022-10-13 16:50             ` Jakub Jelinek
  2022-10-13 19:37               ` Jason Merrill
  1 sibling, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-13 16:50 UTC (permalink / raw)
  To: Jason Merrill, Joseph S. Myers
  Cc: Richard Biener, Jeff Law, Uros Bizjak, gcc-patches

Hi!

On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
> > As I wrote earlier, I think we need at least one, __builtin_nans variant
> > which would be used in libstdc++
> > std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
> > I think
> > std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
> > return (__bf16) __builtin_huge_valf ();
> > and similarly
> > std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
> > return (__bf16) __builtin_nanf ("");
> > but
> > return (__bf16) __builtin_nansf ("");
> > would loose the signaling NaN on the conversion and raise exception,
> > and as the method is constexpr,
> > union { unsigned short a; __bf16 b; } u = { 0x7f81 };
> > return u.b;
> > wouldn't work.  I can certainly restrict the builtins to the single
> > one, but wonder whether the suffix for that builtin shouldn't be chosen
> > such that eventually we could add more builtins if we need to
> > and don't run into the log with bf16 suffix vs. logb with f16 suffix
> > ambiguity.
> > As you said, most of the libstdc++ overloads for std::bfloat16_t then
> > can use float builtins or library calls under the hood, but std::nextafter
> > is another case where I think we'll need to have something bfloat16_t
> > specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.
> 
> Makes sense.

So, this updated version of the patch adds just a single __builtin_nansf16b
builtin (or do you want __builtin_nansbf16?).
> 
> > Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
> > in the next iteration (always with pedwarn in that case).

And implements bf16/BF16 suffixes for C too.

> > I'm afraid too many places rely on all modes of a certain class to be
> > visible when walking from "narrowest" to "widest" mode, say
> > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> > && GET_MODE_WIDER_MODE (HFmode) == SFmode.
> 
> Yes, it seems they need to change now that their assumptions have been
> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
> whether they want an iteration that uses get_wider (likely with a new name)
> or not.

And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
is updated on top of those changes.

So far lightly tested on x86_64-linux, ok for trunk if it passes full
bootstrap/regtest on both x86_64-linux and i686-linux?

2022-10-13  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	(build_common_tree_nodes): Initialize bfloat16_type_node if
	BFmode is supported.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
	* builtins.def (BUILT_IN_NANSF16B): New builtin.
	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
	for -msse2.
	(ix86_mangle_type): Mangle BFmode as DF16b.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node, only create it if still NULL.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
	macros for -fbuilding-libgcc.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
	double.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_bfloat16,
	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
	New.
	* gcc.dg/torture/bfloat16-basic.c: New test.
	* gcc.dg/torture/bfloat16-builtin.c: New test.
	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
	* gcc.dg/torture/bfloat16-complex.c: New test.
	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
	from bfloat16-builtin-issignaling-1.c.
	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
	bfloat16-basic.c.
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
	diagnostics.
	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
	-msse2.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DF16b.
	* testsuite/demangle-expected (_Z3xxxDF16b): New test.

--- gcc/tree-core.h.jj	2022-10-10 09:31:57.683981308 +0200
+++ gcc/tree-core.h	2022-10-13 16:57:08.953775013 +0200
@@ -665,6 +665,9 @@ enum tree_index {
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
 
+  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
+  TI_BFLOAT16_TYPE,
+
   /* The _FloatN and _FloatNx types must be consecutive, and in the
      same sequence as the corresponding complex types, which must also
      be consecutive; _FloatN must come before _FloatNx; the order must
--- gcc/tree.h.jj	2022-10-10 09:31:57.766980149 +0200
+++ gcc/tree.h	2022-10-13 17:22:14.728207071 +0200
@@ -4291,6 +4291,7 @@ tree_strip_any_location_wrapper (tree ex
 #define float_type_node			global_trees[TI_FLOAT_TYPE]
 #define double_type_node		global_trees[TI_DOUBLE_TYPE]
 #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
+#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
 
 /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
 #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
--- gcc/tree.cc.jj	2022-10-10 09:31:57.743980470 +0200
+++ gcc/tree.cc	2022-10-13 16:57:08.956774972 +0200
@@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
        : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
-	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
@@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
   machine_mode float16_type_mode = (float16_type_node
 				    ? TYPE_MODE (float16_type_node)
 				    : VOIDmode);
+  machine_mode bfloat16_type_mode = (bfloat16_type_node
+				     ? TYPE_MODE (bfloat16_type_node)
+				     : VOIDmode);
   machine_mode float_type_mode = TYPE_MODE (float_type_node);
   machine_mode double_type_mode = TYPE_MODE (double_type_node);
 
@@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return long_double_type_node;
@@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return complex_float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return complex_double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return complex_long_double_type_node;
@@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
       SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
     }
   float128t_type_node = float128_type_node;
+#ifdef HAVE_BFmode
+  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
+      && targetm.scalar_mode_supported_p (BFmode)
+      && targetm.libgcc_floating_mode_supported_p (BFmode))
+    {
+      bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
+      layout_type (bfloat16_type_node);
+      SET_TYPE_MODE (bfloat16_type_node, BFmode);
+    }
+#endif
 
   float_ptr_type_node = build_pointer_type (float_type_node);
   double_ptr_type_node = build_pointer_type (double_type_node);
--- gcc/expmed.h.jj	2022-10-03 18:00:53.046735271 +0200
+++ gcc/expmed.h	2022-10-13 16:57:08.957774958 +0200
@@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
 				  rtx, tree, rtx, int);
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
+extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
+			       int);
 #ifdef GCC_OPTABS_H
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
--- gcc/expmed.cc.jj	2022-10-13 16:22:17.755496384 +0200
+++ gcc/expmed.cc	2022-10-13 16:57:08.957774958 +0200
@@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
 
 /* Likewise, but return 0 if that cannot be done.  */
 
-static rtx
+rtx
 maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 		    int amount, rtx target, int unsignedp)
 {
--- gcc/expr.cc.jj	2022-10-06 17:43:47.941502119 +0200
+++ gcc/expr.cc	2022-10-13 16:57:09.022774066 +0200
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
 
--- gcc/builtin-types.def.jj	2022-10-03 18:00:52.658740505 +0200
+++ gcc/builtin-types.def	2022-10-13 17:09:52.930317869 +0200
@@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
+DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
+				  ? bfloat16_type_node
+				  : error_mark_node))
 DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
 				 ? float16_type_node
 				 : error_mark_node))
@@ -264,6 +267,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
 DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
 		     BT_LONGDOUBLE, BT_CONST_STRING)
+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
--- gcc/builtins.def.jj	2022-10-03 18:00:52.679740221 +0200
+++ gcc/builtins.def	2022-10-13 17:09:05.633962625 +0200
@@ -514,6 +514,7 @@ DEF_GCC_BUILTIN        (BUILT_IN_NANSF,
 DEF_GCC_BUILTIN        (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL)
 #undef NAN_TYPE
+DEF_GCC_BUILTIN        (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_GCC_BUILTIN        (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_GCC_BUILTIN        (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_GCC_BUILTIN        (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
--- gcc/fold-const-call.cc.jj	2022-09-03 09:35:41.107989686 +0200
+++ gcc/fold-const-call.cc	2022-10-13 17:20:59.579229947 +0200
@@ -1301,6 +1301,7 @@ fold_const_call (combined_fn fn, tree ty
 
     CASE_CFN_NANS:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
+    case CFN_BUILT_IN_NANSF16B:
     case CFN_BUILT_IN_NANSD32:
     case CFN_BUILT_IN_NANSD64:
     case CFN_BUILT_IN_NANSD128:
--- gcc/config/i386/i386.cc.jj	2022-10-03 18:00:52.942736674 +0200
+++ gcc/config/i386/i386.cc	2022-10-13 16:57:09.092773105 +0200
@@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
       classes[1] = X86_64_SSEUP_CLASS;
       return 2;
     case E_HCmode:
+    case E_BCmode:
       classes[0] = X86_64_SSE_CLASS;
       if (!(bit_offset % 64))
 	return 1;
@@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
      be defined by the C front-end for AVX512FP16 intrinsics.  We will
      issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
      enabled.  */
-  return ((mode == HFmode && TARGET_SSE2)
+  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
 	  ? true
 	  : default_libgcc_floating_mode_supported_p (mode));
 }
@@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DF16b";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
--- gcc/config/i386/i386-builtins.cc.jj	2022-10-03 18:00:52.918736997 +0200
+++ gcc/config/i386/i386-builtins.cc	2022-10-13 16:57:09.119772735 +0200
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  if (bfloat16_type_node == NULL_TREE)
+    {
+      bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (bfloat16_type_node) = 16;
+      SET_TYPE_MODE (bfloat16_type_node, BFmode);
+      layout_type (bfloat16_type_node);
+    }
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
--- gcc/config/i386/i386-builtin-types.def.jj	2022-10-03 18:00:52.894737321 +0200
+++ gcc/config/i386/i386-builtin-types.def	2022-10-13 16:57:09.139772460 +0200
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
--- gcc/config/i386/i386.md.jj	2022-10-11 15:57:05.005762022 +0200
+++ gcc/config/i386/i386.md	2022-10-13 16:57:09.187771801 +0200
@@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+(define_expand "cbranchbf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
+		    (match_operand:BF 2 "cmp_fp_expander_operand")))
+   (set (pc) (if_then_else
+	      (match_operator 0 "comparison_operator"
+	       [(reg:CC FLAGS_REG)
+		(const_int 0)])
+	      (label_ref (match_operand 3))
+	      (pc)))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[1]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[1], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
+			   SFmode, NULL_RTX, NULL,
+			   as_a <rtx_code_label *> (operands[3]),
+			   /* Unfortunately this isn't propagated.  */
+			   profile_probability::even ());
+  DONE;
+})
+
 (define_expand "cstorehf4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
@@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
   DONE;
 })
 
+(define_expand "cstorebf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
+		    (match_operand:BF 3 "cmp_fp_expander_operand")))
+   (set (match_operand:QI 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	  [(reg:CC FLAGS_REG)
+	   (const_int 0)]))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[3]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[3], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
+				   op1, op2, SFmode, 0, 1);
+  if (!rtx_equal_p (res, operands[0]))
+    emit_move_insn (operands[0], res);
+  DONE;
+})
+
 (define_expand "cstore<mode>4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
--- gcc/c-family/c-cppbuiltin.cc.jj	2022-10-13 08:41:04.718165419 +0200
+++ gcc/c-family/c-cppbuiltin.cc	2022-10-13 17:51:07.722665421 +0200
@@ -1260,6 +1260,13 @@ c_cpp_builtins (cpp_reader *pfile)
       builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
 				      csuffix, FLOATN_NX_TYPE_NODE (i));
     }
+  if (bfloat16_type_node)
+    {
+      if (c_dialect_cxx () && cxx_dialect > cxx20)
+	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
+      builtin_define_float_constants ("BFLT16", "BF16", "%s",
+				      "BF16", bfloat16_type_node);
+    }
 
   /* For float.h.  */
   if (targetm.decimal_float_supported_p ())
@@ -1370,6 +1377,12 @@ c_cpp_builtins (cpp_reader *pfile)
 	      suffix[0] = 'l';
 	      memcpy (float_h_prefix, "LDBL", 5);
 	    }
+	  else if (bfloat16_type_node
+		   && mode == TYPE_MODE (bfloat16_type_node))
+	    {
+	      memcpy (suffix, "bf16", 5);
+	      memcpy (float_h_prefix, "BFLT16", 7);
+	    }
 	  else
 	    {
 	      bool found_suffix = false;
@@ -1396,22 +1409,28 @@ c_cpp_builtins (cpp_reader *pfile)
 	  machine_mode float16_type_mode = (float16_type_node
 					    ? TYPE_MODE (float16_type_node)
 					    : VOIDmode);
+	  machine_mode bfloat16_type_mode = (bfloat16_type_node
+					     ? TYPE_MODE (bfloat16_type_node)
+					     : VOIDmode);
 	  switch (targetm.c.excess_precision
 		    (EXCESS_PRECISION_TYPE_IMPLICIT))
 	    {
 	    case FLT_EVAL_METHOD_UNPREDICTABLE:
 	    case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	      excess_precision = (mode == float16_type_mode
+				  || mode == bfloat16_type_mode
 				  || mode == TYPE_MODE (float_type_node)
 				  || mode == TYPE_MODE (double_type_node));
 	      break;
 
 	    case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	      excess_precision = (mode == float16_type_mode
+				  || mode == bfloat16_type_mode
 				  || mode == TYPE_MODE (float_type_node));
 	      break;
 	    case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	      excess_precision = mode == float16_type_mode;
+	      excess_precision = (mode == float16_type_mode
+				  || mode == bfloat16_type_mode);
 	      break;
 	    case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16:
 	      excess_precision = false;
--- gcc/c-family/c-lex.cc.jj	2022-10-13 16:21:52.548842666 +0200
+++ gcc/c-family/c-lex.cc	2022-10-13 16:59:51.778540099 +0200
@@ -1000,6 +1000,22 @@ interpret_float (const cpp_token *token,
 	  pedwarn (input_location, OPT_Wpedantic,
 		   "non-standard suffix on floating constant");
       }
+    else if ((flags & CPP_N_BFLOAT16) != 0)
+      {
+	type = bfloat16_type_node;
+	if (type == NULL_TREE)
+	  {
+	    error ("unsupported non-standard suffix on floating constant");
+	    return error_mark_node;
+	  }
+	if (!c_dialect_cxx ())
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "non-standard suffix on floating constant");
+	else if (cxx_dialect < cxx23)
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "%<bf16%> or %<BF16%> suffix on floating constant only "
+		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
+      }
     else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
       type = long_double_type_node;
     else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
--- gcc/c/c-typeck.cc.jj	2022-10-06 17:43:47.900502672 +0200
+++ gcc/c/c-typeck.cc	2022-10-13 16:57:09.226771266 +0200
@@ -3678,6 +3678,9 @@ convert_arguments (location_t loc, vec<l
 		promote_float_arg = false;
 		break;
 	      }
+	  /* Don't promote __bf16 either.  */
+	  if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
+	    promote_float_arg = false;
 	}
 
       if (type != NULL_TREE)
--- gcc/cp/cp-tree.h.jj	2022-10-13 16:21:52.600841952 +0200
+++ gcc/cp/cp-tree.h	2022-10-13 16:57:09.241771060 +0200
@@ -8741,6 +8741,8 @@ extended_float_type_p (tree type)
   for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
     if (type == FLOATN_TYPE_NODE (i))
       return true;
+  if (type == bfloat16_type_node)
+    return true;
   return false;
 }
 
--- gcc/cp/typeck.cc.jj	2022-10-13 16:21:52.642841375 +0200
+++ gcc/cp/typeck.cc	2022-10-13 16:57:09.269770676 +0200
@@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
       if (mv2 == FLOATN_NX_TYPE_NODE (i))
 	extended2 = i + 1;
     }
+  if (mv1 == bfloat16_type_node)
+    extended1 = true;
+  if (mv2 == bfloat16_type_node)
+    extended2 = true;
   if (extended2 && !extended1)
     {
       int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
@@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
   if (cnt > 1 && mv2 == long_double_type_node)
     return -2;
   /* Otherwise, they have equal rank, but extended types
-     (other than std::bfloat16_t) have higher subrank.  */
+     (other than std::bfloat16_t) have higher subrank.
+     std::bfloat16_t shouldn't have equal rank to any standard
+     floating point type.  */
   return 1;
 }
 
--- gcc/testsuite/lib/target-supports.exp.jj	2022-10-11 14:50:14.472773574 +0200
+++ gcc/testsuite/lib/target-supports.exp	2022-10-13 16:57:09.270770662 +0200
@@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
     return 1
 }
 
+# Return 1 if the target supports the __bf16 type, 0 otherwise.
+
+proc check_effective_target_bfloat16 {} {
+    return [check_no_compiler_messages_nocache bfloat16 object {
+	__bf16 foo (__bf16 x) { return x + x; }
+    } [add_options_for_bfloat16 ""]]
+}
+
+proc check_effective_target_bfloat16_runtime {} {
+    return [check_effective_target_bfloat16]
+}
+
+proc add_options_for_bfloat16 { flags } {
+    return "$flags"
+}
+
 # Return 1 if the target supports all four forms of fused multiply-add
 # (fma, fms, fnma, and fnms) for both float and double.
 
--- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj	2022-10-13 16:57:09.271770648 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c	2022-10-13 17:32:28.531884882 +0200
@@ -0,0 +1,11 @@
+/* Test __bf16.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+#define TYPE __bf16
+#define CST(C) CONCAT (C, bf16)
+#define CSTU(C) CONCAT (C, BF16)
+
+#include "floatn-basic.h"
--- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj	2022-10-13 16:57:09.271770648 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c	2022-10-13 18:09:24.288913634 +0200
@@ -0,0 +1,47 @@
+/* Test __bf16 built-in functions.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-add-options ieee } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+extern void exit (int);
+extern void abort (void);
+
+extern __bf16 test_type;
+extern __typeof (__builtin_nansf16b ("")) test_type;
+
+volatile __bf16 inf_cst = (__bf16) __builtin_inff ();
+volatile __bf16 huge_val_cst = (__bf16) __builtin_huge_valf ();
+volatile __bf16 nan_cst = (__bf16) __builtin_nanf ("");
+volatile __bf16 nans_cst = __builtin_nansf16b ("");
+volatile __bf16 neg0 = -0.0bf16, neg1 = -1.0bf16, one = 1.0;
+
+int
+main (void)
+{
+  volatile __bf16 r;
+  if (!__builtin_isinf (inf_cst))
+    abort ();
+  if (!__builtin_isinf (huge_val_cst))
+    abort ();
+  if (inf_cst != huge_val_cst)
+    abort ();
+  if (!__builtin_isnan (nan_cst))
+    abort ();
+  if (!__builtin_isnan (nans_cst))
+    abort ();
+  r = __builtin_fabsf (neg1);
+  if (r != 1.0bf16)
+    abort ();
+  r = __builtin_copysignf (one, neg0);
+  if (r != neg1)
+    abort ();
+  r = __builtin_copysignf (inf_cst, neg1);
+  if (r != -huge_val_cst)
+    abort ();
+  r = __builtin_copysignf (-inf_cst, one);
+  if (r != huge_val_cst)
+    abort ();
+  exit (0);
+}
--- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj	2022-10-13 16:57:09.271770648 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c	2022-10-13 17:40:15.067555349 +0200
@@ -0,0 +1,21 @@
+/* Test __bf16 __builtin_issignaling.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-add-options ieee } */
+/* { dg-require-effective-target bfloat16_runtime } */
+/* { dg-additional-options "-fsignaling-nans" } */
+/* Workaround for PR57484 on ia32: */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
+
+#define CONCATX(X, Y) X ## Y
+#define CONCAT(X, Y) CONCATX (X, Y)
+
+#define TYPE __bf16
+#define CST(C) CONCAT (C, bf16)
+#define FN(F) CONCAT (F, f16b)
+#define NAN(x) ((__bf16) __builtin_nanf (x))
+#define INF ((__bf16) __builtin_inff ())
+#define EXT 0
+
+#include "builtin-issignaling-1.c"
--- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj	2022-10-13 16:57:09.271770648 +0200
+++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c	2022-10-13 17:46:43.259267724 +0200
@@ -0,0 +1,61 @@
+/* Test __bf16 complex arithmetic.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options bfloat16 } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+extern void exit (int);
+extern void abort (void);
+
+volatile __bf16 a = 1.0bf16;
+typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
+volatile __cbf16 b = __builtin_complex (2.0bf16, 3.0bf16);
+volatile __cbf16 c = __builtin_complex (2.0bf16, 3.0bf16);
+volatile __cbf16 d = __builtin_complex (2.0bf16, 3.0bf16);
+
+__cbf16
+fn (__cbf16 arg)
+{
+  return arg / 4;
+}
+
+int
+main (void)
+{
+  volatile __cbf16 r;
+  if (b != c)
+    abort ();
+  if (b != d)
+    abort ();
+  r = a + b;
+  if (__real__ r != 3.0bf16 || __imag__ r != 3.0bf16)
+    abort ();
+  r += d;
+  if (__real__ r != 5.0bf16 || __imag__ r != 6.0bf16)
+    abort ();
+  r -= a;
+  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
+    abort ();
+  r /= (a + a);
+  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
+    abort ();
+  r *= (a + a);
+  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
+    abort ();
+  r -= b;
+  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
+    abort ();
+  r *= r;
+  if (__real__ r != -5.0bf16 || __imag__ r != 12.0bf16)
+    abort ();
+  /* Division may not be exact, so round result before comparing.  */
+  r /= b;
+  r += __builtin_complex (100.0bf16, 100.0bf16);
+  r -= __builtin_complex (100.0bf16, 100.0bf16);
+  if (r != b)
+    abort ();
+  r = fn (r);
+  if (__real__ r != 0.5bf16 || __imag__ r != 0.75bf16)
+    abort ();
+  exit (0);
+}
--- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj	2022-10-03 18:00:53.118734300 +0200
+++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c	2022-10-13 17:39:19.387313780 +0200
@@ -4,7 +4,7 @@
 /* Workaround for PR57484 on ia32: */
 /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
 
-#ifndef EXT
+#if !defined(EXT) && !defined(TYPE)
 int
 f1 (void)
 {
@@ -41,31 +41,42 @@ f6 (long double x)
   return __builtin_issignaling (x);
 }
 #else
-#define CONCATX(X, Y) X ## Y
-#define CONCAT(X, Y) CONCATX (X, Y)
-#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
-#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
-
-#if EXT
-# define TYPE CONCAT3 (_Float, WIDTH, x)
-# define CST(C) CONCAT4 (C, f, WIDTH, x)
-# define FN(F) CONCAT4 (F, f, WIDTH, x)
-#else
-# define TYPE CONCAT (_Float, WIDTH)
-# define CST(C) CONCAT3 (C, f, WIDTH)
-# define FN(F) CONCAT3 (F, f, WIDTH)
+#ifndef TYPE
+# define CONCATX(X, Y) X ## Y
+# define CONCAT(X, Y) CONCATX (X, Y)
+# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
+# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
+
+# if EXT
+#  define TYPE CONCAT3 (_Float, WIDTH, x)
+#  define CST(C) CONCAT4 (C, f, WIDTH, x)
+#  define FN(F) CONCAT4 (F, f, WIDTH, x)
+# else
+#  define TYPE CONCAT (_Float, WIDTH)
+#  define CST(C) CONCAT3 (C, f, WIDTH)
+#  define FN(F) CONCAT3 (F, f, WIDTH)
+# endif
+#endif
+#ifndef NANS
+# define NANS(x) FN (__builtin_nans) (x)
+#endif
+#ifndef NAN
+# define NAN(x) FN (__builtin_nan) (x)
+#endif
+#ifndef INF
+# define INF FN (__builtin_inf) ()
 #endif
 
 int
 f1 (void)
 {
-  return __builtin_issignaling (FN (__builtin_nans) (""));
+  return __builtin_issignaling (NANS (""));
 }
 
 int
 f2 (void)
 {
-  return __builtin_issignaling (FN (__builtin_nan) (""));
+  return __builtin_issignaling (NAN (""));
 }
 
 int
@@ -118,10 +129,10 @@ main ()
   if (!f6 (z))
     __builtin_abort ();
 #else
-  if (f4 (w) || !f4 (FN (__builtin_nans) ("0x123")) || f4 (CST (42.0)) || f4 (FN (__builtin_nan) ("0x234"))
-      || f4 (FN (__builtin_inf) ()) || f4 (-FN (__builtin_inf) ()) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
+  if (f4 (w) || !f4 (NANS ("0x123")) || f4 (CST (42.0)) || f4 (NAN ("0x234"))
+      || f4 (INF) || f4 (-INF) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
     __builtin_abort ();
-  w = FN (__builtin_nans) ("");
+  w = NANS ("");
   asm volatile ("" : : : "memory");
   if (!f4 (w))
     __builtin_abort ();
--- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj	2022-10-03 18:00:53.118734300 +0200
+++ gcc/testsuite/gcc.dg/torture/floatn-basic.h	2022-10-13 16:57:09.285770456 +0200
@@ -9,14 +9,16 @@
 #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
 #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
 
-#if EXT
-# define TYPE CONCAT3 (_Float, WIDTH, x)
-# define CST(C) CONCAT4 (C, f, WIDTH, x)
-# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
-#else
-# define TYPE CONCAT (_Float, WIDTH)
-# define CST(C) CONCAT3 (C, f, WIDTH)
-# define CSTU(C) CONCAT3 (C, F, WIDTH)
+#ifndef TYPE
+# if EXT
+#  define TYPE CONCAT3 (_Float, WIDTH, x)
+#  define CST(C) CONCAT4 (C, f, WIDTH, x)
+#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
+# else
+#  define TYPE CONCAT (_Float, WIDTH)
+#  define CST(C) CONCAT3 (C, f, WIDTH)
+#  define CSTU(C) CONCAT3 (C, F, WIDTH)
+# endif
 #endif
 
 extern void exit (int);
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-10-03 18:00:53.137734043 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-10-13 16:57:09.306770168 +0200
@@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
   __m256bf16 vector2_1 = {};
   __m256bf16 vector2_2 = { glob_bfloat };
   __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m256bf16 vector2_4 = { 0 };
+  __m256bf16 vector2_5 = { 0.1 };
+  __m256bf16 vector2_6 = { is_a_float16 };
+  __m256bf16 vector2_7 = { is_a_float };
+  __m256bf16 vector2_8 = { is_an_int };
+  __m256bf16 vector2_9 = { is_a_short_int };
+  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m256h initi_2_3 = { glob_bfloat };
+  __m256i initi_2_5 = { glob_bfloat };
+  __v16hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m256bf16) {};
 
-  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m256bf16) { 0 };
+  (__m256bf16) { 0.1 };
   (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
@@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-10-03 18:00:53.136734057 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-10-13 16:57:09.327769880 +0200
@@ -12,8 +12,8 @@ double is_a_double;
 
 float *float_ptr;
 
-__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
-__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
+__bf16 foo1 (void) { return (__bf16) 0x1234; }
+__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
 
 __bf16 footest (__bf16 scalar0)
 {
@@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
 
   __bf16 scalar1_1;
   __bf16 scalar1_2 = glob_bfloat;
-  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar1_3 = 0;
+  __bf16 scalar1_4 = 0.1;
+  __bf16 scalar1_5 = is_a_float;
+  __bf16 scalar1_6 = is_an_int;
+  __bf16 scalar1_7 = is_a_float16;
+  __bf16 scalar1_8 = is_a_double;
+  __bf16 scalar1_9 = is_a_short_int;
+
+  int initi_1_1 = glob_bfloat;
+  float initi_1_2 = glob_bfloat;
+  _Float16 initi_1_3 = glob_bfloat;
+  short initi_1_4 = glob_bfloat;
+  double initi_1_5 = glob_bfloat;
 
   __bf16 scalar2_1 = {};
   __bf16 scalar2_2 = { glob_bfloat };
-  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  __bf16 scalar2_3 = { 0 };
+  __bf16 scalar2_4 = { 0.1 };
+  __bf16 scalar2_5 = { is_a_float };
+  __bf16 scalar2_6 = { is_an_int };
+  __bf16 scalar2_7 = { is_a_float16 };
+  __bf16 scalar2_8 = { is_a_double };
+  __bf16 scalar2_9 = { is_a_short_int };
+
+  int initi_2_1 = { glob_bfloat };
+  float initi_2_2 = { glob_bfloat };
+  _Float16 initi_2_3 = { glob_bfloat };
+  short initi_2_4 = { glob_bfloat };
+  double initi_2_5 = { glob_bfloat };
 
   /* Assignments.  */
 
   glob_bfloat = glob_bfloat;
-  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
+  glob_bfloat = 0;
+  glob_bfloat = 0.1;
+  glob_bfloat = is_a_float;
+  glob_bfloat = is_an_int;
+  glob_bfloat = is_a_float16;
+  glob_bfloat = is_a_double;
+  glob_bfloat = is_a_short_int;
+
+  is_an_int = glob_bfloat;
+  is_a_float = glob_bfloat;
+  is_a_float16 = glob_bfloat;
+  is_a_double = glob_bfloat;
+  is_a_short_int = glob_bfloat;
 
   /* Casting.  */
 
   (void) glob_bfloat;
   (__bf16) glob_bfloat;
 
-  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
-
-  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (int) glob_bfloat;
+  (float) glob_bfloat;
+  (_Float16) glob_bfloat;
+  (double) glob_bfloat;
+  (short) glob_bfloat;
+
+  (__bf16) is_an_int;
+  (__bf16) is_a_float;
+  (__bf16) is_a_float16;
+  (__bf16) is_a_double;
+  (__bf16) is_a_short_int;
 
   /* Compound literals.  */
 
   (__bf16) {};
   (__bf16) { glob_bfloat };
-  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
+  (__bf16) { 0 };
+  (__bf16) { 0.1 };
+  (__bf16) { is_a_float };
+  (__bf16) { is_an_int };
+  (__bf16) { is_a_float16 };
+  (__bf16) { is_a_double };
+  (__bf16) { is_a_short_int };
+
+  (int) { glob_bfloat };
+  (float) { glob_bfloat };
+  (_Float16) { glob_bfloat };
+  (double) { glob_bfloat };
+  (short) { glob_bfloat };
 
   /* Arrays and Structs.  */
 
@@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 > glob_bfloat;
+  glob_bfloat == scalar0;
+  scalar0 > is_a_float;
+  is_a_float == scalar0;
+  scalar0 > 0;
+  0 == scalar0;
+  scalar0 > 0.1;
+  0.1 == scalar0;
+  scalar0 > is_an_int;
+  is_an_int == scalar0;
 
   /* Pointer comparison.  */
 
@@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
   /* Conditional expressions.  */
 
   0 ? scalar0 : scalar0;
-  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
-  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
+  0 ? scalar0 : is_a_float;
+  0 ? is_a_float : scalar0;
+  0 ? scalar0 : 0;
+  0 ? 0 : scalar0;
+  0 ? 0.1 : scalar0;
+  0 ? scalar0 : 0.1;
   0 ? bfloat_ptr : bfloat_ptr2;
   0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
   0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
 
-  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 ? scalar0 : scalar0;
+  scalar0 ? is_a_float : scalar0;
+  scalar0 ? scalar0 : is_a_float;
+  scalar0 ? is_a_float : is_a_float;
 
   /* Unary operators.  */
 
-  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +scalar0;
+  -scalar0;
+  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
+  !scalar0;
   *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real scalar0;
+  __imag scalar0;
+  ++scalar0;
+  --scalar0;
+  scalar0++;
+  scalar0--;
 
   /* Binary arithmetic operations.  */
 
-  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
+  scalar0 = glob_bfloat + *bfloat_ptr;
+  scalar0 = glob_bfloat + 0.1;
+  scalar0 = glob_bfloat + 0;
+  scalar0 = glob_bfloat + is_a_float;
 
   return scalar0;
 }
--- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-10-03 18:00:53.136734057 +0200
+++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-10-13 16:57:09.344769646 +0200
@@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
   __m128bf16 vector2_1 = {};
   __m128bf16 vector2_2 = { glob_bfloat };
   __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
-  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
-
-  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
-  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
+  __m128bf16 vector2_4 = { 0 };
+  __m128bf16 vector2_5 = { 0.1 };
+  __m128bf16 vector2_6 = { is_a_float16 };
+  __m128bf16 vector2_7 = { is_a_float };
+  __m128bf16 vector2_8 = { is_an_int };
+  __m128bf16 vector2_9 = { is_a_short_int };
+  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
+
+  __v8si initi_2_1 = { glob_bfloat };
+  __m256 initi_2_2 = { glob_bfloat };
+  __m128h initi_2_3 = { glob_bfloat };
+  __m128 initi_2_4 = { glob_bfloat };
+  __v4si initi_2_5 = { glob_bfloat };
+  __v4hi initi_2_6 = { glob_bfloat };
 
   /* Assignments to/from vectors.  */
 
@@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
   /* Assignments to/from elements.  */
 
   vector2_3[0] = glob_bfloat;
-  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
-  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
+  vector2_3[0] = is_an_int;
+  vector2_3[0] = is_a_short_int;
+  vector2_3[0] = is_a_float;
+  vector2_3[0] = is_a_float16;
+  vector2_3[0] = 0;
+  vector2_3[0] = 0.1;
 
   glob_bfloat = vector2_3[0];
-  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
-  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
+  is_an_int = vector2_3[0];
+  is_a_short_int = vector2_3[0];
+  is_a_float = vector2_3[0];
+  is_a_float16 = vector2_3[0];
 
   /* Compound literals.  */
 
   (__m128bf16) {};
 
-  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
-  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
+  (__m128bf16) { 0 };
+  (__m128bf16) { 0.1 };
   (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
   (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
   (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
@@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
   bfloat_ptr = &bfloat_ptr3[1];
 
   /* Simple comparison.  */
-  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
-  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 > glob_bfloat_vec;
+  glob_bfloat_vec == vector0;
+  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
+  vector0 > 0;
+  0 == vector0;
+  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
+  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
 
   /* Pointer comparison.  */
 
@@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
 
   /* Unary operators.  */
 
-  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
+  +vector0;
+  -vector0;
+  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
+  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
   *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
-  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
+  __real vector0; /* { dg-error {wrong type argument to __real} } */
+  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
+  ++vector0;
+  --vector0;
+  vector0++;
+  vector0--;
 
   /* Binary arithmetic operations.  */
 
-  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
-  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
+  vector0 = glob_bfloat_vec + *bfloat_ptr;
+  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
+  vector0 = glob_bfloat_vec + 0;
+  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
 
   return vector0;
 }
--- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-10-03 18:00:53.109734421 +0200
+++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-10-13 16:57:09.362769399 +0200
@@ -5,6 +5,6 @@ void foo (void)
 {
   __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
   __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
-  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
-  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
+  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
+  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
 }
--- libcpp/include/cpplib.h.jj	2022-10-03 18:00:53.251732506 +0200
+++ libcpp/include/cpplib.h	2022-10-13 16:57:09.384769097 +0200
@@ -1275,6 +1275,7 @@ struct cpp_num
 #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
 
 #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
+#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
 
 #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
 					      of N, divided by 16.  */
--- libcpp/expr.cc.jj	2022-10-03 18:00:53.221732910 +0200
+++ libcpp/expr.cc	2022-10-13 16:58:01.360055690 +0200
@@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
   size_t orig_len = len;
   const uchar *orig_s = s;
   size_t flags;
-  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
+  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
 
   flags = 0;
-  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
+  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
 
   /* The following decimal float suffixes, from TR 24732:2009, TS
      18661-2:2015 and C2X, are supported:
@@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
      w, W - machine-specific type such as __float80 (GNU extension).
      q, Q - machine-specific type such as __float128 (GNU extension).
      fN, FN - _FloatN (TS 18661-3:2015).
-     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
+     fNx, FNx - _FloatNx (TS 18661-3:2015).
+     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
 
   /* Process decimal float suffixes, which are two letters starting
      with d or D.  Order and case are significant.  */
@@ -239,6 +240,19 @@ interpret_float_suffix (cpp_reader *pfil
 		fn++;
 	    }
 	  break;
+	case 'b': case 'B':
+	  if (len > 2
+	      /* Except for bf16 / BF16 where case is significant.  */
+	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
+	      && s[2] == '1'
+	      && s[3] == '6')
+	    {
+	      bf16++;
+	      len -= 3;
+	      s += 3;
+	      break;
+	    }
+	  return 0;
 	case 'd': case 'D': d++; break;
 	case 'l': case 'L': l++; break;
 	case 'w': case 'W': w++; break;
@@ -257,7 +271,7 @@ interpret_float_suffix (cpp_reader *pfil
      of N larger than can be represented in the return value.  The
      caller is responsible for rejecting _FloatN suffixes where
      _FloatN is not supported on the chosen target.  */
-  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
+  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
     return 0;
   if (fn_bits > CPP_FLOATN_MAX)
     return 0;
@@ -295,6 +309,7 @@ interpret_float_suffix (cpp_reader *pfil
 	     q ? CPP_N_MD_Q :
 	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
 	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
+	     bf16 ? CPP_N_BFLOAT16 :
 	     CPP_N_DEFAULT));
 }
 
--- libgcc/config/i386/t-softfp.jj	2022-10-03 18:00:53.314731656 +0200
+++ libgcc/config/i386/t-softfp	2022-10-13 16:57:09.426768521 +0200
@@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
 LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
 
-softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
-softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
+		      tfbf xfbf dfbf sfbf hfbf
 
 softfp_extras += eqhf2
 
@@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
 CFLAGS-extendhfdf2.c += -msse2
 CFLAGS-extendhftf2.c += -msse2
 CFLAGS-extendhfxf2.c += -msse2
+CFLAGS-extendbfsf2.c += -msse2
 
 CFLAGS-truncsfhf2.c += -msse2
 CFLAGS-truncdfhf2.c += -msse2
 CFLAGS-truncxfhf2.c += -msse2
 CFLAGS-trunctfhf2.c += -msse2
+CFLAGS-truncsfbf2.c += -msse2
+CFLAGS-truncdfbf2.c += -msse2
+CFLAGS-truncxfbf2.c += -msse2
+CFLAGS-trunctfbf2.c += -msse2
+CFLAGS-trunchfbf2.c += -msse2
 
 CFLAGS-eqhf2.c += -msse2
 CFLAGS-_divhc3.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj	2022-10-03 18:00:53.313731670 +0200
+++ libgcc/config/i386/libgcc-glibc.ver	2022-10-13 16:57:09.438768356 +0200
@@ -214,3 +214,13 @@ GCC_12.0.0 {
   __trunctfhf2
   __truncxfhf2
 }
+
+%inherit GCC_13.0.0 GCC_12.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __truncxfbf2
+  __trunchfbf2
+}
--- libgcc/config/i386/sfp-machine.h.jj	2022-10-03 18:00:53.313731670 +0200
+++ libgcc/config/i386/sfp-machine.h	2022-10-13 16:57:09.441768315 +0200
@@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_QNANNEGATEDP 0
 
 #define _FP_NANSIGN_H		1
+#define _FP_NANSIGN_B		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
--- libgcc/config/i386/64/sfp-machine.h.jj	2022-10-03 18:00:53.290731980 +0200
+++ libgcc/config/i386/64/sfp-machine.h	2022-10-13 16:57:09.451768178 +0200
@@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
--- libgcc/config/i386/32/sfp-machine.h.jj	2022-10-03 18:00:53.290731980 +0200
+++ libgcc/config/i386/32/sfp-machine.h	2022-10-13 16:57:09.459768068 +0200
@@ -87,6 +87,7 @@
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
--- libgcc/soft-fp/brain.h.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/brain.h	2022-10-13 16:57:09.459768068 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef SOFT_FP_BRAIN_H
+#define SOFT_FP_BRAIN_H	1
+
+#if _FP_W_TYPE_SIZE < 32
+# error "Here's a nickel kid.  Go buy yourself a real computer."
+#endif
+
+#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACBITS_B		8
+#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
+#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
+#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
+#define _FP_EXPBITS_B		8
+#define _FP_EXPBIAS_B		127
+#define _FP_EXPMAX_B		255
+
+#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
+#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
+#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
+#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
+#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
+
+#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
+#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
+#define _FP_HIGHBIT_DW_B	\
+  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
+
+/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
+   chosen by the target machine.  */
+
+typedef float BFtype __attribute__ ((mode (BF)));
+
+union _FP_UNION_B
+{
+  BFtype flt;
+  struct _FP_STRUCT_LAYOUT
+  {
+#if __BYTE_ORDER == __BIG_ENDIAN
+    unsigned sign : 1;
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+#else
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned sign : 1;
+#endif
+  } bits;
+};
+
+#define FP_DECL_B(X)		_FP_DECL (1, X)
+#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
+#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
+#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
+#define FP_PACK_RAW_BP(val, X)			\
+  do						\
+    {						\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_B(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_BP(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_B(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_BP(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_B(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_BP(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_B(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_BP(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
+#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
+  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
+#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
+
+/* BFmode arithmetic is not implemented.  */
+
+#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
+
+#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
+
+#endif /* !SOFT_FP_BRAIN_H */
--- libgcc/soft-fp/truncsfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/truncsfbf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+BFtype
+__truncsfbf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (B, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncdfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/truncdfbf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "double.h"
+
+BFtype
+__truncdfbf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (B, D, 1, 2, R, A);
+#else
+  FP_TRUNC (B, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncxfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/truncxfbf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "extended.h"
+
+BFtype
+__truncxfbf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, E, 1, 4, R, A);
+#else
+  FP_TRUNC (B, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunctfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/trunctfbf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE quad into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "quad.h"
+
+BFtype
+__trunctfbf2 (TFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_Q (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_Q (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, Q, 1, 4, R, A);
+#else
+  FP_TRUNC (B, Q, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunchfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/trunchfbf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,58 @@
+/* Software floating-point emulation.
+   Truncate IEEE half into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "half.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert HFtype to SFtype (lossless) and then
+   truncate to BFtype.  */
+
+BFtype
+__trunchfbf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, B, A);
+  FP_PACK_RAW_S (b, B);
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (B, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncbfhf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/truncbfhf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,75 @@
+/* Software floating-point emulation.
+   Truncate bfloat16 into IEEE half.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "brain.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert BFtype to SFtype (lossless) and then
+   truncate to HFtype.  */
+
+HFtype
+__truncbfhf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  /* Optimize BFtype to SFtype conversion to simple left shift
+     by 16 if possible, we don't need to raise exceptions on sNaN
+     here as the SFtype to HFtype truncation should do that too.  */
+  if (sizeof (BFtype) == 2
+      && sizeof (unsigned short) == 2
+      && sizeof (SFtype) == 4
+      && sizeof (unsigned int) == 4)
+    {
+      union { BFtype a; unsigned short b; } u1;
+      union { SFtype a; unsigned int b; } u2;
+      u1.a = a;
+      u2.b = (u1.b << 8) << 8;
+      b = u2.a;
+    }
+  else
+    {
+      FP_UNPACK_RAW_B (A, a);
+      FP_EXTEND (S, B, 1, 1, B, A);
+      FP_PACK_RAW_S (b, B);
+    }
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (H, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/extendbfsf2.c.jj	2022-10-13 16:57:09.460768054 +0200
+++ libgcc/soft-fp/extendbfsf2.c	2022-10-13 16:57:09.460768054 +0200
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an bfloat16 converted to IEEE single
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+SFtype
+__extendbfsf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_B (A, a);
+  FP_EXTEND (S, B, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libiberty/cp-demangle.h.jj	2022-10-03 18:00:53.342731278 +0200
+++ libiberty/cp-demangle.h	2022-10-13 16:57:09.488767670 +0200
@@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (35)
+#define D_BUILTIN_TYPE_COUNT (36)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
--- libiberty/cp-demangle.c.jj	2022-10-11 14:50:14.605771753 +0200
+++ libiberty/cp-demangle.c	2022-10-13 16:57:09.538766983 +0200
@@ -2487,6 +2487,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
   /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
   /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
+  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
 };
 
 CP_STATIC_IF_GLIBCPP_V3
@@ -2751,11 +2752,22 @@ cplus_demangle_type (struct d_info *di)
 
 	case 'F':
 	  /* DF<number>_ - _Float<number>.
-	     DF<number>x - _Float<number>x.  */
+	     DF<number>x - _Float<number>x
+	     DF16b - std::bfloat16_t.  */
 	  {
 	    int arg = d_number (di);
 	    char buf[12];
 	    char suffix = 0;
+	    if (d_peek_char (di) == 'b')
+	      {
+		if (arg != 16)
+		  return NULL;
+		d_advance (di, 1);
+		ret = d_make_builtin_type (di,
+					   &cplus_demangle_builtin_types[35]);
+		di->expansion += ret->u.s_builtin.type->len;
+		break;
+	      }
 	    if (d_peek_char (di) == 'x')
 	      suffix = 'x';
 	    if (!suffix && d_peek_char (di) != '_')
--- libiberty/testsuite/demangle-expected.jj	2022-10-11 14:50:14.618771575 +0200
+++ libiberty/testsuite/demangle-expected	2022-10-13 16:57:09.553766778 +0200
@@ -1249,6 +1249,10 @@ xxx
 _Z3xxxDF32xDF64xDF128xCDF32xVb
 xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
 xxx
+--format=auto --no-params
+_Z3xxxDF16b
+xxx(std::bfloat16_t)
+xxx
 # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
 --format=auto --no-params
 _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z


	Jakub


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support
  2022-10-13 16:50             ` [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
@ 2022-10-13 19:37               ` Jason Merrill
  2022-10-13 21:11                 ` Uros Bizjak
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Merrill @ 2022-10-13 19:37 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph S. Myers
  Cc: Richard Biener, Jeff Law, Uros Bizjak, gcc-patches

On 10/13/22 12:50, Jakub Jelinek wrote:
> Hi!
> 
> On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
>>> As I wrote earlier, I think we need at least one, __builtin_nans variant
>>> which would be used in libstdc++
>>> std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
>>> I think
>>> std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
>>> return (__bf16) __builtin_huge_valf ();
>>> and similarly
>>> std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
>>> return (__bf16) __builtin_nanf ("");
>>> but
>>> return (__bf16) __builtin_nansf ("");
>>> would loose the signaling NaN on the conversion and raise exception,
>>> and as the method is constexpr,
>>> union { unsigned short a; __bf16 b; } u = { 0x7f81 };
>>> return u.b;
>>> wouldn't work.  I can certainly restrict the builtins to the single
>>> one, but wonder whether the suffix for that builtin shouldn't be chosen
>>> such that eventually we could add more builtins if we need to
>>> and don't run into the log with bf16 suffix vs. logb with f16 suffix
>>> ambiguity.
>>> As you said, most of the libstdc++ overloads for std::bfloat16_t then
>>> can use float builtins or library calls under the hood, but std::nextafter
>>> is another case where I think we'll need to have something bfloat16_t
>>> specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.
>>
>> Makes sense.
> 
> So, this updated version of the patch adds just a single __builtin_nansf16b
> builtin (or do you want __builtin_nansbf16?).

16b sounds fine.

>>> Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
>>> in the next iteration (always with pedwarn in that case).
> 
> And implements bf16/BF16 suffixes for C too.
> 
>>> I'm afraid too many places rely on all modes of a certain class to be
>>> visible when walking from "narrowest" to "widest" mode, say
>>> FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
>>> etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
>>> && GET_MODE_WIDER_MODE (HFmode) == SFmode.
>>
>> Yes, it seems they need to change now that their assumptions have been
>> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
>> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
>> whether they want an iteration that uses get_wider (likely with a new name)
>> or not.
> 
> And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
> is updated on top of those changes.
> 
> So far lightly tested on x86_64-linux, ok for trunk if it passes full
> bootstrap/regtest on both x86_64-linux and i686-linux?

LGTM, but a i386 maintainer should review it as well.

> 2022-10-13  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/
> 	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> 	* tree.h (bfloat16_type_node): Define.
> 	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
> 	like float16_type_mode.
> 	(build_common_tree_nodes): Initialize bfloat16_type_node if
> 	BFmode is supported.
> 	* expmed.h (maybe_expand_shift): Declare.
> 	* expmed.cc (maybe_expand_shift): No longer static.
> 	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> 	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> 	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> 	-ffast-math generic implementation for BF -> SF and SF -> BF
> 	conversions.
> 	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
> 	* builtins.def (BUILT_IN_NANSF16B): New builtin.
> 	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
> 	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
> 	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
> 	for -msse2.
> 	(ix86_mangle_type): Mangle BFmode as DF16b.
> 	(ix86_invalid_conversion, ix86_invalid_unary_op,
> 	ix86_invalid_binary_op): Remove.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> 	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> 	ix86_bf16_type_node, only create it if still NULL.
> 	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> 	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
> gcc/c-family/
> 	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> 	predefine __BFLT16_*__ macros and for C++23 also
> 	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
> 	macros for -fbuilding-libgcc.
> 	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
> gcc/c/
> 	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
> 	double.
> gcc/cp/
> 	* cp-tree.h (extended_float_type_p): Return true for
> 	bfloat16_type_node.
> 	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> 	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> gcc/testsuite/
> 	* lib/target-supports.exp (check_effective_target_bfloat16,
> 	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
> 	New.
> 	* gcc.dg/torture/bfloat16-basic.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
> 	* gcc.dg/torture/bfloat16-complex.c: New test.
> 	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
> 	from bfloat16-builtin-issignaling-1.c.
> 	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
> 	bfloat16-basic.c.
> 	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
> 	diagnostics.
> 	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
> 	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
> 	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
> libcpp/
> 	* include/cpplib.h (CPP_N_BFLOAT16): Define.
> 	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> 	C++.
> libgcc/
> 	* config/i386/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> 	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
> 	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
> 	-msse2.
> 	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> 	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> 	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* soft-fp/brain.h: New file.
> 	* soft-fp/truncsfbf2.c: New file.
> 	* soft-fp/truncdfbf2.c: New file.
> 	* soft-fp/truncxfbf2.c: New file.
> 	* soft-fp/trunctfbf2.c: New file.
> 	* soft-fp/trunchfbf2.c: New file.
> 	* soft-fp/truncbfhf2.c: New file.
> 	* soft-fp/extendbfsf2.c: New file.
> libiberty/
> 	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> 	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> 	entry.
> 	(cplus_demangle_type): Demangle DF16b.
> 	* testsuite/demangle-expected (_Z3xxxDF16b): New test.
> 
> --- gcc/tree-core.h.jj	2022-10-10 09:31:57.683981308 +0200
> +++ gcc/tree-core.h	2022-10-13 16:57:08.953775013 +0200
> @@ -665,6 +665,9 @@ enum tree_index {
>     TI_DOUBLE_TYPE,
>     TI_LONG_DOUBLE_TYPE,
>   
> +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> +  TI_BFLOAT16_TYPE,
> +
>     /* The _FloatN and _FloatNx types must be consecutive, and in the
>        same sequence as the corresponding complex types, which must also
>        be consecutive; _FloatN must come before _FloatNx; the order must
> --- gcc/tree.h.jj	2022-10-10 09:31:57.766980149 +0200
> +++ gcc/tree.h	2022-10-13 17:22:14.728207071 +0200
> @@ -4291,6 +4291,7 @@ tree_strip_any_location_wrapper (tree ex
>   #define float_type_node			global_trees[TI_FLOAT_TYPE]
>   #define double_type_node		global_trees[TI_DOUBLE_TYPE]
>   #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
> +#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
>   
>   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
>   #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> --- gcc/tree.cc.jj	2022-10-10 09:31:57.743980470 +0200
> +++ gcc/tree.cc	2022-10-13 16:57:08.956774972 +0200
> @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
>       = (flag_excess_precision == EXCESS_PRECISION_FAST
>          ? EXCESS_PRECISION_TYPE_FAST
>          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> -	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> +	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
>   
>     enum flt_eval_method target_flt_eval_method
>       = targetm.c.excess_precision (requested_type);
> @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
>     machine_mode float16_type_mode = (float16_type_node
>   				    ? TYPE_MODE (float16_type_node)
>   				    : VOIDmode);
> +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +				     ? TYPE_MODE (bfloat16_type_node)
> +				     : VOIDmode);
>     machine_mode float_type_mode = TYPE_MODE (float_type_node);
>     machine_mode double_type_mode = TYPE_MODE (double_type_node);
>   
> @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return long_double_type_node;
> @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return complex_float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return complex_double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return complex_long_double_type_node;
> @@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
>         SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
>       }
>     float128t_type_node = float128_type_node;
> +#ifdef HAVE_BFmode
> +  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
> +      && targetm.scalar_mode_supported_p (BFmode)
> +      && targetm.libgcc_floating_mode_supported_p (BFmode))
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
> +      layout_type (bfloat16_type_node);
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +    }
> +#endif
>   
>     float_ptr_type_node = build_pointer_type (float_type_node);
>     double_ptr_type_node = build_pointer_type (double_type_node);
> --- gcc/expmed.h.jj	2022-10-03 18:00:53.046735271 +0200
> +++ gcc/expmed.h	2022-10-13 16:57:08.957774958 +0200
> @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
>   				  rtx, tree, rtx, int);
>   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
>   			 int);
> +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> +			       int);
>   #ifdef GCC_OPTABS_H
>   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>   			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> --- gcc/expmed.cc.jj	2022-10-13 16:22:17.755496384 +0200
> +++ gcc/expmed.cc	2022-10-13 16:57:08.957774958 +0200
> @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
>   
>   /* Likewise, but return 0 if that cannot be done.  */
>   
> -static rtx
> +rtx
>   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>   		    int amount, rtx target, int unsignedp)
>   {
> --- gcc/expr.cc.jj	2022-10-06 17:43:47.941502119 +0200
> +++ gcc/expr.cc	2022-10-13 16:57:09.022774066 +0200
> @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
>         gcc_assert ((GET_MODE_PRECISION (from_mode)
>   		   != GET_MODE_PRECISION (to_mode))
>   		  || (DECIMAL_FLOAT_MODE_P (from_mode)
> -		      != DECIMAL_FLOAT_MODE_P (to_mode)));
> +		      != DECIMAL_FLOAT_MODE_P (to_mode))
> +		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>   
>         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   	/* Conversion between decimal float and binary float, same size.  */
> @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
>   	  return;
>   	}
>   
> +#ifdef HAVE_SFmode
> +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> +	{
> +	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> +	    {
> +	      /* To cut down on libgcc size, implement
> +		 BFmode -> {DF,XF,TF}mode conversions by
> +		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +	    {
> +	      /* Similarly, implement BFmode -> HFmode as
> +		 BFmode -> SFmode -> HFmode conversion where SFmode
> +		 has superset of BFmode values.  We don't need
> +		 to handle sNaNs by raising exception and turning
> +		 into into qNaN though, as that can be done in the
> +		 SFmode -> HFmode conversion too.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      int save_flag_finite_math_only = flag_finite_math_only;
> +	      flag_finite_math_only = true;
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      flag_finite_math_only = save_flag_finite_math_only;
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (to_mode == SFmode
> +	      && !HONOR_NANS (from_mode)
> +	      && !HONOR_NANS (to_mode)
> +	      && optimize_insn_for_speed_p ())
> +	    {
> +	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> +		 shift the bits up.  */
> +	      machine_mode fromi_mode, toi_mode;
> +	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				     0).exists (&fromi_mode)
> +		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +					0).exists (&toi_mode))
> +		{
> +		  start_sequence ();
> +		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +		  rtx tof = NULL_RTX;
> +		  if (fromi)
> +		    {
> +		      rtx toi = gen_reg_rtx (toi_mode);
> +		      convert_mode_scalar (toi, fromi, 1);
> +		      toi
> +			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> +					      GET_MODE_PRECISION (to_mode)
> +					      - GET_MODE_PRECISION (from_mode),
> +					      NULL_RTX, 1);
> +		      if (toi)
> +			{
> +			  tof = lowpart_subreg (to_mode, toi, toi_mode);
> +			  if (tof)
> +			    emit_move_insn (to, tof);
> +			}
> +		    }
> +		  insns = get_insns ();
> +		  end_sequence ();
> +		  if (tof)
> +		    {
> +		      emit_insn (insns);
> +		      return;
> +		    }
> +		}
> +	    }
> +	}
> +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> +	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +	  && !HONOR_NANS (from_mode)
> +	  && !HONOR_NANS (to_mode)
> +	  && !flag_rounding_math
> +	  && optimize_insn_for_speed_p ())
> +	{
> +	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
> +	     to nearest, we can expand the conversion inline as
> +	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> +	  machine_mode fromi_mode, toi_mode;
> +	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				 0).exists (&fromi_mode)
> +	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +				    0).exists (&toi_mode))
> +	    {
> +	      start_sequence ();
> +	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +	      rtx tof = NULL_RTX;
> +	      do
> +		{
> +		  if (!fromi)
> +		    break;
> +		  int shift = (GET_MODE_PRECISION (from_mode)
> +			       - GET_MODE_PRECISION (to_mode));
> +		  rtx temp1
> +		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> +					  shift, NULL_RTX, 1);
> +		  if (!temp1)
> +		    break;
> +		  rtx temp2
> +		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp2)
> +		    break;
> +		  rtx temp3
> +		    = expand_binop (fromi_mode, add_optab, fromi,
> +				    gen_int_mode ((HOST_WIDE_INT_1U
> +						   << (shift - 1)) - 1,
> +						  fromi_mode), NULL_RTX,
> +				    1, OPTAB_DIRECT);
> +		  if (!temp3)
> +		    break;
> +		  rtx temp4
> +		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp4)
> +		    break;
> +		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> +						  temp4, shift, NULL_RTX, 1);
> +		  if (!temp5)
> +		    break;
> +		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> +		  if (!temp6)
> +		    break;
> +		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> +					toi_mode);
> +		  if (tof)
> +		    emit_move_insn (to, tof);
> +		}
> +	      while (0);
> +	      insns = get_insns ();
> +	      end_sequence ();
> +	      if (tof)
> +		{
> +		  emit_insn (insns);
> +		  return;
> +		}
> +	    }
> +	}
> +#endif
> +
>         /* Otherwise use a libcall.  */
>         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
>   
> --- gcc/builtin-types.def.jj	2022-10-03 18:00:52.658740505 +0200
> +++ gcc/builtin-types.def	2022-10-13 17:09:52.930317869 +0200
> @@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
>   DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
> +DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
> +				  ? bfloat16_type_node
> +				  : error_mark_node))
>   DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
>   				 ? float16_type_node
>   				 : error_mark_node))
> @@ -264,6 +267,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
>   DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
>   		     BT_LONGDOUBLE, BT_CONST_STRING)
> +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
> --- gcc/builtins.def.jj	2022-10-03 18:00:52.679740221 +0200
> +++ gcc/builtins.def	2022-10-13 17:09:05.633962625 +0200
> @@ -514,6 +514,7 @@ DEF_GCC_BUILTIN        (BUILT_IN_NANSF,
>   DEF_GCC_BUILTIN        (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
>   DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL)
>   #undef NAN_TYPE
> +DEF_GCC_BUILTIN        (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
>   DEF_GCC_BUILTIN        (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
>   DEF_GCC_BUILTIN        (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
>   DEF_GCC_BUILTIN        (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> --- gcc/fold-const-call.cc.jj	2022-09-03 09:35:41.107989686 +0200
> +++ gcc/fold-const-call.cc	2022-10-13 17:20:59.579229947 +0200
> @@ -1301,6 +1301,7 @@ fold_const_call (combined_fn fn, tree ty
>   
>       CASE_CFN_NANS:
>       CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
> +    case CFN_BUILT_IN_NANSF16B:
>       case CFN_BUILT_IN_NANSD32:
>       case CFN_BUILT_IN_NANSD64:
>       case CFN_BUILT_IN_NANSD128:
> --- gcc/config/i386/i386.cc.jj	2022-10-03 18:00:52.942736674 +0200
> +++ gcc/config/i386/i386.cc	2022-10-13 16:57:09.092773105 +0200
> @@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
>         classes[1] = X86_64_SSEUP_CLASS;
>         return 2;
>       case E_HCmode:
> +    case E_BCmode:
>         classes[0] = X86_64_SSE_CLASS;
>         if (!(bit_offset % 64))
>   	return 1;
> @@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
>        be defined by the C front-end for AVX512FP16 intrinsics.  We will
>        issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
>        enabled.  */
> -  return ((mode == HFmode && TARGET_SSE2)
> +  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
>   	  ? true
>   	  : default_libgcc_floating_mode_supported_p (mode));
>   }
> @@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
>     switch (TYPE_MODE (type))
>       {
>       case E_BFmode:
> -      return "u6__bf16";
> +      return "DF16b";
>       case E_HFmode:
>         /* _Float16 is "DF16_".
>   	 Align with clang's decision in https://reviews.llvm.org/D33719. */
> @@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
>       }
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<__bf16%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<__bf16%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
>   
>   static tree
> @@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE ix86_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> -
>   #undef TARGET_STACK_PROTECT_GUARD
>   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
>   
> --- gcc/config/i386/i386-builtins.cc.jj	2022-10-03 18:00:52.918736997 +0200
> +++ gcc/config/i386/i386-builtins.cc	2022-10-13 16:57:09.119772735 +0200
> @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>   
>   tree ix86_float16_type_node = NULL_TREE;
> -tree ix86_bf16_type_node = NULL_TREE;
>   tree ix86_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Retrieve an element from the above table, building some of
> @@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
>   static void
>   ix86_register_bf16_builtin_type (void)
>   {
> -  ix86_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> -  layout_type (ix86_bf16_type_node);
> +  if (bfloat16_type_node == NULL_TREE)
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = 16;
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +      layout_type (bfloat16_type_node);
> +    }
>   
>     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
>       {
> -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> -					    "__bf16");
> -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>       }
>   }
>   
> --- gcc/config/i386/i386-builtin-types.def.jj	2022-10-03 18:00:52.894737321 +0200
> +++ gcc/config/i386/i386-builtin-types.def	2022-10-13 16:57:09.139772460 +0200
> @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
>   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> --- gcc/config/i386/i386.md.jj	2022-10-11 15:57:05.005762022 +0200
> +++ gcc/config/i386/i386.md	2022-10-13 16:57:09.187771801 +0200
> @@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
>     DONE;
>   })
>   
> +(define_expand "cbranchbf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
> +		    (match_operand:BF 2 "cmp_fp_expander_operand")))
> +   (set (pc) (if_then_else
> +	      (match_operator 0 "comparison_operator"
> +	       [(reg:CC FLAGS_REG)
> +		(const_int 0)])
> +	      (label_ref (match_operand 3))
> +	      (pc)))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[1]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[1], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> +			   SFmode, NULL_RTX, NULL,
> +			   as_a <rtx_code_label *> (operands[3]),
> +			   /* Unfortunately this isn't propagated.  */
> +			   profile_probability::even ());
> +  DONE;
> +})
> +
>   (define_expand "cstorehf4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
> @@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
>     DONE;
>   })
>   
> +(define_expand "cstorebf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
> +		    (match_operand:BF 3 "cmp_fp_expander_operand")))
> +   (set (match_operand:QI 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	  [(reg:CC FLAGS_REG)
> +	   (const_int 0)]))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[3]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[3], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
> +				   op1, op2, SFmode, 0, 1);
> +  if (!rtx_equal_p (res, operands[0]))
> +    emit_move_insn (operands[0], res);
> +  DONE;
> +})
> +
>   (define_expand "cstore<mode>4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
> --- gcc/c-family/c-cppbuiltin.cc.jj	2022-10-13 08:41:04.718165419 +0200
> +++ gcc/c-family/c-cppbuiltin.cc	2022-10-13 17:51:07.722665421 +0200
> @@ -1260,6 +1260,13 @@ c_cpp_builtins (cpp_reader *pfile)
>         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
>   				      csuffix, FLOATN_NX_TYPE_NODE (i));
>       }
> +  if (bfloat16_type_node)
> +    {
> +      if (c_dialect_cxx () && cxx_dialect > cxx20)
> +	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> +				      "BF16", bfloat16_type_node);
> +    }
>   
>     /* For float.h.  */
>     if (targetm.decimal_float_supported_p ())
> @@ -1370,6 +1377,12 @@ c_cpp_builtins (cpp_reader *pfile)
>   	      suffix[0] = 'l';
>   	      memcpy (float_h_prefix, "LDBL", 5);
>   	    }
> +	  else if (bfloat16_type_node
> +		   && mode == TYPE_MODE (bfloat16_type_node))
> +	    {
> +	      memcpy (suffix, "bf16", 5);
> +	      memcpy (float_h_prefix, "BFLT16", 7);
> +	    }
>   	  else
>   	    {
>   	      bool found_suffix = false;
> @@ -1396,22 +1409,28 @@ c_cpp_builtins (cpp_reader *pfile)
>   	  machine_mode float16_type_mode = (float16_type_node
>   					    ? TYPE_MODE (float16_type_node)
>   					    : VOIDmode);
> +	  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +					     ? TYPE_MODE (bfloat16_type_node)
> +					     : VOIDmode);
>   	  switch (targetm.c.excess_precision
>   		    (EXCESS_PRECISION_TYPE_IMPLICIT))
>   	    {
>   	    case FLT_EVAL_METHOD_UNPREDICTABLE:
>   	    case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	      excess_precision = (mode == float16_type_mode
> +				  || mode == bfloat16_type_mode
>   				  || mode == TYPE_MODE (float_type_node)
>   				  || mode == TYPE_MODE (double_type_node));
>   	      break;
>   
>   	    case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	      excess_precision = (mode == float16_type_mode
> +				  || mode == bfloat16_type_mode
>   				  || mode == TYPE_MODE (float_type_node));
>   	      break;
>   	    case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	      excess_precision = mode == float16_type_mode;
> +	      excess_precision = (mode == float16_type_mode
> +				  || mode == bfloat16_type_mode);
>   	      break;
>   	    case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16:
>   	      excess_precision = false;
> --- gcc/c-family/c-lex.cc.jj	2022-10-13 16:21:52.548842666 +0200
> +++ gcc/c-family/c-lex.cc	2022-10-13 16:59:51.778540099 +0200
> @@ -1000,6 +1000,22 @@ interpret_float (const cpp_token *token,
>   	  pedwarn (input_location, OPT_Wpedantic,
>   		   "non-standard suffix on floating constant");
>         }
> +    else if ((flags & CPP_N_BFLOAT16) != 0)
> +      {
> +	type = bfloat16_type_node;
> +	if (type == NULL_TREE)
> +	  {
> +	    error ("unsupported non-standard suffix on floating constant");
> +	    return error_mark_node;
> +	  }
> +	if (!c_dialect_cxx ())
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "non-standard suffix on floating constant");
> +	else if (cxx_dialect < cxx23)
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "%<bf16%> or %<BF16%> suffix on floating constant only "
> +		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> +      }
>       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
>         type = long_double_type_node;
>       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> --- gcc/c/c-typeck.cc.jj	2022-10-06 17:43:47.900502672 +0200
> +++ gcc/c/c-typeck.cc	2022-10-13 16:57:09.226771266 +0200
> @@ -3678,6 +3678,9 @@ convert_arguments (location_t loc, vec<l
>   		promote_float_arg = false;
>   		break;
>   	      }
> +	  /* Don't promote __bf16 either.  */
> +	  if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
> +	    promote_float_arg = false;
>   	}
>   
>         if (type != NULL_TREE)
> --- gcc/cp/cp-tree.h.jj	2022-10-13 16:21:52.600841952 +0200
> +++ gcc/cp/cp-tree.h	2022-10-13 16:57:09.241771060 +0200
> @@ -8741,6 +8741,8 @@ extended_float_type_p (tree type)
>     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
>       if (type == FLOATN_TYPE_NODE (i))
>         return true;
> +  if (type == bfloat16_type_node)
> +    return true;
>     return false;
>   }
>   
> --- gcc/cp/typeck.cc.jj	2022-10-13 16:21:52.642841375 +0200
> +++ gcc/cp/typeck.cc	2022-10-13 16:57:09.269770676 +0200
> @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
>         if (mv2 == FLOATN_NX_TYPE_NODE (i))
>   	extended2 = i + 1;
>       }
> +  if (mv1 == bfloat16_type_node)
> +    extended1 = true;
> +  if (mv2 == bfloat16_type_node)
> +    extended2 = true;
>     if (extended2 && !extended1)
>       {
>         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
>     if (cnt > 1 && mv2 == long_double_type_node)
>       return -2;
>     /* Otherwise, they have equal rank, but extended types
> -     (other than std::bfloat16_t) have higher subrank.  */
> +     (other than std::bfloat16_t) have higher subrank.
> +     std::bfloat16_t shouldn't have equal rank to any standard
> +     floating point type.  */
>     return 1;
>   }
>   
> --- gcc/testsuite/lib/target-supports.exp.jj	2022-10-11 14:50:14.472773574 +0200
> +++ gcc/testsuite/lib/target-supports.exp	2022-10-13 16:57:09.270770662 +0200
> @@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
>       return 1
>   }
>   
> +# Return 1 if the target supports the __bf16 type, 0 otherwise.
> +
> +proc check_effective_target_bfloat16 {} {
> +    return [check_no_compiler_messages_nocache bfloat16 object {
> +	__bf16 foo (__bf16 x) { return x + x; }
> +    } [add_options_for_bfloat16 ""]]
> +}
> +
> +proc check_effective_target_bfloat16_runtime {} {
> +    return [check_effective_target_bfloat16]
> +}
> +
> +proc add_options_for_bfloat16 { flags } {
> +    return "$flags"
> +}
> +
>   # Return 1 if the target supports all four forms of fused multiply-add
>   # (fma, fms, fnma, and fnms) for both float and double.
>   
> --- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj	2022-10-13 16:57:09.271770648 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c	2022-10-13 17:32:28.531884882 +0200
> @@ -0,0 +1,11 @@
> +/* Test __bf16.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +#define TYPE __bf16
> +#define CST(C) CONCAT (C, bf16)
> +#define CSTU(C) CONCAT (C, BF16)
> +
> +#include "floatn-basic.h"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj	2022-10-13 16:57:09.271770648 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c	2022-10-13 18:09:24.288913634 +0200
> @@ -0,0 +1,47 @@
> +/* Test __bf16 built-in functions.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +extern void exit (int);
> +extern void abort (void);
> +
> +extern __bf16 test_type;
> +extern __typeof (__builtin_nansf16b ("")) test_type;
> +
> +volatile __bf16 inf_cst = (__bf16) __builtin_inff ();
> +volatile __bf16 huge_val_cst = (__bf16) __builtin_huge_valf ();
> +volatile __bf16 nan_cst = (__bf16) __builtin_nanf ("");
> +volatile __bf16 nans_cst = __builtin_nansf16b ("");
> +volatile __bf16 neg0 = -0.0bf16, neg1 = -1.0bf16, one = 1.0;
> +
> +int
> +main (void)
> +{
> +  volatile __bf16 r;
> +  if (!__builtin_isinf (inf_cst))
> +    abort ();
> +  if (!__builtin_isinf (huge_val_cst))
> +    abort ();
> +  if (inf_cst != huge_val_cst)
> +    abort ();
> +  if (!__builtin_isnan (nan_cst))
> +    abort ();
> +  if (!__builtin_isnan (nans_cst))
> +    abort ();
> +  r = __builtin_fabsf (neg1);
> +  if (r != 1.0bf16)
> +    abort ();
> +  r = __builtin_copysignf (one, neg0);
> +  if (r != neg1)
> +    abort ();
> +  r = __builtin_copysignf (inf_cst, neg1);
> +  if (r != -huge_val_cst)
> +    abort ();
> +  r = __builtin_copysignf (-inf_cst, one);
> +  if (r != huge_val_cst)
> +    abort ();
> +  exit (0);
> +}
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj	2022-10-13 16:57:09.271770648 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c	2022-10-13 17:40:15.067555349 +0200
> @@ -0,0 +1,21 @@
> +/* Test __bf16 __builtin_issignaling.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +/* { dg-additional-options "-fsignaling-nans" } */
> +/* Workaround for PR57484 on ia32: */
> +/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +
> +#define TYPE __bf16
> +#define CST(C) CONCAT (C, bf16)
> +#define FN(F) CONCAT (F, f16b)
> +#define NAN(x) ((__bf16) __builtin_nanf (x))
> +#define INF ((__bf16) __builtin_inff ())
> +#define EXT 0
> +
> +#include "builtin-issignaling-1.c"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj	2022-10-13 16:57:09.271770648 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c	2022-10-13 17:46:43.259267724 +0200
> @@ -0,0 +1,61 @@
> +/* Test __bf16 complex arithmetic.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +extern void exit (int);
> +extern void abort (void);
> +
> +volatile __bf16 a = 1.0bf16;
> +typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
> +volatile __cbf16 b = __builtin_complex (2.0bf16, 3.0bf16);
> +volatile __cbf16 c = __builtin_complex (2.0bf16, 3.0bf16);
> +volatile __cbf16 d = __builtin_complex (2.0bf16, 3.0bf16);
> +
> +__cbf16
> +fn (__cbf16 arg)
> +{
> +  return arg / 4;
> +}
> +
> +int
> +main (void)
> +{
> +  volatile __cbf16 r;
> +  if (b != c)
> +    abort ();
> +  if (b != d)
> +    abort ();
> +  r = a + b;
> +  if (__real__ r != 3.0bf16 || __imag__ r != 3.0bf16)
> +    abort ();
> +  r += d;
> +  if (__real__ r != 5.0bf16 || __imag__ r != 6.0bf16)
> +    abort ();
> +  r -= a;
> +  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
> +    abort ();
> +  r /= (a + a);
> +  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
> +    abort ();
> +  r *= (a + a);
> +  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
> +    abort ();
> +  r -= b;
> +  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
> +    abort ();
> +  r *= r;
> +  if (__real__ r != -5.0bf16 || __imag__ r != 12.0bf16)
> +    abort ();
> +  /* Division may not be exact, so round result before comparing.  */
> +  r /= b;
> +  r += __builtin_complex (100.0bf16, 100.0bf16);
> +  r -= __builtin_complex (100.0bf16, 100.0bf16);
> +  if (r != b)
> +    abort ();
> +  r = fn (r);
> +  if (__real__ r != 0.5bf16 || __imag__ r != 0.75bf16)
> +    abort ();
> +  exit (0);
> +}
> --- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj	2022-10-03 18:00:53.118734300 +0200
> +++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c	2022-10-13 17:39:19.387313780 +0200
> @@ -4,7 +4,7 @@
>   /* Workaround for PR57484 on ia32: */
>   /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
>   
> -#ifndef EXT
> +#if !defined(EXT) && !defined(TYPE)
>   int
>   f1 (void)
>   {
> @@ -41,31 +41,42 @@ f6 (long double x)
>     return __builtin_issignaling (x);
>   }
>   #else
> -#define CONCATX(X, Y) X ## Y
> -#define CONCAT(X, Y) CONCATX (X, Y)
> -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> -
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define FN(F) CONCAT3 (F, f, WIDTH)
> +#ifndef TYPE
> +# define CONCATX(X, Y) X ## Y
> +# define CONCAT(X, Y) CONCATX (X, Y)
> +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> +
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define FN(F) CONCAT3 (F, f, WIDTH)
> +# endif
> +#endif
> +#ifndef NANS
> +# define NANS(x) FN (__builtin_nans) (x)
> +#endif
> +#ifndef NAN
> +# define NAN(x) FN (__builtin_nan) (x)
> +#endif
> +#ifndef INF
> +# define INF FN (__builtin_inf) ()
>   #endif
>   
>   int
>   f1 (void)
>   {
> -  return __builtin_issignaling (FN (__builtin_nans) (""));
> +  return __builtin_issignaling (NANS (""));
>   }
>   
>   int
>   f2 (void)
>   {
> -  return __builtin_issignaling (FN (__builtin_nan) (""));
> +  return __builtin_issignaling (NAN (""));
>   }
>   
>   int
> @@ -118,10 +129,10 @@ main ()
>     if (!f6 (z))
>       __builtin_abort ();
>   #else
> -  if (f4 (w) || !f4 (FN (__builtin_nans) ("0x123")) || f4 (CST (42.0)) || f4 (FN (__builtin_nan) ("0x234"))
> -      || f4 (FN (__builtin_inf) ()) || f4 (-FN (__builtin_inf) ()) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
> +  if (f4 (w) || !f4 (NANS ("0x123")) || f4 (CST (42.0)) || f4 (NAN ("0x234"))
> +      || f4 (INF) || f4 (-INF) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
>       __builtin_abort ();
> -  w = FN (__builtin_nans) ("");
> +  w = NANS ("");
>     asm volatile ("" : : : "memory");
>     if (!f4 (w))
>       __builtin_abort ();
> --- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj	2022-10-03 18:00:53.118734300 +0200
> +++ gcc/testsuite/gcc.dg/torture/floatn-basic.h	2022-10-13 16:57:09.285770456 +0200
> @@ -9,14 +9,16 @@
>   #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
>   #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
>   
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define CSTU(C) CONCAT3 (C, F, WIDTH)
> +#ifndef TYPE
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define CSTU(C) CONCAT3 (C, F, WIDTH)
> +# endif
>   #endif
>   
>   extern void exit (int);
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-10-03 18:00:53.137734043 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-10-13 16:57:09.306770168 +0200
> @@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
>     __m256bf16 vector2_1 = {};
>     __m256bf16 vector2_2 = { glob_bfloat };
>     __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256bf16 vector2_4 = { 0 };
> +  __m256bf16 vector2_5 = { 0.1 };
> +  __m256bf16 vector2_6 = { is_a_float16 };
> +  __m256bf16 vector2_7 = { is_a_float };
> +  __m256bf16 vector2_8 = { is_an_int };
> +  __m256bf16 vector2_9 = { is_a_short_int };
> +  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m256h initi_2_3 = { glob_bfloat };
> +  __m256i initi_2_5 = { glob_bfloat };
> +  __v16hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m256bf16) {};
>   
> -  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m256bf16) { 0 };
> +  (__m256bf16) { 0.1 };
>     (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
> @@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-10-03 18:00:53.136734057 +0200
> +++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-10-13 16:57:09.327769880 +0200
> @@ -12,8 +12,8 @@ double is_a_double;
>   
>   float *float_ptr;
>   
> -__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> -__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> +__bf16 foo1 (void) { return (__bf16) 0x1234; }
> +__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
>   
>   __bf16 footest (__bf16 scalar0)
>   {
> @@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
>   
>     __bf16 scalar1_1;
>     __bf16 scalar1_2 = glob_bfloat;
> -  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar1_3 = 0;
> +  __bf16 scalar1_4 = 0.1;
> +  __bf16 scalar1_5 = is_a_float;
> +  __bf16 scalar1_6 = is_an_int;
> +  __bf16 scalar1_7 = is_a_float16;
> +  __bf16 scalar1_8 = is_a_double;
> +  __bf16 scalar1_9 = is_a_short_int;
> +
> +  int initi_1_1 = glob_bfloat;
> +  float initi_1_2 = glob_bfloat;
> +  _Float16 initi_1_3 = glob_bfloat;
> +  short initi_1_4 = glob_bfloat;
> +  double initi_1_5 = glob_bfloat;
>   
>     __bf16 scalar2_1 = {};
>     __bf16 scalar2_2 = { glob_bfloat };
> -  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar2_3 = { 0 };
> +  __bf16 scalar2_4 = { 0.1 };
> +  __bf16 scalar2_5 = { is_a_float };
> +  __bf16 scalar2_6 = { is_an_int };
> +  __bf16 scalar2_7 = { is_a_float16 };
> +  __bf16 scalar2_8 = { is_a_double };
> +  __bf16 scalar2_9 = { is_a_short_int };
> +
> +  int initi_2_1 = { glob_bfloat };
> +  float initi_2_2 = { glob_bfloat };
> +  _Float16 initi_2_3 = { glob_bfloat };
> +  short initi_2_4 = { glob_bfloat };
> +  double initi_2_5 = { glob_bfloat };
>   
>     /* Assignments.  */
>   
>     glob_bfloat = glob_bfloat;
> -  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  glob_bfloat = 0;
> +  glob_bfloat = 0.1;
> +  glob_bfloat = is_a_float;
> +  glob_bfloat = is_an_int;
> +  glob_bfloat = is_a_float16;
> +  glob_bfloat = is_a_double;
> +  glob_bfloat = is_a_short_int;
> +
> +  is_an_int = glob_bfloat;
> +  is_a_float = glob_bfloat;
> +  is_a_float16 = glob_bfloat;
> +  is_a_double = glob_bfloat;
> +  is_a_short_int = glob_bfloat;
>   
>     /* Casting.  */
>   
>     (void) glob_bfloat;
>     (__bf16) glob_bfloat;
>   
> -  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -
> -  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (int) glob_bfloat;
> +  (float) glob_bfloat;
> +  (_Float16) glob_bfloat;
> +  (double) glob_bfloat;
> +  (short) glob_bfloat;
> +
> +  (__bf16) is_an_int;
> +  (__bf16) is_a_float;
> +  (__bf16) is_a_float16;
> +  (__bf16) is_a_double;
> +  (__bf16) is_a_short_int;
>   
>     /* Compound literals.  */
>   
>     (__bf16) {};
>     (__bf16) { glob_bfloat };
> -  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  (__bf16) { 0 };
> +  (__bf16) { 0.1 };
> +  (__bf16) { is_a_float };
> +  (__bf16) { is_an_int };
> +  (__bf16) { is_a_float16 };
> +  (__bf16) { is_a_double };
> +  (__bf16) { is_a_short_int };
> +
> +  (int) { glob_bfloat };
> +  (float) { glob_bfloat };
> +  (_Float16) { glob_bfloat };
> +  (double) { glob_bfloat };
> +  (short) { glob_bfloat };
>   
>     /* Arrays and Structs.  */
>   
> @@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 > glob_bfloat;
> +  glob_bfloat == scalar0;
> +  scalar0 > is_a_float;
> +  is_a_float == scalar0;
> +  scalar0 > 0;
> +  0 == scalar0;
> +  scalar0 > 0.1;
> +  0.1 == scalar0;
> +  scalar0 > is_an_int;
> +  is_an_int == scalar0;
>   
>     /* Pointer comparison.  */
>   
> @@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
>     /* Conditional expressions.  */
>   
>     0 ? scalar0 : scalar0;
> -  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  0 ? scalar0 : is_a_float;
> +  0 ? is_a_float : scalar0;
> +  0 ? scalar0 : 0;
> +  0 ? 0 : scalar0;
> +  0 ? 0.1 : scalar0;
> +  0 ? scalar0 : 0.1;
>     0 ? bfloat_ptr : bfloat_ptr2;
>     0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>     0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>   
> -  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 ? scalar0 : scalar0;
> +  scalar0 ? is_a_float : scalar0;
> +  scalar0 ? scalar0 : is_a_float;
> +  scalar0 ? is_a_float : is_a_float;
>   
>     /* Unary operators.  */
>   
> -  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +scalar0;
> +  -scalar0;
> +  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !scalar0;
>     *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real scalar0;
> +  __imag scalar0;
> +  ++scalar0;
> +  --scalar0;
> +  scalar0++;
> +  scalar0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 = glob_bfloat + *bfloat_ptr;
> +  scalar0 = glob_bfloat + 0.1;
> +  scalar0 = glob_bfloat + 0;
> +  scalar0 = glob_bfloat + is_a_float;
>   
>     return scalar0;
>   }
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-10-03 18:00:53.136734057 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-10-13 16:57:09.344769646 +0200
> @@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
>     __m128bf16 vector2_1 = {};
>     __m128bf16 vector2_2 = { glob_bfloat };
>     __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m128bf16 vector2_4 = { 0 };
> +  __m128bf16 vector2_5 = { 0.1 };
> +  __m128bf16 vector2_6 = { is_a_float16 };
> +  __m128bf16 vector2_7 = { is_a_float };
> +  __m128bf16 vector2_8 = { is_an_int };
> +  __m128bf16 vector2_9 = { is_a_short_int };
> +  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m128h initi_2_3 = { glob_bfloat };
> +  __m128 initi_2_4 = { glob_bfloat };
> +  __v4si initi_2_5 = { glob_bfloat };
> +  __v4hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m128bf16) {};
>   
> -  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m128bf16) { 0 };
> +  (__m128bf16) { 0.1 };
>     (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
> @@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-10-03 18:00:53.109734421 +0200
> +++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-10-13 16:57:09.362769399 +0200
> @@ -5,6 +5,6 @@ void foo (void)
>   {
>     __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>     __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> -  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> +  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>   }
> --- libcpp/include/cpplib.h.jj	2022-10-03 18:00:53.251732506 +0200
> +++ libcpp/include/cpplib.h	2022-10-13 16:57:09.384769097 +0200
> @@ -1275,6 +1275,7 @@ struct cpp_num
>   #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
>   
>   #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
> +#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
>   
>   #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
>   					      of N, divided by 16.  */
> --- libcpp/expr.cc.jj	2022-10-03 18:00:53.221732910 +0200
> +++ libcpp/expr.cc	2022-10-13 16:58:01.360055690 +0200
> @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
>     size_t orig_len = len;
>     const uchar *orig_s = s;
>     size_t flags;
> -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
>   
>     flags = 0;
> -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
>   
>     /* The following decimal float suffixes, from TR 24732:2009, TS
>        18661-2:2015 and C2X, are supported:
> @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
>        w, W - machine-specific type such as __float80 (GNU extension).
>        q, Q - machine-specific type such as __float128 (GNU extension).
>        fN, FN - _FloatN (TS 18661-3:2015).
> -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
>   
>     /* Process decimal float suffixes, which are two letters starting
>        with d or D.  Order and case are significant.  */
> @@ -239,6 +240,19 @@ interpret_float_suffix (cpp_reader *pfil
>   		fn++;
>   	    }
>   	  break;
> +	case 'b': case 'B':
> +	  if (len > 2
> +	      /* Except for bf16 / BF16 where case is significant.  */
> +	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
> +	      && s[2] == '1'
> +	      && s[3] == '6')
> +	    {
> +	      bf16++;
> +	      len -= 3;
> +	      s += 3;
> +	      break;
> +	    }
> +	  return 0;
>   	case 'd': case 'D': d++; break;
>   	case 'l': case 'L': l++; break;
>   	case 'w': case 'W': w++; break;
> @@ -257,7 +271,7 @@ interpret_float_suffix (cpp_reader *pfil
>        of N larger than can be represented in the return value.  The
>        caller is responsible for rejecting _FloatN suffixes where
>        _FloatN is not supported on the chosen target.  */
> -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
>       return 0;
>     if (fn_bits > CPP_FLOATN_MAX)
>       return 0;
> @@ -295,6 +309,7 @@ interpret_float_suffix (cpp_reader *pfil
>   	     q ? CPP_N_MD_Q :
>   	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
>   	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> +	     bf16 ? CPP_N_BFLOAT16 :
>   	     CPP_N_DEFAULT));
>   }
>   
> --- libgcc/config/i386/t-softfp.jj	2022-10-03 18:00:53.314731656 +0200
> +++ libgcc/config/i386/t-softfp	2022-10-13 16:57:09.426768521 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>   
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +		      tfbf xfbf dfbf sfbf hfbf
>   
>   softfp_extras += eqhf2
>   
> @@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
>   CFLAGS-extendhfdf2.c += -msse2
>   CFLAGS-extendhftf2.c += -msse2
>   CFLAGS-extendhfxf2.c += -msse2
> +CFLAGS-extendbfsf2.c += -msse2
>   
>   CFLAGS-truncsfhf2.c += -msse2
>   CFLAGS-truncdfhf2.c += -msse2
>   CFLAGS-truncxfhf2.c += -msse2
>   CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-truncsfbf2.c += -msse2
> +CFLAGS-truncdfbf2.c += -msse2
> +CFLAGS-truncxfbf2.c += -msse2
> +CFLAGS-trunctfbf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>   
>   CFLAGS-eqhf2.c += -msse2
>   CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj	2022-10-03 18:00:53.313731670 +0200
> +++ libgcc/config/i386/libgcc-glibc.ver	2022-10-13 16:57:09.438768356 +0200
> @@ -214,3 +214,13 @@ GCC_12.0.0 {
>     __trunctfhf2
>     __truncxfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/i386/sfp-machine.h.jj	2022-10-03 18:00:53.313731670 +0200
> +++ libgcc/config/i386/sfp-machine.h	2022-10-13 16:57:09.441768315 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_QNANNEGATEDP 0
>   
>   #define _FP_NANSIGN_H		1
> +#define _FP_NANSIGN_B		1
>   #define _FP_NANSIGN_S		1
>   #define _FP_NANSIGN_D		1
>   #define _FP_NANSIGN_E		1
> --- libgcc/config/i386/64/sfp-machine.h.jj	2022-10-03 18:00:53.290731980 +0200
> +++ libgcc/config/i386/64/sfp-machine.h	2022-10-13 16:57:09.451768178 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D
>   #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
> --- libgcc/config/i386/32/sfp-machine.h.jj	2022-10-03 18:00:53.290731980 +0200
> +++ libgcc/config/i386/32/sfp-machine.h	2022-10-13 16:57:09.459768068 +0200
> @@ -87,6 +87,7 @@
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   /* Even if XFmode is 12byte,  we have to pad it to
> --- libgcc/soft-fp/brain.h.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/brain.h	2022-10-13 16:57:09.459768068 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H	1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B		8
> +#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B		8
> +#define _FP_EXPBIAS_B		127
> +#define _FP_EXPMAX_B		255
> +
> +#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B	\
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)		_FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)			\
> +  do						\
> +    {						\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_B(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/truncsfbf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/truncdfbf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/truncxfbf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/trunctfbf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/trunchfbf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/truncbfhf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj	2022-10-13 16:57:09.460768054 +0200
> +++ libgcc/soft-fp/extendbfsf2.c	2022-10-13 16:57:09.460768054 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libiberty/cp-demangle.h.jj	2022-10-03 18:00:53.342731278 +0200
> +++ libiberty/cp-demangle.h	2022-10-13 16:57:09.488767670 +0200
> @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
>   extern const struct demangle_operator_info cplus_demangle_operators[];
>   #endif
>   
> -#define D_BUILTIN_TYPE_COUNT (35)
> +#define D_BUILTIN_TYPE_COUNT (36)
>   
>   CP_STATIC_IF_GLIBCPP_V3
>   const struct demangle_builtin_type_info
> --- libiberty/cp-demangle.c.jj	2022-10-11 14:50:14.605771753 +0200
> +++ libiberty/cp-demangle.c	2022-10-13 16:57:09.538766983 +0200
> @@ -2487,6 +2487,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
>     /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
>   	     D_PRINT_DEFAULT },
>     /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
> +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
>   };
>   
>   CP_STATIC_IF_GLIBCPP_V3
> @@ -2751,11 +2752,22 @@ cplus_demangle_type (struct d_info *di)
>   
>   	case 'F':
>   	  /* DF<number>_ - _Float<number>.
> -	     DF<number>x - _Float<number>x.  */
> +	     DF<number>x - _Float<number>x
> +	     DF16b - std::bfloat16_t.  */
>   	  {
>   	    int arg = d_number (di);
>   	    char buf[12];
>   	    char suffix = 0;
> +	    if (d_peek_char (di) == 'b')
> +	      {
> +		if (arg != 16)
> +		  return NULL;
> +		d_advance (di, 1);
> +		ret = d_make_builtin_type (di,
> +					   &cplus_demangle_builtin_types[35]);
> +		di->expansion += ret->u.s_builtin.type->len;
> +		break;
> +	      }
>   	    if (d_peek_char (di) == 'x')
>   	      suffix = 'x';
>   	    if (!suffix && d_peek_char (di) != '_')
> --- libiberty/testsuite/demangle-expected.jj	2022-10-11 14:50:14.618771575 +0200
> +++ libiberty/testsuite/demangle-expected	2022-10-13 16:57:09.553766778 +0200
> @@ -1249,6 +1249,10 @@ xxx
>   _Z3xxxDF32xDF64xDF128xCDF32xVb
>   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
>   xxx
> +--format=auto --no-params
> +_Z3xxxDF16b
> +xxx(std::bfloat16_t)
> +xxx
>   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
>   --format=auto --no-params
>   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> 
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support
  2022-10-13 19:37               ` Jason Merrill
@ 2022-10-13 21:11                 ` Uros Bizjak
  2022-10-13 21:35                   ` Jakub Jelinek
  0 siblings, 1 reply; 22+ messages in thread
From: Uros Bizjak @ 2022-10-13 21:11 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Jakub Jelinek, Joseph S. Myers, Richard Biener, Jeff Law, gcc-patches

On Thu, Oct 13, 2022 at 9:38 PM Jason Merrill <jason@redhat.com> wrote:
>
> On 10/13/22 12:50, Jakub Jelinek wrote:
> > Hi!
> >
> > On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
> >>> As I wrote earlier, I think we need at least one, __builtin_nans variant
> >>> which would be used in libstdc++
> >>> std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
> >>> I think
> >>> std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
> >>> return (__bf16) __builtin_huge_valf ();
> >>> and similarly
> >>> std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
> >>> return (__bf16) __builtin_nanf ("");
> >>> but
> >>> return (__bf16) __builtin_nansf ("");
> >>> would loose the signaling NaN on the conversion and raise exception,
> >>> and as the method is constexpr,
> >>> union { unsigned short a; __bf16 b; } u = { 0x7f81 };
> >>> return u.b;
> >>> wouldn't work.  I can certainly restrict the builtins to the single
> >>> one, but wonder whether the suffix for that builtin shouldn't be chosen
> >>> such that eventually we could add more builtins if we need to
> >>> and don't run into the log with bf16 suffix vs. logb with f16 suffix
> >>> ambiguity.
> >>> As you said, most of the libstdc++ overloads for std::bfloat16_t then
> >>> can use float builtins or library calls under the hood, but std::nextafter
> >>> is another case where I think we'll need to have something bfloat16_t
> >>> specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.
> >>
> >> Makes sense.
> >
> > So, this updated version of the patch adds just a single __builtin_nansf16b
> > builtin (or do you want __builtin_nansbf16?).
>
> 16b sounds fine.
>
> >>> Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
> >>> in the next iteration (always with pedwarn in that case).
> >
> > And implements bf16/BF16 suffixes for C too.
> >
> >>> I'm afraid too many places rely on all modes of a certain class to be
> >>> visible when walking from "narrowest" to "widest" mode, say
> >>> FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> >>> etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> >>> && GET_MODE_WIDER_MODE (HFmode) == SFmode.
> >>
> >> Yes, it seems they need to change now that their assumptions have been
> >> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
> >> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
> >> whether they want an iteration that uses get_wider (likely with a new name)
> >> or not.
> >
> > And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
> > is updated on top of those changes.
> >
> > So far lightly tested on x86_64-linux, ok for trunk if it passes full
> > bootstrap/regtest on both x86_64-linux and i686-linux?
>
> LGTM, but a i386 maintainer should review it as well.

OK with two changes  to cbranch and cstore expanders, as explained inline.

Thanks,
Uros.

> > 2022-10-13  Jakub Jelinek  <jakub@redhat.com>
> >
> > gcc/
> >       * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> >       * tree.h (bfloat16_type_node): Define.
> >       * tree.cc (excess_precision_type): Promote bfloat16_type_mode
> >       like float16_type_mode.
> >       (build_common_tree_nodes): Initialize bfloat16_type_node if
> >       BFmode is supported.
> >       * expmed.h (maybe_expand_shift): Declare.
> >       * expmed.cc (maybe_expand_shift): No longer static.
> >       * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> >       conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> >       conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> >       -ffast-math generic implementation for BF -> SF and SF -> BF
> >       conversions.
> >       * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
> >       * builtins.def (BUILT_IN_NANSF16B): New builtin.
> >       * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
> >       * config/i386/i386.cc (classify_argument): Handle E_BCmode.
> >       (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
> >       for -msse2.
> >       (ix86_mangle_type): Mangle BFmode as DF16b.
> >       (ix86_invalid_conversion, ix86_invalid_unary_op,
> >       ix86_invalid_binary_op): Remove.
> >       (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> >       TARGET_INVALID_BINARY_OP): Don't redefine.
> >       * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> >       (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> >       ix86_bf16_type_node, only create it if still NULL.
> >       * config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> >       * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
> > gcc/c-family/
> >       * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> >       predefine __BFLT16_*__ macros and for C++23 also
> >       __STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
> >       macros for -fbuilding-libgcc.
> >       * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
> > gcc/c/
> >       * c-typeck.cc (convert_arguments): Don't promote __bf16 to
> >       double.
> > gcc/cp/
> >       * cp-tree.h (extended_float_type_p): Return true for
> >       bfloat16_type_node.
> >       * typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> >       extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> > gcc/testsuite/
> >       * lib/target-supports.exp (check_effective_target_bfloat16,
> >       check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
> >       New.
> >       * gcc.dg/torture/bfloat16-basic.c: New test.
> >       * gcc.dg/torture/bfloat16-builtin.c: New test.
> >       * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
> >       * gcc.dg/torture/bfloat16-complex.c: New test.
> >       * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
> >       from bfloat16-builtin-issignaling-1.c.
> >       * gcc.dg/torture/floatn-basic.h: Allow to be includable from
> >       bfloat16-basic.c.
> >       * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
> >       diagnostics.
> >       * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
> >       * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
> >       * g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
> > libcpp/
> >       * include/cpplib.h (CPP_N_BFLOAT16): Define.
> >       * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> >       C++.
> > libgcc/
> >       * config/i386/t-softfp (softfp_extensions): Add bfsf.
> >       (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> >       (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
> >       CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
> >       -msse2.
> >       * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> >       __extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> >       * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> >       * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> >       * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> >       * soft-fp/brain.h: New file.
> >       * soft-fp/truncsfbf2.c: New file.
> >       * soft-fp/truncdfbf2.c: New file.
> >       * soft-fp/truncxfbf2.c: New file.
> >       * soft-fp/trunctfbf2.c: New file.
> >       * soft-fp/trunchfbf2.c: New file.
> >       * soft-fp/truncbfhf2.c: New file.
> >       * soft-fp/extendbfsf2.c: New file.
> > libiberty/
> >       * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> >       * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> >       entry.
> >       (cplus_demangle_type): Demangle DF16b.
> >       * testsuite/demangle-expected (_Z3xxxDF16b): New test.
> >
> > --- gcc/tree-core.h.jj        2022-10-10 09:31:57.683981308 +0200
> > +++ gcc/tree-core.h   2022-10-13 16:57:08.953775013 +0200
> > @@ -665,6 +665,9 @@ enum tree_index {
> >     TI_DOUBLE_TYPE,
> >     TI_LONG_DOUBLE_TYPE,
> >
> > +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> > +  TI_BFLOAT16_TYPE,
> > +
> >     /* The _FloatN and _FloatNx types must be consecutive, and in the
> >        same sequence as the corresponding complex types, which must also
> >        be consecutive; _FloatN must come before _FloatNx; the order must
> > --- gcc/tree.h.jj     2022-10-10 09:31:57.766980149 +0200
> > +++ gcc/tree.h        2022-10-13 17:22:14.728207071 +0200
> > @@ -4291,6 +4291,7 @@ tree_strip_any_location_wrapper (tree ex
> >   #define float_type_node                     global_trees[TI_FLOAT_TYPE]
> >   #define double_type_node            global_trees[TI_DOUBLE_TYPE]
> >   #define long_double_type_node               global_trees[TI_LONG_DOUBLE_TYPE]
> > +#define bfloat16_type_node           global_trees[TI_BFLOAT16_TYPE]
> >
> >   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
> >   #define FLOATN_TYPE_NODE(IDX)               global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> > --- gcc/tree.cc.jj    2022-10-10 09:31:57.743980470 +0200
> > +++ gcc/tree.cc       2022-10-13 16:57:08.956774972 +0200
> > @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
> >       = (flag_excess_precision == EXCESS_PRECISION_FAST
> >          ? EXCESS_PRECISION_TYPE_FAST
> >          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> > -       ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> > +       ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
> >
> >     enum flt_eval_method target_flt_eval_method
> >       = targetm.c.excess_precision (requested_type);
> > @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
> >     machine_mode float16_type_mode = (float16_type_node
> >                                   ? TYPE_MODE (float16_type_node)
> >                                   : VOIDmode);
> > +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> > +                                  ? TYPE_MODE (bfloat16_type_node)
> > +                                  : VOIDmode);
> >     machine_mode float_type_mode = TYPE_MODE (float_type_node);
> >     machine_mode double_type_mode = TYPE_MODE (double_type_node);
> >
> > @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
> >       switch (target_flt_eval_method)
> >         {
> >         case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> > -         if (type_mode == float16_type_mode)
> > +         if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode)
> >             return float_type_node;
> >           break;
> >         case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
> >           if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode
> >               || type_mode == float_type_mode)
> >             return double_type_node;
> >           break;
> >         case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
> >           if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode
> >               || type_mode == float_type_mode
> >               || type_mode == double_type_mode)
> >             return long_double_type_node;
> > @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
> >       switch (target_flt_eval_method)
> >         {
> >         case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> > -         if (type_mode == float16_type_mode)
> > +         if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode)
> >             return complex_float_type_node;
> >           break;
> >         case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
> >           if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode
> >               || type_mode == float_type_mode)
> >             return complex_double_type_node;
> >           break;
> >         case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
> >           if (type_mode == float16_type_mode
> > +             || type_mode == bfloat16_type_mode
> >               || type_mode == float_type_mode
> >               || type_mode == double_type_mode)
> >             return complex_long_double_type_node;
> > @@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
> >         SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
> >       }
> >     float128t_type_node = float128_type_node;
> > +#ifdef HAVE_BFmode
> > +  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
> > +      && targetm.scalar_mode_supported_p (BFmode)
> > +      && targetm.libgcc_floating_mode_supported_p (BFmode))
> > +    {
> > +      bfloat16_type_node = make_node (REAL_TYPE);
> > +      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
> > +      layout_type (bfloat16_type_node);
> > +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> > +    }
> > +#endif
> >
> >     float_ptr_type_node = build_pointer_type (float_type_node);
> >     double_ptr_type_node = build_pointer_type (double_type_node);
> > --- gcc/expmed.h.jj   2022-10-03 18:00:53.046735271 +0200
> > +++ gcc/expmed.h      2022-10-13 16:57:08.957774958 +0200
> > @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
> >                                 rtx, tree, rtx, int);
> >   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
> >                        int);
> > +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> > +                            int);
> >   #ifdef GCC_OPTABS_H
> >   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
> >                         rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> > --- gcc/expmed.cc.jj  2022-10-13 16:22:17.755496384 +0200
> > +++ gcc/expmed.cc     2022-10-13 16:57:08.957774958 +0200
> > @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
> >
> >   /* Likewise, but return 0 if that cannot be done.  */
> >
> > -static rtx
> > +rtx
> >   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
> >                   int amount, rtx target, int unsignedp)
> >   {
> > --- gcc/expr.cc.jj    2022-10-06 17:43:47.941502119 +0200
> > +++ gcc/expr.cc       2022-10-13 16:57:09.022774066 +0200
> > @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
> >         gcc_assert ((GET_MODE_PRECISION (from_mode)
> >                  != GET_MODE_PRECISION (to_mode))
> >                 || (DECIMAL_FLOAT_MODE_P (from_mode)
> > -                   != DECIMAL_FLOAT_MODE_P (to_mode)));
> > +                   != DECIMAL_FLOAT_MODE_P (to_mode))
> > +               || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> > +                   && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> > +               || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> > +                   && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
> >
> >         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
> >       /* Conversion between decimal float and binary float, same size.  */
> > @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
> >         return;
> >       }
> >
> > +#ifdef HAVE_SFmode
> > +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> > +       && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> > +     {
> > +       if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> > +         {
> > +           /* To cut down on libgcc size, implement
> > +              BFmode -> {DF,XF,TF}mode conversions by
> > +              BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> > +           rtx temp = gen_reg_rtx (SFmode);
> > +           convert_mode_scalar (temp, from, unsignedp);
> > +           convert_mode_scalar (to, temp, unsignedp);
> > +           return;
> > +         }
> > +       if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> > +         {
> > +           /* Similarly, implement BFmode -> HFmode as
> > +              BFmode -> SFmode -> HFmode conversion where SFmode
> > +              has superset of BFmode values.  We don't need
> > +              to handle sNaNs by raising exception and turning
> > +              into into qNaN though, as that can be done in the
> > +              SFmode -> HFmode conversion too.  */
> > +           rtx temp = gen_reg_rtx (SFmode);
> > +           int save_flag_finite_math_only = flag_finite_math_only;
> > +           flag_finite_math_only = true;
> > +           convert_mode_scalar (temp, from, unsignedp);
> > +           flag_finite_math_only = save_flag_finite_math_only;
> > +           convert_mode_scalar (to, temp, unsignedp);
> > +           return;
> > +         }
> > +       if (to_mode == SFmode
> > +           && !HONOR_NANS (from_mode)
> > +           && !HONOR_NANS (to_mode)
> > +           && optimize_insn_for_speed_p ())
> > +         {
> > +           /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> > +              shift the bits up.  */
> > +           machine_mode fromi_mode, toi_mode;
> > +           if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> > +                                  0).exists (&fromi_mode)
> > +               && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> > +                                     0).exists (&toi_mode))
> > +             {
> > +               start_sequence ();
> > +               rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> > +               rtx tof = NULL_RTX;
> > +               if (fromi)
> > +                 {
> > +                   rtx toi = gen_reg_rtx (toi_mode);
> > +                   convert_mode_scalar (toi, fromi, 1);
> > +                   toi
> > +                     = maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> > +                                           GET_MODE_PRECISION (to_mode)
> > +                                           - GET_MODE_PRECISION (from_mode),
> > +                                           NULL_RTX, 1);
> > +                   if (toi)
> > +                     {
> > +                       tof = lowpart_subreg (to_mode, toi, toi_mode);
> > +                       if (tof)
> > +                         emit_move_insn (to, tof);
> > +                     }
> > +                 }
> > +               insns = get_insns ();
> > +               end_sequence ();
> > +               if (tof)
> > +                 {
> > +                   emit_insn (insns);
> > +                   return;
> > +                 }
> > +             }
> > +         }
> > +     }
> > +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> > +       && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> > +       && !HONOR_NANS (from_mode)
> > +       && !HONOR_NANS (to_mode)
> > +       && !flag_rounding_math
> > +       && optimize_insn_for_speed_p ())
> > +     {
> > +       /* If we don't expect qNaNs nor sNaNs and can assume rounding
> > +          to nearest, we can expand the conversion inline as
> > +          (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> > +       machine_mode fromi_mode, toi_mode;
> > +       if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> > +                              0).exists (&fromi_mode)
> > +           && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> > +                                 0).exists (&toi_mode))
> > +         {
> > +           start_sequence ();
> > +           rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> > +           rtx tof = NULL_RTX;
> > +           do
> > +             {
> > +               if (!fromi)
> > +                 break;
> > +               int shift = (GET_MODE_PRECISION (from_mode)
> > +                            - GET_MODE_PRECISION (to_mode));
> > +               rtx temp1
> > +                 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> > +                                       shift, NULL_RTX, 1);
> > +               if (!temp1)
> > +                 break;
> > +               rtx temp2
> > +                 = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> > +                                 NULL_RTX, 1, OPTAB_DIRECT);
> > +               if (!temp2)
> > +                 break;
> > +               rtx temp3
> > +                 = expand_binop (fromi_mode, add_optab, fromi,
> > +                                 gen_int_mode ((HOST_WIDE_INT_1U
> > +                                                << (shift - 1)) - 1,
> > +                                               fromi_mode), NULL_RTX,
> > +                                 1, OPTAB_DIRECT);
> > +               if (!temp3)
> > +                 break;
> > +               rtx temp4
> > +                 = expand_binop (fromi_mode, add_optab, temp3, temp2,
> > +                                 NULL_RTX, 1, OPTAB_DIRECT);
> > +               if (!temp4)
> > +                 break;
> > +               rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> > +                                               temp4, shift, NULL_RTX, 1);
> > +               if (!temp5)
> > +                 break;
> > +               rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> > +               if (!temp6)
> > +                 break;
> > +               tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> > +                                     toi_mode);
> > +               if (tof)
> > +                 emit_move_insn (to, tof);
> > +             }
> > +           while (0);
> > +           insns = get_insns ();
> > +           end_sequence ();
> > +           if (tof)
> > +             {
> > +               emit_insn (insns);
> > +               return;
> > +             }
> > +         }
> > +     }
> > +#endif
> > +
> >         /* Otherwise use a libcall.  */
> >         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
> >
> > --- gcc/builtin-types.def.jj  2022-10-03 18:00:52.658740505 +0200
> > +++ gcc/builtin-types.def     2022-10-13 17:09:52.930317869 +0200
> > @@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
> >   DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
> >   DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
> >   DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
> > +DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
> > +                               ? bfloat16_type_node
> > +                               : error_mark_node))
> >   DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
> >                                ? float16_type_node
> >                                : error_mark_node))
> > @@ -264,6 +267,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
> >   DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
> >   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
> >                    BT_LONGDOUBLE, BT_CONST_STRING)
> > +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
> >   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
> >   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
> >   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
> > --- gcc/builtins.def.jj       2022-10-03 18:00:52.679740221 +0200
> > +++ gcc/builtins.def  2022-10-13 17:09:05.633962625 +0200
> > @@ -514,6 +514,7 @@ DEF_GCC_BUILTIN        (BUILT_IN_NANSF,
> >   DEF_GCC_BUILTIN        (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> >   DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL)
> >   #undef NAN_TYPE
> > +DEF_GCC_BUILTIN        (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> >   DEF_GCC_BUILTIN        (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> >   DEF_GCC_BUILTIN        (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> >   DEF_GCC_BUILTIN        (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> > --- gcc/fold-const-call.cc.jj 2022-09-03 09:35:41.107989686 +0200
> > +++ gcc/fold-const-call.cc    2022-10-13 17:20:59.579229947 +0200
> > @@ -1301,6 +1301,7 @@ fold_const_call (combined_fn fn, tree ty
> >
> >       CASE_CFN_NANS:
> >       CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
> > +    case CFN_BUILT_IN_NANSF16B:
> >       case CFN_BUILT_IN_NANSD32:
> >       case CFN_BUILT_IN_NANSD64:
> >       case CFN_BUILT_IN_NANSD128:
> > --- gcc/config/i386/i386.cc.jj        2022-10-03 18:00:52.942736674 +0200
> > +++ gcc/config/i386/i386.cc   2022-10-13 16:57:09.092773105 +0200
> > @@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
> >         classes[1] = X86_64_SSEUP_CLASS;
> >         return 2;
> >       case E_HCmode:
> > +    case E_BCmode:
> >         classes[0] = X86_64_SSE_CLASS;
> >         if (!(bit_offset % 64))
> >       return 1;
> > @@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
> >        be defined by the C front-end for AVX512FP16 intrinsics.  We will
> >        issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> >        enabled.  */
> > -  return ((mode == HFmode && TARGET_SSE2)
> > +  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
> >         ? true
> >         : default_libgcc_floating_mode_supported_p (mode));
> >   }
> > @@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
> >     switch (TYPE_MODE (type))
> >       {
> >       case E_BFmode:
> > -      return "u6__bf16";
> > +      return "DF16b";
> >       case E_HFmode:
> >         /* _Float16 is "DF16_".
> >        Align with clang's decision in https://reviews.llvm.org/D33719. */
> > @@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
> >       }
> >   }
> >
> > -/* Return the diagnostic message string if conversion from FROMTYPE to
> > -   TOTYPE is not allowed, NULL otherwise.  */
> > -
> > -static const char *
> > -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> > -{
> > -  if (element_mode (fromtype) != element_mode (totype))
> > -    {
> > -      /* Do no allow conversions to/from BFmode scalar types.  */
> > -      if (TYPE_MODE (fromtype) == BFmode)
> > -     return N_("invalid conversion from type %<__bf16%>");
> > -      if (TYPE_MODE (totype) == BFmode)
> > -     return N_("invalid conversion to type %<__bf16%>");
> > -    }
> > -
> > -  /* Conversion allowed.  */
> > -  return NULL;
> > -}
> > -
> > -/* Return the diagnostic message string if the unary operation OP is
> > -   not permitted on TYPE, NULL otherwise.  */
> > -
> > -static const char *
> > -ix86_invalid_unary_op (int op, const_tree type)
> > -{
> > -  /* Reject all single-operand operations on BFmode except for &.  */
> > -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> > -    return N_("operation not permitted on type %<__bf16%>");
> > -
> > -  /* Operation allowed.  */
> > -  return NULL;
> > -}
> > -
> > -/* Return the diagnostic message string if the binary operation OP is
> > -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> > -
> > -static const char *
> > -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> > -                        const_tree type2)
> > -{
> > -  /* Reject all 2-operand operations on BFmode.  */
> > -  if (element_mode (type1) == BFmode
> > -      || element_mode (type2) == BFmode)
> > -    return N_("operation not permitted on type %<__bf16%>");
> > -
> > -  /* Operation allowed.  */
> > -  return NULL;
> > -}
> > -
> >   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
> >
> >   static tree
> > @@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
> >   #undef TARGET_MANGLE_TYPE
> >   #define TARGET_MANGLE_TYPE ix86_mangle_type
> >
> > -#undef TARGET_INVALID_CONVERSION
> > -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> > -
> > -#undef TARGET_INVALID_UNARY_OP
> > -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> > -
> > -#undef TARGET_INVALID_BINARY_OP
> > -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> > -
> >   #undef TARGET_STACK_PROTECT_GUARD
> >   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
> >
> > --- gcc/config/i386/i386-builtins.cc.jj       2022-10-03 18:00:52.918736997 +0200
> > +++ gcc/config/i386/i386-builtins.cc  2022-10-13 16:57:09.119772735 +0200
> > @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
> >   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
> >
> >   tree ix86_float16_type_node = NULL_TREE;
> > -tree ix86_bf16_type_node = NULL_TREE;
> >   tree ix86_bf16_ptr_type_node = NULL_TREE;
> >
> >   /* Retrieve an element from the above table, building some of
> > @@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
> >   static void
> >   ix86_register_bf16_builtin_type (void)
> >   {
> > -  ix86_bf16_type_node = make_node (REAL_TYPE);
> > -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> > -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> > -  layout_type (ix86_bf16_type_node);
> > +  if (bfloat16_type_node == NULL_TREE)
> > +    {
> > +      bfloat16_type_node = make_node (REAL_TYPE);
> > +      TYPE_PRECISION (bfloat16_type_node) = 16;
> > +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> > +      layout_type (bfloat16_type_node);
> > +    }
> >
> >     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
> >       {
> > -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> > -                                         "__bf16");
> > -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> > +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> > +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
> >       }
> >   }
> >
> > --- gcc/config/i386/i386-builtin-types.def.jj 2022-10-03 18:00:52.894737321 +0200
> > +++ gcc/config/i386/i386-builtin-types.def    2022-10-13 16:57:09.139772460 +0200
> > @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
> >   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
> >   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> >   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> > -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> > +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
> >   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
> >   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
> >   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> > --- gcc/config/i386/i386.md.jj        2022-10-11 15:57:05.005762022 +0200
> > +++ gcc/config/i386/i386.md   2022-10-13 16:57:09.187771801 +0200
> > @@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
> >     DONE;
> >   })
> >
> > +(define_expand "cbranchbf4"
> > +  [(set (reg:CC FLAGS_REG)
> > +     (compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
> > +                 (match_operand:BF 2 "cmp_fp_expander_operand")))
> > +   (set (pc) (if_then_else
> > +           (match_operator 0 "comparison_operator"
> > +            [(reg:CC FLAGS_REG)
> > +             (const_int 0)])
> > +           (label_ref (match_operand 3))
> > +           (pc)))]
> > +  ""
> > +{
> > +  rtx op1 = gen_lowpart (HImode, operands[1]);
> > +  if (CONST_INT_P (op1))
> > +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> > +                                       operands[1], BFmode);
> > +  else
> > +    {
> > +      rtx t1 = gen_reg_rtx (SImode);
> > +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> > +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> > +      op1 = gen_lowpart (SFmode, t1);
> > +    }
> > +  rtx op2 = gen_lowpart (HImode, operands[2]);
> > +  if (CONST_INT_P (op2))
> > +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> > +                                       operands[2], BFmode);
> > +  else
> > +    {
> > +      rtx t2 = gen_reg_rtx (SImode);
> > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > +      op2 = gen_lowpart (SFmode, t2);
> > +    }
> > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > +                        SFmode, NULL_RTX, NULL,
> > +                        as_a <rtx_code_label *> (operands[3]),
> > +                        /* Unfortunately this isn't propagated.  */
> > +                        profile_probability::even ());

You could use ix86_expand_branch instead of do_compare_rtx_and_jump
here. This would expand in SFmode, so insn condition from cbranchsf4
should be copied here:

  "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"

Additionally, ix86_fp_comparison_operator predicate should be used for
operator0. Basically, just copy predicates from cbranchsf4 as we are
effectively expanding the SFmode compare & branch.

> > +  DONE;
> > +})
> > +
> >   (define_expand "cstorehf4"
> >     [(set (reg:CC FLAGS_REG)
> >       (compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
> > @@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
> >     DONE;
> >   })
> >
> > +(define_expand "cstorebf4"
> > +  [(set (reg:CC FLAGS_REG)
> > +     (compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
> > +                 (match_operand:BF 3 "cmp_fp_expander_operand")))
> > +   (set (match_operand:QI 0 "register_operand")
> > +     (match_operator 1 "comparison_operator"
> > +       [(reg:CC FLAGS_REG)
> > +        (const_int 0)]))]
> > +  ""
> > +{
> > +  rtx op1 = gen_lowpart (HImode, operands[2]);
> > +  if (CONST_INT_P (op1))
> > +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> > +                                       operands[2], BFmode);
> > +  else
> > +    {
> > +      rtx t1 = gen_reg_rtx (SImode);
> > +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> > +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> > +      op1 = gen_lowpart (SFmode, t1);
> > +    }
> > +  rtx op2 = gen_lowpart (HImode, operands[3]);
> > +  if (CONST_INT_P (op2))
> > +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> > +                                       operands[3], BFmode);
> > +  else
> > +    {
> > +      rtx t2 = gen_reg_rtx (SImode);
> > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > +      op2 = gen_lowpart (SFmode, t2);
> > +    }

Similar to cbranch above, use ix86_expand_setcc and copy predicates
from cstoresf4.

Uros.

> > +  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
> > +                                op1, op2, SFmode, 0, 1);
> > +  if (!rtx_equal_p (res, operands[0]))
> > +    emit_move_insn (operands[0], res);
> > +  DONE;
> > +})
> > +
> >   (define_expand "cstore<mode>4"
> >     [(set (reg:CC FLAGS_REG)
> >       (compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
> > --- gcc/c-family/c-cppbuiltin.cc.jj   2022-10-13 08:41:04.718165419 +0200
> > +++ gcc/c-family/c-cppbuiltin.cc      2022-10-13 17:51:07.722665421 +0200
> > @@ -1260,6 +1260,13 @@ c_cpp_builtins (cpp_reader *pfile)
> >         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
> >                                     csuffix, FLOATN_NX_TYPE_NODE (i));
> >       }
> > +  if (bfloat16_type_node)
> > +    {
> > +      if (c_dialect_cxx () && cxx_dialect > cxx20)
> > +     cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> > +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> > +                                   "BF16", bfloat16_type_node);
> > +    }
> >
> >     /* For float.h.  */
> >     if (targetm.decimal_float_supported_p ())
> > @@ -1370,6 +1377,12 @@ c_cpp_builtins (cpp_reader *pfile)
> >             suffix[0] = 'l';
> >             memcpy (float_h_prefix, "LDBL", 5);
> >           }
> > +       else if (bfloat16_type_node
> > +                && mode == TYPE_MODE (bfloat16_type_node))
> > +         {
> > +           memcpy (suffix, "bf16", 5);
> > +           memcpy (float_h_prefix, "BFLT16", 7);
> > +         }
> >         else
> >           {
> >             bool found_suffix = false;
> > @@ -1396,22 +1409,28 @@ c_cpp_builtins (cpp_reader *pfile)
> >         machine_mode float16_type_mode = (float16_type_node
> >                                           ? TYPE_MODE (float16_type_node)
> >                                           : VOIDmode);
> > +       machine_mode bfloat16_type_mode = (bfloat16_type_node
> > +                                          ? TYPE_MODE (bfloat16_type_node)
> > +                                          : VOIDmode);
> >         switch (targetm.c.excess_precision
> >                   (EXCESS_PRECISION_TYPE_IMPLICIT))
> >           {
> >           case FLT_EVAL_METHOD_UNPREDICTABLE:
> >           case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
> >             excess_precision = (mode == float16_type_mode
> > +                               || mode == bfloat16_type_mode
> >                                 || mode == TYPE_MODE (float_type_node)
> >                                 || mode == TYPE_MODE (double_type_node));
> >             break;
> >
> >           case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
> >             excess_precision = (mode == float16_type_mode
> > +                               || mode == bfloat16_type_mode
> >                                 || mode == TYPE_MODE (float_type_node));
> >             break;
> >           case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> > -           excess_precision = mode == float16_type_mode;
> > +           excess_precision = (mode == float16_type_mode
> > +                               || mode == bfloat16_type_mode);
> >             break;
> >           case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16:
> >             excess_precision = false;
> > --- gcc/c-family/c-lex.cc.jj  2022-10-13 16:21:52.548842666 +0200
> > +++ gcc/c-family/c-lex.cc     2022-10-13 16:59:51.778540099 +0200
> > @@ -1000,6 +1000,22 @@ interpret_float (const cpp_token *token,
> >         pedwarn (input_location, OPT_Wpedantic,
> >                  "non-standard suffix on floating constant");
> >         }
> > +    else if ((flags & CPP_N_BFLOAT16) != 0)
> > +      {
> > +     type = bfloat16_type_node;
> > +     if (type == NULL_TREE)
> > +       {
> > +         error ("unsupported non-standard suffix on floating constant");
> > +         return error_mark_node;
> > +       }
> > +     if (!c_dialect_cxx ())
> > +       pedwarn (input_location, OPT_Wpedantic,
> > +                "non-standard suffix on floating constant");
> > +     else if (cxx_dialect < cxx23)
> > +       pedwarn (input_location, OPT_Wpedantic,
> > +                "%<bf16%> or %<BF16%> suffix on floating constant only "
> > +                "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> > +      }
> >       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
> >         type = long_double_type_node;
> >       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> > --- gcc/c/c-typeck.cc.jj      2022-10-06 17:43:47.900502672 +0200
> > +++ gcc/c/c-typeck.cc 2022-10-13 16:57:09.226771266 +0200
> > @@ -3678,6 +3678,9 @@ convert_arguments (location_t loc, vec<l
> >               promote_float_arg = false;
> >               break;
> >             }
> > +       /* Don't promote __bf16 either.  */
> > +       if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
> > +         promote_float_arg = false;
> >       }
> >
> >         if (type != NULL_TREE)
> > --- gcc/cp/cp-tree.h.jj       2022-10-13 16:21:52.600841952 +0200
> > +++ gcc/cp/cp-tree.h  2022-10-13 16:57:09.241771060 +0200
> > @@ -8741,6 +8741,8 @@ extended_float_type_p (tree type)
> >     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
> >       if (type == FLOATN_TYPE_NODE (i))
> >         return true;
> > +  if (type == bfloat16_type_node)
> > +    return true;
> >     return false;
> >   }
> >
> > --- gcc/cp/typeck.cc.jj       2022-10-13 16:21:52.642841375 +0200
> > +++ gcc/cp/typeck.cc  2022-10-13 16:57:09.269770676 +0200
> > @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
> >         if (mv2 == FLOATN_NX_TYPE_NODE (i))
> >       extended2 = i + 1;
> >       }
> > +  if (mv1 == bfloat16_type_node)
> > +    extended1 = true;
> > +  if (mv2 == bfloat16_type_node)
> > +    extended2 = true;
> >     if (extended2 && !extended1)
> >       {
> >         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> > @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
> >     if (cnt > 1 && mv2 == long_double_type_node)
> >       return -2;
> >     /* Otherwise, they have equal rank, but extended types
> > -     (other than std::bfloat16_t) have higher subrank.  */
> > +     (other than std::bfloat16_t) have higher subrank.
> > +     std::bfloat16_t shouldn't have equal rank to any standard
> > +     floating point type.  */
> >     return 1;
> >   }
> >
> > --- gcc/testsuite/lib/target-supports.exp.jj  2022-10-11 14:50:14.472773574 +0200
> > +++ gcc/testsuite/lib/target-supports.exp     2022-10-13 16:57:09.270770662 +0200
> > @@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
> >       return 1
> >   }
> >
> > +# Return 1 if the target supports the __bf16 type, 0 otherwise.
> > +
> > +proc check_effective_target_bfloat16 {} {
> > +    return [check_no_compiler_messages_nocache bfloat16 object {
> > +     __bf16 foo (__bf16 x) { return x + x; }
> > +    } [add_options_for_bfloat16 ""]]
> > +}
> > +
> > +proc check_effective_target_bfloat16_runtime {} {
> > +    return [check_effective_target_bfloat16]
> > +}
> > +
> > +proc add_options_for_bfloat16 { flags } {
> > +    return "$flags"
> > +}
> > +
> >   # Return 1 if the target supports all four forms of fused multiply-add
> >   # (fma, fms, fnma, and fnms) for both float and double.
> >
> > --- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj  2022-10-13 16:57:09.271770648 +0200
> > +++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c     2022-10-13 17:32:28.531884882 +0200
> > @@ -0,0 +1,11 @@
> > +/* Test __bf16.  */
> > +/* { dg-do run } */
> > +/* { dg-options "" } */
> > +/* { dg-add-options bfloat16 } */
> > +/* { dg-require-effective-target bfloat16_runtime } */
> > +
> > +#define TYPE __bf16
> > +#define CST(C) CONCAT (C, bf16)
> > +#define CSTU(C) CONCAT (C, BF16)
> > +
> > +#include "floatn-basic.h"
> > --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj        2022-10-13 16:57:09.271770648 +0200
> > +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c   2022-10-13 18:09:24.288913634 +0200
> > @@ -0,0 +1,47 @@
> > +/* Test __bf16 built-in functions.  */
> > +/* { dg-do run } */
> > +/* { dg-options "" } */
> > +/* { dg-add-options bfloat16 } */
> > +/* { dg-add-options ieee } */
> > +/* { dg-require-effective-target bfloat16_runtime } */
> > +
> > +extern void exit (int);
> > +extern void abort (void);
> > +
> > +extern __bf16 test_type;
> > +extern __typeof (__builtin_nansf16b ("")) test_type;
> > +
> > +volatile __bf16 inf_cst = (__bf16) __builtin_inff ();
> > +volatile __bf16 huge_val_cst = (__bf16) __builtin_huge_valf ();
> > +volatile __bf16 nan_cst = (__bf16) __builtin_nanf ("");
> > +volatile __bf16 nans_cst = __builtin_nansf16b ("");
> > +volatile __bf16 neg0 = -0.0bf16, neg1 = -1.0bf16, one = 1.0;
> > +
> > +int
> > +main (void)
> > +{
> > +  volatile __bf16 r;
> > +  if (!__builtin_isinf (inf_cst))
> > +    abort ();
> > +  if (!__builtin_isinf (huge_val_cst))
> > +    abort ();
> > +  if (inf_cst != huge_val_cst)
> > +    abort ();
> > +  if (!__builtin_isnan (nan_cst))
> > +    abort ();
> > +  if (!__builtin_isnan (nans_cst))
> > +    abort ();
> > +  r = __builtin_fabsf (neg1);
> > +  if (r != 1.0bf16)
> > +    abort ();
> > +  r = __builtin_copysignf (one, neg0);
> > +  if (r != neg1)
> > +    abort ();
> > +  r = __builtin_copysignf (inf_cst, neg1);
> > +  if (r != -huge_val_cst)
> > +    abort ();
> > +  r = __builtin_copysignf (-inf_cst, one);
> > +  if (r != huge_val_cst)
> > +    abort ();
> > +  exit (0);
> > +}
> > --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj  2022-10-13 16:57:09.271770648 +0200
> > +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c     2022-10-13 17:40:15.067555349 +0200
> > @@ -0,0 +1,21 @@
> > +/* Test __bf16 __builtin_issignaling.  */
> > +/* { dg-do run } */
> > +/* { dg-options "" } */
> > +/* { dg-add-options bfloat16 } */
> > +/* { dg-add-options ieee } */
> > +/* { dg-require-effective-target bfloat16_runtime } */
> > +/* { dg-additional-options "-fsignaling-nans" } */
> > +/* Workaround for PR57484 on ia32: */
> > +/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
> > +
> > +#define CONCATX(X, Y) X ## Y
> > +#define CONCAT(X, Y) CONCATX (X, Y)
> > +
> > +#define TYPE __bf16
> > +#define CST(C) CONCAT (C, bf16)
> > +#define FN(F) CONCAT (F, f16b)
> > +#define NAN(x) ((__bf16) __builtin_nanf (x))
> > +#define INF ((__bf16) __builtin_inff ())
> > +#define EXT 0
> > +
> > +#include "builtin-issignaling-1.c"
> > --- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj        2022-10-13 16:57:09.271770648 +0200
> > +++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c   2022-10-13 17:46:43.259267724 +0200
> > @@ -0,0 +1,61 @@
> > +/* Test __bf16 complex arithmetic.  */
> > +/* { dg-do run } */
> > +/* { dg-options "" } */
> > +/* { dg-add-options bfloat16 } */
> > +/* { dg-require-effective-target bfloat16_runtime } */
> > +
> > +extern void exit (int);
> > +extern void abort (void);
> > +
> > +volatile __bf16 a = 1.0bf16;
> > +typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
> > +volatile __cbf16 b = __builtin_complex (2.0bf16, 3.0bf16);
> > +volatile __cbf16 c = __builtin_complex (2.0bf16, 3.0bf16);
> > +volatile __cbf16 d = __builtin_complex (2.0bf16, 3.0bf16);
> > +
> > +__cbf16
> > +fn (__cbf16 arg)
> > +{
> > +  return arg / 4;
> > +}
> > +
> > +int
> > +main (void)
> > +{
> > +  volatile __cbf16 r;
> > +  if (b != c)
> > +    abort ();
> > +  if (b != d)
> > +    abort ();
> > +  r = a + b;
> > +  if (__real__ r != 3.0bf16 || __imag__ r != 3.0bf16)
> > +    abort ();
> > +  r += d;
> > +  if (__real__ r != 5.0bf16 || __imag__ r != 6.0bf16)
> > +    abort ();
> > +  r -= a;
> > +  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
> > +    abort ();
> > +  r /= (a + a);
> > +  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
> > +    abort ();
> > +  r *= (a + a);
> > +  if (__real__ r != 4.0bf16 || __imag__ r != 6.0bf16)
> > +    abort ();
> > +  r -= b;
> > +  if (__real__ r != 2.0bf16 || __imag__ r != 3.0bf16)
> > +    abort ();
> > +  r *= r;
> > +  if (__real__ r != -5.0bf16 || __imag__ r != 12.0bf16)
> > +    abort ();
> > +  /* Division may not be exact, so round result before comparing.  */
> > +  r /= b;
> > +  r += __builtin_complex (100.0bf16, 100.0bf16);
> > +  r -= __builtin_complex (100.0bf16, 100.0bf16);
> > +  if (r != b)
> > +    abort ();
> > +  r = fn (r);
> > +  if (__real__ r != 0.5bf16 || __imag__ r != 0.75bf16)
> > +    abort ();
> > +  exit (0);
> > +}
> > --- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj   2022-10-03 18:00:53.118734300 +0200
> > +++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c      2022-10-13 17:39:19.387313780 +0200
> > @@ -4,7 +4,7 @@
> >   /* Workaround for PR57484 on ia32: */
> >   /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
> >
> > -#ifndef EXT
> > +#if !defined(EXT) && !defined(TYPE)
> >   int
> >   f1 (void)
> >   {
> > @@ -41,31 +41,42 @@ f6 (long double x)
> >     return __builtin_issignaling (x);
> >   }
> >   #else
> > -#define CONCATX(X, Y) X ## Y
> > -#define CONCAT(X, Y) CONCATX (X, Y)
> > -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> > -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> > -
> > -#if EXT
> > -# define TYPE CONCAT3 (_Float, WIDTH, x)
> > -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> > -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> > -#else
> > -# define TYPE CONCAT (_Float, WIDTH)
> > -# define CST(C) CONCAT3 (C, f, WIDTH)
> > -# define FN(F) CONCAT3 (F, f, WIDTH)
> > +#ifndef TYPE
> > +# define CONCATX(X, Y) X ## Y
> > +# define CONCAT(X, Y) CONCATX (X, Y)
> > +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> > +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> > +
> > +# if EXT
> > +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> > +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> > +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> > +# else
> > +#  define TYPE CONCAT (_Float, WIDTH)
> > +#  define CST(C) CONCAT3 (C, f, WIDTH)
> > +#  define FN(F) CONCAT3 (F, f, WIDTH)
> > +# endif
> > +#endif
> > +#ifndef NANS
> > +# define NANS(x) FN (__builtin_nans) (x)
> > +#endif
> > +#ifndef NAN
> > +# define NAN(x) FN (__builtin_nan) (x)
> > +#endif
> > +#ifndef INF
> > +# define INF FN (__builtin_inf) ()
> >   #endif
> >
> >   int
> >   f1 (void)
> >   {
> > -  return __builtin_issignaling (FN (__builtin_nans) (""));
> > +  return __builtin_issignaling (NANS (""));
> >   }
> >
> >   int
> >   f2 (void)
> >   {
> > -  return __builtin_issignaling (FN (__builtin_nan) (""));
> > +  return __builtin_issignaling (NAN (""));
> >   }
> >
> >   int
> > @@ -118,10 +129,10 @@ main ()
> >     if (!f6 (z))
> >       __builtin_abort ();
> >   #else
> > -  if (f4 (w) || !f4 (FN (__builtin_nans) ("0x123")) || f4 (CST (42.0)) || f4 (FN (__builtin_nan) ("0x234"))
> > -      || f4 (FN (__builtin_inf) ()) || f4 (-FN (__builtin_inf) ()) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
> > +  if (f4 (w) || !f4 (NANS ("0x123")) || f4 (CST (42.0)) || f4 (NAN ("0x234"))
> > +      || f4 (INF) || f4 (-INF) || f4 (CST (-42.0)) || f4 (CST (-0.0)) || f4 (CST (0.0)))
> >       __builtin_abort ();
> > -  w = FN (__builtin_nans) ("");
> > +  w = NANS ("");
> >     asm volatile ("" : : : "memory");
> >     if (!f4 (w))
> >       __builtin_abort ();
> > --- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj    2022-10-03 18:00:53.118734300 +0200
> > +++ gcc/testsuite/gcc.dg/torture/floatn-basic.h       2022-10-13 16:57:09.285770456 +0200
> > @@ -9,14 +9,16 @@
> >   #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> >   #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> >
> > -#if EXT
> > -# define TYPE CONCAT3 (_Float, WIDTH, x)
> > -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> > -# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> > -#else
> > -# define TYPE CONCAT (_Float, WIDTH)
> > -# define CST(C) CONCAT3 (C, f, WIDTH)
> > -# define CSTU(C) CONCAT3 (C, F, WIDTH)
> > +#ifndef TYPE
> > +# if EXT
> > +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> > +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> > +#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> > +# else
> > +#  define TYPE CONCAT (_Float, WIDTH)
> > +#  define CST(C) CONCAT3 (C, f, WIDTH)
> > +#  define CSTU(C) CONCAT3 (C, F, WIDTH)
> > +# endif
> >   #endif
> >
> >   extern void exit (int);
> > --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj      2022-10-03 18:00:53.137734043 +0200
> > +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c 2022-10-13 16:57:09.306770168 +0200
> > @@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
> >     __m256bf16 vector2_1 = {};
> >     __m256bf16 vector2_2 = { glob_bfloat };
> >     __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> > -  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
> > -
> > -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  __m256bf16 vector2_4 = { 0 };
> > +  __m256bf16 vector2_5 = { 0.1 };
> > +  __m256bf16 vector2_6 = { is_a_float16 };
> > +  __m256bf16 vector2_7 = { is_a_float };
> > +  __m256bf16 vector2_8 = { is_an_int };
> > +  __m256bf16 vector2_9 = { is_a_short_int };
> > +  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> > +
> > +  __v8si initi_2_1 = { glob_bfloat };
> > +  __m256 initi_2_2 = { glob_bfloat };
> > +  __m256h initi_2_3 = { glob_bfloat };
> > +  __m256i initi_2_5 = { glob_bfloat };
> > +  __v16hi initi_2_6 = { glob_bfloat };
> >
> >     /* Assignments to/from vectors.  */
> >
> > @@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
> >     /* Assignments to/from elements.  */
> >
> >     vector2_3[0] = glob_bfloat;
> > -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  vector2_3[0] = is_an_int;
> > +  vector2_3[0] = is_a_short_int;
> > +  vector2_3[0] = is_a_float;
> > +  vector2_3[0] = is_a_float16;
> > +  vector2_3[0] = 0;
> > +  vector2_3[0] = 0.1;
> >
> >     glob_bfloat = vector2_3[0];
> > -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  is_an_int = vector2_3[0];
> > +  is_a_short_int = vector2_3[0];
> > +  is_a_float = vector2_3[0];
> > +  is_a_float16 = vector2_3[0];
> >
> >     /* Compound literals.  */
> >
> >     (__m256bf16) {};
> >
> > -  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  (__m256bf16) { 0 };
> > +  (__m256bf16) { 0.1 };
> >     (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
> >     (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
> >     (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
> > @@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
> >     bfloat_ptr = &bfloat_ptr3[1];
> >
> >     /* Simple comparison.  */
> > -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  vector0 > glob_bfloat_vec;
> > +  glob_bfloat_vec == vector0;
> > +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> > +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> > +  vector0 > 0;
> > +  0 == vector0;
> > +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> > +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> > +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> > +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> >
> >     /* Pointer comparison.  */
> >
> > @@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
> >
> >     /* Unary operators.  */
> >
> > -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  +vector0;
> > +  -vector0;
> > +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> > +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
> >     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> > -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> > +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> > +  ++vector0;
> > +  --vector0;
> > +  vector0++;
> > +  vector0--;
> >
> >     /* Binary arithmetic operations.  */
> >
> > -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> > +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> > +  vector0 = glob_bfloat_vec + 0;
> > +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
> >
> >     return vector0;
> >   }
> > --- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj 2022-10-03 18:00:53.136734057 +0200
> > +++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c    2022-10-13 16:57:09.327769880 +0200
> > @@ -12,8 +12,8 @@ double is_a_double;
> >
> >   float *float_ptr;
> >
> > -__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> > -__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> > +__bf16 foo1 (void) { return (__bf16) 0x1234; }
> > +__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
> >
> >   __bf16 footest (__bf16 scalar0)
> >   {
> > @@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
> >
> >     __bf16 scalar1_1;
> >     __bf16 scalar1_2 = glob_bfloat;
> > -  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -
> > -  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  __bf16 scalar1_3 = 0;
> > +  __bf16 scalar1_4 = 0.1;
> > +  __bf16 scalar1_5 = is_a_float;
> > +  __bf16 scalar1_6 = is_an_int;
> > +  __bf16 scalar1_7 = is_a_float16;
> > +  __bf16 scalar1_8 = is_a_double;
> > +  __bf16 scalar1_9 = is_a_short_int;
> > +
> > +  int initi_1_1 = glob_bfloat;
> > +  float initi_1_2 = glob_bfloat;
> > +  _Float16 initi_1_3 = glob_bfloat;
> > +  short initi_1_4 = glob_bfloat;
> > +  double initi_1_5 = glob_bfloat;
> >
> >     __bf16 scalar2_1 = {};
> >     __bf16 scalar2_2 = { glob_bfloat };
> > -  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -
> > -  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  __bf16 scalar2_3 = { 0 };
> > +  __bf16 scalar2_4 = { 0.1 };
> > +  __bf16 scalar2_5 = { is_a_float };
> > +  __bf16 scalar2_6 = { is_an_int };
> > +  __bf16 scalar2_7 = { is_a_float16 };
> > +  __bf16 scalar2_8 = { is_a_double };
> > +  __bf16 scalar2_9 = { is_a_short_int };
> > +
> > +  int initi_2_1 = { glob_bfloat };
> > +  float initi_2_2 = { glob_bfloat };
> > +  _Float16 initi_2_3 = { glob_bfloat };
> > +  short initi_2_4 = { glob_bfloat };
> > +  double initi_2_5 = { glob_bfloat };
> >
> >     /* Assignments.  */
> >
> >     glob_bfloat = glob_bfloat;
> > -  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -
> > -  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  glob_bfloat = 0;
> > +  glob_bfloat = 0.1;
> > +  glob_bfloat = is_a_float;
> > +  glob_bfloat = is_an_int;
> > +  glob_bfloat = is_a_float16;
> > +  glob_bfloat = is_a_double;
> > +  glob_bfloat = is_a_short_int;
> > +
> > +  is_an_int = glob_bfloat;
> > +  is_a_float = glob_bfloat;
> > +  is_a_float16 = glob_bfloat;
> > +  is_a_double = glob_bfloat;
> > +  is_a_short_int = glob_bfloat;
> >
> >     /* Casting.  */
> >
> >     (void) glob_bfloat;
> >     (__bf16) glob_bfloat;
> >
> > -  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -
> > -  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  (int) glob_bfloat;
> > +  (float) glob_bfloat;
> > +  (_Float16) glob_bfloat;
> > +  (double) glob_bfloat;
> > +  (short) glob_bfloat;
> > +
> > +  (__bf16) is_an_int;
> > +  (__bf16) is_a_float;
> > +  (__bf16) is_a_float16;
> > +  (__bf16) is_a_double;
> > +  (__bf16) is_a_short_int;
> >
> >     /* Compound literals.  */
> >
> >     (__bf16) {};
> >     (__bf16) { glob_bfloat };
> > -  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -
> > -  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  (__bf16) { 0 };
> > +  (__bf16) { 0.1 };
> > +  (__bf16) { is_a_float };
> > +  (__bf16) { is_an_int };
> > +  (__bf16) { is_a_float16 };
> > +  (__bf16) { is_a_double };
> > +  (__bf16) { is_a_short_int };
> > +
> > +  (int) { glob_bfloat };
> > +  (float) { glob_bfloat };
> > +  (_Float16) { glob_bfloat };
> > +  (double) { glob_bfloat };
> > +  (short) { glob_bfloat };
> >
> >     /* Arrays and Structs.  */
> >
> > @@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
> >     bfloat_ptr = &bfloat_ptr3[1];
> >
> >     /* Simple comparison.  */
> > -  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  scalar0 > glob_bfloat;
> > +  glob_bfloat == scalar0;
> > +  scalar0 > is_a_float;
> > +  is_a_float == scalar0;
> > +  scalar0 > 0;
> > +  0 == scalar0;
> > +  scalar0 > 0.1;
> > +  0.1 == scalar0;
> > +  scalar0 > is_an_int;
> > +  is_an_int == scalar0;
> >
> >     /* Pointer comparison.  */
> >
> > @@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
> >     /* Conditional expressions.  */
> >
> >     0 ? scalar0 : scalar0;
> > -  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  0 ? scalar0 : is_a_float;
> > +  0 ? is_a_float : scalar0;
> > +  0 ? scalar0 : 0;
> > +  0 ? 0 : scalar0;
> > +  0 ? 0.1 : scalar0;
> > +  0 ? scalar0 : 0.1;
> >     0 ? bfloat_ptr : bfloat_ptr2;
> >     0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> >     0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
> >
> > -  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  scalar0 ? scalar0 : scalar0;
> > +  scalar0 ? is_a_float : scalar0;
> > +  scalar0 ? scalar0 : is_a_float;
> > +  scalar0 ? is_a_float : is_a_float;
> >
> >     /* Unary operators.  */
> >
> > -  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  +scalar0;
> > +  -scalar0;
> > +  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
> > +  !scalar0;
> >     *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
> > -  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  __real scalar0;
> > +  __imag scalar0;
> > +  ++scalar0;
> > +  --scalar0;
> > +  scalar0++;
> > +  scalar0--;
> >
> >     /* Binary arithmetic operations.  */
> >
> > -  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  scalar0 = glob_bfloat + *bfloat_ptr;
> > +  scalar0 = glob_bfloat + 0.1;
> > +  scalar0 = glob_bfloat + 0;
> > +  scalar0 = glob_bfloat + is_a_float;
> >
> >     return scalar0;
> >   }
> > --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj      2022-10-03 18:00:53.136734057 +0200
> > +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c 2022-10-13 16:57:09.344769646 +0200
> > @@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
> >     __m128bf16 vector2_1 = {};
> >     __m128bf16 vector2_2 = { glob_bfloat };
> >     __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> > -  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -
> > -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  __m128bf16 vector2_4 = { 0 };
> > +  __m128bf16 vector2_5 = { 0.1 };
> > +  __m128bf16 vector2_6 = { is_a_float16 };
> > +  __m128bf16 vector2_7 = { is_a_float };
> > +  __m128bf16 vector2_8 = { is_an_int };
> > +  __m128bf16 vector2_9 = { is_a_short_int };
> > +  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> > +
> > +  __v8si initi_2_1 = { glob_bfloat };
> > +  __m256 initi_2_2 = { glob_bfloat };
> > +  __m128h initi_2_3 = { glob_bfloat };
> > +  __m128 initi_2_4 = { glob_bfloat };
> > +  __v4si initi_2_5 = { glob_bfloat };
> > +  __v4hi initi_2_6 = { glob_bfloat };
> >
> >     /* Assignments to/from vectors.  */
> >
> > @@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
> >     /* Assignments to/from elements.  */
> >
> >     vector2_3[0] = glob_bfloat;
> > -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  vector2_3[0] = is_an_int;
> > +  vector2_3[0] = is_a_short_int;
> > +  vector2_3[0] = is_a_float;
> > +  vector2_3[0] = is_a_float16;
> > +  vector2_3[0] = 0;
> > +  vector2_3[0] = 0.1;
> >
> >     glob_bfloat = vector2_3[0];
> > -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> > +  is_an_int = vector2_3[0];
> > +  is_a_short_int = vector2_3[0];
> > +  is_a_float = vector2_3[0];
> > +  is_a_float16 = vector2_3[0];
> >
> >     /* Compound literals.  */
> >
> >     (__m128bf16) {};
> >
> > -  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  (__m128bf16) { 0 };
> > +  (__m128bf16) { 0.1 };
> >     (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
> >     (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
> >     (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
> > @@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
> >     bfloat_ptr = &bfloat_ptr3[1];
> >
> >     /* Simple comparison.  */
> > -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  vector0 > glob_bfloat_vec;
> > +  glob_bfloat_vec == vector0;
> > +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> > +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> > +  vector0 > 0;
> > +  0 == vector0;
> > +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> > +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> > +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> > +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> >
> >     /* Pointer comparison.  */
> >
> > @@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
> >
> >     /* Unary operators.  */
> >
> > -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  +vector0;
> > +  -vector0;
> > +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> > +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
> >     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> > -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> > +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> > +  ++vector0;
> > +  --vector0;
> > +  vector0++;
> > +  vector0--;
> >
> >     /* Binary arithmetic operations.  */
> >
> > -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> > -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> > +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> > +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> > +  vector0 = glob_bfloat_vec + 0;
> > +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
> >
> >     return vector0;
> >   }
> > --- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj   2022-10-03 18:00:53.109734421 +0200
> > +++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C      2022-10-13 16:57:09.362769399 +0200
> > @@ -5,6 +5,6 @@ void foo (void)
> >   {
> >     __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> >     __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> > -  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
> > -  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
> > +  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> > +  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> >   }
> > --- libcpp/include/cpplib.h.jj        2022-10-03 18:00:53.251732506 +0200
> > +++ libcpp/include/cpplib.h   2022-10-13 16:57:09.384769097 +0200
> > @@ -1275,6 +1275,7 @@ struct cpp_num
> >   #define CPP_N_USERDEF       0x1000000 /* C++11 user-defined literal.  */
> >
> >   #define CPP_N_SIZE_T        0x2000000 /* C++23 size_t literal.  */
> > +#define CPP_N_BFLOAT16       0x4000000 /* std::bfloat16_t type.  */
> >
> >   #define CPP_N_WIDTH_FLOATN_NX       0xF0000000 /* _FloatN / _FloatNx value
> >                                             of N, divided by 16.  */
> > --- libcpp/expr.cc.jj 2022-10-03 18:00:53.221732910 +0200
> > +++ libcpp/expr.cc    2022-10-13 16:58:01.360055690 +0200
> > @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
> >     size_t orig_len = len;
> >     const uchar *orig_s = s;
> >     size_t flags;
> > -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> > +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
> >
> >     flags = 0;
> > -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> > +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
> >
> >     /* The following decimal float suffixes, from TR 24732:2009, TS
> >        18661-2:2015 and C2X, are supported:
> > @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
> >        w, W - machine-specific type such as __float80 (GNU extension).
> >        q, Q - machine-specific type such as __float128 (GNU extension).
> >        fN, FN - _FloatN (TS 18661-3:2015).
> > -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> > +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> > +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
> >
> >     /* Process decimal float suffixes, which are two letters starting
> >        with d or D.  Order and case are significant.  */
> > @@ -239,6 +240,19 @@ interpret_float_suffix (cpp_reader *pfil
> >               fn++;
> >           }
> >         break;
> > +     case 'b': case 'B':
> > +       if (len > 2
> > +           /* Except for bf16 / BF16 where case is significant.  */
> > +           && s[1] == (s[0] == 'b' ? 'f' : 'F')
> > +           && s[2] == '1'
> > +           && s[3] == '6')
> > +         {
> > +           bf16++;
> > +           len -= 3;
> > +           s += 3;
> > +           break;
> > +         }
> > +       return 0;
> >       case 'd': case 'D': d++; break;
> >       case 'l': case 'L': l++; break;
> >       case 'w': case 'W': w++; break;
> > @@ -257,7 +271,7 @@ interpret_float_suffix (cpp_reader *pfil
> >        of N larger than can be represented in the return value.  The
> >        caller is responsible for rejecting _FloatN suffixes where
> >        _FloatN is not supported on the chosen target.  */
> > -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> > +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
> >       return 0;
> >     if (fn_bits > CPP_FLOATN_MAX)
> >       return 0;
> > @@ -295,6 +309,7 @@ interpret_float_suffix (cpp_reader *pfil
> >            q ? CPP_N_MD_Q :
> >            fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
> >            fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> > +          bf16 ? CPP_N_BFLOAT16 :
> >            CPP_N_DEFAULT));
> >   }
> >
> > --- libgcc/config/i386/t-softfp.jj    2022-10-03 18:00:53.314731656 +0200
> > +++ libgcc/config/i386/t-softfp       2022-10-13 16:57:09.426768521 +0200
> > @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
> >   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
> >   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> >
> > -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> > -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> > +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> > +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> > +                   tfbf xfbf dfbf sfbf hfbf
> >
> >   softfp_extras += eqhf2
> >
> > @@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
> >   CFLAGS-extendhfdf2.c += -msse2
> >   CFLAGS-extendhftf2.c += -msse2
> >   CFLAGS-extendhfxf2.c += -msse2
> > +CFLAGS-extendbfsf2.c += -msse2
> >
> >   CFLAGS-truncsfhf2.c += -msse2
> >   CFLAGS-truncdfhf2.c += -msse2
> >   CFLAGS-truncxfhf2.c += -msse2
> >   CFLAGS-trunctfhf2.c += -msse2
> > +CFLAGS-truncsfbf2.c += -msse2
> > +CFLAGS-truncdfbf2.c += -msse2
> > +CFLAGS-truncxfbf2.c += -msse2
> > +CFLAGS-trunctfbf2.c += -msse2
> > +CFLAGS-trunchfbf2.c += -msse2
> >
> >   CFLAGS-eqhf2.c += -msse2
> >   CFLAGS-_divhc3.c += -msse2
> > --- libgcc/config/i386/libgcc-glibc.ver.jj    2022-10-03 18:00:53.313731670 +0200
> > +++ libgcc/config/i386/libgcc-glibc.ver       2022-10-13 16:57:09.438768356 +0200
> > @@ -214,3 +214,13 @@ GCC_12.0.0 {
> >     __trunctfhf2
> >     __truncxfhf2
> >   }
> > +
> > +%inherit GCC_13.0.0 GCC_12.0.0
> > +GCC_13.0.0 {
> > +  __extendbfsf2
> > +  __truncdfbf2
> > +  __truncsfbf2
> > +  __trunctfbf2
> > +  __truncxfbf2
> > +  __trunchfbf2
> > +}
> > --- libgcc/config/i386/sfp-machine.h.jj       2022-10-03 18:00:53.313731670 +0200
> > +++ libgcc/config/i386/sfp-machine.h  2022-10-13 16:57:09.441768315 +0200
> > @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
> >   #define _FP_QNANNEGATEDP 0
> >
> >   #define _FP_NANSIGN_H               1
> > +#define _FP_NANSIGN_B                1
> >   #define _FP_NANSIGN_S               1
> >   #define _FP_NANSIGN_D               1
> >   #define _FP_NANSIGN_E               1
> > --- libgcc/config/i386/64/sfp-machine.h.jj    2022-10-03 18:00:53.290731980 +0200
> > +++ libgcc/config/i386/64/sfp-machine.h       2022-10-13 16:57:09.451768178 +0200
> > @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
> >   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
> >
> >   #define _FP_NANFRAC_H               _FP_QNANBIT_H
> > +#define _FP_NANFRAC_B                _FP_QNANBIT_B
> >   #define _FP_NANFRAC_S               _FP_QNANBIT_S
> >   #define _FP_NANFRAC_D               _FP_QNANBIT_D
> >   #define _FP_NANFRAC_E               _FP_QNANBIT_E, 0
> > --- libgcc/config/i386/32/sfp-machine.h.jj    2022-10-03 18:00:53.290731980 +0200
> > +++ libgcc/config/i386/32/sfp-machine.h       2022-10-13 16:57:09.459768068 +0200
> > @@ -87,6 +87,7 @@
> >   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
> >
> >   #define _FP_NANFRAC_H               _FP_QNANBIT_H
> > +#define _FP_NANFRAC_B                _FP_QNANBIT_B
> >   #define _FP_NANFRAC_S               _FP_QNANBIT_S
> >   #define _FP_NANFRAC_D               _FP_QNANBIT_D, 0
> >   /* Even if XFmode is 12byte,  we have to pad it to
> > --- libgcc/soft-fp/brain.h.jj 2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/brain.h    2022-10-13 16:57:09.459768068 +0200
> > @@ -0,0 +1,172 @@
> > +/* Software floating-point emulation.
> > +   Definitions for Brain Floating Point format (bfloat16).
> > +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#ifndef SOFT_FP_BRAIN_H
> > +#define SOFT_FP_BRAIN_H      1
> > +
> > +#if _FP_W_TYPE_SIZE < 32
> > +# error "Here's a nickel kid.  Go buy yourself a real computer."
> > +#endif
> > +
> > +#define _FP_FRACTBITS_B              (_FP_W_TYPE_SIZE)
> > +
> > +#define _FP_FRACTBITS_DW_B   (_FP_W_TYPE_SIZE)
> > +
> > +#define _FP_FRACBITS_B               8
> > +#define _FP_FRACXBITS_B              (_FP_FRACTBITS_B - _FP_FRACBITS_B)
> > +#define _FP_WFRACBITS_B              (_FP_WORKBITS + _FP_FRACBITS_B)
> > +#define _FP_WFRACXBITS_B     (_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> > +#define _FP_EXPBITS_B                8
> > +#define _FP_EXPBIAS_B                127
> > +#define _FP_EXPMAX_B         255
> > +
> > +#define _FP_QNANBIT_B                ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> > +#define _FP_QNANBIT_SH_B     ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> > +#define _FP_IMPLBIT_B                ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> > +#define _FP_IMPLBIT_SH_B     ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> > +#define _FP_OVERFLOW_B               ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> > +
> > +#define _FP_WFRACBITS_DW_B   (2 * _FP_WFRACBITS_B)
> > +#define _FP_WFRACXBITS_DW_B  (_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> > +#define _FP_HIGHBIT_DW_B     \
> > +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> > +
> > +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> > +   chosen by the target machine.  */
> > +
> > +typedef float BFtype __attribute__ ((mode (BF)));
> > +
> > +union _FP_UNION_B
> > +{
> > +  BFtype flt;
> > +  struct _FP_STRUCT_LAYOUT
> > +  {
> > +#if __BYTE_ORDER == __BIG_ENDIAN
> > +    unsigned sign : 1;
> > +    unsigned exp  : _FP_EXPBITS_B;
> > +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> > +#else
> > +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> > +    unsigned exp  : _FP_EXPBITS_B;
> > +    unsigned sign : 1;
> > +#endif
> > +  } bits;
> > +};
> > +
> > +#define FP_DECL_B(X)         _FP_DECL (1, X)
> > +#define FP_UNPACK_RAW_B(X, val)      _FP_UNPACK_RAW_1 (B, X, (val))
> > +#define FP_UNPACK_RAW_BP(X, val)     _FP_UNPACK_RAW_1_P (B, X, (val))
> > +#define FP_PACK_RAW_B(val, X)        _FP_PACK_RAW_1 (B, (val), X)
> > +#define FP_PACK_RAW_BP(val, X)                       \
> > +  do                                         \
> > +    {                                                \
> > +      if (!FP_INHIBIT_RESULTS)                       \
> > +     _FP_PACK_RAW_1_P (B, (val), X);         \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_UNPACK_B(X, val)                  \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_UNPACK_RAW_1 (B, X, (val));                \
> > +      _FP_UNPACK_CANONICAL (B, 1, X);                \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_UNPACK_BP(X, val)                 \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_UNPACK_RAW_1_P (B, X, (val));              \
> > +      _FP_UNPACK_CANONICAL (B, 1, X);                \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_UNPACK_SEMIRAW_B(X, val)          \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_UNPACK_RAW_1 (B, X, (val));                \
> > +      _FP_UNPACK_SEMIRAW (B, 1, X);          \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_UNPACK_SEMIRAW_BP(X, val)         \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_UNPACK_RAW_1_P (B, X, (val));              \
> > +      _FP_UNPACK_SEMIRAW (B, 1, X);          \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_PACK_B(val, X)                    \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_PACK_CANONICAL (B, 1, X);          \
> > +      _FP_PACK_RAW_1 (B, (val), X);          \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_PACK_BP(val, X)                   \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_PACK_CANONICAL (B, 1, X);          \
> > +      if (!FP_INHIBIT_RESULTS)                       \
> > +     _FP_PACK_RAW_1_P (B, (val), X);         \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_PACK_SEMIRAW_B(val, X)            \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_PACK_SEMIRAW (B, 1, X);            \
> > +      _FP_PACK_RAW_1 (B, (val), X);          \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_PACK_SEMIRAW_BP(val, X)           \
> > +  do                                         \
> > +    {                                                \
> > +      _FP_PACK_SEMIRAW (B, 1, X);            \
> > +      if (!FP_INHIBIT_RESULTS)                       \
> > +     _FP_PACK_RAW_1_P (B, (val), X);         \
> > +    }                                                \
> > +  while (0)
> > +
> > +#define FP_TO_INT_B(r, X, rsz, rsg)  _FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> > +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)    \
> > +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> > +#define FP_FROM_INT_B(X, r, rs, rt)  _FP_FROM_INT (B, 1, X, (r), (rs), rt)
> > +
> > +/* BFmode arithmetic is not implemented.  */
> > +
> > +#define _FP_FRAC_HIGH_B(X)   _FP_FRAC_HIGH_1 (X)
> > +#define _FP_FRAC_HIGH_RAW_B(X)       _FP_FRAC_HIGH_1 (X)
> > +#define _FP_FRAC_HIGH_DW_B(X)        _FP_FRAC_HIGH_1 (X)
> > +
> > +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> > +
> > +#endif /* !SOFT_FP_BRAIN_H */
> > --- libgcc/soft-fp/truncsfbf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/truncsfbf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,48 @@
> > +/* Software floating-point emulation.
> > +   Truncate IEEE single into bfloat16.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "single.h"
> > +
> > +BFtype
> > +__truncsfbf2 (SFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_S (A);
> > +  FP_DECL_B (R);
> > +  BFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  FP_UNPACK_SEMIRAW_S (A, a);
> > +  FP_TRUNC (B, S, 1, 1, R, A);
> > +  FP_PACK_SEMIRAW_B (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/truncdfbf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/truncdfbf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,52 @@
> > +/* Software floating-point emulation.
> > +   Truncate IEEE double into bfloat16.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "double.h"
> > +
> > +BFtype
> > +__truncdfbf2 (DFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_D (A);
> > +  FP_DECL_B (R);
> > +  BFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  FP_UNPACK_SEMIRAW_D (A, a);
> > +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> > +  FP_TRUNC (B, D, 1, 2, R, A);
> > +#else
> > +  FP_TRUNC (B, D, 1, 1, R, A);
> > +#endif
> > +  FP_PACK_SEMIRAW_B (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/truncxfbf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/truncxfbf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,52 @@
> > +/* Software floating-point emulation.
> > +   Truncate IEEE extended into bfloat16.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "extended.h"
> > +
> > +BFtype
> > +__truncxfbf2 (XFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_E (A);
> > +  FP_DECL_B (R);
> > +  BFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  FP_UNPACK_SEMIRAW_E (A, a);
> > +#if _FP_W_TYPE_SIZE < 64
> > +  FP_TRUNC (B, E, 1, 4, R, A);
> > +#else
> > +  FP_TRUNC (B, E, 1, 2, R, A);
> > +#endif
> > +  FP_PACK_SEMIRAW_B (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/trunctfbf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/trunctfbf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,52 @@
> > +/* Software floating-point emulation.
> > +   Truncate IEEE quad into bfloat16.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "quad.h"
> > +
> > +BFtype
> > +__trunctfbf2 (TFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_Q (A);
> > +  FP_DECL_B (R);
> > +  BFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  FP_UNPACK_SEMIRAW_Q (A, a);
> > +#if _FP_W_TYPE_SIZE < 64
> > +  FP_TRUNC (B, Q, 1, 4, R, A);
> > +#else
> > +  FP_TRUNC (B, Q, 1, 2, R, A);
> > +#endif
> > +  FP_PACK_SEMIRAW_B (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/trunchfbf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/trunchfbf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,58 @@
> > +/* Software floating-point emulation.
> > +   Truncate IEEE half into bfloat16.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "half.h"
> > +#include "single.h"
> > +
> > +/* BFtype and HFtype are unordered, neither is a superset or subset
> > +   of each other.  Convert HFtype to SFtype (lossless) and then
> > +   truncate to BFtype.  */
> > +
> > +BFtype
> > +__trunchfbf2 (HFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_H (A);
> > +  FP_DECL_S (B);
> > +  FP_DECL_B (R);
> > +  SFtype b;
> > +  BFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  FP_UNPACK_RAW_H (A, a);
> > +  FP_EXTEND (S, H, 1, 1, B, A);
> > +  FP_PACK_RAW_S (b, B);
> > +  FP_UNPACK_SEMIRAW_S (B, b);
> > +  FP_TRUNC (B, S, 1, 1, R, B);
> > +  FP_PACK_SEMIRAW_B (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/truncbfhf2.c.jj    2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/truncbfhf2.c       2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,75 @@
> > +/* Software floating-point emulation.
> > +   Truncate bfloat16 into IEEE half.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "soft-fp.h"
> > +#include "half.h"
> > +#include "brain.h"
> > +#include "single.h"
> > +
> > +/* BFtype and HFtype are unordered, neither is a superset or subset
> > +   of each other.  Convert BFtype to SFtype (lossless) and then
> > +   truncate to HFtype.  */
> > +
> > +HFtype
> > +__truncbfhf2 (BFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_H (A);
> > +  FP_DECL_S (B);
> > +  FP_DECL_B (R);
> > +  SFtype b;
> > +  HFtype r;
> > +
> > +  FP_INIT_ROUNDMODE;
> > +  /* Optimize BFtype to SFtype conversion to simple left shift
> > +     by 16 if possible, we don't need to raise exceptions on sNaN
> > +     here as the SFtype to HFtype truncation should do that too.  */
> > +  if (sizeof (BFtype) == 2
> > +      && sizeof (unsigned short) == 2
> > +      && sizeof (SFtype) == 4
> > +      && sizeof (unsigned int) == 4)
> > +    {
> > +      union { BFtype a; unsigned short b; } u1;
> > +      union { SFtype a; unsigned int b; } u2;
> > +      u1.a = a;
> > +      u2.b = (u1.b << 8) << 8;
> > +      b = u2.a;
> > +    }
> > +  else
> > +    {
> > +      FP_UNPACK_RAW_B (A, a);
> > +      FP_EXTEND (S, B, 1, 1, B, A);
> > +      FP_PACK_RAW_S (b, B);
> > +    }
> > +  FP_UNPACK_SEMIRAW_S (B, b);
> > +  FP_TRUNC (H, S, 1, 1, R, B);
> > +  FP_PACK_SEMIRAW_H (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libgcc/soft-fp/extendbfsf2.c.jj   2022-10-13 16:57:09.460768054 +0200
> > +++ libgcc/soft-fp/extendbfsf2.c      2022-10-13 16:57:09.460768054 +0200
> > @@ -0,0 +1,49 @@
> > +/* Software floating-point emulation.
> > +   Return an bfloat16 converted to IEEE single
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   In addition to the permissions in the GNU Lesser General Public
> > +   License, the Free Software Foundation gives you unlimited
> > +   permission to link the compiled version of this file into
> > +   combinations with other programs, and to distribute those
> > +   combinations without any restriction coming from the use of this
> > +   file.  (The Lesser General Public License restrictions do apply in
> > +   other respects; for example, they cover modification of the file,
> > +   and distribution when not linked into a combine executable.)
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#define FP_NO_EXACT_UNDERFLOW
> > +#include "soft-fp.h"
> > +#include "brain.h"
> > +#include "single.h"
> > +
> > +SFtype
> > +__extendbfsf2 (BFtype a)
> > +{
> > +  FP_DECL_EX;
> > +  FP_DECL_B (A);
> > +  FP_DECL_S (R);
> > +  SFtype r;
> > +
> > +  FP_INIT_EXCEPTIONS;
> > +  FP_UNPACK_RAW_B (A, a);
> > +  FP_EXTEND (S, B, 1, 1, R, A);
> > +  FP_PACK_RAW_S (r, R);
> > +  FP_HANDLE_EXCEPTIONS;
> > +
> > +  return r;
> > +}
> > --- libiberty/cp-demangle.h.jj        2022-10-03 18:00:53.342731278 +0200
> > +++ libiberty/cp-demangle.h   2022-10-13 16:57:09.488767670 +0200
> > @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
> >   extern const struct demangle_operator_info cplus_demangle_operators[];
> >   #endif
> >
> > -#define D_BUILTIN_TYPE_COUNT (35)
> > +#define D_BUILTIN_TYPE_COUNT (36)
> >
> >   CP_STATIC_IF_GLIBCPP_V3
> >   const struct demangle_builtin_type_info
> > --- libiberty/cp-demangle.c.jj        2022-10-11 14:50:14.605771753 +0200
> > +++ libiberty/cp-demangle.c   2022-10-13 16:57:09.538766983 +0200
> > @@ -2487,6 +2487,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
> >     /* 33 */ { NL ("decltype(nullptr)"),      NL ("decltype(nullptr)"),
> >            D_PRINT_DEFAULT },
> >     /* 34 */ { NL ("_Float"), NL ("_Float"),          D_PRINT_FLOAT },
> > +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
> >   };
> >
> >   CP_STATIC_IF_GLIBCPP_V3
> > @@ -2751,11 +2752,22 @@ cplus_demangle_type (struct d_info *di)
> >
> >       case 'F':
> >         /* DF<number>_ - _Float<number>.
> > -          DF<number>x - _Float<number>x.  */
> > +          DF<number>x - _Float<number>x
> > +          DF16b - std::bfloat16_t.  */
> >         {
> >           int arg = d_number (di);
> >           char buf[12];
> >           char suffix = 0;
> > +         if (d_peek_char (di) == 'b')
> > +           {
> > +             if (arg != 16)
> > +               return NULL;
> > +             d_advance (di, 1);
> > +             ret = d_make_builtin_type (di,
> > +                                        &cplus_demangle_builtin_types[35]);
> > +             di->expansion += ret->u.s_builtin.type->len;
> > +             break;
> > +           }
> >           if (d_peek_char (di) == 'x')
> >             suffix = 'x';
> >           if (!suffix && d_peek_char (di) != '_')
> > --- libiberty/testsuite/demangle-expected.jj  2022-10-11 14:50:14.618771575 +0200
> > +++ libiberty/testsuite/demangle-expected     2022-10-13 16:57:09.553766778 +0200
> > @@ -1249,6 +1249,10 @@ xxx
> >   _Z3xxxDF32xDF64xDF128xCDF32xVb
> >   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
> >   xxx
> > +--format=auto --no-params
> > +_Z3xxxDF16b
> > +xxx(std::bfloat16_t)
> > +xxx
> >   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
> >   --format=auto --no-params
> >   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> >
> >
> >       Jakub
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support
  2022-10-13 21:11                 ` Uros Bizjak
@ 2022-10-13 21:35                   ` Jakub Jelinek
  2022-10-13 21:46                     ` Uros Bizjak
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2022-10-13 21:35 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Jason Merrill, Joseph S. Myers, Richard Biener, Jeff Law, gcc-patches

On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > +                        SFmode, NULL_RTX, NULL,
> > > +                        as_a <rtx_code_label *> (operands[3]),
> > > +                        /* Unfortunately this isn't propagated.  */
> > > +                        profile_probability::even ());
> 
> You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> here. This would expand in SFmode, so insn condition from cbranchsf4
> should be copied here:
> 
>   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> 
> Additionally, ix86_fp_comparison_operator predicate should be used for
> operator0. Basically, just copy predicates from cbranchsf4 as we are
> effectively expanding the SFmode compare & branch.

The reason why I've used there the generic routine was exactly to handle
not just ix86_fp_comparison_operator, but also comparisons that are more
complex than that (need 2 comparisons).

While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
and try cbranchsf4, succeed on that and the only disadvantage would be
that the BFmode -> SFmode extensions would be performed using library
functions unless -ffast-math while they can be handled by left shifting
the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
(__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
entrypoints.
I wanted to avoid adding __{eq,ne}bf2 and the addition of
cbranchbf4/cstorebf4 was how I managed to do that; by telling the
generic code that it can handle those by the faster BFmode to SFmode
conversions of the operands and then perform one or two bit checks.

I guess another possibility would be to call ix86_expand_branch there
once or twice and repeat what the generic code does, or add the
libgcc entrypoints which would perhaps bypass soft-fp and just do the
shifts + SFmode comparison.

> > > +  else
> > > +    {
> > > +      rtx t2 = gen_reg_rtx (SImode);
> > > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > +      op2 = gen_lowpart (SFmode, t2);
> > > +    }
> 
> Similar to cbranch above, use ix86_expand_setcc and copy predicates
> from cstoresf4.

Ditto here, cstore was actually quite required by the generic code when
cbranch is implemented.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support
  2022-10-13 21:35                   ` Jakub Jelinek
@ 2022-10-13 21:46                     ` Uros Bizjak
  0 siblings, 0 replies; 22+ messages in thread
From: Uros Bizjak @ 2022-10-13 21:46 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Joseph S. Myers, Richard Biener, Jeff Law, gcc-patches

On Thu, Oct 13, 2022 at 11:35 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > > +                        SFmode, NULL_RTX, NULL,
> > > > +                        as_a <rtx_code_label *> (operands[3]),
> > > > +                        /* Unfortunately this isn't propagated.  */
> > > > +                        profile_probability::even ());
> >
> > You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> > here. This would expand in SFmode, so insn condition from cbranchsf4
> > should be copied here:
> >
> >   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> >
> > Additionally, ix86_fp_comparison_operator predicate should be used for
> > operator0. Basically, just copy predicates from cbranchsf4 as we are
> > effectively expanding the SFmode compare & branch.
>
> The reason why I've used there the generic routine was exactly to handle
> not just ix86_fp_comparison_operator, but also comparisons that are more
> complex than that (need 2 comparisons).
>
> While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
> strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
> and try cbranchsf4, succeed on that and the only disadvantage would be
> that the BFmode -> SFmode extensions would be performed using library
> functions unless -ffast-math while they can be handled by left shifting
> the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
> NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
> is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
> cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
> (__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
> entrypoints.
> I wanted to avoid adding __{eq,ne}bf2 and the addition of
> cbranchbf4/cstorebf4 was how I managed to do that; by telling the
> generic code that it can handle those by the faster BFmode to SFmode
> conversions of the operands and then perform one or two bit checks.

Thanks, for the explanation, I see the intention now.

The patch is OK as is.

Thanks,
Uros.

> I guess another possibility would be to call ix86_expand_branch there
> once or twice and repeat what the generic code does, or add the
> libgcc entrypoints which would perhaps bypass soft-fp and just do the
> shifts + SFmode comparison.
>
> > > > +  else
> > > > +    {
> > > > +      rtx t2 = gen_reg_rtx (SImode);
> > > > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > > +      op2 = gen_lowpart (SFmode, t2);
> > > > +    }
> >
> > Similar to cbranch above, use ix86_expand_setcc and copy predicates
> > from cstoresf4.
>
> Ditto here, cstore was actually quite required by the generic code when
> cbranch is implemented.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-10-13 21:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 15:55 [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
2022-09-30 13:49 ` Jason Merrill
2022-09-30 14:08   ` Jakub Jelinek
2022-09-30 18:21     ` Joseph Myers
2022-09-30 18:38       ` Jakub Jelinek
2022-09-30 19:27         ` Jonathan Wakely
2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
2022-10-04 15:54       ` Joseph Myers
2022-10-04 21:50       ` Jason Merrill
2022-10-05 13:47         ` Jakub Jelinek
2022-10-05 20:02           ` Jason Merrill
2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
2022-10-12 10:15               ` Richard Sandiford
2022-10-12 11:07                 ` [PATCH] machmode, v2: " Jakub Jelinek
2022-10-12 11:49                   ` Richard Sandiford
2022-10-12 10:37               ` [PATCH] machmode: " Eric Botcazou
2022-10-12 10:57                 ` Jakub Jelinek
2022-10-13 16:50             ` [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
2022-10-13 19:37               ` Jason Merrill
2022-10-13 21:11                 ` Uros Bizjak
2022-10-13 21:35                   ` Jakub Jelinek
2022-10-13 21:46                     ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).