Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jason Merrill <jason@redhat.com>
To: Jakub Jelinek <jakub@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>,
	Richard Biener <rguenther@suse.de>,
	Jeff Law <jeffreyalaw@gmail.com>, Uros Bizjak <ubizjak@gmail.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
Date: Tue, 4 Oct 2022 17:50:50 -0400	[thread overview]
Message-ID: <55062a15-79a1-f8cf-ed20-25ca8ff42abe@redhat.com> (raw)
In-Reply-To: <Yzv3kyZFBYlJpeyL@tucnak>

On 10/4/22 05:06, Jakub Jelinek wrote:
> On Fri, Sep 30, 2022 at 04:08:10PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> On Fri, Sep 30, 2022 at 09:49:08AM -0400, Jason Merrill wrote:
>>> The comment from Apple on the ABI mangling proposal suggests to me that we
>>> might want to delay enabling C++ std::bfloat16_t (i.e. defining
>>> __STDCPP_BFLOAT16_T__) until we have that excess precision support?
>>
>> I saw that comment.  We have similar problem with _Float16 too, where C++
>> effectively right now works as when one uses -fexcess-precision=16 in C
>> (which isn't default).
>> I can see how hard would it be to add EXCESS_PRECISION_EXPR support to C++
>> FE.
> 
> I've started on that but it will take some time.  That said, it should
> work though less efficiently even without that, even in C users can always
> select request such behavior with -fexcess-precision=16.
> 
>>> If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?
>>
>> Perhaps, I just followed what was in the pull request.  Can change it.
> 
> Changed now, added support for the builtins and ported most of the
> float16 tests, so that it gets at least some test coverage.
> Also, for now I've left the aarch64 and arm changes out of the patch,
> because I haven't tested it on aarch64 yet and arm support was incomplete
> and I haven't heard from the ARM maintainers yet what they want or don't
> want.
> 
> The added testcases showed a few problems.  One is that i?86 maintains
> 2 kinds of fp comparisons, trivial and non-trivial, the trivial which can
> be handled by just a single conditional jump or setCC are handled directly,
> while the complex ones which need two are not handled and the generic
> code then figures it out using the trivial ones.  Unfortunately this means
> that for == and != we end up with libcalls for it.  For _Float16, we have
> added __nehf2 and __eqhf2 entrypoints last year.  I wanted to avoid doing
> the same for __bf16, so I've added cbranchbf4 and cstorebf4 expanders
> that handle all fp comparisons and internally just shift the operands up
> to construct SFmode without even handling sNaNs and then call the generic
> code to handle SFmode comparisons.
> 
> Another problem is for HFmode comparisons, when we see we don't support
> directly some HFmode comparison, we iterate on wider scalar float modes
> and look for usable comparisons, but BFmode and HFmode are unordered and
> one of them has to appear as wider but neither is a subset nor superset,
> so I had to skip wider modes which have equal precision to the starting one.
> Yet another problem is because I've only enabled the bf16/BF16 suffixes in
> C++ because for C it might clash with some later extension.  Am I right to
> fear about that, or do you think C will never standardize suffixes that
> would clash with that because C++ standardized the bf16/BF16 suffixes for
> something already?  If I could enable it, I'd always pedwarn for C for those
> and could enable the __BF16_*__ macros.  Right now I had to disable some
> -fbuilding-libgcc macros because of that (though nothing really uses them
> right now).
> 
> Another question is the suffixes of the builtins.  For now I have added
> bf16 suffix and enabled the builtins with !both_p, so one always needs to
> use __builtin_* form for them.  None of the GCC builtins end with b,
> so this isn't ambiguous with __builtin_*f16, but some libm functions do end
> with b, in particular ilogb, logb and f{??,??x}sub.  ilogb and the subs
> always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
> log?  Shall the builtins use f16b suffixes instead like the mangling does?

Do we want bf16 builtins at all?  The impression I've gotten is that 
users want computation to happen in SFmode and only later truncate back 
to BFmode.

> Full patch bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> 2022-10-04  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/
> 	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> 	* tree.h (bfloat16_type_node): Define.
> 	(CASE_FLT_FN_FLOATN_NX): Also include BUILT_IN_*BF16.
> 	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
> 	like float16_type_mode.
> 	(build_common_tree_nodes): Initialize bfloat16_type_node if
> 	BFmode is supported.
> 	* expmed.h (maybe_expand_shift): Declare.
> 	* expmed.cc (maybe_expand_shift): No longer static.
> 	(emit_store_flag_1): Don't consider [BH]Fmode as wider mode to
> 	narrower modes.
> 	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> 	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> 	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> 	-ffast-math generic implementation for BF -> SF and SF -> BF
> 	conversions.
> 	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16,
> 	BT_FN_BFLOAT16_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING,
> 	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
> 	BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): New.
> 	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS,
> 	DEF_EXT_LIB_FLOATN_NX_BUILTINS): Also add *bf16 suffixed builtins,
> 	but for these only __builtin_ prefixed functions.
> 	* optabs.cc (can_compare_p, prepare_cmp_insn): Don't consider
> 	[BH]Fmode as wider mode to narrower modes.
> 	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
> 	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
> 	for -msse2.
> 	(ix86_mangle_type): Mangle BFmode as DF16b.
> 	(ix86_invalid_conversion, ix86_invalid_unary_op,
> 	ix86_invalid_binary_op): Remove.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> 	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> 	ix86_bf16_type_node, only create it if still NULL.
> 	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> 	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
> gcc/c-family/
> 	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> 	predefine for C++ __BFLT16_*__ macros and for C++23 also
> 	__STDCPP_BFLOAT16_T__.
> 	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
> gcc/c/
> 	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
> 	double.
> gcc/cp/
> 	* cp-tree.h (extended_float_type_p): Return true for
> 	bfloat16_type_node.
> 	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> 	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> gcc/testsuite/
> 	* lib/target-supports.exp (check_effective_target_bfloat16,
> 	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
> 	New.
> 	* gcc.dg/torture/bfloat16-basic.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin.c: New test.
> 	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
> 	* gcc.dg/torture/bfloat16-complex.c: New test.
> 	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
> 	from bfloat16-builtin-issignaling-1.c.
> 	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
> 	bfloat16-basic.c.
> 	* gcc.dg/torture/floatn-builtin.h: Allow to be includable from
> 	bfloat16-builtin.c.
> 	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
> 	diagnostics.
> 	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
> 	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
> 	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
> libcpp/
> 	* include/cpplib.h (CPP_N_BFLOAT16): Define.
> 	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> 	C++.
> libgcc/
> 	* config/i386/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> 	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
> 	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
> 	-msse2.
> 	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> 	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> 	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* soft-fp/brain.h: New file.
> 	* soft-fp/truncsfbf2.c: New file.
> 	* soft-fp/truncdfbf2.c: New file.
> 	* soft-fp/truncxfbf2.c: New file.
> 	* soft-fp/trunctfbf2.c: New file.
> 	* soft-fp/trunchfbf2.c: New file.
> 	* soft-fp/truncbfhf2.c: New file.
> 	* soft-fp/extendbfsf2.c: New file.
> libiberty/
> 	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> 	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> 	entry.
> 	(cplus_demangle_type): Demangle DF16b.
> 	* testsuite/demangle-expected (_Z3xxxDF16b): New test.
> 
> --- gcc/tree-core.h.jj	2022-10-01 21:44:52.521002702 +0200
> +++ gcc/tree-core.h	2022-10-03 22:46:34.218787107 +0200
> @@ -665,6 +665,9 @@ enum tree_index {
>     TI_DOUBLE_TYPE,
>     TI_LONG_DOUBLE_TYPE,
>   
> +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> +  TI_BFLOAT16_TYPE,
> +
>     /* The _FloatN and _FloatNx types must be consecutive, and in the
>        same sequence as the corresponding complex types, which must also
>        be consecutive; _FloatN must come before _FloatNx; the order must
> --- gcc/tree.h.jj	2022-10-01 21:44:52.525002648 +0200
> +++ gcc/tree.h	2022-10-03 22:46:34.220787080 +0200
> @@ -279,7 +279,7 @@ code_helper::is_builtin_fn () const
>   #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L
>   #define CASE_FLT_FN_FLOATN_NX(FN)			   \
>     case FN##F16: case FN##F32: case FN##F64: case FN##F128: \
> -  case FN##F32X: case FN##F64X: case FN##F128X
> +  case FN##F32X: case FN##F64X: case FN##F128X: case FN##BF16
>   #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R
>   #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX
>   
> @@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
>   #define float_type_node			global_trees[TI_FLOAT_TYPE]
>   #define double_type_node		global_trees[TI_DOUBLE_TYPE]
>   #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
> +#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
>   
>   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
>   #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> --- gcc/tree.cc.jj	2022-10-01 21:44:52.524002662 +0200
> +++ gcc/tree.cc	2022-10-03 22:46:34.223787040 +0200
> @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
>       = (flag_excess_precision == EXCESS_PRECISION_FAST
>          ? EXCESS_PRECISION_TYPE_FAST
>          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> -	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> +	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
>   
>     enum flt_eval_method target_flt_eval_method
>       = targetm.c.excess_precision (requested_type);
> @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
>     machine_mode float16_type_mode = (float16_type_node
>   				    ? TYPE_MODE (float16_type_node)
>   				    : VOIDmode);
> +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +				     ? TYPE_MODE (bfloat16_type_node)
> +				     : VOIDmode);
>     machine_mode float_type_mode = TYPE_MODE (float_type_node);
>     machine_mode double_type_mode = TYPE_MODE (double_type_node);
>   
> @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return long_double_type_node;
> @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return complex_float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return complex_double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return complex_long_double_type_node;
> @@ -9462,6 +9471,17 @@ build_common_tree_nodes (bool signed_cha
>         SET_TYPE_MODE (FLOATN_NX_TYPE_NODE (i), mode);
>       }
>     float128t_type_node = float128_type_node;
> +#ifdef HAVE_BFmode
> +  if (REAL_MODE_FORMAT (BFmode) == &arm_bfloat_half_format
> +      && targetm.scalar_mode_supported_p (BFmode)
> +      && targetm.libgcc_floating_mode_supported_p (BFmode))
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = GET_MODE_PRECISION (BFmode);
> +      layout_type (bfloat16_type_node);
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +    }
> +#endif
>   
>     float_ptr_type_node = build_pointer_type (float_type_node);
>     double_ptr_type_node = build_pointer_type (double_type_node);
> --- gcc/expmed.h.jj	2022-10-01 21:44:52.503002947 +0200
> +++ gcc/expmed.h	2022-10-03 22:46:34.223787040 +0200
> @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
>   				  rtx, tree, rtx, int);
>   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
>   			 int);
> +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> +			       int);
>   #ifdef GCC_OPTABS_H
>   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>   			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> --- gcc/expmed.cc.jj	2022-10-01 21:44:52.501002974 +0200
> +++ gcc/expmed.cc	2022-10-03 22:59:19.176483448 +0200
> @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
>   
>   /* Likewise, but return 0 if that cannot be done.  */
>   
> -static rtx
> +rtx
>   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>   		    int amount, rtx target, int unsignedp)
>   {
> @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
>       {
>        machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
>        icode = optab_handler (cstore_optab, optab_mode);
> -     if (icode != CODE_FOR_nothing)
> +     if (icode != CODE_FOR_nothing
> +	 /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	    a subset or superset of the other.  */
> +	 && (compare_mode == mode
> +	     || !SCALAR_FLOAT_MODE_P (compare_mode)
> +	     || maybe_ne (GET_MODE_PRECISION (compare_mode),
> +			  GET_MODE_PRECISION (mode))))

Why do you need to do this here (and in prepare_cmp_insn, and similarly 
in can_compare_p)?  Shouldn't get_wider skip over modes that are not 
actually wider?

>   	{
>   	  do_pending_stack_adjust ();
>   	  rtx tem = emit_cstore (target, icode, code, mode, compare_mode,
> --- gcc/expr.cc.jj	2022-10-01 21:44:52.506002906 +0200
> +++ gcc/expr.cc	2022-10-03 22:46:34.226787000 +0200
> @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
>         gcc_assert ((GET_MODE_PRECISION (from_mode)
>   		   != GET_MODE_PRECISION (to_mode))
>   		  || (DECIMAL_FLOAT_MODE_P (from_mode)
> -		      != DECIMAL_FLOAT_MODE_P (to_mode)));
> +		      != DECIMAL_FLOAT_MODE_P (to_mode))
> +		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>   
>         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   	/* Conversion between decimal float and binary float, same size.  */
> @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
>   	  return;
>   	}
>   
> +#ifdef HAVE_SFmode
> +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> +	{
> +	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> +	    {
> +	      /* To cut down on libgcc size, implement
> +		 BFmode -> {DF,XF,TF}mode conversions by
> +		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +	    {
> +	      /* Similarly, implement BFmode -> HFmode as
> +		 BFmode -> SFmode -> HFmode conversion where SFmode
> +		 has superset of BFmode values.  We don't need
> +		 to handle sNaNs by raising exception and turning
> +		 into into qNaN though, as that can be done in the
> +		 SFmode -> HFmode conversion too.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      int save_flag_finite_math_only = flag_finite_math_only;
> +	      flag_finite_math_only = true;
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      flag_finite_math_only = save_flag_finite_math_only;
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (to_mode == SFmode
> +	      && !HONOR_NANS (from_mode)
> +	      && !HONOR_NANS (to_mode)
> +	      && optimize_insn_for_speed_p ())
> +	    {
> +	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> +		 shift the bits up.  */
> +	      machine_mode fromi_mode, toi_mode;
> +	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				     0).exists (&fromi_mode)
> +		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +					0).exists (&toi_mode))
> +		{
> +		  start_sequence ();
> +		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +		  rtx tof = NULL_RTX;
> +		  if (fromi)
> +		    {
> +		      rtx toi = gen_reg_rtx (toi_mode);
> +		      convert_mode_scalar (toi, fromi, 1);
> +		      toi
> +			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> +					      GET_MODE_PRECISION (to_mode)
> +					      - GET_MODE_PRECISION (from_mode),
> +					      NULL_RTX, 1);
> +		      if (toi)
> +			{
> +			  tof = lowpart_subreg (to_mode, toi, toi_mode);
> +			  if (tof)
> +			    emit_move_insn (to, tof);
> +			}
> +		    }
> +		  insns = get_insns ();
> +		  end_sequence ();
> +		  if (tof)
> +		    {
> +		      emit_insn (insns);
> +		      return;
> +		    }
> +		}
> +	    }
> +	}
> +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> +	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +	  && !HONOR_NANS (from_mode)
> +	  && !HONOR_NANS (to_mode)
> +	  && !flag_rounding_math
> +	  && optimize_insn_for_speed_p ())
> +	{
> +	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
> +	     to nearest, we can expand the conversion inline as
> +	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> +	  machine_mode fromi_mode, toi_mode;
> +	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				 0).exists (&fromi_mode)
> +	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +				    0).exists (&toi_mode))
> +	    {
> +	      start_sequence ();
> +	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +	      rtx tof = NULL_RTX;
> +	      do
> +		{
> +		  if (!fromi)
> +		    break;
> +		  int shift = (GET_MODE_PRECISION (from_mode)
> +			       - GET_MODE_PRECISION (to_mode));
> +		  rtx temp1
> +		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> +					  shift, NULL_RTX, 1);
> +		  if (!temp1)
> +		    break;
> +		  rtx temp2
> +		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp2)
> +		    break;
> +		  rtx temp3
> +		    = expand_binop (fromi_mode, add_optab, fromi,
> +				    gen_int_mode ((HOST_WIDE_INT_1U
> +						   << (shift - 1)) - 1,
> +						  fromi_mode), NULL_RTX,
> +				    1, OPTAB_DIRECT);
> +		  if (!temp3)
> +		    break;
> +		  rtx temp4
> +		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp4)
> +		    break;
> +		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> +						  temp4, shift, NULL_RTX, 1);
> +		  if (!temp5)
> +		    break;
> +		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> +		  if (!temp6)
> +		    break;
> +		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> +					toi_mode);
> +		  if (tof)
> +		    emit_move_insn (to, tof);
> +		}
> +	      while (0);
> +	      insns = get_insns ();
> +	      end_sequence ();
> +	      if (tof)
> +		{
> +		  emit_insn (insns);
> +		  return;
> +		}
> +	    }
> +	}
> +#endif
> +
>         /* Otherwise use a libcall.  */
>         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
>   
> --- gcc/builtin-types.def.jj	2022-01-11 22:31:40.590769786 +0100
> +++ gcc/builtin-types.def	2022-10-03 22:46:34.227786987 +0200
> @@ -82,6 +82,9 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lan
>   DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
> +DEF_PRIMITIVE_TYPE (BT_BFLOAT16, (bfloat16_type_node
> +				  ? bfloat16_type_node
> +				  : error_mark_node))
>   DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
>   				 ? float16_type_node
>   				 : error_mark_node))
> @@ -187,6 +190,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DO
>      distinguish it from two types in sequence, "long" followed by
>      "double".  */
>   DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32)
>   DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64)
> @@ -206,6 +210,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE
>   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16)
> +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT128_FLOAT128, BT_FLOAT128, BT_FLOAT128)
> @@ -264,6 +269,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_CONST_S
>   DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_STRING, BT_DOUBLE, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_CONST_STRING,
>   		     BT_LONGDOUBLE, BT_CONST_STRING)
> +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_CONST_STRING, BT_BFLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_CONST_STRING, BT_FLOAT16, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_CONST_STRING, BT_FLOAT32, BT_CONST_STRING)
>   DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_CONST_STRING, BT_FLOAT64, BT_CONST_STRING)
> @@ -401,6 +407,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE
>   		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
>   DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
> +		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16,
>   		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32,
> @@ -554,6 +562,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE
>   		     BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
>   DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>   		     BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
> +DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16,
> +		     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
>   DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16,
>   		     BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
>   DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32,
> --- gcc/builtins.def.jj	2022-09-29 22:16:46.928044191 +0200
> +++ gcc/builtins.def	2022-10-03 22:46:34.227786987 +0200
> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.
>      value for the type.  */
>   #undef DEF_GCC_FLOATN_NX_BUILTINS
>   #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
> +  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>     DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
> @@ -123,6 +124,7 @@ along with GCC; see the file COPYING3.
>   	       false, true)
>   #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS
>   #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
> +  DEF_GCC_BUILTIN (ENUM ## BF16, NAME "bf16", TYPE_MACRO (BFLOAT16), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>     DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
> --- gcc/optabs.cc.jj	2022-07-26 21:43:55.638403562 +0200
> +++ gcc/optabs.cc	2022-10-03 23:00:17.402698229 +0200
> @@ -4254,11 +4254,24 @@ can_compare_p (enum rtx_code code, machi
>   	       enum can_compare_purpose purpose)
>   {
>     rtx test;
> +  machine_mode orig_mode = mode;
>     test = gen_rtx_fmt_ee (code, mode, const0_rtx, const0_rtx);
>     do
>       {
>         enum insn_code icode;
>   
> +      /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	 a subset or superset of the other.  */
> +      if (mode != orig_mode
> +	  && SCALAR_FLOAT_MODE_P (mode)
> +	  && known_eq (GET_MODE_PRECISION (mode),
> +		       GET_MODE_PRECISION (orig_mode)))
> +	{
> +	  mode = GET_MODE_WIDER_MODE (mode).else_void ();
> +	  PUT_MODE (test, mode);
> +	  continue;
> +	}
> +
>         if (purpose == ccp_jump
>             && (icode = optab_handler (cbranch_optab, mode)) != CODE_FOR_nothing
>             && insn_operand_matches (icode, 0, test))
> @@ -4497,7 +4510,13 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>         enum insn_code icode;
>         icode = optab_handler (cbranch_optab, cmp_mode);
>         if (icode != CODE_FOR_nothing
> -	  && insn_operand_matches (icode, 0, test))
> +	  && insn_operand_matches (icode, 0, test)
> +	  /* Don't consider [BH]Fmode as usable wider mode, as neither is
> +	     a subset or superset of the other.  */
> +	  && (cmp_mode == mode
> +	      || !SCALAR_FLOAT_MODE_P (cmp_mode)
> +	      || maybe_ne (GET_MODE_PRECISION (cmp_mode),
> +			   GET_MODE_PRECISION (mode))))
>   	{
>   	  rtx_insn *last = get_last_insn ();
>   	  rtx op0 = prepare_operand (icode, x, 1, mode, cmp_mode, unsignedp);
> --- gcc/config/i386/i386.cc.jj	2022-10-01 21:44:58.477921753 +0200
> +++ gcc/config/i386/i386.cc	2022-10-03 22:46:34.233786906 +0200
> @@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, co
>         classes[1] = X86_64_SSEUP_CLASS;
>         return 2;
>       case E_HCmode:
> +    case E_BCmode:
>         classes[0] = X86_64_SSE_CLASS;
>         if (!(bit_offset % 64))
>   	return 1;
> @@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (s
>        be defined by the C front-end for AVX512FP16 intrinsics.  We will
>        issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
>        enabled.  */
> -  return ((mode == HFmode && TARGET_SSE2)
> +  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
>   	  ? true
>   	  : default_libgcc_floating_mode_supported_p (mode));
>   }
> @@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
>     switch (TYPE_MODE (type))
>       {
>       case E_BFmode:
> -      return "u6__bf16";
> +      return "DF16b";
>       case E_HFmode:
>         /* _Float16 is "DF16_".
>   	 Align with clang's decision in https://reviews.llvm.org/D33719. */
> @@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
>       }
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<__bf16%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<__bf16%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
>   
>   static tree
> @@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE ix86_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> -
>   #undef TARGET_STACK_PROTECT_GUARD
>   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
>   
> --- gcc/config/i386/i386-builtins.cc.jj	2022-10-01 21:44:52.478003286 +0200
> +++ gcc/config/i386/i386-builtins.cc	2022-10-03 22:46:34.233786906 +0200
> @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>   
>   tree ix86_float16_type_node = NULL_TREE;
> -tree ix86_bf16_type_node = NULL_TREE;
>   tree ix86_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Retrieve an element from the above table, building some of
> @@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void
>   static void
>   ix86_register_bf16_builtin_type (void)
>   {
> -  ix86_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> -  layout_type (ix86_bf16_type_node);
> +  if (bfloat16_type_node == NULL_TREE)
> +    {
> +      bfloat16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (bfloat16_type_node) = 16;
> +      SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +      layout_type (bfloat16_type_node);
> +    }
>   
>     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
>       {
> -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> -					    "__bf16");
> -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>       }
>   }
>   
> --- gcc/config/i386/i386-builtin-types.def.jj	2022-10-01 21:44:52.476003314 +0200
> +++ gcc/config/i386/i386-builtin-types.def	2022-10-03 22:46:34.233786906 +0200
> @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
>   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> --- gcc/config/i386/i386.md.jj	2022-09-05 23:25:28.627019050 +0200
> +++ gcc/config/i386/i386.md	2022-10-03 22:46:34.239786826 +0200
> @@ -1644,6 +1644,48 @@ (define_expand "cbranch<mode>4"
>     DONE;
>   })
>   
> +(define_expand "cbranchbf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
> +		    (match_operand:BF 2 "cmp_fp_expander_operand")))
> +   (set (pc) (if_then_else
> +	      (match_operator 0 "comparison_operator"
> +	       [(reg:CC FLAGS_REG)
> +		(const_int 0)])
> +	      (label_ref (match_operand 3))
> +	      (pc)))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[1]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[1], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> +			   SFmode, NULL_RTX, NULL,
> +			   as_a <rtx_code_label *> (operands[3]),
> +			   /* Unfortunately this isn't propagated.  */
> +			   profile_probability::even ());
> +  DONE;
> +})
> +
>   (define_expand "cstorehf4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
> @@ -1659,6 +1701,45 @@ (define_expand "cstorehf4"
>     DONE;
>   })
>   
> +(define_expand "cstorebf4"
> +  [(set (reg:CC FLAGS_REG)
> +	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
> +		    (match_operand:BF 3 "cmp_fp_expander_operand")))
> +   (set (match_operand:QI 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	  [(reg:CC FLAGS_REG)
> +	   (const_int 0)]))]
> +  ""
> +{
> +  rtx op1 = gen_lowpart (HImode, operands[2]);
> +  if (CONST_INT_P (op1))
> +    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[2], BFmode);
> +  else
> +    {
> +      rtx t1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t1, op1));
> +      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +      op1 = gen_lowpart (SFmode, t1);
> +    }
> +  rtx op2 = gen_lowpart (HImode, operands[3]);
> +  if (CONST_INT_P (op2))
> +    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +					  operands[3], BFmode);
> +  else
> +    {
> +      rtx t2 = gen_reg_rtx (SImode);
> +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> +      op2 = gen_lowpart (SFmode, t2);
> +    }
> +  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
> +				   op1, op2, SFmode, 0, 1);
> +  if (!rtx_equal_p (res, operands[0]))
> +    emit_move_insn (operands[0], res);
> +  DONE;
> +})
> +
>   (define_expand "cstore<mode>4"
>     [(set (reg:CC FLAGS_REG)
>   	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
> --- gcc/c-family/c-cppbuiltin.cc.jj	2022-10-03 22:45:46.041435824 +0200
> +++ gcc/c-family/c-cppbuiltin.cc	2022-10-03 23:11:46.111410475 +0200
> @@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
>         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
>   				      csuffix, FLOATN_NX_TYPE_NODE (i));
>       }
> +  if (bfloat16_type_node && c_dialect_cxx ())
> +    {
> +      if (cxx_dialect > cxx20)
> +	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> +				      "BF16", bfloat16_type_node);
> +    }
>   
>     /* For float.h.  */
>     if (targetm.decimal_float_supported_p ())
> @@ -1351,6 +1358,8 @@ c_cpp_builtins (cpp_reader *pfile)
>   	  if (!targetm.scalar_mode_supported_p (mode)
>   	      || !targetm.libgcc_floating_mode_supported_p (mode))
>   	    continue;
> +	  if (bfloat16_type_node && TYPE_MODE (bfloat16_type_node) == mode)
> +	    continue;
>   	  macro_name = XALLOCAVEC (char, name_len
>   				   + sizeof ("__LIBGCC_HAS__MODE__"));
>   	  sprintf (macro_name, "__LIBGCC_HAS_%s_MODE__", name);
> --- gcc/c-family/c-lex.cc.jj	2022-10-03 22:46:14.597051320 +0200
> +++ gcc/c-family/c-lex.cc	2022-10-03 22:46:34.240786812 +0200
> @@ -1000,6 +1000,19 @@ interpret_float (const cpp_token *token,
>   	  pedwarn (input_location, OPT_Wpedantic,
>   		   "non-standard suffix on floating constant");
>         }
> +    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
> +      {
> +	type = bfloat16_type_node;
> +	if (type == NULL_TREE)
> +	  {
> +	    error ("unsupported non-standard suffix on floating constant");
> +	    return error_mark_node;
> +	  }
> +	if (cxx_dialect < cxx23)
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "%<bf16%> or %<BF16%> suffix on floating constant only "
> +		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> +      }
>       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
>         type = long_double_type_node;
>       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> --- gcc/c/c-typeck.cc.jj	2022-09-25 22:22:03.963596917 +0200
> +++ gcc/c/c-typeck.cc	2022-10-03 22:46:34.245786745 +0200
> @@ -3676,6 +3676,9 @@ convert_arguments (location_t loc, vec<l
>   		promote_float_arg = false;
>   		break;
>   	      }
> +	  /* Don't promote __bf16 either.  */
> +	  if (TYPE_MAIN_VARIANT (valtype) == bfloat16_type_node)
> +	    promote_float_arg = false;
>   	}
>   
>         if (type != NULL_TREE)
> --- gcc/cp/cp-tree.h.jj	2022-10-03 22:46:23.896926090 +0200
> +++ gcc/cp/cp-tree.h	2022-10-03 22:46:34.246786732 +0200
> @@ -8702,6 +8702,8 @@ extended_float_type_p (tree type)
>     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
>       if (type == FLOATN_TYPE_NODE (i))
>         return true;
> +  if (type == bfloat16_type_node)
> +    return true;
>     return false;
>   }
>   
> --- gcc/cp/typeck.cc.jj	2022-10-01 21:44:52.497003028 +0200
> +++ gcc/cp/typeck.cc	2022-10-03 22:46:34.249786691 +0200
> @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
>         if (mv2 == FLOATN_NX_TYPE_NODE (i))
>   	extended2 = i + 1;
>       }
> +  if (mv1 == bfloat16_type_node)
> +    extended1 = true;
> +  if (mv2 == bfloat16_type_node)
> +    extended2 = true;
>     if (extended2 && !extended1)
>       {
>         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
>     if (cnt > 1 && mv2 == long_double_type_node)
>       return -2;
>     /* Otherwise, they have equal rank, but extended types
> -     (other than std::bfloat16_t) have higher subrank.  */
> +     (other than std::bfloat16_t) have higher subrank.
> +     std::bfloat16_t shouldn't have equal rank to any standard
> +     floating point type.  */
>     return 1;
>   }
>   
> --- gcc/testsuite/lib/target-supports.exp.jj	2022-10-01 21:44:58.540920897 +0200
> +++ gcc/testsuite/lib/target-supports.exp	2022-10-03 22:46:34.250786678 +0200
> @@ -3416,6 +3416,22 @@ proc check_effective_target_base_quadflo
>       return 1
>   }
>   
> +# Return 1 if the target supports the __bf16 type, 0 otherwise.
> +
> +proc check_effective_target_bfloat16 {} {
> +    return [check_no_compiler_messages_nocache bfloat16 object {
> +	__bf16 foo (__bf16 x) { return x + x; }
> +    } [add_options_for_bfloat16 ""]]
> +}
> +
> +proc check_effective_target_bfloat16_runtime {} {
> +    return [check_effective_target_bfloat16]
> +}
> +
> +proc add_options_for_bfloat16 { flags } {
> +    return "$flags"
> +}
> +
>   # Return 1 if the target supports all four forms of fused multiply-add
>   # (fma, fms, fnma, and fnms) for both float and double.
>   
> --- gcc/testsuite/gcc.dg/torture/bfloat16-basic.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-basic.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,11 @@
> +/* Test __bf16.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) (C))
> +#define CSTU(C) CST(C)
> +
> +#include "floatn-basic.h"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,15 @@
> +/* Test __bf16 built-in functions.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) C)
> +#define FN(F) CONCAT (F, bf16)
> +
> +#include "floatn-builtin.h"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,19 @@
> +/* Test __bf16 __builtin_issignaling.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-add-options ieee } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +/* { dg-additional-options "-fsignaling-nans" } */
> +/* Workaround for PR57484 on ia32: */
> +/* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +
> +#define TYPE __bf16
> +#define CST(C) ((__bf16) C)
> +#define FN(F) CONCAT (F, bf16)
> +#define EXT 0
> +
> +#include "builtin-issignaling-1.c"
> --- gcc/testsuite/gcc.dg/torture/bfloat16-complex.c.jj	2022-10-03 22:46:34.251786665 +0200
> +++ gcc/testsuite/gcc.dg/torture/bfloat16-complex.c	2022-10-03 22:46:34.251786665 +0200
> @@ -0,0 +1,61 @@
> +/* Test __bf16 complex arithmetic.  */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +extern void exit (int);
> +extern void abort (void);
> +
> +volatile __bf16 a = ((__bf16) 1.0);
> +typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
> +volatile __cbf16 b = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +volatile __cbf16 c = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +volatile __cbf16 d = __builtin_complex (((__bf16) 2.0), ((__bf16) 3.0));
> +
> +__cbf16
> +fn (__cbf16 arg)
> +{
> +  return arg / 4;
> +}
> +
> +int
> +main (void)
> +{
> +  volatile __cbf16 r;
> +  if (b != c)
> +    abort ();
> +  if (b != d)
> +    abort ();
> +  r = a + b;
> +  if (__real__ r != ((__bf16) 3.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r += d;
> +  if (__real__ r != ((__bf16) 5.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r -= a;
> +  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r /= (a + a);
> +  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r *= (a + a);
> +  if (__real__ r != ((__bf16) 4.0) || __imag__ r != ((__bf16) 6.0))
> +    abort ();
> +  r -= b;
> +  if (__real__ r != ((__bf16) 2.0) || __imag__ r != ((__bf16) 3.0))
> +    abort ();
> +  r *= r;
> +  if (__real__ r != -((__bf16) 5.0) || __imag__ r != ((__bf16) 12.0))
> +    abort ();
> +  /* Division may not be exact, so round result before comparing.  */
> +  r /= b;
> +  r += __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
> +  r -= __builtin_complex (((__bf16) 100.0), ((__bf16) 100.0));
> +  if (r != b)
> +    abort ();
> +  r = fn (r);
> +  if (__real__ r != ((__bf16) 0.5) || __imag__ r != ((__bf16) 0.75))
> +    abort ();
> +  exit (0);
> +}
> --- gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c.jj	2022-08-27 23:01:28.323565905 +0200
> +++ gcc/testsuite/gcc.dg/torture/builtin-issignaling-1.c	2022-10-03 22:46:34.251786665 +0200
> @@ -4,7 +4,7 @@
>   /* Workaround for PR57484 on ia32: */
>   /* { dg-additional-options "-msse2 -mfpmath=sse" { target { ia32 && sse2_runtime } } } */
>   
> -#ifndef EXT
> +#if !defined(EXT) && !defined(TYPE)
>   int
>   f1 (void)
>   {
> @@ -41,19 +41,21 @@ f6 (long double x)
>     return __builtin_issignaling (x);
>   }
>   #else
> -#define CONCATX(X, Y) X ## Y
> -#define CONCAT(X, Y) CONCATX (X, Y)
> -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> -
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define FN(F) CONCAT3 (F, f, WIDTH)
> +#ifndef TYPE
> +# define CONCATX(X, Y) X ## Y
> +# define CONCAT(X, Y) CONCATX (X, Y)
> +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> +
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define FN(F) CONCAT3 (F, f, WIDTH)
> +# endif
>   #endif
>   
>   int
> --- gcc/testsuite/gcc.dg/torture/floatn-basic.h.jj	2020-01-14 20:02:47.411600427 +0100
> +++ gcc/testsuite/gcc.dg/torture/floatn-basic.h	2022-10-03 22:46:34.251786665 +0200
> @@ -9,14 +9,16 @@
>   #define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
>   #define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
>   
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define CSTU(C) CONCAT3 (C, F, WIDTH)
> +#ifndef TYPE
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define CSTU(C) CONCAT4 (C, F, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define CSTU(C) CONCAT3 (C, F, WIDTH)
> +# endif
>   #endif
>   
>   extern void exit (int);
> --- gcc/testsuite/gcc.dg/torture/floatn-builtin.h.jj	2020-01-14 20:02:47.412600412 +0100
> +++ gcc/testsuite/gcc.dg/torture/floatn-builtin.h	2022-10-03 22:46:34.251786665 +0200
> @@ -2,19 +2,21 @@
>      built-in functions.  Before including this file, define WIDTH as
>      the value N; define EXT to 1 for _FloatNx and 0 for _FloatN.  */
>   
> -#define CONCATX(X, Y) X ## Y
> -#define CONCAT(X, Y) CONCATX (X, Y)
> -#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> -#define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
> +#ifndef TYPE
> +# define CONCATX(X, Y) X ## Y
> +# define CONCAT(X, Y) CONCATX (X, Y)
> +# define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +# define CONCAT4(W, X, Y, Z) CONCAT (CONCAT (CONCAT (W, X), Y), Z)
>   
> -#if EXT
> -# define TYPE CONCAT3 (_Float, WIDTH, x)
> -# define CST(C) CONCAT4 (C, f, WIDTH, x)
> -# define FN(F) CONCAT4 (F, f, WIDTH, x)
> -#else
> -# define TYPE CONCAT (_Float, WIDTH)
> -# define CST(C) CONCAT3 (C, f, WIDTH)
> -# define FN(F) CONCAT3 (F, f, WIDTH)
> +# if EXT
> +#  define TYPE CONCAT3 (_Float, WIDTH, x)
> +#  define CST(C) CONCAT4 (C, f, WIDTH, x)
> +#  define FN(F) CONCAT4 (F, f, WIDTH, x)
> +# else
> +#  define TYPE CONCAT (_Float, WIDTH)
> +#  define CST(C) CONCAT3 (C, f, WIDTH)
> +#  define FN(F) CONCAT3 (F, f, WIDTH)
> +# endif
>   #endif
>   
>   extern void exit (int);
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c.jj	2022-10-01 21:44:52.519002730 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c	2022-10-03 22:46:34.252786651 +0200
> @@ -45,19 +45,19 @@ __m256bf16 footest (__m256bf16 vector0)
>     __m256bf16 vector2_1 = {};
>     __m256bf16 vector2_2 = { glob_bfloat };
>     __m256bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m256bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error "invalid conversion to type '__bf16'" } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256i initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v16hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m256bf16 vector2_4 = { 0 };
> +  __m256bf16 vector2_5 = { 0.1 };
> +  __m256bf16 vector2_6 = { is_a_float16 };
> +  __m256bf16 vector2_7 = { is_a_float };
> +  __m256bf16 vector2_8 = { is_an_int };
> +  __m256bf16 vector2_9 = { is_a_short_int };
> +  __m256bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m256h initi_2_3 = { glob_bfloat };
> +  __m256i initi_2_5 = { glob_bfloat };
> +  __v16hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -79,25 +79,25 @@ __m256bf16 footest (__m256bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m256bf16) {};
>   
> -  (__m256bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m256bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m256bf16) { 0 };
> +  (__m256bf16) { 0.1 };
>     (__m256bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m256bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m256bf16) { is_a_long_int_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256i'} } */
> @@ -176,16 +176,16 @@ __m256bf16 footest (__m256bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -224,24 +224,24 @@ __m256bf16 footest (__m256bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m256bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c.jj	2022-10-01 21:44:52.515002784 +0200
> +++ gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c	2022-10-03 22:46:34.252786651 +0200
> @@ -12,8 +12,8 @@ double is_a_double;
>   
>   float *float_ptr;
>   
> -__bf16 foo1 (void) { return (__bf16) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> -__bf16 foo2 (void) { return (__bf16) (short) 0x1234; } /* { dg-error {invalid conversion to type '__bf16'} } */
> +__bf16 foo1 (void) { return (__bf16) 0x1234; }
> +__bf16 foo2 (void) { return (__bf16) (short) 0x1234; }
>   
>   __bf16 footest (__bf16 scalar0)
>   {
> @@ -22,87 +22,87 @@ __bf16 footest (__bf16 scalar0)
>   
>     __bf16 scalar1_1;
>     __bf16 scalar1_2 = glob_bfloat;
> -  __bf16 scalar1_3 = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_4 = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_5 = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_6 = is_an_int;  /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_7 = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_8 = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar1_9 = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_1_1 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_1_2 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_1_3 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar1_3 = 0;
> +  __bf16 scalar1_4 = 0.1;
> +  __bf16 scalar1_5 = is_a_float;
> +  __bf16 scalar1_6 = is_an_int;
> +  __bf16 scalar1_7 = is_a_float16;
> +  __bf16 scalar1_8 = is_a_double;
> +  __bf16 scalar1_9 = is_a_short_int;
> +
> +  int initi_1_1 = glob_bfloat;
> +  float initi_1_2 = glob_bfloat;
> +  _Float16 initi_1_3 = glob_bfloat;
> +  short initi_1_4 = glob_bfloat;
> +  double initi_1_5 = glob_bfloat;
>   
>     __bf16 scalar2_1 = {};
>     __bf16 scalar2_2 = { glob_bfloat };
> -  __bf16 scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_5 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_6 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_7 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_8 = { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 scalar2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  int initi_2_1 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  float initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  _Float16 initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  short initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  double initi_2_5 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __bf16 scalar2_3 = { 0 };
> +  __bf16 scalar2_4 = { 0.1 };
> +  __bf16 scalar2_5 = { is_a_float };
> +  __bf16 scalar2_6 = { is_an_int };
> +  __bf16 scalar2_7 = { is_a_float16 };
> +  __bf16 scalar2_8 = { is_a_double };
> +  __bf16 scalar2_9 = { is_a_short_int };
> +
> +  int initi_2_1 = { glob_bfloat };
> +  float initi_2_2 = { glob_bfloat };
> +  _Float16 initi_2_3 = { glob_bfloat };
> +  short initi_2_4 = { glob_bfloat };
> +  double initi_2_5 = { glob_bfloat };
>   
>     /* Assignments.  */
>   
>     glob_bfloat = glob_bfloat;
> -  glob_bfloat = 0;   /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  glob_bfloat = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  is_an_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_double = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  glob_bfloat = 0;
> +  glob_bfloat = 0.1;
> +  glob_bfloat = is_a_float;
> +  glob_bfloat = is_an_int;
> +  glob_bfloat = is_a_float16;
> +  glob_bfloat = is_a_double;
> +  glob_bfloat = is_a_short_int;
> +
> +  is_an_int = glob_bfloat;
> +  is_a_float = glob_bfloat;
> +  is_a_float16 = glob_bfloat;
> +  is_a_double = glob_bfloat;
> +  is_a_short_int = glob_bfloat;
>   
>     /* Casting.  */
>   
>     (void) glob_bfloat;
>     (__bf16) glob_bfloat;
>   
> -  (int) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) glob_bfloat; /* { dg-error {invalid conversion from type '__bf16'} } */
> -
> -  (__bf16) is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_double; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (int) glob_bfloat;
> +  (float) glob_bfloat;
> +  (_Float16) glob_bfloat;
> +  (double) glob_bfloat;
> +  (short) glob_bfloat;
> +
> +  (__bf16) is_an_int;
> +  (__bf16) is_a_float;
> +  (__bf16) is_a_float16;
> +  (__bf16) is_a_double;
> +  (__bf16) is_a_short_int;
>   
>     /* Compound literals.  */
>   
>     (__bf16) {};
>     (__bf16) { glob_bfloat };
> -  (__bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_double }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__bf16) { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  (int) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (float) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (_Float16) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (double) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  (short) { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  (__bf16) { 0 };
> +  (__bf16) { 0.1 };
> +  (__bf16) { is_a_float };
> +  (__bf16) { is_an_int };
> +  (__bf16) { is_a_float16 };
> +  (__bf16) { is_a_double };
> +  (__bf16) { is_a_short_int };
> +
> +  (int) { glob_bfloat };
> +  (float) { glob_bfloat };
> +  (_Float16) { glob_bfloat };
> +  (double) { glob_bfloat };
> +  (short) { glob_bfloat };
>   
>     /* Arrays and Structs.  */
>   
> @@ -145,16 +145,16 @@ __bf16 footest (__bf16 scalar0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  scalar0 > glob_bfloat; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 > is_an_int; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int == scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 > glob_bfloat;
> +  glob_bfloat == scalar0;
> +  scalar0 > is_a_float;
> +  is_a_float == scalar0;
> +  scalar0 > 0;
> +  0 == scalar0;
> +  scalar0 > 0.1;
> +  0.1 == scalar0;
> +  scalar0 > is_an_int;
> +  is_an_int == scalar0;
>   
>     /* Pointer comparison.  */
>   
> @@ -174,41 +174,41 @@ __bf16 footest (__bf16 scalar0)
>     /* Conditional expressions.  */
>   
>     0 ? scalar0 : scalar0;
> -  0 ? scalar0 : is_a_float; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? is_a_float : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0 : scalar0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  0 ? 0.1 : scalar0; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  0 ? scalar0 : 0.1; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  0 ? scalar0 : is_a_float;
> +  0 ? is_a_float : scalar0;
> +  0 ? scalar0 : 0;
> +  0 ? 0 : scalar0;
> +  0 ? 0.1 : scalar0;
> +  0 ? scalar0 : 0.1;
>     0 ? bfloat_ptr : bfloat_ptr2;
>     0 ? bfloat_ptr : float_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>     0 ? float_ptr : bfloat_ptr; /* { dg-warning {pointer type mismatch in conditional expression} } */
>   
> -  scalar0 ? scalar0 : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? scalar0 : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 ? is_a_float : is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 ? scalar0 : scalar0;
> +  scalar0 ? is_a_float : scalar0;
> +  scalar0 ? scalar0 : is_a_float;
> +  scalar0 ? is_a_float : is_a_float;
>   
>     /* Unary operators.  */
>   
> -  +scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +scalar0;
> +  -scalar0;
> +  ~scalar0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !scalar0;
>     *scalar0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --scalar0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real scalar0;
> +  __imag scalar0;
> +  ++scalar0;
> +  --scalar0;
> +  scalar0++;
> +  scalar0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  scalar0 = glob_bfloat + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  scalar0 = glob_bfloat + is_a_float; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  scalar0 = glob_bfloat + *bfloat_ptr;
> +  scalar0 = glob_bfloat + 0.1;
> +  scalar0 = glob_bfloat + 0;
> +  scalar0 = glob_bfloat + is_a_float;
>   
>     return scalar0;
>   }
> --- gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c.jj	2022-10-01 21:44:52.517002757 +0200
> +++ gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c	2022-10-03 22:46:34.252786651 +0200
> @@ -48,20 +48,20 @@ __m128bf16 footest (__m128bf16 vector0)
>     __m128bf16 vector2_1 = {};
>     __m128bf16 vector2_2 = { glob_bfloat };
>     __m128bf16 vector2_3 = { glob_bfloat, glob_bfloat, glob_bfloat, glob_bfloat };
> -  __m128bf16 vector2_4 = { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_5 = { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_6 = { is_a_float16 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_7 = { is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_8 = { is_an_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_9 = { is_a_short_int }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -
> -  __v8si initi_2_1 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m256 initi_2_2 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128h initi_2_3 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __m128 initi_2_4 = { glob_bfloat }; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4si initi_2_5 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> -  __v4hi initi_2_6 = { glob_bfloat };   /* { dg-error {invalid conversion from type '__bf16'} } */
> +  __m128bf16 vector2_4 = { 0 };
> +  __m128bf16 vector2_5 = { 0.1 };
> +  __m128bf16 vector2_6 = { is_a_float16 };
> +  __m128bf16 vector2_7 = { is_a_float };
> +  __m128bf16 vector2_8 = { is_an_int };
> +  __m128bf16 vector2_9 = { is_a_short_int };
> +  __m128bf16 vector2_10 = { 0.0, 0, is_a_short_int, is_a_float };
> +
> +  __v8si initi_2_1 = { glob_bfloat };
> +  __m256 initi_2_2 = { glob_bfloat };
> +  __m128h initi_2_3 = { glob_bfloat };
> +  __m128 initi_2_4 = { glob_bfloat };
> +  __v4si initi_2_5 = { glob_bfloat };
> +  __v4hi initi_2_6 = { glob_bfloat };
>   
>     /* Assignments to/from vectors.  */
>   
> @@ -85,25 +85,25 @@ __m128bf16 footest (__m128bf16 vector0)
>     /* Assignments to/from elements.  */
>   
>     vector2_3[0] = glob_bfloat;
> -  vector2_3[0] = is_an_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_short_int; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = is_a_float16; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  vector2_3[0] = 0.1; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  vector2_3[0] = is_an_int;
> +  vector2_3[0] = is_a_short_int;
> +  vector2_3[0] = is_a_float;
> +  vector2_3[0] = is_a_float16;
> +  vector2_3[0] = 0;
> +  vector2_3[0] = 0.1;
>   
>     glob_bfloat = vector2_3[0];
> -  is_an_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_short_int = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> -  is_a_float16 = vector2_3[0]; /* { dg-error {invalid conversion from type '__bf16'} } */
> +  is_an_int = vector2_3[0];
> +  is_a_short_int = vector2_3[0];
> +  is_a_float = vector2_3[0];
> +  is_a_float16 = vector2_3[0];
>   
>     /* Compound literals.  */
>   
>     (__m128bf16) {};
>   
> -  (__m128bf16) { 0 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> -  (__m128bf16) { 0.1 }; /* { dg-error {invalid conversion to type '__bf16'} } */
> +  (__m128bf16) { 0 };
> +  (__m128bf16) { 0.1 };
>     (__m128bf16) { is_a_float_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m256'} } */
>     (__m128bf16) { is_an_int_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__v8si'} } */
>     (__m128bf16) { is_a_float_pair }; /* { dg-error {incompatible types when initializing type '__bf16' using type '__m128'} } */
> @@ -186,16 +186,16 @@ __m128bf16 footest (__m128bf16 vector0)
>     bfloat_ptr = &bfloat_ptr3[1];
>   
>     /* Simple comparison.  */
> -  vector0 > glob_bfloat_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  glob_bfloat_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_a_float_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  0.1 == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 > is_an_int_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  is_an_int_vec == vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 > glob_bfloat_vec;
> +  glob_bfloat_vec == vector0;
> +  vector0 > is_a_float_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_a_float_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
> +  vector0 > 0;
> +  0 == vector0;
> +  vector0 > 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  0.1 == vector0; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 > is_an_int_vec; /* { dg-error {comparing vectors with different element types} } */
> +  is_an_int_vec == vector0; /* { dg-error {comparing vectors with different element types} } */
>   
>     /* Pointer comparison.  */
>   
> @@ -234,24 +234,24 @@ __m128bf16 footest (__m128bf16 vector0)
>   
>     /* Unary operators.  */
>   
> -  +vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  -vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ~vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  !vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  +vector0;
> +  -vector0;
> +  ~vector0; /* { dg-error {wrong type argument to bit-complement} } */
> +  !vector0; /* { dg-error {wrong type argument to unary exclamation mark} } */
>     *vector0; /* { dg-error {invalid type argument of unary '\*'} } */
> -  __real vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  __imag vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  ++vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  --vector0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0++; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0--; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  __real vector0; /* { dg-error {wrong type argument to __real} } */
> +  __imag vector0; /* { dg-error {wrong type argument to __imag} } */
> +  ++vector0;
> +  --vector0;
> +  vector0++;
> +  vector0--;
>   
>     /* Binary arithmetic operations.  */
>   
> -  vector0 = glob_bfloat_vec + *bfloat_ptr; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + 0; /* { dg-error {operation not permitted on type '__bf16'} } */
> -  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {operation not permitted on type '__bf16'} } */
> +  vector0 = glob_bfloat_vec + *bfloat_ptr;
> +  vector0 = glob_bfloat_vec + 0.1; /* { dg-error {conversion of scalar 'double' to vector '__m128bf16'} } */
> +  vector0 = glob_bfloat_vec + 0;
> +  vector0 = glob_bfloat_vec + is_a_float_vec; /* { dg-error {invalid operands to binary \+} } */
>   
>     return vector0;
>   }
> --- gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C.jj	2022-10-01 21:44:52.512002825 +0200
> +++ gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C	2022-10-03 22:46:34.252786651 +0200
> @@ -5,6 +5,6 @@ void foo (void)
>   {
>     __bf16 (); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>     __bf16 a = __bf16(); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> -  __bf16 (0x1234); /* { dg-error {invalid conversion to type '__bf16'} } */
> -  __bf16 (0.1); /* { dg-error {invalid conversion to type '__bf16'} } */
> +  __bf16 (0x1234); /* { dg-bogus {invalid conversion to type '__bf16'} } */
> +  __bf16 (0.1); /* { dg-bogus {invalid conversion to type '__bf16'} } */
>   }
> --- libcpp/include/cpplib.h.jj	2022-09-29 18:11:28.760749857 +0200
> +++ libcpp/include/cpplib.h	2022-10-03 11:10:11.084028291 +0200
> @@ -1275,6 +1275,7 @@ struct cpp_num
>   #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
>   
>   #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
> +#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
>   
>   #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
>   					      of N, divided by 16.  */
> --- libcpp/expr.cc.jj	2022-09-29 18:11:28.760749857 +0200
> +++ libcpp/expr.cc	2022-10-03 11:10:11.107027980 +0200
> @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
>     size_t orig_len = len;
>     const uchar *orig_s = s;
>     size_t flags;
> -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
>   
>     flags = 0;
> -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
>   
>     /* The following decimal float suffixes, from TR 24732:2009, TS
>        18661-2:2015 and C2X, are supported:
> @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
>        w, W - machine-specific type such as __float80 (GNU extension).
>        q, Q - machine-specific type such as __float128 (GNU extension).
>        fN, FN - _FloatN (TS 18661-3:2015).
> -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
>   
>     /* Process decimal float suffixes, which are two letters starting
>        with d or D.  Order and case are significant.  */
> @@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
>   		fn++;
>   	    }
>   	  break;
> +	case 'b': case 'B':
> +	  if (len > 2
> +	      /* Except for bf16 / BF16 where case is significant.  */
> +	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
> +	      && s[2] == '1'
> +	      && s[3] == '6'
> +	      && CPP_OPTION (pfile, cplusplus))
> +	    {
> +	      bf16++;
> +	      len -= 3;
> +	      s += 3;
> +	      break;
> +	    }
> +	  return 0;
>   	case 'd': case 'D': d++; break;
>   	case 'l': case 'L': l++; break;
>   	case 'w': case 'W': w++; break;
> @@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
>        of N larger than can be represented in the return value.  The
>        caller is responsible for rejecting _FloatN suffixes where
>        _FloatN is not supported on the chosen target.  */
> -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
>       return 0;
>     if (fn_bits > CPP_FLOATN_MAX)
>       return 0;
> @@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
>   	     q ? CPP_N_MD_Q :
>   	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
>   	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> +	     bf16 ? CPP_N_BFLOAT16 :
>   	     CPP_N_DEFAULT));
>   }
>   
> --- libgcc/config/i386/t-softfp.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/t-softfp	2022-10-03 11:10:11.158027289 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>   
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +		      tfbf xfbf dfbf sfbf hfbf
>   
>   softfp_extras += eqhf2
>   
> @@ -15,11 +16,17 @@ CFLAGS-extendhfsf2.c += -msse2
>   CFLAGS-extendhfdf2.c += -msse2
>   CFLAGS-extendhftf2.c += -msse2
>   CFLAGS-extendhfxf2.c += -msse2
> +CFLAGS-extendbfsf2.c += -msse2
>   
>   CFLAGS-truncsfhf2.c += -msse2
>   CFLAGS-truncdfhf2.c += -msse2
>   CFLAGS-truncxfhf2.c += -msse2
>   CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-truncsfbf2.c += -msse2
> +CFLAGS-truncdfbf2.c += -msse2
> +CFLAGS-truncxfbf2.c += -msse2
> +CFLAGS-trunctfbf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>   
>   CFLAGS-eqhf2.c += -msse2
>   CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/libgcc-glibc.ver	2022-10-03 11:10:11.168027153 +0200
> @@ -214,3 +214,13 @@ GCC_12.0.0 {
>     __trunctfhf2
>     __truncxfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/i386/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_QNANNEGATEDP 0
>   
>   #define _FP_NANSIGN_H		1
> +#define _FP_NANSIGN_B		1
>   #define _FP_NANSIGN_S		1
>   #define _FP_NANSIGN_D		1
>   #define _FP_NANSIGN_E		1
> --- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/64/sfp-machine.h	2022-10-03 11:10:11.181026977 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D
>   #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
> --- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-29 18:11:28.761749843 +0200
> +++ libgcc/config/i386/32/sfp-machine.h	2022-10-03 11:10:11.182026963 +0200
> @@ -87,6 +87,7 @@
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   /* Even if XFmode is 12byte,  we have to pad it to
> --- libgcc/soft-fp/brain.h.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/brain.h	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H	1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B		8
> +#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B		8
> +#define _FP_EXPBIAS_B		127
> +#define _FP_EXPMAX_B		255
> +
> +#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B	\
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)		_FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)			\
> +  do						\
> +    {						\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_B(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/truncsfbf2.c	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj	2022-10-03 11:10:11.182026963 +0200
> +++ libgcc/soft-fp/truncdfbf2.c	2022-10-03 11:10:11.182026963 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/truncxfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/trunctfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/trunchfbf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/truncbfhf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj	2022-10-03 11:10:11.183026950 +0200
> +++ libgcc/soft-fp/extendbfsf2.c	2022-10-03 11:10:11.183026950 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libiberty/cp-demangle.h.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/cp-demangle.h	2022-10-03 11:10:11.184026936 +0200
> @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
>   extern const struct demangle_operator_info cplus_demangle_operators[];
>   #endif
>   
> -#define D_BUILTIN_TYPE_COUNT (35)
> +#define D_BUILTIN_TYPE_COUNT (36)
>   
>   CP_STATIC_IF_GLIBCPP_V3
>   const struct demangle_builtin_type_info
> --- libiberty/cp-demangle.c.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/cp-demangle.c	2022-10-03 11:39:01.324587895 +0200
> @@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
>     /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
>   	     D_PRINT_DEFAULT },
>     /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
> +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
>   };
>   
>   CP_STATIC_IF_GLIBCPP_V3
> @@ -2753,11 +2754,22 @@ cplus_demangle_type (struct d_info *di)
>   
>   	case 'F':
>   	  /* DF<number>_ - _Float<number>.
> -	     DF<number>x - _Float<number>x.  */
> +	     DF<number>x - _Float<number>x
> +	     DF16b - std::bfloat16_t.  */
>   	  {
>   	    int arg = d_number (di);
>   	    char buf[12];
>   	    char suffix = 0;
> +	    if (d_peek_char (di) == 'b')
> +	      {
> +		if (arg != 16)
> +		  return NULL;
> +		d_advance (di, 1);
> +		ret = d_make_builtin_type (di,
> +					   &cplus_demangle_builtin_types[35]);
> +		di->expansion += ret->u.s_builtin.type->len;
> +		break;
> +	      }
>   	    if (d_peek_char (di) == 'x')
>   	      suffix = 'x';
>   	    if (!suffix && d_peek_char (di) != '_')
> --- libiberty/testsuite/demangle-expected.jj	2022-09-29 18:11:28.762749829 +0200
> +++ libiberty/testsuite/demangle-expected	2022-10-03 11:39:12.666434242 +0200
> @@ -1249,6 +1249,10 @@ xxx
>   _Z3xxxDF32xDF64xDF128xCDF32xVb
>   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
>   xxx
> +--format=auto --no-params
> +_Z3xxxDF16b
> +xxx(std::bfloat16_t)
> +xxx
>   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
>   --format=auto --no-params
>   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> 
> 
> 	Jakub
>

next prev parent reply	other threads:[~2022-10-04 21:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29 15:55 [RFC PATCH] c++, i386, arm, aarch64, " Jakub Jelinek
2022-09-30 13:49 ` Jason Merrill
2022-09-30 14:08   ` Jakub Jelinek
2022-09-30 18:21     ` Joseph Myers
2022-09-30 18:38       ` Jakub Jelinek
2022-09-30 19:27         ` Jonathan Wakely
2022-10-04  9:06     ` [PATCH] middle-end, c++, i386, " Jakub Jelinek
2022-10-04 15:54       ` Joseph Myers
2022-10-04 21:50       ` Jason Merrill [this message]
2022-10-05 13:47         ` Jakub Jelinek
2022-10-05 20:02           ` Jason Merrill
2022-10-12  8:23             ` [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE Jakub Jelinek
2022-10-12 10:15               ` Richard Sandiford
2022-10-12 11:07                 ` [PATCH] machmode, v2: " Jakub Jelinek
2022-10-12 11:49                   ` Richard Sandiford
2022-10-12 10:37               ` [PATCH] machmode: " Eric Botcazou
2022-10-12 10:57                 ` Jakub Jelinek
2022-10-13 16:50             ` [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support Jakub Jelinek
2022-10-13 19:37               ` Jason Merrill
2022-10-13 21:11                 ` Uros Bizjak
2022-10-13 21:35                   ` Jakub Jelinek
2022-10-13 21:46                     ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55062a15-79a1-f8cf-ed20-25ca8ff42abe@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=joseph@codesourcery.com \
    --cc=rguenther@suse.de \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).