From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id F00923858D38
	for <gcc-patches@gcc.gnu.org>; Fri, 30 Sep 2022 13:49:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F00923858D38
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1664545754;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Vx41JMo1qBI7nABFrkqmImy4GC1uysiB4iGBnRBzSqQ=;
	b=frVLg/lhVDtI0mdgXwAJIP6nX6ANDBsvJucyYTC9yMmbuPPEwmQvtoS2DNhwzRqKMvc8py
	4rSz/auWlqUhlvNhAwB/zSLPnLHilK4YFMyf1Cy1ZsnTYQ/dbc3RVGenB3SBLMXuEvUq5q
	ZUqj3b3Ou/sA/1+av8cm1tnoM+2q0wM=
Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com
 [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-136-qkqGe5zAMJ6AEuZlyI6FMA-1; Fri, 30 Sep 2022 09:49:13 -0400
X-MC-Unique: qkqGe5zAMJ6AEuZlyI6FMA-1
Received: by mail-qv1-f70.google.com with SMTP id ll6-20020a056214598600b004af9fc1faf8so2885626qvb.2
        for <gcc-patches@gcc.gnu.org>; Fri, 30 Sep 2022 06:49:13 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date;
        bh=lqrLszrdg836LGrD+blR42p+RrIYxFG3DzaR8+KMNCI=;
        b=FHGlss8EzhY8GVSW+p797gpeqCf7++TD0+xsgYeXRjPGC4GPdfIgj5MlsZVoDLnkza
         PuLZnEeI6SEgZV5zzqjfktEajq3UfRhRElnGjNAAEtogeZYvN+vSpGQC7QAShfq+6a8n
         Nc2DpBope8eaIqdZLInYkrjC7NHi6SpHyMAiSuT8tJ+ITnKFaCjbdjmM24657U2jSg2N
         a6RWfJE4wMnkb4Tp7ukQmScPbcJVnUHZCnWOpsQvDGyllHG1RYllGkx/5HqkrFWYLk6x
         Q8VF4MRCAROS0tg6JPds1GMNykyw8fSCVRQurbLh4JS/32t/2E2r4EK8j3NR6NiRY1mO
         UiSw==
X-Gm-Message-State: ACrzQf1GdNRP8EuHWzSiOOIhvGc11S9D88oQkeYAOlSCy2qP8HRN6HHN
	RTs3+W7JYfQU9xcp86s9XQoJfPjkdlme1j5AY3FzYdrUfBTlSriCYDYqVuZE35th1t8arvFCHFY
	oRxwjx2s2XirACuesMA==
X-Received: by 2002:a05:6214:1c09:b0:4ac:9160:7484 with SMTP id u9-20020a0562141c0900b004ac91607484mr6725064qvc.13.1664545751634;
        Fri, 30 Sep 2022 06:49:11 -0700 (PDT)
X-Google-Smtp-Source: AMsMyM79/7RjPO+ONRdt41DE8an+HjRQf4nhUKTCiwi9eDwTzAtcPyQ+hd+BCrlrAHX2DJRVFopFMQ==
X-Received: by 2002:a05:6214:1c09:b0:4ac:9160:7484 with SMTP id u9-20020a0562141c0900b004ac91607484mr6724984qvc.13.1664545750576;
        Fri, 30 Sep 2022 06:49:10 -0700 (PDT)
Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43])
        by smtp.gmail.com with ESMTPSA id v6-20020a05622a014600b00342f8143599sm2057093qtw.13.2022.09.30.06.49.09
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 30 Sep 2022 06:49:09 -0700 (PDT)
Message-ID: <37522634-319a-b471-aa35-87e711b0479e@redhat.com>
Date: Fri, 30 Sep 2022 09:49:08 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.1
Subject: Re: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and
 __bf16 arithmetic support
To: Jakub Jelinek <jakub@redhat.com>,
 "Joseph S. Myers" <joseph@codesourcery.com>, Hongtao Liu
 <crazylht@gmail.com>, hjl.tools@gmail.com,
 Richard Earnshaw <richard.earnshaw@arm.com>,
 Kyrylo Tkachov <kyrylo.tkachov@arm.com>, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org
References: <YzXABvJX2wl3gHkK@tucnak>
From: Jason Merrill <jason@redhat.com>
In-Reply-To: <YzXABvJX2wl3gHkK@tucnak>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-19.5 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_MANYTO,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 9/29/22 11:55, Jakub Jelinek wrote:
> Hi!
> 
> Here is more complete patch to add std::bfloat16_t support on
> x86, AArch64 and (only partially) on ARM 32-bit.  No BFmode optabs
> are added by the patch, so for binops/unops it extends to SFmode
> first and then truncates back to BFmode.
> For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
> of all those conversions so that we avoid double rounding, for
> BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
> it emits BFmode -> SFmode conversion first and then converts to the even
> wider mode, neither step should be imprecise.
> For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
> and then SFmode -> HFmode, because neither format is subset or superset
> of the other, while SFmode is superset of both.
> expr.cc then contains a -ffast-math optimization of the BF -> SF and
> SF -> BF conversions if we don't optimize for space (and for the latter
> if -frounding-math isn't enabled either).
> For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
> but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
> || !flag_unsafe_math_optimizations, because I think the insn doesn't
> raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
> In C by default (unless x86 -fexcess-precision=16) we use float excess
> precision for BFmode, so truncate only on explicit casts and assignments.
> In C++ unfortunately (but that is the case of also _Float16) we don't
> support excess precision yet which means that for
> __bf16 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) { return a * b + c * d; }
> we do a lot of conversions.

The comment from Apple on the ABI mangling proposal suggests to me that 
we might want to delay enabling C++ std::bfloat16_t (i.e. defining 
__STDCPP_BFLOAT16_T__) until we have that excess precision support?

"Steve [Cannon] is concerned that adding this type as an arithmetic type 
might serve to be an attractive nuisance. Because the precision of 
bfloat16 is so limited, controlling when truncation back to bfloat16 
occurs is of paramount practical importance to bfloat16 users. The 
normal semantics of an arithmetic type in C and C++ encourage the 
independent evaluation of operations, which would require an implicit 
truncation back to bfloat16 on every intermediate result. That would 
have catastrophic effects on both the precision and the performance of 
typical bfloat16 code. For example, on the performance side, typical 
hardware support is built around complex fused operations (e.g. float32 
+= bfloat16 * bfloat16 + bfloat16 * bfloat16, with all intermediate 
results computed in float32) that it would not be correct to 
pattern-match from independent operations.

Now, C and C++ do allow excess precision evaluation (C 6.5p8; C++ 
[expr.pre]p6), and Steve and I think that that might fix this problem. 
But we'd really need to force excess precision evaluation in order to 
get acceptable results; otherwise, allowing arithmetic is really just 
encouraging people to write code that is effectively incorrect. And even 
then there's definitely risk that someone might e.g. accumulate the 
intermediate results of a loop in std::bfloat16_t instead of in float."

> The aarch64 part is untested but has a chance of working (IMHO),
> though I'd appreciate if ARM maintainers could decide whether it is
> acceptable for them that __bf16 changes mangling and will allow arithmetics
> and conversions.
> The arm part is partial, libgcc side is missing as the target doesn't really
> seem to use soft-fp right now.  Perhaps the config/arm/ changes can be
> left out from the patch (thus keep ARM 32-bit __bf16 as before) and support
> for it can be done at some later time.
> 
> Thoughts on this?
> 
> 2022-09-29  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/
> 	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> 	* tree.h (bfloat16_type_node): Define.
> 	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
> 	like float16_type_mode.
> 	* expmed.h (maybe_expand_shift): Declare.
> 	* expmed.cc (maybe_expand_shift): No longer static.
> 	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> 	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> 	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> 	-ffast-math generic implementation for BF -> SF and SF -> BF
> 	conversions.
> 	* config/arm/arm.h (arm_bf16_type_node): Remove.
> 	(arm_bf16_ptr_type_node): Adjust comment.
> 	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	(arm_mangle_type): Mangle BFmode as DFb16_.

If we're using DF32x for _Float32x, maybe we want DF16b for bfloat16?

> 	(arm_invalid_conversion): Only reject BF <-> HF conversions if
> 	HFmode is non-IEEE format.
> 	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
> 	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
> 	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
> 	arm_bf16_type_node.
> 	(arm_init_simd_builtin_types): Likewise.
> 	(arm_init_simd_builtin_scalar_types): Likewise.
> 	(arm_init_bf16_types): Likewise.
> 	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
> 	(ix86_invalid_conversion, ix86_invalid_unary_op,
> 	ix86_invalid_binary_op): Remove.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> 	TARGET_INVALID_BINARY_OP): Don't redefine.
> 	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> 	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> 	ix86_bf16_type_node.
> 	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> 	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
> 	(aarch64_bf16_ptr_type_node): Adjust comment.
> 	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
> 	bfloat16_type_node rather than aarch64_bf16_type_node.
> 	(aarch64_mangle_type): Mangle BFmode as DFb16_.
> 	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
> 	aarch64_invalid_binary_op): Remove BFmode related rejections.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
> 	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
> 	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
> 	aarch64_bf16_type_node.
> 	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
> 	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
> gcc/c-family/
> 	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> 	predefine for C++ __BFLT16_*__ macros and for C++23 also
> 	__STDCPP_BFLOAT16_T__.
> 	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
> gcc/cp/
> 	* cp-tree.h (extended_float_type_p): Return true for
> 	bfloat16_type_node.
> 	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
> 	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
> libcpp/
> 	* include/cpplib.h (CPP_N_BFLOAT16): Define.
> 	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
> 	C++.
> libgcc/
> 	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
> 	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
> 	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
> 	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
> 	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
> 	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
> 	* soft-fp/brain.h: New file.
> 	* soft-fp/truncsfbf2.c: New file.
> 	* soft-fp/truncdfbf2.c: New file.
> 	* soft-fp/truncxfbf2.c: New file.
> 	* soft-fp/trunctfbf2.c: New file.
> 	* soft-fp/trunchfbf2.c: New file.
> 	* soft-fp/truncbfhf2.c: New file.
> 	* soft-fp/extendbfsf2.c: New file.
> libiberty/
> 	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
> 	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
> 	entry.
> 	(cplus_demangle_type): Demangle DFb16_.
> 	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.
> 
> --- gcc/tree-core.h.jj	2022-09-29 09:13:25.717718458 +0200
> +++ gcc/tree-core.h	2022-09-29 12:40:17.417778754 +0200
> @@ -665,6 +665,9 @@ enum tree_index {
>     TI_DOUBLE_TYPE,
>     TI_LONG_DOUBLE_TYPE,
>   
> +  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
> +  TI_BFLOAT16_TYPE,
> +
>     /* The _FloatN and _FloatNx types must be consecutive, and in the
>        same sequence as the corresponding complex types, which must also
>        be consecutive; _FloatN must come before _FloatNx; the order must
> --- gcc/tree.h.jj	2022-09-29 09:13:25.720718416 +0200
> +++ gcc/tree.h	2022-09-29 12:40:17.416778768 +0200
> @@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
>   #define float_type_node			global_trees[TI_FLOAT_TYPE]
>   #define double_type_node		global_trees[TI_DOUBLE_TYPE]
>   #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
> +#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
>   
>   /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
>   #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
> --- gcc/tree.cc.jj	2022-09-29 09:13:31.328641080 +0200
> +++ gcc/tree.cc	2022-09-29 12:40:17.400778985 +0200
> @@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
>       = (flag_excess_precision == EXCESS_PRECISION_FAST
>          ? EXCESS_PRECISION_TYPE_FAST
>          : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
> -	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
> +	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
>   
>     enum flt_eval_method target_flt_eval_method
>       = targetm.c.excess_precision (requested_type);
> @@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
>     machine_mode float16_type_mode = (float16_type_node
>   				    ? TYPE_MODE (float16_type_node)
>   				    : VOIDmode);
> +  machine_mode bfloat16_type_mode = (bfloat16_type_node
> +				     ? TYPE_MODE (bfloat16_type_node)
> +				     : VOIDmode);
>     machine_mode float_type_mode = TYPE_MODE (float_type_node);
>     machine_mode double_type_mode = TYPE_MODE (double_type_node);
>   
> @@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return long_double_type_node;
> @@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
>   	switch (target_flt_eval_method)
>   	  {
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
> -	    if (type_mode == float16_type_mode)
> +	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode)
>   	      return complex_float_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode)
>   	      return complex_double_type_node;
>   	    break;
>   	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
>   	    if (type_mode == float16_type_mode
> +		|| type_mode == bfloat16_type_mode
>   		|| type_mode == float_type_mode
>   		|| type_mode == double_type_mode)
>   	      return complex_long_double_type_node;
> --- gcc/expmed.h.jj	2022-07-26 10:32:23.681271790 +0200
> +++ gcc/expmed.h	2022-09-29 15:18:46.457023535 +0200
> @@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
>   				  rtx, tree, rtx, int);
>   extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
>   			 int);
> +extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
> +			       int);
>   #ifdef GCC_OPTABS_H
>   extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>   			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
> --- gcc/expmed.cc.jj	2022-08-31 10:20:20.000000000 +0200
> +++ gcc/expmed.cc	2022-09-29 15:17:52.224769673 +0200
> @@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
>   
>   /* Likewise, but return 0 if that cannot be done.  */
>   
> -static rtx
> +rtx
>   maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>   		    int amount, rtx target, int unsignedp)
>   {
> --- gcc/expr.cc.jj	2022-09-09 09:50:35.228575531 +0200
> +++ gcc/expr.cc	2022-09-29 17:09:46.716352938 +0200
> @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
>         gcc_assert ((GET_MODE_PRECISION (from_mode)
>   		   != GET_MODE_PRECISION (to_mode))
>   		  || (DECIMAL_FLOAT_MODE_P (from_mode)
> -		      != DECIMAL_FLOAT_MODE_P (to_mode)));
> +		      != DECIMAL_FLOAT_MODE_P (to_mode))
> +		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>   
>         if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   	/* Conversion between decimal float and binary float, same size.  */
> @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
>   	  return;
>   	}
>   
> +#ifdef HAVE_SFmode
> +      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
> +	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
> +	{
> +	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
> +	    {
> +	      /* To cut down on libgcc size, implement
> +		 BFmode -> {DF,XF,TF}mode conversions by
> +		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
> +	    {
> +	      /* Similarly, implement BFmode -> HFmode as
> +		 BFmode -> SFmode -> HFmode conversion where SFmode
> +		 has superset of BFmode values.  We don't need
> +		 to handle sNaNs by raising exception and turning
> +		 into into qNaN though, as that can be done in the
> +		 SFmode -> HFmode conversion too.  */
> +	      rtx temp = gen_reg_rtx (SFmode);
> +	      int save_flag_finite_math_only = flag_finite_math_only;
> +	      flag_finite_math_only = true;
> +	      convert_mode_scalar (temp, from, unsignedp);
> +	      flag_finite_math_only = save_flag_finite_math_only;
> +	      convert_mode_scalar (to, temp, unsignedp);
> +	      return;
> +	    }
> +	  if (to_mode == SFmode
> +	      && !HONOR_NANS (from_mode)
> +	      && !HONOR_NANS (to_mode)
> +	      && optimize_insn_for_speed_p ())
> +	    {
> +	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
> +		 shift the bits up.  */
> +	      machine_mode fromi_mode, toi_mode;
> +	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				     0).exists (&fromi_mode)
> +		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +					0).exists (&toi_mode))
> +		{
> +		  start_sequence ();
> +		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +		  rtx tof = NULL_RTX;
> +		  if (fromi)
> +		    {
> +		      rtx toi = gen_reg_rtx (toi_mode);
> +		      convert_mode_scalar (toi, fromi, 1);
> +		      toi
> +			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
> +					      GET_MODE_PRECISION (to_mode)
> +					      - GET_MODE_PRECISION (from_mode),
> +					      NULL_RTX, 1);
> +		      if (toi)
> +			{
> +			  tof = lowpart_subreg (to_mode, toi, toi_mode);
> +			  if (tof)
> +			    emit_move_insn (to, tof);
> +			}
> +		    }
> +		  insns = get_insns ();
> +		  end_sequence ();
> +		  if (tof)
> +		    {
> +		      emit_insn (insns);
> +		      return;
> +		    }
> +		}
> +	    }
> +	}
> +      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
> +	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
> +	  && !HONOR_NANS (from_mode)
> +	  && !HONOR_NANS (to_mode)
> +	  && !flag_rounding_math
> +	  && optimize_insn_for_speed_p ())
> +	{
> +	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
> +	     to nearest, we can expand the conversion inline as
> +	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> +	  machine_mode fromi_mode, toi_mode;
> +	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
> +				 0).exists (&fromi_mode)
> +	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
> +				    0).exists (&toi_mode))
> +	    {
> +	      start_sequence ();
> +	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
> +	      rtx tof = NULL_RTX;
> +	      do
> +		{
> +		  if (!fromi)
> +		    break;
> +		  int shift = (GET_MODE_PRECISION (from_mode)
> +			       - GET_MODE_PRECISION (to_mode));
> +		  rtx temp1
> +		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
> +					  shift, NULL_RTX, 1);
> +		  if (!temp1)
> +		    break;
> +		  rtx temp2
> +		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp2)
> +		    break;
> +		  rtx temp3
> +		    = expand_binop (fromi_mode, add_optab, fromi,
> +				    gen_int_mode ((HOST_WIDE_INT_1U
> +						   << (shift - 1)) - 1,
> +						  fromi_mode), NULL_RTX,
> +				    1, OPTAB_DIRECT);
> +		  if (!temp3)
> +		    break;
> +		  rtx temp4
> +		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
> +				    NULL_RTX, 1, OPTAB_DIRECT);
> +		  if (!temp4)
> +		    break;
> +		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
> +						  temp4, shift, NULL_RTX, 1);
> +		  if (!temp5)
> +		    break;
> +		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
> +		  if (!temp6)
> +		    break;
> +		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
> +					toi_mode);
> +		  if (tof)
> +		    emit_move_insn (to, tof);
> +		}
> +	      while (0);
> +	      insns = get_insns ();
> +	      end_sequence ();
> +	      if (tof)
> +		{
> +		  emit_insn (insns);
> +		  return;
> +		}
> +	    }
> +	}
> +#endif
> +
>         /* Otherwise use a libcall.  */
>         libcall = convert_optab_libfunc (tab, to_mode, from_mode);
>   
> --- gcc/config/arm/arm.h.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/arm/arm.h	2022-09-29 12:40:17.401778971 +0200
> @@ -78,9 +78,8 @@ extern void (*arm_lang_output_object_att
>      the backend.  Defined in arm-builtins.cc.  */
>   extern tree arm_fp16_type_node;
>   
> -/* This type is the user-visible __bf16.  We need it in a few places in
> -   the backend.  Defined in arm-builtins.cc.  */
> -extern tree arm_bf16_type_node;
> +/* The user-visible __bf16 uses bfloat16_type_node, but for pointer to that
> +   use backend specific tree.  Defined in arm-builtins.cc.  */
>   extern tree arm_bf16_ptr_type_node;
>   
>   
> --- gcc/config/arm/arm.cc.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/arm/arm.cc	2022-09-29 15:33:07.997170885 +0200
> @@ -688,12 +688,6 @@ static const struct attribute_spec arm_a
>   #undef TARGET_INVALID_CONVERSION
>   #define TARGET_INVALID_CONVERSION arm_invalid_conversion
>   
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
> -
>   #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
>   #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
>   
> @@ -30360,7 +30354,7 @@ arm_mangle_type (const_tree type)
>     if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
>       {
>         if (TYPE_MODE (type) == BFmode)
> -	return "u6__bf16";
> +	return "DFb16_";
>         else
>   	return "Dh";
>       }
> @@ -33996,47 +33990,22 @@ arm_invalid_conversion (const_tree fromt
>   {
>     if (element_mode (fromtype) != element_mode (totype))
>       {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<bfloat16_t%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<bfloat16_t%>");
> +      /* Do no allow conversions from BFmode to non-ieee HFmode
> +	 scalar types or vice versa.  */
> +      if (TYPE_MODE (fromtype) == BFmode
> +	  && TYPE_MODE (totype) == HFmode
> +	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
> +	return N_("invalid conversion from type %<bfloat16_t%> to %<__fp16%>");
> +      if (TYPE_MODE (totype) == BFmode
> +	  && TYPE_MODE (fromtype) == HFmode
> +	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
> +	return N_("invalid conversion to type %<bfloat16_t%> from %<__fp16%>");
>       }
>   
>     /* Conversion allowed.  */
>     return NULL;
>   }
>   
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -arm_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
>   
>      In VFPv1, VFP registers could only be accessed in the mode they were
> --- gcc/config/arm/arm-builtins.cc.jj	2022-09-29 09:13:25.681718954 +0200
> +++ gcc/config/arm/arm-builtins.cc	2022-09-29 12:40:17.405778917 +0200
> @@ -1370,7 +1370,6 @@ struct arm_simd_type_info arm_simd_types
>   tree arm_fp16_type_node = NULL_TREE;
>   
>   /* Back-end node type for brain float (bfloat) types.  */
> -tree arm_bf16_type_node = NULL_TREE;
>   tree arm_bf16_ptr_type_node = NULL_TREE;
>   
>   static tree arm_simd_intOI_type_node = NULL_TREE;
> @@ -1459,7 +1458,7 @@ arm_simd_builtin_std_type (machine_mode
>       case E_DFmode:
>         return double_type_node;
>       case E_BFmode:
> -      return arm_bf16_type_node;
> +      return bfloat16_type_node;
>       default:
>         gcc_unreachable ();
>       }
> @@ -1570,9 +1569,9 @@ arm_init_simd_builtin_types (void)
>     arm_simd_types[Float32x4_t].eltype = float_type_node;
>   
>     /* Init Bfloat vector types with underlying __bf16 scalar type.  */
> -  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
> -  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
> -  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
> +  arm_simd_types[Bfloat16x2_t].eltype = bfloat16_type_node;
> +  arm_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
> +  arm_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
>   
>     for (i = 0; i < nelts; i++)
>       {
> @@ -1658,7 +1657,7 @@ arm_init_simd_builtin_scalar_types (void
>   					     "__builtin_neon_df");
>     (*lang_hooks.types.register_builtin_type) (intTI_type_node,
>   					     "__builtin_neon_ti");
> -  (*lang_hooks.types.register_builtin_type) (arm_bf16_type_node,
> +  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
>                                                "__builtin_neon_bf");
>     /* Unsigned integer types for various mode sizes.  */
>     (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
> @@ -1797,13 +1796,13 @@ arm_init_builtin (unsigned int fcode, ar
>   static void
>   arm_init_bf16_types (void)
>   {
> -  arm_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (arm_bf16_type_node) = 16;
> -  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
> -  layout_type (arm_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
> -  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
> -  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
> +  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +  arm_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>   }
>   
>   /* Set up ACLE builtins, even builtins for instructions that are not
> --- gcc/config/i386/i386.cc.jj	2022-09-29 12:03:12.073350093 +0200
> +++ gcc/config/i386/i386.cc	2022-09-29 12:40:17.409778863 +0200
> @@ -22728,7 +22728,7 @@ ix86_mangle_type (const_tree type)
>     switch (TYPE_MODE (type))
>       {
>       case E_BFmode:
> -      return "u6__bf16";
> +      return "DFb16_";
>       case E_HFmode:
>         /* _Float16 is "DF16_".
>   	 Align with clang's decision in https://reviews.llvm.org/D33719. */
> @@ -22747,55 +22747,6 @@ ix86_mangle_type (const_tree type)
>       }
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<__bf16%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<__bf16%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the binary operation OP is
> -   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
> -
> -static const char *
> -ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
> -			   const_tree type2)
> -{
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<__bf16%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   static GTY(()) tree ix86_tls_stack_chk_guard_decl;
>   
>   static tree
> @@ -24853,15 +24804,6 @@ ix86_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE ix86_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
> -
> -#undef TARGET_INVALID_BINARY_OP
> -#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
> -
>   #undef TARGET_STACK_PROTECT_GUARD
>   #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
>   
> --- gcc/config/i386/i386-builtins.cc.jj	2022-09-29 09:13:25.710718554 +0200
> +++ gcc/config/i386/i386-builtins.cc	2022-09-29 12:40:17.406778903 +0200
> @@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>   static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>   
>   tree ix86_float16_type_node = NULL_TREE;
> -tree ix86_bf16_type_node = NULL_TREE;
>   tree ix86_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Retrieve an element from the above table, building some of
> @@ -1372,16 +1371,15 @@ ix86_register_float16_builtin_type (void
>   static void
>   ix86_register_bf16_builtin_type (void)
>   {
> -  ix86_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (ix86_bf16_type_node) = 16;
> -  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
> -  layout_type (ix86_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
>     if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
>       {
> -      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
> -					    "__bf16");
> -      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
> +      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>       }
>   }
>   
> --- gcc/config/i386/i386-builtin-types.def.jj	2022-09-29 09:13:25.709718568 +0200
> +++ gcc/config/i386/i386-builtin-types.def	2022-09-29 12:40:17.406778903 +0200
> @@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
>   DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>   DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> -DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>   DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>   DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> --- gcc/config/aarch64/aarch64.h.jj	2022-09-29 09:13:25.680718968 +0200
> +++ gcc/config/aarch64/aarch64.h	2022-09-29 12:40:17.409778863 +0200
> @@ -1337,9 +1337,8 @@ extern const char *aarch64_rewrite_mcpu
>   extern GTY(()) tree aarch64_fp16_type_node;
>   extern GTY(()) tree aarch64_fp16_ptr_type_node;
>   
> -/* This type is the user-visible __bf16, and a pointer to that type.  Defined
> -   in aarch64-builtins.cc.  */
> -extern GTY(()) tree aarch64_bf16_type_node;
> +/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
> +   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
>   extern GTY(()) tree aarch64_bf16_ptr_type_node;
>   
>   /* The generic unwind code in libgcc does not initialize the frame pointer.
> --- gcc/config/aarch64/aarch64.cc.jj	2022-09-29 09:13:25.680718968 +0200
> +++ gcc/config/aarch64/aarch64.cc	2022-09-29 12:40:17.413778808 +0200
> @@ -19741,7 +19741,7 @@ aarch64_gimplify_va_arg_expr (tree valis
>   	  field_ptr_t = aarch64_fp16_ptr_type_node;
>   	  break;
>   	case E_BFmode:
> -	  field_t = aarch64_bf16_type_node;
> +	  field_t = bfloat16_type_node;
>   	  field_ptr_t = aarch64_bf16_ptr_type_node;
>   	  break;
>   	case E_V2SImode:
> @@ -20645,7 +20645,7 @@ aarch64_mangle_type (const_tree type)
>     if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
>       {
>         if (TYPE_MODE (type) == BFmode)
> -	return "u6__bf16";
> +	return "DFb16_";
>         else
>   	return "Dh";
>       }
> @@ -26820,39 +26820,6 @@ aarch64_stack_protect_guard (void)
>     return NULL_TREE;
>   }
>   
> -/* Return the diagnostic message string if conversion from FROMTYPE to
> -   TOTYPE is not allowed, NULL otherwise.  */
> -
> -static const char *
> -aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
> -{
> -  if (element_mode (fromtype) != element_mode (totype))
> -    {
> -      /* Do no allow conversions to/from BFmode scalar types.  */
> -      if (TYPE_MODE (fromtype) == BFmode)
> -	return N_("invalid conversion from type %<bfloat16_t%>");
> -      if (TYPE_MODE (totype) == BFmode)
> -	return N_("invalid conversion to type %<bfloat16_t%>");
> -    }
> -
> -  /* Conversion allowed.  */
> -  return NULL;
> -}
> -
> -/* Return the diagnostic message string if the unary operation OP is
> -   not permitted on TYPE, NULL otherwise.  */
> -
> -static const char *
> -aarch64_invalid_unary_op (int op, const_tree type)
> -{
> -  /* Reject all single-operand operations on BFmode except for &.  */
> -  if (element_mode (type) == BFmode && op != ADDR_EXPR)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
> -  /* Operation allowed.  */
> -  return NULL;
> -}
> -
>   /* Return the diagnostic message string if the binary operation OP is
>      not permitted on TYPE1 and TYPE2, NULL otherwise.  */
>   
> @@ -26860,11 +26827,6 @@ static const char *
>   aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
>   			   const_tree type2)
>   {
> -  /* Reject all 2-operand operations on BFmode.  */
> -  if (element_mode (type1) == BFmode
> -      || element_mode (type2) == BFmode)
> -    return N_("operation not permitted on type %<bfloat16_t%>");
> -
>     if (VECTOR_TYPE_P (type1)
>         && VECTOR_TYPE_P (type2)
>         && !TYPE_INDIVISIBLE_P (type1)
> @@ -27461,12 +27423,6 @@ aarch64_libgcc_floating_mode_supported_p
>   #undef TARGET_MANGLE_TYPE
>   #define TARGET_MANGLE_TYPE aarch64_mangle_type
>   
> -#undef TARGET_INVALID_CONVERSION
> -#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
> -
> -#undef TARGET_INVALID_UNARY_OP
> -#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
> -
>   #undef TARGET_INVALID_BINARY_OP
>   #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
>   
> --- gcc/config/aarch64/aarch64-builtins.cc.jj	2022-09-29 09:13:25.676719023 +0200
> +++ gcc/config/aarch64/aarch64-builtins.cc	2022-09-29 12:40:17.410778849 +0200
> @@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
>   tree aarch64_fp16_ptr_type_node = NULL_TREE;
>   
>   /* Back-end node type for brain float (bfloat) types.  */
> -tree aarch64_bf16_type_node = NULL_TREE;
>   tree aarch64_bf16_ptr_type_node = NULL_TREE;
>   
>   /* Wrapper around add_builtin_function.  NAME is the name of the built-in
> @@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
>       case E_DFmode:
>         return double_type_node;
>       case E_BFmode:
> -      return aarch64_bf16_type_node;
> +      return bfloat16_type_node;
>       default:
>         gcc_unreachable ();
>       }
> @@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
>     aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>   
>     /* Init Bfloat vector types with underlying __bf16 type.  */
> -  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
> -  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> +  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
> +  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
>   
>     for (i = 0; i < nelts; i++)
>       {
> @@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
>   					     "__builtin_aarch64_simd_poly128");
>     (*lang_hooks.types.register_builtin_type) (intTI_type_node,
>   					     "__builtin_aarch64_simd_ti");
> -  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
> +  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
>   					     "__builtin_aarch64_simd_bf");
>     /* Unsigned integer types for various mode sizes.  */
>     (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
> @@ -1682,13 +1681,13 @@ aarch64_init_fp16_types (void)
>   static void
>   aarch64_init_bf16_types (void)
>   {
> -  aarch64_bf16_type_node = make_node (REAL_TYPE);
> -  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
> -  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
> -  layout_type (aarch64_bf16_type_node);
> +  bfloat16_type_node = make_node (REAL_TYPE);
> +  TYPE_PRECISION (bfloat16_type_node) = 16;
> +  SET_TYPE_MODE (bfloat16_type_node, BFmode);
> +  layout_type (bfloat16_type_node);
>   
> -  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
> -  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
> +  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
> +  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
>   }
>   
>   /* Pointer authentication builtins that will become NOP on legacy platform.
> --- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2022-09-29 09:13:25.676719023 +0200
> +++ gcc/config/aarch64/aarch64-sve-builtins.def	2022-09-29 12:40:17.413778808 +0200
> @@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
>   DEF_SVE_MODE (vnum, none, none, vectors)
>   
>   DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
> -DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
> +DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
>   DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
>   DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
>   DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
> --- gcc/c-family/c-cppbuiltin.cc.jj	2022-09-29 09:13:25.675719037 +0200
> +++ gcc/c-family/c-cppbuiltin.cc	2022-09-29 12:40:17.416778768 +0200
> @@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
>         builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
>   				      csuffix, FLOATN_NX_TYPE_NODE (i));
>       }
> +  if (bfloat16_type_node && c_dialect_cxx ())
> +    {
> +      if (cxx_dialect > cxx20)
> +	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
> +      builtin_define_float_constants ("BFLT16", "BF16", "%s",
> +				      "BF16", bfloat16_type_node);
> +    }
>   
>     /* For float.h.  */
>     if (targetm.decimal_float_supported_p ())
> --- gcc/c-family/c-lex.cc.jj	2022-09-29 09:13:25.675719037 +0200
> +++ gcc/c-family/c-lex.cc	2022-09-29 12:40:17.416778768 +0200
> @@ -995,6 +995,19 @@ interpret_float (const cpp_token *token,
>   	  pedwarn (input_location, OPT_Wpedantic,
>   		   "non-standard suffix on floating constant");
>         }
> +    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
> +      {
> +	type = bfloat16_type_node;
> +	if (type == NULL_TREE)
> +	  {
> +	    error ("unsupported non-standard suffix on floating constant");
> +	    return error_mark_node;
> +	  }
> +	if (cxx_dialect < cxx23)
> +	  pedwarn (input_location, OPT_Wpedantic,
> +		   "%<bf16%> or %<BF16%> suffix on floating constant only "
> +		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
> +      }
>       else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
>         type = long_double_type_node;
>       else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
> --- gcc/cp/cp-tree.h.jj	2022-09-29 09:13:31.164643341 +0200
> +++ gcc/cp/cp-tree.h	2022-09-29 12:40:17.414778795 +0200
> @@ -8714,6 +8714,8 @@ extended_float_type_p (tree type)
>     for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
>       if (type == FLOATN_TYPE_NODE (i))
>         return true;
> +  if (type == bfloat16_type_node)
> +    return true;
>     return false;
>   }
>   
> --- gcc/cp/typeck.cc.jj	2022-09-29 09:13:25.716718472 +0200
> +++ gcc/cp/typeck.cc	2022-09-29 12:40:17.415778781 +0200
> @@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
>         if (mv2 == FLOATN_NX_TYPE_NODE (i))
>   	extended2 = i + 1;
>       }
> +  if (mv1 == bfloat16_type_node)
> +    extended1 = true;
> +  if (mv2 == bfloat16_type_node)
> +    extended2 = true;
>     if (extended2 && !extended1)
>       {
>         int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
> @@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
>     if (cnt > 1 && mv2 == long_double_type_node)
>       return -2;
>     /* Otherwise, they have equal rank, but extended types
> -     (other than std::bfloat16_t) have higher subrank.  */
> +     (other than std::bfloat16_t) have higher subrank.
> +     std::bfloat16_t shouldn't have equal rank to any standard
> +     floating point type.  */
>     return 1;
>   }
>   
> --- libcpp/include/cpplib.h.jj	2022-09-08 13:01:19.853771383 +0200
> +++ libcpp/include/cpplib.h	2022-09-28 19:06:59.615380690 +0200
> @@ -1275,6 +1275,7 @@ struct cpp_num
>   #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
>   
>   #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
> +#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
>   
>   #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
>   					      of N, divided by 16.  */
> --- libcpp/expr.cc.jj	2022-09-27 08:03:27.119982735 +0200
> +++ libcpp/expr.cc	2022-09-28 17:55:36.667177540 +0200
> @@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
>     size_t orig_len = len;
>     const uchar *orig_s = s;
>     size_t flags;
> -  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
> +  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
>   
>     flags = 0;
> -  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
> +  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
>   
>     /* The following decimal float suffixes, from TR 24732:2009, TS
>        18661-2:2015 and C2X, are supported:
> @@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
>        w, W - machine-specific type such as __float80 (GNU extension).
>        q, Q - machine-specific type such as __float128 (GNU extension).
>        fN, FN - _FloatN (TS 18661-3:2015).
> -     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
> +     fNx, FNx - _FloatNx (TS 18661-3:2015).
> +     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
>   
>     /* Process decimal float suffixes, which are two letters starting
>        with d or D.  Order and case are significant.  */
> @@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
>   		fn++;
>   	    }
>   	  break;
> +	case 'b': case 'B':
> +	  if (len > 2
> +	      /* Except for bf16 / BF16 where case is significant.  */
> +	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
> +	      && s[2] == '1'
> +	      && s[3] == '6'
> +	      && CPP_OPTION (pfile, cplusplus))
> +	    {
> +	      bf16++;
> +	      len -= 3;
> +	      s += 3;
> +	      break;
> +	    }
> +	  return 0;
>   	case 'd': case 'D': d++; break;
>   	case 'l': case 'L': l++; break;
>   	case 'w': case 'W': w++; break;
> @@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
>        of N larger than can be represented in the return value.  The
>        caller is responsible for rejecting _FloatN suffixes where
>        _FloatN is not supported on the chosen target.  */
> -  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
> +  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
>       return 0;
>     if (fn_bits > CPP_FLOATN_MAX)
>       return 0;
> @@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
>   	     q ? CPP_N_MD_Q :
>   	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
>   	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
> +	     bf16 ? CPP_N_BFLOAT16 :
>   	     CPP_N_DEFAULT));
>   }
>   
> --- libgcc/config/arm/sfp-machine.h.jj	2020-01-12 11:54:38.615380187 +0100
> +++ libgcc/config/arm/sfp-machine.h	2022-09-28 19:02:51.922710542 +0200
> @@ -22,6 +22,7 @@ typedef int __gcc_CMPtype __attribute__
>   /* According to RTABI, QNAN is only with the most significant bit of the
>      significand set, and all other significand bits zero.  */
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   #define _FP_NANFRAC_Q		_FP_QNANBIT_Q, 0, 0, 0
> --- libgcc/config/aarch64/t-softfp.jj	2020-09-29 11:32:02.988602194 +0200
> +++ libgcc/config/aarch64/t-softfp	2022-09-28 18:59:43.381246466 +0200
> @@ -1,7 +1,7 @@
>   softfp_float_modes := tf
>   softfp_int_modes := si di ti
> -softfp_extensions := sftf dftf hftf
> -softfp_truncations := tfsf tfdf tfhf
> +softfp_extensions := sftf dftf hftf bfsf
> +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
>   softfp_exclude_libgcc2 := n
>   softfp_extras := fixhfti fixunshfti floattihf floatuntihf
>   
> --- libgcc/config/aarch64/libgcc-softfp.ver.jj	2022-01-11 23:11:23.691271871 +0100
> +++ libgcc/config/aarch64/libgcc-softfp.ver	2022-09-28 19:00:36.050537146 +0200
> @@ -26,3 +26,12 @@ GCC_11.0 {
>     __mulhc3
>     __trunctfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_11.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/aarch64/sfp-machine.h.jj	2022-01-11 23:11:23.691271871 +0100
> +++ libgcc/config/aarch64/sfp-machine.h	2022-09-28 19:02:10.303270053 +0200
> @@ -43,6 +43,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
> +#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
>   #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
>   #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
>   #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
> --- libgcc/config/i386/t-softfp.jj	2022-09-23 09:02:31.759659479 +0200
> +++ libgcc/config/i386/t-softfp	2022-09-28 18:58:09.114520943 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>   libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>   LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>   
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +		      tfbf xfbf dfbf sfbf hfbf
>   
>   softfp_extras += eqhf2
>   
> @@ -20,6 +21,7 @@ CFLAGS-truncsfhf2.c += -msse2
>   CFLAGS-truncdfhf2.c += -msse2
>   CFLAGS-truncxfhf2.c += -msse2
>   CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>   
>   CFLAGS-eqhf2.c += -msse2
>   CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-23 09:02:31.746659658 +0200
> +++ libgcc/config/i386/libgcc-glibc.ver	2022-09-28 18:58:09.114520943 +0200
> @@ -214,3 +214,13 @@ GCC_12.0.0 {
>     __trunctfhf2
>     __truncxfhf2
>   }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/i386/sfp-machine.h.jj	2022-09-23 09:02:31.747659644 +0200
> +++ libgcc/config/i386/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>   #define _FP_QNANNEGATEDP 0
>   
>   #define _FP_NANSIGN_H		1
> +#define _FP_NANSIGN_B		1
>   #define _FP_NANSIGN_S		1
>   #define _FP_NANSIGN_D		1
>   #define _FP_NANSIGN_E		1
> --- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-23 09:02:31.700660291 +0200
> +++ libgcc/config/i386/64/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D
>   #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
> --- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-23 09:02:31.683660526 +0200
> +++ libgcc/config/i386/32/sfp-machine.h	2022-09-28 18:58:09.115520929 +0200
> @@ -87,6 +87,7 @@
>   #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>   
>   #define _FP_NANFRAC_H		_FP_QNANBIT_H
> +#define _FP_NANFRAC_B		_FP_QNANBIT_B
>   #define _FP_NANFRAC_S		_FP_QNANBIT_S
>   #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
>   /* Even if XFmode is 12byte,  we have to pad it to
> --- libgcc/soft-fp/brain.h.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/brain.h	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H	1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B		8
> +#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B		8
> +#define _FP_EXPBIAS_B		127
> +#define _FP_EXPMAX_B		255
> +
> +#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B	\
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)		_FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)			\
> +  do						\
> +    {						\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)			\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_CANONICAL (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1 (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)		\
> +  do						\
> +    {						\
> +      _FP_UNPACK_RAW_1_P (B, X, (val));		\
> +      _FP_UNPACK_SEMIRAW (B, 1, X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_B(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)			\
> +  do						\
> +    {						\
> +      _FP_PACK_CANONICAL (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      _FP_PACK_RAW_1 (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)		\
> +  do						\
> +    {						\
> +      _FP_PACK_SEMIRAW (B, 1, X);		\
> +      if (!FP_INHIBIT_RESULTS)			\
> +	_FP_PACK_RAW_1_P (B, (val), X);		\
> +    }						\
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncsfbf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/truncdfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncxfbf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/trunctfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/trunchfbf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj	2022-09-28 18:58:09.113520956 +0200
> +++ libgcc/soft-fp/truncbfhf2.c	2022-09-28 18:58:09.113520956 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj	2022-09-28 18:58:09.114520943 +0200
> +++ libgcc/soft-fp/extendbfsf2.c	2022-09-28 18:58:09.114520943 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libiberty/cp-demangle.h.jj	2022-09-27 08:03:27.142982423 +0200
> +++ libiberty/cp-demangle.h	2022-09-29 12:42:47.291727886 +0200
> @@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
>   extern const struct demangle_operator_info cplus_demangle_operators[];
>   #endif
>   
> -#define D_BUILTIN_TYPE_COUNT (35)
> +#define D_BUILTIN_TYPE_COUNT (36)
>   
>   CP_STATIC_IF_GLIBCPP_V3
>   const struct demangle_builtin_type_info
> --- libiberty/cp-demangle.c.jj	2022-09-27 08:03:27.141982437 +0200
> +++ libiberty/cp-demangle.c	2022-09-29 13:04:57.083526204 +0200
> @@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
>     /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
>   	     D_PRINT_DEFAULT },
>     /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
> +  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
>   };
>   
>   CP_STATIC_IF_GLIBCPP_V3
> @@ -2753,8 +2754,20 @@ cplus_demangle_type (struct d_info *di)
>   
>   	case 'F':
>   	  /* DF<number>_ - _Float<number>.
> -	     DF<number>x - _Float<number>x.  */
> +	     DF<number>x - _Float<number>x
> +	     DFb16_ - std::bfloat16_t.  */
>   	  {
> +	    if (d_peek_char (di) == 'b')
> +	      {
> +		d_advance (di, 1);
> +		if (d_number (di) != 16 || d_peek_char (di) != '_')
> +		  return NULL;
> +		d_advance (di, 1);
> +		ret = d_make_builtin_type (di,
> +					   &cplus_demangle_builtin_types[35]);
> +		di->expansion += ret->u.s_builtin.type->len;
> +		break;
> +	      }
>   	    int arg = d_number (di);
>   	    char buf[12];
>   	    char suffix = 0;
> --- libiberty/testsuite/demangle-expected.jj	2022-09-27 08:03:27.168982071 +0200
> +++ libiberty/testsuite/demangle-expected	2022-09-29 12:49:02.181597532 +0200
> @@ -1249,6 +1249,10 @@ xxx
>   _Z3xxxDF32xDF64xDF128xCDF32xVb
>   xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
>   xxx
> +--format=auto --no-params
> +_Z3xxxDFb16_
> +xxx(std::bfloat16_t)
> +xxx
>   # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
>   --format=auto --no-params
>   _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z
> 
> 	Jakub
>