From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <crazylht@gmail.com>
Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30])
	by sourceware.org (Postfix) with ESMTPS id AFE8F3858D38
	for <gcc-patches@gcc.gnu.org>; Fri, 23 Sep 2022 00:42:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AFE8F3858D38
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-yb1-xb30.google.com with SMTP id e81so15066195ybb.13
        for <gcc-patches@gcc.gnu.org>; Thu, 22 Sep 2022 17:42:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date;
        bh=Prxn5ni/4VEzGiHI4QPO7N0Z16HNxyoHIpotmROajHw=;
        b=AovsI+Uk47kHi258BqH4tPgJkvT0gyPDbZhF36FtG94GfGUnlmphPV3WCqRGGmXld5
         vK7kCz+YMmzVXi4mOpEtplfsJ8/k9iHIopT8FS+RPRDAeb91v8N8gcPWScBaAib/IqwV
         IECevLixCNbLym86BaGip8Avv0Rh63/XVdnDa1qpvA6FbZwjCe/YYKmAddeP0xU7/Eaq
         htjDb8f7wo020o5n1WcyfV2Vwzr1ihxh/w+2W2t96QxTKEWcibe79nogD1ZKySZErNKI
         ZhQpaejNoIzRYtUgak8cBaAxj7A9FtOeZEtfKk1S2+aLAez9zpHzAUN8h5imPfir2IJa
         Oc0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date;
        bh=Prxn5ni/4VEzGiHI4QPO7N0Z16HNxyoHIpotmROajHw=;
        b=S7AQRjBfG3tdEunROICeRTJFqVVZKcP2dEzx7C+oORf1q/etZZ4M0By6tn5cczDxd2
         KGs71kBRXOOlkO0SR6eun714Szw+j8t14MivjM5dHtETK46099xSHq2qXmvMeZ6Q/qUF
         vQ06GELcY6L57KrDGr2BZVQ+OiFytUM/qwWhCDBgQZ2sm3bE3xdfJbyWhNfjAQw6ebHN
         rSQloXZsPZSshyYeaGjuFXeBxxmmLdn6QCD/7R75mc55tpCJuJxowEMINjebISbO3I61
         XpDKUl3MJawPcF1/WL2UCKq78qU6RhpDgbJ044Sg/dVYM1jxvegivR65j0aN/4OXskIx
         ijPQ==
X-Gm-Message-State: ACrzQf0C9Hga7rNgHAfoLsZFn8jqGotoW+lF9mmEoDQp/CxwbiuL1oya
	YRgVP9vq43ziQslebpAgH1K4slsci/TZGyFc8Go9MrecVRXYaA==
X-Google-Smtp-Source: AMsMyM7XlQF3oweSSrfGS4LF5ubJ3dYB0LQ2PUJ9qHwZR8WPUEiOzTjoHGa2ffVdxgJ5ng8w7fmbbe0PkDzzSUmU4+Q=
X-Received: by 2002:a05:6902:705:b0:6ae:9d50:7181 with SMTP id
 k5-20020a056902070500b006ae9d507181mr7550231ybt.296.1663893733377; Thu, 22
 Sep 2022 17:42:13 -0700 (PDT)
MIME-Version: 1.0
References: <Yx7oXsjWwmgsaj/A@tucnak> <CAMZc-bxB2fVAxVbBmqOj-ixTJybuX-V9dT2FvjrQR-wWRR7HkQ@mail.gmail.com>
 <Yyl/BkcvjpWW3Jyz@tucnak> <YyyFs7w3npTxkci7@tucnak>
In-Reply-To: <YyyFs7w3npTxkci7@tucnak>
From: Hongtao Liu <crazylht@gmail.com>
Date: Fri, 23 Sep 2022 08:44:51 +0800
Message-ID: <CAMZc-bwcVERQC4ggav_g=y4wdz7c7vZAN3pGY3YC==+MOQLdZg@mail.gmail.com>
Subject: Re: [RFC PATCH] __trunc{tf,xf,df,sf,hf}bf2, __truncbfhf2 and __extendbfsf2
To: Jakub Jelinek <jakub@redhat.com>
Cc: Jonathan Wakely <jwakely@redhat.com>, "Joseph S. Myers" <joseph@codesourcery.com>, 
	Richard Earnshaw <richard.earnshaw@arm.com>, Kyrylo Tkachov <kyrylo.tkachov@arm.com>, 
	richard.sandiford@arm.com, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, Sep 22, 2022 at 11:56 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Sep 20, 2022 at 10:51:18AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Sep 20, 2022 at 11:35:07AM +0800, Hongtao Liu wrote:
> > > > The question is (mainly for aarch64, arm and x86 backend maintainers) if we
> > > > shouldn't support it, in the PR there is a partial patch to do so, but
> > > > the big question is if it should be supported as the __bf16 type those
> > > > 3 targets use with u6__bf16 mangling and remove those *_invalid_* cases
> > > > and add conversions to/from at least SFmode but probably also DFmode, TFmode
> > > > and XFmode on x86 and implement arithmetics on those through conversion to
> > > > SFmode, performing arithmetics there and conversion back.
> > > > Conversion from BFmode to SFmode is easy, left shift by 16 and ought to be
> > > > implemented inline, SFmode -> BFmode conversion is harder,
> > > > I think it is roughly:
> > > I'm not sure if there should be any floating point exceptions for
> > > BFmode operation.
> > > For x86, there's no floating point exceptions for AVX512_BF16 related
> > > instructions
> >
> > As long as __bf16 is just an extension, supporting or not supporting
> > exceptions on sNaNs is just fine I think, but I'm afraid it is different
> > for std::bfloat16_t.  If we claim we support it (define that type
> > in <stdfloat>, predefine __STD_BFLOAT16_TYPE__), then it needs to follow
> > ISO/IEC/IEEE 60559, and I'm afraid that means also exceptions and the like.
> > While the IEEE spec doesn't cover the exact bfloat16 format, C++ talks about
> > a format with these and these number of bits here and there that behaves
> > like in IEEE otherwise.
> > Whether we support std::bfloat16_t at all is our choice, if we do support
> > it, whether we support it with __bf16 underlying type or come up with
> > something different, it is up to us, and with -ffast-math/-Ofast etc.
> > we can certainly use hw instructions for it which don't raise exceptions.
> >
> > At least that is my limited understanding of it...
>
> I've been playing with this a little bit and here is a soft-fp version of
> IMHO everything we need for proper bfloat16 support.
> In particular, I think we need all the truncating conversions from other
> floating formats that a target with BFmode floating point support (currently
> arm, aarch64 and x86) has, truncating conversion from BFmode to HFmode
> (seems GCC when precision is the same considers conversions truncating)
> and an extension from BFmode to SFmode.  Extensions from BFmode to
> SF/DF/XF/TFmode are IMHO best implemented inside of GCC by performing
> BFmode to SFmode conversion first and then converting SFmode to those
> other formats, other arithmetics on BFmode should be implemented simply
> by widening to SFmode, doing arithmetics there and then converting back.
> The BF to SFmode extension can be also implemented simply by shifting
> the VCEd value up by 16 bits and VCEing the result if flags say
> sNaNs don't need to be handled, or IMHO if we use the extended result
> in some arithmetic operation that will handle the sNaN signaling +
> conversion into qNaN, similarly for SFmode to BFmode conversions
> we can use hw instructions if available and we don't care about sNaNs.
>
> The C FE has the advantage that it has excess precision support, there
> we should arrange for BFmode to be always promoted to SFmode excess
> precision, but C++ FE doesn't.
>
> Also, question to ARM/AArch64/x86 maintainers is if it is ok to
> add conversion and arithmetic support to the __bf16 type, or if
> that type should keep to be useless and there should be another
> type (some keyword or just float __attribute__((__mode__ (__BF__))))
> that we'd have that support for.  Whatever type we'd use as
> std::bfloat16_t should mangle as DFb16_ rather than u6__bf16 that
> __bf16 currently mangles to though.
>
> Thoughts on this?
x86 is ok to add conversion and arithmetic support, also for mange as DFb16_.
>
> And for Joseph, sure, the libgcc/soft-fp/ part should probably go
> into glibc first and be copied from there afterwards.
>
> Perhaps the __truncbfhf2 could be dropped and we could just on
> the compiler side emit shift left by 16 before calling __truncsfhf2.
>
> --- libgcc/soft-fp/brain.h.jj   2022-09-22 15:28:04.865171729 +0200
> +++ libgcc/soft-fp/brain.h      2022-09-22 15:35:11.970374554 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef SOFT_FP_BRAIN_H
> +#define SOFT_FP_BRAIN_H        1
> +
> +#if _FP_W_TYPE_SIZE < 32
> +# error "Here's a nickel kid.  Go buy yourself a real computer."
> +#endif
> +
> +#define _FP_FRACTBITS_B                (_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACTBITS_DW_B     (_FP_W_TYPE_SIZE)
> +
> +#define _FP_FRACBITS_B         8
> +#define _FP_FRACXBITS_B                (_FP_FRACTBITS_B - _FP_FRACBITS_B)
> +#define _FP_WFRACBITS_B                (_FP_WORKBITS + _FP_FRACBITS_B)
> +#define _FP_WFRACXBITS_B       (_FP_FRACTBITS_B - _FP_WFRACBITS_B)
> +#define _FP_EXPBITS_B          8
> +#define _FP_EXPBIAS_B          127
> +#define _FP_EXPMAX_B           255
> +
> +#define _FP_QNANBIT_B          ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
> +#define _FP_QNANBIT_SH_B       ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
> +#define _FP_IMPLBIT_B          ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
> +#define _FP_IMPLBIT_SH_B       ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
> +#define _FP_OVERFLOW_B         ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
> +
> +#define _FP_WFRACBITS_DW_B     (2 * _FP_WFRACBITS_B)
> +#define _FP_WFRACXBITS_DW_B    (_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
> +#define _FP_HIGHBIT_DW_B       \
> +  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
> +
> +/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
> +   chosen by the target machine.  */
> +
> +typedef float BFtype __attribute__ ((mode (BF)));
> +
> +union _FP_UNION_B
> +{
> +  BFtype flt;
> +  struct _FP_STRUCT_LAYOUT
> +  {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +    unsigned sign : 1;
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +#else
> +    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
> +    unsigned exp  : _FP_EXPBITS_B;
> +    unsigned sign : 1;
> +#endif
> +  } bits;
> +};
> +
> +#define FP_DECL_B(X)           _FP_DECL (1, X)
> +#define FP_UNPACK_RAW_B(X, val)        _FP_UNPACK_RAW_1 (B, X, (val))
> +#define FP_UNPACK_RAW_BP(X, val)       _FP_UNPACK_RAW_1_P (B, X, (val))
> +#define FP_PACK_RAW_B(val, X)  _FP_PACK_RAW_1 (B, (val), X)
> +#define FP_PACK_RAW_BP(val, X)                 \
> +  do                                           \
> +    {                                          \
> +      if (!FP_INHIBIT_RESULTS)                 \
> +       _FP_PACK_RAW_1_P (B, (val), X);         \
> +    }                                          \
> +  while (0)
> +
> +#define FP_UNPACK_B(X, val)                    \
> +  do                                           \
> +    {                                          \
> +      _FP_UNPACK_RAW_1 (B, X, (val));          \
> +      _FP_UNPACK_CANONICAL (B, 1, X);          \
> +    }                                          \
> +  while (0)
> +
> +#define FP_UNPACK_BP(X, val)                   \
> +  do                                           \
> +    {                                          \
> +      _FP_UNPACK_RAW_1_P (B, X, (val));                \
> +      _FP_UNPACK_CANONICAL (B, 1, X);          \
> +    }                                          \
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_B(X, val)            \
> +  do                                           \
> +    {                                          \
> +      _FP_UNPACK_RAW_1 (B, X, (val));          \
> +      _FP_UNPACK_SEMIRAW (B, 1, X);            \
> +    }                                          \
> +  while (0)
> +
> +#define FP_UNPACK_SEMIRAW_BP(X, val)           \
> +  do                                           \
> +    {                                          \
> +      _FP_UNPACK_RAW_1_P (B, X, (val));                \
> +      _FP_UNPACK_SEMIRAW (B, 1, X);            \
> +    }                                          \
> +  while (0)
> +
> +#define FP_PACK_B(val, X)                      \
> +  do                                           \
> +    {                                          \
> +      _FP_PACK_CANONICAL (B, 1, X);            \
> +      _FP_PACK_RAW_1 (B, (val), X);            \
> +    }                                          \
> +  while (0)
> +
> +#define FP_PACK_BP(val, X)                     \
> +  do                                           \
> +    {                                          \
> +      _FP_PACK_CANONICAL (B, 1, X);            \
> +      if (!FP_INHIBIT_RESULTS)                 \
> +       _FP_PACK_RAW_1_P (B, (val), X);         \
> +    }                                          \
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_B(val, X)              \
> +  do                                           \
> +    {                                          \
> +      _FP_PACK_SEMIRAW (B, 1, X);              \
> +      _FP_PACK_RAW_1 (B, (val), X);            \
> +    }                                          \
> +  while (0)
> +
> +#define FP_PACK_SEMIRAW_BP(val, X)             \
> +  do                                           \
> +    {                                          \
> +      _FP_PACK_SEMIRAW (B, 1, X);              \
> +      if (!FP_INHIBIT_RESULTS)                 \
> +       _FP_PACK_RAW_1_P (B, (val), X);         \
> +    }                                          \
> +  while (0)
> +
> +#define FP_TO_INT_B(r, X, rsz, rsg)    _FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
> +#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)      \
> +  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
> +#define FP_FROM_INT_B(X, r, rs, rt)    _FP_FROM_INT (B, 1, X, (r), (rs), rt)
> +
> +/* BFmode arithmetic is not implemented.  */
> +
> +#define _FP_FRAC_HIGH_B(X)     _FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_RAW_B(X) _FP_FRAC_HIGH_1 (X)
> +#define _FP_FRAC_HIGH_DW_B(X)  _FP_FRAC_HIGH_1 (X)
> +
> +#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
> +
> +#endif /* !SOFT_FP_BRAIN_H */
> --- libgcc/soft-fp/truncsfbf2.c.jj      2022-09-22 15:43:46.345386049 +0200
> +++ libgcc/soft-fp/truncsfbf2.c 2022-09-22 16:02:19.940226518 +0200
> @@ -0,0 +1,48 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE single into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +BFtype
> +__truncsfbf2 (SFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_S (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_S (A, a);
> +  FP_TRUNC (B, S, 1, 1, R, A);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncbfhf2.c.jj      2022-09-22 16:13:28.894300765 +0200
> +++ libgcc/soft-fp/truncbfhf2.c 2022-09-22 17:12:11.459004531 +0200
> @@ -0,0 +1,75 @@
> +/* Software floating-point emulation.
> +   Truncate bfloat16 into IEEE half.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert BFtype to SFtype (lossless) and then
> +   truncate to HFtype.  */
> +
> +HFtype
> +__truncbfhf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  /* Optimize BFtype to SFtype conversion to simple left shift
> +     by 16 if possible, we don't need to raise exceptions on sNaN
> +     here as the SFtype to HFtype truncation should do that too.  */
> +  if (sizeof (BFtype) == 2
> +      && sizeof (unsigned short) == 2
> +      && sizeof (SFtype) == 4
> +      && sizeof (unsigned int) == 4)
> +    {
> +      union { BFtype a; unsigned short b; } u1;
> +      union { SFtype a; unsigned int b; } u2;
> +      u1.a = a;
> +      u2.b = (u1.b << 8) << 8;
> +      b = u2.a;
> +    }
> +  else
> +    {
> +      FP_UNPACK_RAW_B (A, a);
> +      FP_EXTEND (S, B, 1, 1, B, A);
> +      FP_PACK_RAW_S (b, B);
> +    }
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (H, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncxfbf2.c.jj      2022-09-22 15:45:56.211621629 +0200
> +++ libgcc/soft-fp/truncxfbf2.c 2022-09-22 16:02:03.205454405 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "extended.h"
> +
> +BFtype
> +__truncxfbf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunchfbf2.c.jj      2022-09-22 15:59:01.321931320 +0200
> +++ libgcc/soft-fp/trunchfbf2.c 2022-09-22 17:11:28.729588880 +0200
> @@ -0,0 +1,58 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE half into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "half.h"
> +#include "single.h"
> +
> +/* BFtype and HFtype are unordered, neither is a superset or subset
> +   of each other.  Convert HFtype to SFtype (lossless) and then
> +   truncate to BFtype.  */
> +
> +BFtype
> +__trunchfbf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_S (B);
> +  FP_DECL_B (R);
> +  SFtype b;
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_RAW_H (A, a);
> +  FP_EXTEND (S, H, 1, 1, B, A);
> +  FP_PACK_RAW_S (b, B);
> +  FP_UNPACK_SEMIRAW_S (B, b);
> +  FP_TRUNC (B, S, 1, 1, R, B);
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/truncdfbf2.c.jj      2022-09-22 15:40:15.303253337 +0200
> +++ libgcc/soft-fp/truncdfbf2.c 2022-09-22 15:41:55.083897689 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE double into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "double.h"
> +
> +BFtype
> +__truncdfbf2 (DFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_D (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_D (A, a);
> +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
> +  FP_TRUNC (B, D, 1, 2, R, A);
> +#else
> +  FP_TRUNC (B, D, 1, 1, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/trunctfbf2.c.jj      2022-09-22 15:44:14.924997754 +0200
> +++ libgcc/soft-fp/trunctfbf2.c 2022-09-22 15:44:45.694579708 +0200
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE quad into bfloat16.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "quad.h"
> +
> +BFtype
> +__trunctfbf2 (TFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_Q (A);
> +  FP_DECL_B (R);
> +  BFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_Q (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (B, Q, 1, 4, R, A);
> +#else
> +  FP_TRUNC (B, Q, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_B (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/soft-fp/extendbfsf2.c.jj     2022-09-22 16:27:01.378339625 +0200
> +++ libgcc/soft-fp/extendbfsf2.c        2022-09-22 16:27:46.379725593 +0200
> @@ -0,0 +1,49 @@
> +/* Software floating-point emulation.
> +   Return an bfloat16 converted to IEEE single
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "brain.h"
> +#include "single.h"
> +
> +SFtype
> +__extendbfsf2 (BFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  FP_DECL_S (R);
> +  SFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_B (A, a);
> +  FP_EXTEND (S, B, 1, 1, R, A);
> +  FP_PACK_RAW_S (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --- libgcc/config/i386/t-softfp.jj      2021-12-30 15:12:44.111138056 +0100
> +++ libgcc/config/i386/t-softfp 2022-09-22 16:38:31.639921214 +0200
> @@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
>  libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
>  LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
>
> -softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> -softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
> +                     tfbf xfbf dfbf sfbf hfbf bfhf
>
>  softfp_extras += eqhf2
>
> @@ -20,6 +21,8 @@ CFLAGS-truncsfhf2.c += -msse2
>  CFLAGS-truncdfhf2.c += -msse2
>  CFLAGS-truncxfhf2.c += -msse2
>  CFLAGS-trunctfhf2.c += -msse2
> +CFLAGS-truncbfhf2.c += -msse2
> +CFLAGS-trunchfbf2.c += -msse2
>
>  CFLAGS-eqhf2.c += -msse2
>  CFLAGS-_divhc3.c += -msse2
> --- libgcc/config/i386/libgcc-glibc.ver.jj      2022-01-11 23:11:23.723271422 +0100
> +++ libgcc/config/i386/libgcc-glibc.ver 2022-09-22 16:41:26.599448819 +0200
> @@ -214,3 +214,14 @@ GCC_12.0.0 {
>    __trunctfhf2
>    __truncxfhf2
>  }
> +
> +%inherit GCC_13.0.0 GCC_12.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __truncxfbf2
> +  __trunchfbf2
> +  __truncbfhf2
> +}
> --- libgcc/config/i386/64/sfp-machine.h.jj      2021-12-30 15:12:44.111138056 +0100
> +++ libgcc/config/i386/64/sfp-machine.h 2022-09-22 16:44:45.897627866 +0200
> @@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>
>  #define _FP_NANFRAC_H          _FP_QNANBIT_H
> +#define _FP_NANFRAC_B          _FP_QNANBIT_B
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D
>  #define _FP_NANFRAC_E          _FP_QNANBIT_E, 0
> --- libgcc/config/i386/sfp-machine.h.jj 2021-12-30 15:12:44.111138056 +0100
> +++ libgcc/config/i386/sfp-machine.h    2022-09-22 16:46:16.130350681 +0200
> @@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
>  #define _FP_QNANNEGATEDP 0
>
>  #define _FP_NANSIGN_H          1
> +#define _FP_NANSIGN_B          1
>  #define _FP_NANSIGN_S          1
>  #define _FP_NANSIGN_D          1
>  #define _FP_NANSIGN_E          1
> --- libgcc/config/i386/32/sfp-machine.h.jj      2021-12-30 15:12:44.110138070 +0100
> +++ libgcc/config/i386/32/sfp-machine.h 2022-09-22 16:44:26.786898371 +0200
> @@ -87,6 +87,7 @@
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
>  #define _FP_NANFRAC_H          _FP_QNANBIT_H
> +#define _FP_NANFRAC_B          _FP_QNANBIT_B
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D, 0
>  /* Even if XFmode is 12byte,  we have to pad it to
>
>
>         Jakub


-- 
BR,
Hongtao