public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Patch ping
@ 2023-01-30  9:50 Jakub Jelinek
  2023-01-30 23:07 ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-01-30  9:50 UTC (permalink / raw)
  To: Jason Merrill, Joseph S. Myers, Uros Bizjak, Jeff Law,
	Richard Biener, Richard Earnshaw, Kyrylo Tkachov,
	richard.sandiford
  Cc: gcc-patches

I'd like to ping a few pending patches:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607534.html
  - PR107846 - P1 - c-family: Account for integral promotions of left shifts for -Wshift-overflow warning

https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610285.html
  - PR108464 - P1 - file-prefix-map: Fix up -f*-prefix-map= (3 variants)

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606382.html
  - PR107703 - P3, ABI - libgcc, i386: Add __fix{,uns}bfti and __float{,un}tibf

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606973.html
  - PR107465 - P2 - c-family: Fix up -Wsign-compare BIT_NOT_EXPR handling

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607104.html
  - PR107465 - P2 - c-family: Incremental fix for -Wsign-compare BIT_NOT_EXPR handling

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607145.html
  - PR107558 - P2 - c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates

https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608932.html
  - PR108079 - P2 - c, c++, cgraphunit: Prevent duplicated -Wunused-value warnings

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html
  - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling it in GCC 14
    will be harder)

Thanks

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Patch ping
  2023-01-30  9:50 Patch ping Jakub Jelinek
@ 2023-01-30 23:07 ` Richard Sandiford
  2023-02-01 10:27   ` AArch64 bfloat16 mangling Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2023-01-30 23:07 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, Joseph S. Myers, Uros Bizjak, Jeff Law,
	Richard Biener, Richard Earnshaw, Kyrylo Tkachov, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html
>   - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling it in GCC 14
>     will be harder)

Sorry for the delay on this.  There's still an ongoing debate about
whether to keep the current AArch64 mangling or switch to the new one.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AArch64 bfloat16 mangling
  2023-01-30 23:07 ` Richard Sandiford
@ 2023-02-01 10:27   ` Jakub Jelinek
  2023-03-09 17:14     ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-02-01 10:27 UTC (permalink / raw)
  To: Richard Sandiford, Richard Earnshaw, Kyrylo Tkachov; +Cc: gcc-patches

Hi!

On Mon, Jan 30, 2023 at 11:07:23PM +0000, Richard Sandiford wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html
> >   - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling it in GCC 14
> >     will be harder)
> 
> Sorry for the delay on this.  There's still an ongoing debate about
> whether to keep the current AArch64 mangling or switch to the new one.

If it helps, I'll try to repeat the options I see:
1) don't do anything right now; problem is if it is done later (GCC 14+),
   libstdc++ would need to conditionalize the std::bfloat16_t RTTI symbols,
   have them in one symbol version for x86 and in another for aarch64
2) similarly to x86 __bf16 would be the underlying type for std::bfloat16_t
   where the latter needs to act as usable extended floating point type with
   all arithmetics, mangling is DF16b which is how std::bfloat16_t should
   mangle according to the Itanitum ABI pull request; decltype (0.0bf16) is
   __bf16; disadvantage is that existing code using __bf16 in argument
   passing and templates changes mangling
3) keep __bf16 as is with its u6__bf16 mangling and use for std::bfloat16_t
   a distinct type (the latter would be the bfloat16_type_node);
   decltype (0.0bf16) would be that new type which would mangle DF16b and
   would allow arithmetics/casts etc.  How exactly would the new type be
   named is up to you (__bfloat16_t, __bfloat16, __std_bfloat16_t,
   whatever else); in theory it could be created without a user accessible
   name as well; libstdc++ only uses decltype (0.0bf16) to get at it
4) like 3), including keeping the mangling of __bf16 as u6__bf16, but
   make also __bf16 a usable arithmetic type, not just a storage only type;
   for C++ FE it would be simply another non-standard type like say
   __float128 is on x86
5) like 2), but make the mangling of __bf16 depend on flag_abi_version;
   flag_abi_version >= 18 (aka GCC 13+ ABI) mangles it as DF16b,
   flag_abi_version < 18 mangles it as u6__bf16; the default for
   -fabi-compat-version= is I think GCC 8 ABI compatibility, so GCC normally
   emits mangling aliases, so say void foo (std::bfloat16_t) {} would
   mangle as _Z3fooDF16b and for a few years there would be
   an alias _Z3foou6__bf16 to it

Of course, it is possible I've missed some options.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-02-01 10:27   ` AArch64 bfloat16 mangling Jakub Jelinek
@ 2023-03-09 17:14     ` Richard Sandiford
  2023-03-10  8:37       ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2023-03-09 17:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Earnshaw, Kyrylo Tkachov, gcc-patches

Sorry for the slow response.

Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> On Mon, Jan 30, 2023 at 11:07:23PM +0000, Richard Sandiford wrote:
>> Jakub Jelinek <jakub@redhat.com> writes:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html
>> >   - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling it in GCC 14
>> >     will be harder)
>> 
>> Sorry for the delay on this.  There's still an ongoing debate about
>> whether to keep the current AArch64 mangling or switch to the new one.
>
> If it helps, I'll try to repeat the options I see:
> 1) don't do anything right now; problem is if it is done later (GCC 14+),
>    libstdc++ would need to conditionalize the std::bfloat16_t RTTI symbols,
>    have them in one symbol version for x86 and in another for aarch64
> 2) similarly to x86 __bf16 would be the underlying type for std::bfloat16_t
>    where the latter needs to act as usable extended floating point type with
>    all arithmetics, mangling is DF16b which is how std::bfloat16_t should
>    mangle according to the Itanitum ABI pull request; decltype (0.0bf16) is
>    __bf16; disadvantage is that existing code using __bf16 in argument
>    passing and templates changes mangling
> 3) keep __bf16 as is with its u6__bf16 mangling and use for std::bfloat16_t
>    a distinct type (the latter would be the bfloat16_type_node);
>    decltype (0.0bf16) would be that new type which would mangle DF16b and
>    would allow arithmetics/casts etc.  How exactly would the new type be
>    named is up to you (__bfloat16_t, __bfloat16, __std_bfloat16_t,
>    whatever else); in theory it could be created without a user accessible
>    name as well; libstdc++ only uses decltype (0.0bf16) to get at it
> 4) like 3), including keeping the mangling of __bf16 as u6__bf16, but
>    make also __bf16 a usable arithmetic type, not just a storage only type;
>    for C++ FE it would be simply another non-standard type like say
>    __float128 is on x86
> 5) like 2), but make the mangling of __bf16 depend on flag_abi_version;
>    flag_abi_version >= 18 (aka GCC 13+ ABI) mangles it as DF16b,
>    flag_abi_version < 18 mangles it as u6__bf16; the default for
>    -fabi-compat-version= is I think GCC 8 ABI compatibility, so GCC normally
>    emits mangling aliases, so say void foo (std::bfloat16_t) {} would
>    mangle as _Z3fooDF16b and for a few years there would be
>    an alias _Z3foou6__bf16 to it
>
> Of course, it is possible I've missed some options.
>
> 	Jakub

We decided to keep the current mangling of __bf16 and use it for
std::bfloat16_t too.  __bf16 will become a non-standard arithmetic type.
This will be an explicit diversion from the Itanium ABI.

I think that's equivalent to your (2) without the part about following
the Itanium ABI.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-09 17:14     ` Richard Sandiford
@ 2023-03-10  8:37       ` Jakub Jelinek
  2023-03-10  8:43         ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-03-10  8:37 UTC (permalink / raw)
  To: Richard Earnshaw, Kyrylo Tkachov, richard.sandiford, Jason Merrill
  Cc: gcc-patches

On Thu, Mar 09, 2023 at 05:14:11PM +0000, Richard Sandiford wrote:
> We decided to keep the current mangling of __bf16 and use it for
> std::bfloat16_t too.  __bf16 will become a non-standard arithmetic type.
> This will be an explicit diversion from the Itanium ABI.
> 
> I think that's equivalent to your (2) without the part about following
> the Itanium ABI.

I'm afraid I have no idea how can the above work though.

Diversion from the Itanium ABI is doable, we have various examples
where we mangle things differently, say on powerpc* where
long double is mangled as g if it is IBM double double format (i.e.
Itanium __float128) while for long double the spec says e, and
as u9__ieee128 if it is IEEE quad.  __float128 also mangles as
u9__ieee128 and so does __ieee128.

The problem is if __bf16 needs to be treated differently from
decltype (0.0bf16) aka std::bfloat16_t (the former being a non-standard
arithmetic type, the latter being C++23 extended floating-point type,
then they need to be distinct types.  And distinct types need to
mangle differently.  Consider
#include <stdfloat>
template <typename T>
void bar () {}
void baz ()
{
  bar<__bf16> ();
  bar<decltype (0.0bf16)> ();
  bar<std::bfloat16_t> ();
}
If __bf16 is distinct from the latter two which are the same type,
then it will instantiate bar twice, for both of those types, but
if they are mangled the same, will emit two functions with the same
name and assembler will reject it (or LTO might ICE etc.).

Note, e.g.
void foo (__float128, __ieee128, long double, _Float128) {}
template <typename T>
void bar () {}
void baz ()
{
  bar <__float128> ();
  bar <__ieee128> ();
  bar <long double> ();
}
works on powerpc64le-linux with -mlong-double-128 -mabi=ieeelongdouble
because __float128, __ieee128 and long double types are in that case
the same type, not distinct, so e.g. bar is instantiated just once
(only _Float128 mangles differently above).  With
-mlong-double-128 -mabi=ibmlongdouble __float128 and __ieee128 are
the same (non-standard) type, while long double mangles differently
(g) and _Float128 too, so bar is instantiated twice.

So, either __bf16 should be also extended floating-point type
like decltype (0.0bf16) and std::bfloat16_t and in that case
it is fine if it mangles u6__bf16, or __bf16 will be a distinct
type from the latter two, __bf16 non-standard arithmetic type
while the latter two extended floating-point types, but then
they need to mangle differently, most likely u6__bf16 vs. DF16b.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-10  8:37       ` Jakub Jelinek
@ 2023-03-10  8:43         ` Richard Sandiford
  2023-03-10 11:30           ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2023-03-10  8:43 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Earnshaw, Kyrylo Tkachov, Jason Merrill, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Mar 09, 2023 at 05:14:11PM +0000, Richard Sandiford wrote:
>> We decided to keep the current mangling of __bf16 and use it for
>> std::bfloat16_t too.  __bf16 will become a non-standard arithmetic type.
>> This will be an explicit diversion from the Itanium ABI.
>> 
>> I think that's equivalent to your (2) without the part about following
>> the Itanium ABI.
>
> I'm afraid I have no idea how can the above work though.
>
> Diversion from the Itanium ABI is doable, we have various examples
> where we mangle things differently, say on powerpc* where
> long double is mangled as g if it is IBM double double format (i.e.
> Itanium __float128) while for long double the spec says e, and
> as u9__ieee128 if it is IEEE quad.  __float128 also mangles as
> u9__ieee128 and so does __ieee128.
>
> The problem is if __bf16 needs to be treated differently from
> decltype (0.0bf16) aka std::bfloat16_t (the former being a non-standard
> arithmetic type, the latter being C++23 extended floating-point type,
> then they need to be distinct types.  And distinct types need to
> mangle differently.  Consider
> #include <stdfloat>
> template <typename T>
> void bar () {}
> void baz ()
> {
>   bar<__bf16> ();
>   bar<decltype (0.0bf16)> ();
>   bar<std::bfloat16_t> ();
> }
> If __bf16 is distinct from the latter two which are the same type,
> then it will instantiate bar twice, for both of those types, but
> if they are mangled the same, will emit two functions with the same
> name and assembler will reject it (or LTO might ICE etc.).
>
> Note, e.g.
> void foo (__float128, __ieee128, long double, _Float128) {}
> template <typename T>
> void bar () {}
> void baz ()
> {
>   bar <__float128> ();
>   bar <__ieee128> ();
>   bar <long double> ();
> }
> works on powerpc64le-linux with -mlong-double-128 -mabi=ieeelongdouble
> because __float128, __ieee128 and long double types are in that case
> the same type, not distinct, so e.g. bar is instantiated just once
> (only _Float128 mangles differently above).  With
> -mlong-double-128 -mabi=ibmlongdouble __float128 and __ieee128 are
> the same (non-standard) type, while long double mangles differently
> (g) and _Float128 too, so bar is instantiated twice.
>
> So, either __bf16 should be also extended floating-point type
> like decltype (0.0bf16) and std::bfloat16_t and in that case
> it is fine if it mangles u6__bf16, or __bf16 will be a distinct
> type from the latter two,

Yeah, the former is what I meant.  The intention is that __bf16 and
std::bfloat16_t are the same type, not distinct types.

Richard

> __bf16 non-standard arithmetic type
> while the latter two extended floating-point types, but then
> they need to mangle differently, most likely u6__bf16 vs. DF16b.
>
> 	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-10  8:43         ` Richard Sandiford
@ 2023-03-10 11:30           ` Jakub Jelinek
  2023-03-10 11:50             ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-03-10 11:30 UTC (permalink / raw)
  To: Richard Earnshaw, Kyrylo Tkachov, Jason Merrill, gcc-patches,
	richard.sandiford

On Fri, Mar 10, 2023 at 08:43:02AM +0000, Richard Sandiford wrote:
> > So, either __bf16 should be also extended floating-point type
> > like decltype (0.0bf16) and std::bfloat16_t and in that case
> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct
> > type from the latter two,
> 
> Yeah, the former is what I meant.  The intention is that __bf16 and
> std::bfloat16_t are the same type, not distinct types.

Ok, in that case here is totally untested patch on top of
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
which is also needed (for aarch64 of course the i386 parts of the
patch which have been acked already don't matter but the 2 libgcc
new files are needed and the optabs change is too).

The reason why __floatdibf and __floatundibf are needed on aarch64
and not on x86 is that the latter has optabs for DI -> XF conversions
and so for DI -> BF uses DI -> XF -> BF where the first conversion
doesn't round/truncate anything.  While on aarch64 DI -> TF conversion
where TF is the narrowed mode which can hold all DI values exactly
is done using a libcall and so GCC emits direct DI -> BF conversions.

Will test it momentarily (including the patch it depends on):

2023-03-10  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
	(aarch64_bf16_ptr_type_node): Adjust comment.
	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
	bfloat16_type_node rather than aarch64_bf16_type_node.
	(aarch64_libgcc_floating_mode_supported_p,
	aarch64_scalar_mode_supported_p): Also support BFmode.
	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
	aarch64_invalid_binary_op): Remove BFmode related rejections.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
	aarch64_bf16_type_node.
	(aarch64_init_simd_builtin_types): Likewise.
	(aarch64_init_bf16_types): Likewise.  Don't create bfloat16_type_node,
	which is created in tree.cc already.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/testsuite/
	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
	Don't expect one __bf16 related error.
libgcc/
	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
	(softfp_extras): Add floatdibf floatundibf floattibf floatuntibf.
	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define.
	* soft-fp/floatundibf.c: New file.
	* soft-fp/floatdibf.c: New file.
libstdc++-v3/
	* config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos
	if it isn't mangled as DF16b but u6__bf16.

--- gcc/config/aarch64/aarch64.h.jj	2023-01-16 11:52:15.923736422 +0100
+++ gcc/config/aarch64/aarch64.h	2023-03-10 11:49:35.941436327 +0100
@@ -1237,9 +1237,8 @@ extern const char *aarch64_rewrite_mcpu
 extern GTY(()) tree aarch64_fp16_type_node;
 extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
-/* This type is the user-visible __bf16, and a pointer to that type.  Defined
-   in aarch64-builtins.cc.  */
-extern GTY(()) tree aarch64_bf16_type_node;
+/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
+   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
 extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
--- gcc/config/aarch64/aarch64-builtins.cc.jj	2023-01-16 11:52:15.913736570 +0100
+++ gcc/config/aarch64/aarch64-builtins.cc	2023-03-10 11:49:35.942436313 +0100
@@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree aarch64_bf16_type_node = NULL_TREE;
 tree aarch64_bf16_ptr_type_node = NULL_TREE;
 
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
@@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return aarch64_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 type.  */
-  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
-  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
 					     "__builtin_aarch64_simd_poly128");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_aarch64_simd_ti");
-  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
 					     "__builtin_aarch64_simd_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1682,13 +1681,8 @@ aarch64_init_fp16_types (void)
 static void
 aarch64_init_bf16_types (void)
 {
-  aarch64_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
-  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
-  layout_type (aarch64_bf16_type_node);
-
-  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
-  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Pointer authentication builtins that will become NOP on legacy platform.
--- gcc/config/aarch64/aarch64.cc.jj	2023-02-08 18:40:20.779327223 +0100
+++ gcc/config/aarch64/aarch64.cc	2023-03-10 11:49:35.946436254 +0100
@@ -19858,7 +19858,7 @@ aarch64_gimplify_va_arg_expr (tree valis
 	  field_ptr_t = aarch64_fp16_ptr_type_node;
 	  break;
 	case E_BFmode:
-	  field_t = aarch64_bf16_type_node;
+	  field_t = bfloat16_type_node;
 	  field_ptr_t = aarch64_bf16_ptr_type_node;
 	  break;
 	case E_V2SImode:
@@ -26588,18 +26588,18 @@ aarch64_dwarf_poly_indeterminate_value (
 }
 
 /* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
-   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+   if MODE is [BH]Fmode, and punt to the generic implementation otherwise.  */
 
 static bool
 aarch64_libgcc_floating_mode_supported_p (scalar_float_mode mode)
 {
-  return (mode == HFmode
+  return ((mode == HFmode || mode == BFmode)
 	  ? true
 	  : default_libgcc_floating_mode_supported_p (mode));
 }
 
 /* Implement TARGET_SCALAR_MODE_SUPPORTED_P - return TRUE
-   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+   if MODE is [BH]Fmode, and punt to the generic implementation otherwise.  */
 
 static bool
 aarch64_scalar_mode_supported_p (scalar_mode mode)
@@ -26607,7 +26607,7 @@ aarch64_scalar_mode_supported_p (scalar_
   if (DECIMAL_FLOAT_MODE_P (mode))
     return default_decimal_float_supported_p ();
 
-  return (mode == HFmode
+  return ((mode == HFmode || mode == BFmode)
 	  ? true
 	  : default_scalar_mode_supported_p (mode));
 }
@@ -27075,39 +27075,6 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Return the diagnostic message string if the binary operation OP is
    not permitted on TYPE1 and TYPE2, NULL otherwise.  */
 
@@ -27115,11 +27082,6 @@ static const char *
 aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
 			   const_tree type2)
 {
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
   if (VECTOR_TYPE_P (type1)
       && VECTOR_TYPE_P (type2)
       && !TYPE_INDIVISIBLE_P (type1)
@@ -27716,12 +27678,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
-
 #undef TARGET_INVALID_BINARY_OP
 #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
 
--- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2023-01-16 11:52:15.918736496 +0100
+++ gcc/config/aarch64/aarch64-sve-builtins.def	2023-03-10 11:49:35.970435904 +0100
@@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
 DEF_SVE_MODE (vnum, none, none, vectors)
 
 DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
-DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
+DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
 DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
 DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
 DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
--- gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c.jj	2020-01-31 19:18:02.603901390 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c	2023-03-10 12:13:46.754296831 +0100
@@ -18,7 +18,7 @@ f1 (svbool_t pg, svuint8_t u8, svuint16_
   svbfdot (f32, bf16, bf16);
   svbfdot (f32, 0, bf16); /* { dg-error {passing 'int' to argument 2 of 'svbfdot', which expects 'svbfloat16_t'} } */
   svbfdot (f32, f32, bf16); /* { dg-error {passing 'svfloat32_t' to argument 2 of 'svbfdot', which expects 'svbfloat16_t'} } */
-  svbfdot (f32, bf16, 0); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */
+  svbfdot (f32, bf16, 0);
   svbfdot (f32, bf16, f32); /* { dg-error {passing 'svfloat32_t' to argument 3 of 'svbfdot', which expects 'svbfloat16_t'} } */
   svbfdot (f32, bf16, bf);
 }
--- libgcc/config/aarch64/t-softfp.jj	2022-11-14 13:35:34.527155682 +0100
+++ libgcc/config/aarch64/t-softfp	2023-03-10 12:19:58.668882041 +0100
@@ -1,9 +1,10 @@
 softfp_float_modes := tf
 softfp_int_modes := si di ti
-softfp_extensions := sftf dftf hftf
-softfp_truncations := tfsf tfdf tfhf
+softfp_extensions := sftf dftf hftf bfsf
+softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf
+softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+		 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
 
--- libgcc/config/aarch64/libgcc-softfp.ver.jj	2023-01-16 11:52:16.633725959 +0100
+++ libgcc/config/aarch64/libgcc-softfp.ver	2023-03-10 12:11:44.144082714 +0100
@@ -26,3 +26,16 @@ GCC_11.0 {
   __mulhc3
   __trunctfhf2
 }
+
+%inherit GCC_13.0.0 GCC_11.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __floatdibf
+  __floattibf
+  __floatundibf
+  __floatuntibf
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __trunchfbf2
+}
--- libgcc/config/aarch64/sfp-machine.h.jj	2023-01-16 11:52:16.633725959 +0100
+++ libgcc/config/aarch64/sfp-machine.h	2023-03-10 11:49:35.985435685 +0100
@@ -43,10 +43,12 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
+#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
 #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
 #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
 #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
 #define _FP_NANSIGN_H		0
+#define _FP_NANSIGN_B		0
 #define _FP_NANSIGN_S		0
 #define _FP_NANSIGN_D		0
 #define _FP_NANSIGN_Q		0
--- libgcc/soft-fp/floatundibf.c.jj	2023-03-10 12:10:40.143014939 +0100
+++ libgcc/soft-fp/floatundibf.c	2023-03-10 12:11:07.387618096 +0100
@@ -0,0 +1,45 @@
+/* Software floating-point emulation.
+   Convert a 64bit unsigned integer to bfloat16
+   Copyright (C) 2007-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+
+BFtype
+__floatundibf (UDItype i)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  BFtype a;
+
+  FP_INIT_ROUNDMODE;
+  FP_FROM_INT_B (A, i, DI_BITS, UDItype);
+  FP_PACK_RAW_B (a, A);
+  FP_HANDLE_EXCEPTIONS;
+
+  return a;
+}
--- libgcc/soft-fp/floatdibf.c.jj	2023-03-10 12:08:56.752520872 +0100
+++ libgcc/soft-fp/floatdibf.c	2023-03-10 12:09:56.934644288 +0100
@@ -0,0 +1,45 @@
+/* Software floating-point emulation.
+   Convert a 64bit signed integer to bfloat16
+   Copyright (C) 2007-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+
+BFtype
+__floatdibf (DItype i)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  BFtype a;
+
+  FP_INIT_ROUNDMODE;
+  FP_FROM_INT_B (A, i, DI_BITS, UDItype);
+  FP_PACK_RAW_B (a, A);
+  FP_HANDLE_EXCEPTIONS;
+
+  return a;
+}
--- libstdc++-v3/config/abi/pre/gnu.ver.jj	2023-03-07 18:57:13.135213321 +0100
+++ libstdc++-v3/config/abi/pre/gnu.ver	2023-03-10 11:52:27.870929478 +0100
@@ -2828,6 +2828,9 @@ CXXABI_1.3.14 {
     _ZTIDF[0-9]*[_bx];
     _ZTIPDF[0-9]*[_bx];
     _ZTIPKDF[0-9]*[_bx];
+    _ZTIu6__bf16;
+    _ZTIPu6__bf16;
+    _ZTIPKu6__bf16;
 
 } CXXABI_1.3.13;
 


	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-10 11:30           ` Jakub Jelinek
@ 2023-03-10 11:50             ` Richard Sandiford
  2023-03-10 15:35               ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2023-03-10 11:50 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Earnshaw, Kyrylo Tkachov, Jason Merrill, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Fri, Mar 10, 2023 at 08:43:02AM +0000, Richard Sandiford wrote:
>> > So, either __bf16 should be also extended floating-point type
>> > like decltype (0.0bf16) and std::bfloat16_t and in that case
>> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct
>> > type from the latter two,
>> 
>> Yeah, the former is what I meant.  The intention is that __bf16 and
>> std::bfloat16_t are the same type, not distinct types.
>
> Ok, in that case here is totally untested patch on top of
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
> which is also needed (for aarch64 of course the i386 parts of the
> patch which have been acked already don't matter but the 2 libgcc
> new files are needed and the optabs change is too).

OK for the rest of that.

> The reason why __floatdibf and __floatundibf are needed on aarch64
> and not on x86 is that the latter has optabs for DI -> XF conversions
> and so for DI -> BF uses DI -> XF -> BF where the first conversion
> doesn't round/truncate anything.  While on aarch64 DI -> TF conversion
> where TF is the narrowed mode which can hold all DI values exactly
> is done using a libcall and so GCC emits direct DI -> BF conversions.
>
> Will test it momentarily (including the patch it depends on):
>
> 2023-03-10  Jakub Jelinek  <jakub@redhat.com>
>
> gcc/
> 	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
> 	(aarch64_bf16_ptr_type_node): Adjust comment.
> 	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
> 	bfloat16_type_node rather than aarch64_bf16_type_node.
> 	(aarch64_libgcc_floating_mode_supported_p,
> 	aarch64_scalar_mode_supported_p): Also support BFmode.
> 	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
> 	aarch64_invalid_binary_op): Remove BFmode related rejections.
> 	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
> 	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
> 	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
> 	aarch64_bf16_type_node.
> 	(aarch64_init_simd_builtin_types): Likewise.
> 	(aarch64_init_bf16_types): Likewise.  Don't create bfloat16_type_node,
> 	which is created in tree.cc already.
> 	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
> 	Don't expect one __bf16 related error.
> libgcc/
> 	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
> 	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
> 	(softfp_extras): Add floatdibf floatundibf floattibf floatuntibf.
> 	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
> 	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
> 	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define.
> 	* soft-fp/floatundibf.c: New file.
> 	* soft-fp/floatdibf.c: New file.
> libstdc++-v3/
> 	* config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos
> 	if it isn't mangled as DF16b but u6__bf16.

Thanks, looks great.  Nice to see all the - lines. :)

A naive question:

> --- libgcc/config/aarch64/t-softfp.jj	2022-11-14 13:35:34.527155682 +0100
> +++ libgcc/config/aarch64/t-softfp	2023-03-10 12:19:58.668882041 +0100
> @@ -1,9 +1,10 @@
>  softfp_float_modes := tf
>  softfp_int_modes := si di ti
> -softfp_extensions := sftf dftf hftf
> -softfp_truncations := tfsf tfdf tfhf
> +softfp_extensions := sftf dftf hftf bfsf
> +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf

Is bfsf used for conversions in which sf is the ultimate target,
as opposed to operations that convert bf to sf and then do something
with the sf?  And so the libfunc is needed to raise exceptions, which in
more complex operations can be left to the following sf operation?

Do we still optimise to a shift for -ffinite-math-only?

Assuming so, the patch LGTM.  I'm not familiar enough with softfloat
to do a meaningful review of those parts, and I'm taking the versioning
changes on faith. :)

Thanks,
Richard

>  softfp_exclude_libgcc2 := n
> -softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> +softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
> +		 floatdibf floatundibf floattibf floatuntibf
>  
>  TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
>  
> --- libgcc/config/aarch64/libgcc-softfp.ver.jj	2023-01-16 11:52:16.633725959 +0100
> +++ libgcc/config/aarch64/libgcc-softfp.ver	2023-03-10 12:11:44.144082714 +0100
> @@ -26,3 +26,16 @@ GCC_11.0 {
>    __mulhc3
>    __trunctfhf2
>  }
> +
> +%inherit GCC_13.0.0 GCC_11.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __floatdibf
> +  __floattibf
> +  __floatundibf
> +  __floatuntibf
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/aarch64/sfp-machine.h.jj	2023-01-16 11:52:16.633725959 +0100
> +++ libgcc/config/aarch64/sfp-machine.h	2023-03-10 11:49:35.985435685 +0100
> @@ -43,10 +43,12 @@ typedef int __gcc_CMPtype __attribute__
>  #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>  
>  #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
> +#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
>  #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
>  #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
>  #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
>  #define _FP_NANSIGN_H		0
> +#define _FP_NANSIGN_B		0
>  #define _FP_NANSIGN_S		0
>  #define _FP_NANSIGN_D		0
>  #define _FP_NANSIGN_Q		0
> --- libgcc/soft-fp/floatundibf.c.jj	2023-03-10 12:10:40.143014939 +0100
> +++ libgcc/soft-fp/floatundibf.c	2023-03-10 12:11:07.387618096 +0100
> @@ -0,0 +1,45 @@
> +/* Software floating-point emulation.
> +   Convert a 64bit unsigned integer to bfloat16
> +   Copyright (C) 2007-2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +
> +BFtype
> +__floatundibf (UDItype i)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  BFtype a;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_FROM_INT_B (A, i, DI_BITS, UDItype);
> +  FP_PACK_RAW_B (a, A);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return a;
> +}
> --- libgcc/soft-fp/floatdibf.c.jj	2023-03-10 12:08:56.752520872 +0100
> +++ libgcc/soft-fp/floatdibf.c	2023-03-10 12:09:56.934644288 +0100
> @@ -0,0 +1,45 @@
> +/* Software floating-point emulation.
> +   Convert a 64bit signed integer to bfloat16
> +   Copyright (C) 2007-2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "brain.h"
> +
> +BFtype
> +__floatdibf (DItype i)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_B (A);
> +  BFtype a;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_FROM_INT_B (A, i, DI_BITS, UDItype);
> +  FP_PACK_RAW_B (a, A);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return a;
> +}
> --- libstdc++-v3/config/abi/pre/gnu.ver.jj	2023-03-07 18:57:13.135213321 +0100
> +++ libstdc++-v3/config/abi/pre/gnu.ver	2023-03-10 11:52:27.870929478 +0100
> @@ -2828,6 +2828,9 @@ CXXABI_1.3.14 {
>      _ZTIDF[0-9]*[_bx];
>      _ZTIPDF[0-9]*[_bx];
>      _ZTIPKDF[0-9]*[_bx];
> +    _ZTIu6__bf16;
> +    _ZTIPu6__bf16;
> +    _ZTIPKu6__bf16;
>  
>  } CXXABI_1.3.13;
>  
>
>
> 	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-10 11:50             ` Richard Sandiford
@ 2023-03-10 15:35               ` Jakub Jelinek
  2023-03-10 16:25                 ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-03-10 15:35 UTC (permalink / raw)
  To: Richard Earnshaw, Kyrylo Tkachov, Jason Merrill, gcc-patches,
	richard.sandiford

On Fri, Mar 10, 2023 at 11:50:39AM +0000, Richard Sandiford wrote:
> > Will test it momentarily (including the patch it depends on):

Note, testing still pending, I'm testing in a Fedora scratch build
and that is quite slow (lto bootstrap and the like).

> A naive question:
> 
> > --- libgcc/config/aarch64/t-softfp.jj	2022-11-14 13:35:34.527155682 +0100
> > +++ libgcc/config/aarch64/t-softfp	2023-03-10 12:19:58.668882041 +0100
> > @@ -1,9 +1,10 @@
> >  softfp_float_modes := tf
> >  softfp_int_modes := si di ti
> > -softfp_extensions := sftf dftf hftf
> > -softfp_truncations := tfsf tfdf tfhf
> > +softfp_extensions := sftf dftf hftf bfsf
> > +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
> 
> Is bfsf used for conversions in which sf is the ultimate target,
> as opposed to operations that convert bf to sf and then do something
> with the sf?  And so the libfunc is needed to raise exceptions, which in
> more complex operations can be left to the following sf operation?
> 
> Do we still optimise to a shift for -ffinite-math-only?

Reminds me I should have added testcase coverage for PR107703, will post
it momentarily.

But, consider say:
template <typename T, typename F>
[[gnu::noipa]] T cvt (F f)
{
  return T (F (f));
}

void
foo ()
{
  cvt <_Float32, __bf16> (0.0bf16);
  cvt <_Float64, __bf16> (0.0bf16);
  cvt <_Float128, __bf16> (0.0bf16);
  cvt <signed char, __bf16> (0.0bf16);
  cvt <signed short, __bf16> (0.0bf16);
  cvt <int, __bf16> (0.0bf16);
  cvt <long long, __bf16> (0.0bf16);
  cvt <__int128, __bf16> (0.0bf16);
}

This emits on x86_64 -O2:
/usr/src/gcc/obj/gcc/cc1plus -quiet -O2 1111.C; grep call.*__ 1111.s
	call	__extendbfsf2
	call	__extendbfsf2
	call	__extendbfsf2
	call	__extendsftf2
	call	__fixsfti
where the first call is in cvt <_Float32, __bf16> is really needed,
admittedly the second 2 calls could be replaced by shifts but aren't right
now (we expand BF -> DF as BF -> SF -> DF and because sNaN would be already
diagnosed on the SF -> DF conversion if BF -> SF is done with shift, I think
it would be ok; similarly for BF -> TF).  All the others (BF -> ?I) are
expanded as BF -> SF using shift and then SF -> ?I.  With -O2 -ffast-math
/usr/src/gcc/obj/gcc/cc1plus -quiet -O2 -ffast-math 1111.C; grep call.*__ 1111.s
	call	__extendsftf2
	call	__fixsfti
so all the BF -> SF conversions are then done using shifts.
And aarch64 is exactly the same:
./cc1plus -quiet -nostdinc -O2 1111.C; grep bl.*__[ef] 1111.s
	bl	__extendbfsf2
	bl	__extendbfsf2
	bl	__extendbfsf2
	bl	__extendsftf2
	bl	__fixsfti
./cc1plus -quiet -nostdinc -O2 -ffast-math 1111.C; grep bl.*__[ef] 1111.s
	bl	__extendsftf2
	bl	__fixsfti

> Assuming so, the patch LGTM.  I'm not familiar enough with softfloat
> to do a meaningful review of those parts, and I'm taking the versioning
> changes on faith. :)

The soft-fp new files (in both patches) are fairly mechanical:
for i in float{,un}{d,t}isf.c; do \
  sed 's/IEEE single/bfloat16/;s/single/brain/;s/SFtype/BFtype/;s/_S /_B /;s/sf /bf /' \
    $i `echo $i | sed 's/sf.c/bf.c/'`
done
(well, I've created them by hand, so the Copyright lines differ, but
otherwise they are identical to what the above script would create).
So, there are no smarts in those, the soft-fp library already can handle
those formats.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AArch64 bfloat16 mangling
  2023-03-10 15:35               ` Jakub Jelinek
@ 2023-03-10 16:25                 ` Richard Sandiford
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2023-03-10 16:25 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Earnshaw, Kyrylo Tkachov, Jason Merrill, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Fri, Mar 10, 2023 at 11:50:39AM +0000, Richard Sandiford wrote:
>> > Will test it momentarily (including the patch it depends on):
>
> Note, testing still pending, I'm testing in a Fedora scratch build
> and that is quite slow (lto bootstrap and the like).
>
>> A naive question:
>> 
>> > --- libgcc/config/aarch64/t-softfp.jj	2022-11-14 13:35:34.527155682 +0100
>> > +++ libgcc/config/aarch64/t-softfp	2023-03-10 12:19:58.668882041 +0100
>> > @@ -1,9 +1,10 @@
>> >  softfp_float_modes := tf
>> >  softfp_int_modes := si di ti
>> > -softfp_extensions := sftf dftf hftf
>> > -softfp_truncations := tfsf tfdf tfhf
>> > +softfp_extensions := sftf dftf hftf bfsf
>> > +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
>> 
>> Is bfsf used for conversions in which sf is the ultimate target,
>> as opposed to operations that convert bf to sf and then do something
>> with the sf?  And so the libfunc is needed to raise exceptions, which in
>> more complex operations can be left to the following sf operation?
>> 
>> Do we still optimise to a shift for -ffinite-math-only?
>
> Reminds me I should have added testcase coverage for PR107703, will post
> it momentarily.
>
> But, consider say:
> template <typename T, typename F>
> [[gnu::noipa]] T cvt (F f)
> {
>   return T (F (f));
> }
>
> void
> foo ()
> {
>   cvt <_Float32, __bf16> (0.0bf16);
>   cvt <_Float64, __bf16> (0.0bf16);
>   cvt <_Float128, __bf16> (0.0bf16);
>   cvt <signed char, __bf16> (0.0bf16);
>   cvt <signed short, __bf16> (0.0bf16);
>   cvt <int, __bf16> (0.0bf16);
>   cvt <long long, __bf16> (0.0bf16);
>   cvt <__int128, __bf16> (0.0bf16);
> }
>
> This emits on x86_64 -O2:
> /usr/src/gcc/obj/gcc/cc1plus -quiet -O2 1111.C; grep call.*__ 1111.s
> 	call	__extendbfsf2
> 	call	__extendbfsf2
> 	call	__extendbfsf2
> 	call	__extendsftf2
> 	call	__fixsfti
> where the first call is in cvt <_Float32, __bf16> is really needed,
> admittedly the second 2 calls could be replaced by shifts but aren't right
> now (we expand BF -> DF as BF -> SF -> DF and because sNaN would be already
> diagnosed on the SF -> DF conversion if BF -> SF is done with shift, I think
> it would be ok; similarly for BF -> TF).  All the others (BF -> ?I) are
> expanded as BF -> SF using shift and then SF -> ?I.  With -O2 -ffast-math
> /usr/src/gcc/obj/gcc/cc1plus -quiet -O2 -ffast-math 1111.C; grep call.*__ 1111.s
> 	call	__extendsftf2
> 	call	__fixsfti
> so all the BF -> SF conversions are then done using shifts.
> And aarch64 is exactly the same:
> ./cc1plus -quiet -nostdinc -O2 1111.C; grep bl.*__[ef] 1111.s
> 	bl	__extendbfsf2
> 	bl	__extendbfsf2
> 	bl	__extendbfsf2
> 	bl	__extendsftf2
> 	bl	__fixsfti
> ./cc1plus -quiet -nostdinc -O2 -ffast-math 1111.C; grep bl.*__[ef] 1111.s
> 	bl	__extendsftf2
> 	bl	__fixsfti

Thanks, sounds good.  In some ways it's ironic that, in a bf->df
conversion, it's the bf->sf that needs a call, and the sf->df can
be done inline, given that one of the purposes of bf16 was to provide
cheap conversions to float.  And similarly that bf->sf is more expensive
than sf->df.  But that's not the patch's fault.

Rather than have an out-of-line call, would it be possible to synthesise
the checking inline by making bf->sf do a following sf->df conversion,
even when the df result is not used?  It would obviously need to be kept
alive somehow (not sure how).

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-10 16:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30  9:50 Patch ping Jakub Jelinek
2023-01-30 23:07 ` Richard Sandiford
2023-02-01 10:27   ` AArch64 bfloat16 mangling Jakub Jelinek
2023-03-09 17:14     ` Richard Sandiford
2023-03-10  8:37       ` Jakub Jelinek
2023-03-10  8:43         ` Richard Sandiford
2023-03-10 11:30           ` Jakub Jelinek
2023-03-10 11:50             ` Richard Sandiford
2023-03-10 15:35               ` Jakub Jelinek
2023-03-10 16:25                 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).