From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 3603138582BE for ; Sun, 13 Nov 2022 21:00:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3603138582BE Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668373229; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=8l/xu6iaNyRrxY0WQWr4e15Jc/Ptr7ilsgWxXHsf7WY=; b=cpxjcAxnMyRvofxqdO+g9YBnHd+CfEpiwZPdpz0kJ3FiRHvdeWgW782XWhxzXqu+QZ7dll zESzf5OXksExYSbfMIiaMLDgaQtE03qgkLbwnkeIRxkgfqshrMHIfrKqsT9v4hACeX+0TS 4noKeMpM1uyM1PD+IwD1+s7RIAV1pBU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-326-RLar5QZ6O_2VuNziGY4APA-1; Sun, 13 Nov 2022 16:00:26 -0500 X-MC-Unique: RLar5QZ6O_2VuNziGY4APA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 03EBC800B23; Sun, 13 Nov 2022 21:00:26 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.38]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 54F442166B2B; Sun, 13 Nov 2022 21:00:25 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2ADL0JwV2804834 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sun, 13 Nov 2022 22:00:20 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2ADL0ITm2804833; Sun, 13 Nov 2022 22:00:18 +0100 Date: Sun, 13 Nov 2022 22:00:18 +0100 From: Jakub Jelinek To: Richard Earnshaw , Kyrylo Tkachov , richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] aarch64: Add bfloat16_t support for aarch64 Message-ID: Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! x86_64/i686 has for a few weeks working std::bfloat16_t support, __bf16 there is no longer a storage only type, but can be used for arithmetics and is supported in libgcc and libstdc++. The following patch adds similar support for AArch64. Bootstrapped/regtested on aarch64-linux. Regressions are: +FAIL: 26_numerics/headers/cmath/functions_std_c++23.cc (test for excess errors) this one is something I need to look at: functions_std_c++23.cc:(.text._Z14test_functionsIDFb16_EvPT_PiPlPx[_Z14test_functionsIDFb16_EvPT_PiPlPx]+0x738): undefined reference to `__floatdibf' (4 times). I need to compare to x86, I believe we want to do a DI -> SF conversion followed by SF -> BF, but it is unclear why that isn't happening. +FAIL: gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c -march=armv8.2-a+sve -moverride=tune=none (test for errors, line 21) svbfdot (f32, bf16, 0); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */ This test tests for something that no longer fails, so could be just adjusted. +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++11 scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++11 scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++11 scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++14 scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++14 scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++14 scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++17 scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++17 scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++17 scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++20 scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++20 scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++20 scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++2b scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++2b scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++2b scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++98 scan-assembler \\t.global\\t_Z1fPu6__bf16 +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++98 scan-assembler \\t.global\\t_Z1gPu6__bf16S_ +FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C -std=c++98 scan-assembler \\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE These test the mangling, which changed from u6__bf16 to the standard DF16b. Now, while on x86 we change the mangling and behavior of __bf16, it doesn't need to be necessarily like that on aarch64 (although it would be nice for consistency), for C++ portable code would just use std::bfloat16_t type which is in libstdc++ defined to decltype(0.0bf16). So, if you want to keep previous mangling of __bf16 type or keep it storage only type, we can always register some other name (__bfloat16_t or whatever), make __bf16 and __bfloat16_t be distinct types (former aarch64_bf16_type_node in the compiler, the latter bfloat16_type_node) and thus have 0.0bf16 have the latter type and libstdc++ using it. 2022-11-13 Jakub Jelinek gcc/ * config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove. (aarch64_bf16_ptr_type_node): Adjust comment. * config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use bfloat16_type_node rather than aarch64_bf16_type_node. (aarch64_mangle_type): Mangle BFmode as DF16b. (aarch64_libgcc_floating_mode_supported_p, aarch64_scalar_mode_supported_p): Also support BFmode. (aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove. aarch64_invalid_binary_op): Remove BFmode related rejections. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine. * config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove. (aarch64_int_or_fp_type): Use bfloat16_type_node rather than aarch64_bf16_type_node. (aarch64_init_simd_builtin_types): Likewise. (aarch64_init_bf16_types): Likewise. Don't create bfloat16_type_node, which is created in tree.cc already. * config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise. libgcc/ * config/aarch64/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf dfbf sfbf hfbf. * config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,t,h}fbf2. * config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define. --- gcc/config/aarch64/aarch64.h +++ gcc/config/aarch64/aarch64.h @@ -1220,9 +1220,8 @@ extern const char *aarch64_rewrite_mcpu (int argc, const char **argv); extern GTY(()) tree aarch64_fp16_type_node; extern GTY(()) tree aarch64_fp16_ptr_type_node; -/* This type is the user-visible __bf16, and a pointer to that type. Defined - in aarch64-builtins.cc. */ -extern GTY(()) tree aarch64_bf16_type_node; +/* Pointer to the user-visible __bf16 type. __bf16 itself is generic + bfloat16_type_node. Defined in aarch64-builtins.cc. */ extern GTY(()) tree aarch64_bf16_ptr_type_node; /* The generic unwind code in libgcc does not initialize the frame pointer. --- gcc/config/aarch64/aarch64-builtins.cc +++ gcc/config/aarch64/aarch64-builtins.cc @@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE; tree aarch64_fp16_ptr_type_node = NULL_TREE; /* Back-end node type for brain float (bfloat) types. */ -tree aarch64_bf16_type_node = NULL_TREE; tree aarch64_bf16_ptr_type_node = NULL_TREE; /* Wrapper around add_builtin_function. NAME is the name of the built-in @@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mode, case E_DFmode: return double_type_node; case E_BFmode: - return aarch64_bf16_type_node; + return bfloat16_type_node; default: gcc_unreachable (); } @@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Float64x2_t].eltype = double_type_node; /* Init Bfloat vector types with underlying __bf16 type. */ - aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; - aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; + aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node; + aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node; for (i = 0; i < nelts; i++) { @@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (void) "__builtin_aarch64_simd_poly128"); (*lang_hooks.types.register_builtin_type) (intTI_type_node, "__builtin_aarch64_simd_ti"); - (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, + (*lang_hooks.types.register_builtin_type) (bfloat16_type_node, "__builtin_aarch64_simd_bf"); /* Unsigned integer types for various mode sizes. */ (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node, @@ -1682,13 +1681,8 @@ aarch64_init_fp16_types (void) static void aarch64_init_bf16_types (void) { - aarch64_bf16_type_node = make_node (REAL_TYPE); - TYPE_PRECISION (aarch64_bf16_type_node) = 16; - SET_TYPE_MODE (aarch64_bf16_type_node, BFmode); - layout_type (aarch64_bf16_type_node); - - lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16"); - aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node); + lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16"); + aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node); } /* Pointer authentication builtins that will become NOP on legacy platform. --- gcc/config/aarch64/aarch64.cc +++ gcc/config/aarch64/aarch64.cc @@ -19823,7 +19823,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p, field_ptr_t = aarch64_fp16_ptr_type_node; break; case E_BFmode: - field_t = aarch64_bf16_type_node; + field_t = bfloat16_type_node; field_ptr_t = aarch64_bf16_ptr_type_node; break; case E_V2SImode: @@ -20730,7 +20730,7 @@ aarch64_mangle_type (const_tree type) if (TYPE_MAIN_VARIANT (type) == float16_type_node) return NULL; if (TYPE_MODE (type) == BFmode) - return "u6__bf16"; + return "DF16b"; else return "Dh"; } @@ -26428,18 +26428,18 @@ aarch64_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, } /* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE - if MODE is HFmode, and punt to the generic implementation otherwise. */ + if MODE is [BH]Fmode, and punt to the generic implementation otherwise. */ static bool aarch64_libgcc_floating_mode_supported_p (scalar_float_mode mode) { - return (mode == HFmode + return ((mode == HFmode || mode == BFmode) ? true : default_libgcc_floating_mode_supported_p (mode)); } /* Implement TARGET_SCALAR_MODE_SUPPORTED_P - return TRUE - if MODE is HFmode, and punt to the generic implementation otherwise. */ + if MODE is [BH]Fmode, and punt to the generic implementation otherwise. */ static bool aarch64_scalar_mode_supported_p (scalar_mode mode) @@ -26447,7 +26447,7 @@ aarch64_scalar_mode_supported_p (scalar_mode mode) if (DECIMAL_FLOAT_MODE_P (mode)) return default_decimal_float_supported_p (); - return (mode == HFmode + return ((mode == HFmode || mode == BFmode) ? true : default_scalar_mode_supported_p (mode)); } @@ -26905,39 +26905,6 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } -/* Return the diagnostic message string if conversion from FROMTYPE to - TOTYPE is not allowed, NULL otherwise. */ - -static const char * -aarch64_invalid_conversion (const_tree fromtype, const_tree totype) -{ - if (element_mode (fromtype) != element_mode (totype)) - { - /* Do no allow conversions to/from BFmode scalar types. */ - if (TYPE_MODE (fromtype) == BFmode) - return N_("invalid conversion from type %"); - if (TYPE_MODE (totype) == BFmode) - return N_("invalid conversion to type %"); - } - - /* Conversion allowed. */ - return NULL; -} - -/* Return the diagnostic message string if the unary operation OP is - not permitted on TYPE, NULL otherwise. */ - -static const char * -aarch64_invalid_unary_op (int op, const_tree type) -{ - /* Reject all single-operand operations on BFmode except for &. */ - if (element_mode (type) == BFmode && op != ADDR_EXPR) - return N_("operation not permitted on type %"); - - /* Operation allowed. */ - return NULL; -} - /* Return the diagnostic message string if the binary operation OP is not permitted on TYPE1 and TYPE2, NULL otherwise. */ @@ -26945,11 +26912,6 @@ static const char * aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree type2) { - /* Reject all 2-operand operations on BFmode. */ - if (element_mode (type1) == BFmode - || element_mode (type2) == BFmode) - return N_("operation not permitted on type %"); - if (VECTOR_TYPE_P (type1) && VECTOR_TYPE_P (type2) && !TYPE_INDIVISIBLE_P (type1) @@ -27546,12 +27508,6 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MANGLE_TYPE #define TARGET_MANGLE_TYPE aarch64_mangle_type -#undef TARGET_INVALID_CONVERSION -#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion - -#undef TARGET_INVALID_UNARY_OP -#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op - #undef TARGET_INVALID_BINARY_OP #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op --- gcc/config/aarch64/aarch64-sve-builtins.def +++ gcc/config/aarch64/aarch64-sve-builtins.def @@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_t, bytes) DEF_SVE_MODE (vnum, none, none, vectors) DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node) -DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node) +DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node) DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node) DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node) DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node) --- libgcc/config/aarch64/t-softfp +++ libgcc/config/aarch64/t-softfp @@ -1,7 +1,7 @@ softfp_float_modes := tf softfp_int_modes := si di ti -softfp_extensions := sftf dftf hftf -softfp_truncations := tfsf tfdf tfhf +softfp_extensions := sftf dftf hftf bfsf +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf softfp_exclude_libgcc2 := n softfp_extras := fixhfti fixunshfti floattihf floatuntihf --- libgcc/config/aarch64/libgcc-softfp.ver +++ libgcc/config/aarch64/libgcc-softfp.ver @@ -26,3 +26,12 @@ GCC_11.0 { __mulhc3 __trunctfhf2 } + +%inherit GCC_13.0.0 GCC_11.0.0 +GCC_13.0.0 { + __extendbfsf2 + __truncdfbf2 + __truncsfbf2 + __trunctfbf2 + __trunchfbf2 +} --- libgcc/config/aarch64/sfp-machine.h +++ libgcc/config/aarch64/sfp-machine.h @@ -43,10 +43,12 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_2_udiv(Q,R,X,Y) #define _FP_NANFRAC_H ((_FP_QNANBIT_H << 1) - 1) +#define _FP_NANFRAC_B ((_FP_QNANBIT_B << 1) - 1) #define _FP_NANFRAC_S ((_FP_QNANBIT_S << 1) - 1) #define _FP_NANFRAC_D ((_FP_QNANBIT_D << 1) - 1) #define _FP_NANFRAC_Q ((_FP_QNANBIT_Q << 1) - 1), -1 #define _FP_NANSIGN_H 0 +#define _FP_NANSIGN_B 0 #define _FP_NANSIGN_S 0 #define _FP_NANSIGN_D 0 #define _FP_NANSIGN_Q 0 Jakub