From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by sourceware.org (Postfix) with ESMTPS id 8D3AB3829BCF for ; Mon, 31 Oct 2022 15:48:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D3AB3829BCF Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 7C6523200302; Mon, 31 Oct 2022 11:48:56 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:48:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=cc:cc:content-transfer-encoding:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231336; x= 1667317736; bh=qQ7mGDy4DYPcvGJDAtP2u8S0+mX7buFubUHNgzObVcg=; b=1 5IuD3LZWVNmzdpgzYEsagI6Rd/1o/+kU5gOfbAWt0HVk6ngJe4opMINX3q5Z32Yw I5jfLW/MkxsSvJVDNMFxNTQ/3b8gsNn9wXxfFnhR2Rv4Mr1ElvRDzRf1o9Ry8cuS kO4N0JnEJbpQoiRvJEUa3lfZGWwMNKLSdMxLBl4uVXRzVxvedbPJRMpVfZR0A3TU h866zilzsxmndr0IMXHzhb6NviwNM6BIEBPZughQ2mggA3DA31t6Xanub1atc0CD 8kc/AhaIJJSc6c68tT9c3AAHH5x1fyzOMHTmXfid2CYf+3QZ0ttS1LDQppI0IgFf 4/GRnUygwH50ATHWBVmRA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1667231336; x=1667317736; bh=qQ7mGDy4DYPcv GJDAtP2u8S0+mX7buFubUHNgzObVcg=; b=gAHVIPElStqpZTIGCb3+JI4v0E9OF CnstkTFx2ocxZvCyJRCttVcPkS3/nCWLiSiJ3xXSi/ZhBjYPJ19htpKylqpvBaIx E7FjCtCpb4ICxEw9YJcsfyNlk8I7h2AFFcbRKPCNqi3PSf1e3SaIAt6yaBVbixSO 40w7zwAtEKeTtCrkECfQbgzrX6eVakvM9Lcl6am+yQSNiE8EUmjUAlmYbYd/Xyxo XrYo6iSZkZcFj8r5OcJoSK87DLvXoR0vVgqQzKChlrmFqoTWRdPlVXUcbIeNjn7i N5pHe3+D4Onls4zGf1+/5f2QttaRux+fnO5sA05DBOT4gQynRCWle5NsA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdekudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg htthgvrhhnpefhtdetueeiieffffdtheeuueeigfeliedujeetteetgedtfeeuffdvfefh hfelveenucffohhmrghinhepfhgtrghsthdrshgspdhgnhhurdhorhhgpdhlihgsudhfuh hntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm X-ME-Proxy: Feedback-ID: i791144d6:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 31 Oct 2022 11:48:55 -0400 (EDT) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 29VFmlZh087322; Mon, 31 Oct 2022 08:48:47 -0700 (PDT) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Cc: Daniel Engel , Christophe Lyon Subject: [PATCH v7 31/34] Import float<->double conversion from the CM0 library Date: Mon, 31 Oct 2022 08:45:26 -0700 Message-Id: <20221031154529.3627576-32-gnu@danielengel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com> References: <20221031154529.3627576-1-gnu@danielengel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file. * config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d. --- libgcc/config/arm/eabi/fcast.S | 256 +++++++++++++++++++++++++++++++++ libgcc/config/arm/lib1funcs.S | 1 + libgcc/config/arm/t-elf | 2 + 3 files changed, 259 insertions(+) create mode 100644 libgcc/config/arm/eabi/fcast.S diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S new file mode 100644 index 00000000000..f0d1373d31a --- /dev/null +++ b/libgcc/config/arm/eabi/fcast.S @@ -0,0 +1,256 @@ +/* fcast.S: Thumb-1 optimized 32- and 64-bit float conversions + + Copyright (C) 2018-2022 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +#ifdef L_arm_f2d + +// double __aeabi_f2d(float) +// Converts a single-precision float in $r0 to double-precision in $r1:$r0. +// Rounding, overflow, and underflow are impossible. +// INF and ZERO are returned unmodified. +FUNC_START_SECTION aeabi_f2d .text.sorted.libgcc.fpcore.v.f2d +FUNC_ALIAS extendsfdf2 aeabi_f2d + CFI_START_FUNCTION + + // Save the sign. + lsrs r1, r0, #31 + lsls r1, #31 + + // Set up registers for __fp_normalize2(). + push { rT, lr } + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Test for zero. + lsls r0, #1 + beq LLSYM(__f2d_return) + + // Split the exponent and mantissa into separate registers. + // This is the most efficient way to convert subnormals in the + // half-precision form into normals in single-precision. + // This does add a leading implicit '1' to INF and NAN, + // but that will be absorbed when the value is re-assembled. + movs r2, r0 + bl SYM(__fp_normalize2) __PLT__ + + // Set up the exponent bias. For INF/NAN values, the bias + // is 1791 (2047 - 255 - 1), where the last '1' accounts + // for the implicit '1' in the mantissa. + movs r0, #3 + lsls r0, #9 + adds r0, #255 + + // Test for INF/NAN, promote exponent if necessary + cmp r2, #255 + beq LLSYM(__f2d_indefinite) + + // For normal values, the exponent bias is 895 (1023 - 127 - 1), + // which is half of the prepared INF/NAN bias. + lsrs r0, #1 + + LLSYM(__f2d_indefinite): + // Assemble exponent with bias correction. + adds r2, r0 + lsls r2, #20 + adds r1, r2 + + // Assemble the high word of the mantissa. + lsrs r0, r3, #11 + add r1, r0 + + // Remainder of the mantissa in the low word of the result. + lsls r0, r3, #21 + + LLSYM(__f2d_return): + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END extendsfdf2 +FUNC_END aeabi_f2d + +#endif /* L_arm_f2d */ + + +#if defined(L_arm_d2f) || defined(L_arm_truncdfsf2) + +// HACK: Build two separate implementations: +// * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules. +// * __truncdfsf2() rounds towards zero per GCC specification. +// Presumably, a program will consistently use one ABI or the other, +// which means that code size will not be duplicated in practice. +// Merging two versions with dynamic rounding would be rather hard. +#ifdef L_arm_truncdfsf2 + #define D2F_NAME truncdfsf2 + #define D2F_SECTION .text.sorted.libgcc.fpcore.x.truncdfsf2 +#else + #define D2F_NAME aeabi_d2f + #define D2F_SECTION .text.sorted.libgcc.fpcore.w.d2f +#endif + +// float __aeabi_d2f(double) +// Converts a double-precision float in $r1:$r0 to single-precision in $r0. +// Values out of range become ZERO or INF; returns the upper 23 bits of NAN. +FUNC_START_SECTION D2F_NAME D2F_SECTION + CFI_START_FUNCTION + + // Save the sign. + lsrs r2, r1, #31 + lsls r2, #31 + mov ip, r2 + + // Isolate the exponent (11 bits). + lsls r2, r1, #1 + lsrs r2, #21 + + // Isolate the mantissa. It's safe to always add the implicit '1' -- + // even for subnormals -- since they will underflow in every case. + lsls r1, #12 + adds r1, #1 + rors r1, r1 + lsrs r3, r0, #21 + adds r1, r3 + + #ifndef L_arm_truncdfsf2 + // Fix the remainder. Even though the mantissa already has 32 bits + // of significance, this value still influences rounding ties. + lsls r0, #11 + #endif + + // Test for INF/NAN (r3 = 2047) + mvns r3, r2 + lsrs r3, #21 + cmp r3, r2 + beq LLSYM(__d2f_indefinite) + + // Adjust exponent bias. Offset is 127 - 1023, less 1 more since + // __fp_assemble() expects the exponent relative to bit[30]. + lsrs r3, #1 + subs r2, r3 + adds r2, #126 + + #ifndef L_arm_truncdfsf2 + LLSYM(__d2f_overflow): + // Use the standard formatting for overflow and underflow. + push { rT, lr } + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + b SYM(__fp_assemble) + .cfi_restore_state + + #else /* L_arm_truncdfsf2 */ + // In theory, __truncdfsf2() could also push registers and branch to + // __fp_assemble() after calculating the truncation shift and clearing + // bits. __fp_assemble() always rounds down if there is no remainder. + // However, after doing all of that work, the incremental cost to + // finish assembling the return value is only 6 or 7 instructions + // (depending on how __d2f_overflow() returns). + // This seems worthwhile to avoid linking in all of __fp_assemble(). + + // Test for INF. + cmp r2, #254 + bge LLSYM(__d2f_overflow) + + #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS + // Preserve inexact zero. + orrs r0, r1 + #endif + + // HACK: Pre-empt the default round-to-nearest mode, + // since GCC specifies rounding towards zero. + // Start by identifying subnormals by negative exponents. + asrs r3, r2, #31 + ands r3, r2 + + // Clear the exponent field if the result is subnormal. + eors r2, r3 + + // Add the subnormal shift to the nominal 8 bits of standard remainder. + // Also, saturate the low byte if the shift is larger than 32 bits. + // Anything larger would flush to zero anyway, and the shift + // innstructions only examine the low byte of the second operand. + // Basically: + // x = (-x + 8 > 32) ? 255 : (-x + 8) + // x = (x + 24 < 0) ? 255 : (-x + 8) + // x = (x + 24 < 0) ? 255 : (-(x + 24) + 32) + adds r3, #24 + asrs r0, r3, #31 + subs r3, #32 + rsbs r3, #0 + orrs r3, r0 + + // Clear the insignificant bits. + lsrs r1, r3 + + // Combine the mantissa and the exponent. + lsls r2, #23 + adds r0, r1, r2 + + // Combine with the saved sign. + add r0, ip + RET + + LLSYM(__d2f_overflow): + // Construct signed INF in $r0. + movs r0, #255 + lsls r0, #23 + add r0, ip + RET + + #endif /* L_arm_truncdfsf2 */ + + LLSYM(__d2f_indefinite): + // Test for INF. If the mantissa, exclusive of the implicit '1', + // is equal to '0', the result will be INF. + lsls r3, r1, #1 + orrs r3, r0 + beq LLSYM(__d2f_overflow) + + // TODO: Support for TRAP_NANS here. + // This will be double precision, not compatible with the current handler. + + // Construct NAN with the upper 22 bits of the mantissa, setting bit[21] + // to ensure a valid NAN without changing bit[22] (quiet) + subs r2, #0xD + lsls r0, r2, #20 + lsrs r1, #8 + orrs r0, r1 + + #if defined(STRICT_NANS) && STRICT_NANS + // Yes, the NAN was probably altered, but at least keep the sign... + add r0, ip + #endif + + RET + + CFI_END_FUNCTION +FUNC_END D2F_NAME + +#endif /* L_arm_d2f || L_arm_truncdfsf2 */ + diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index 22619516eaf..28a5f4d5c86 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -2019,6 +2019,7 @@ LSYM(Lchange_\register): #include "eabi/fdiv.S" #include "eabi/ffixed.S" #include "eabi/ffloat.S" +#include "eabi/fcast.S" #endif /* NOT_ISA_TARGET_32BIT */ #include "eabi/lcmp.S" #endif /* !__symbian__ */ diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf index 6b0bb642ef5..434a7a85598 100644 --- a/libgcc/config/arm/t-elf +++ b/libgcc/config/arm/t-elf @@ -106,6 +106,8 @@ LIB1ASMFUNCS += \ _arm_floatunsisf \ _arm_fixsfdi \ _arm_fixunssfdi \ + _arm_d2f \ + _arm_f2d \ _fp_exceptionf \ _fp_checknanf \ _fp_assemblef \ -- 2.34.1