From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by sourceware.org (Postfix) with ESMTPS id D04AE3885527 for ; Mon, 31 Oct 2022 15:48:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D04AE3885527 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id BC2CC3200917; Mon, 31 Oct 2022 11:48:35 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:48:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=cc:cc:content-transfer-encoding:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231315; x= 1667317715; bh=r49nQ7pHeszYpi4qQBES36vacTd1m1sYISpdGnkpFw8=; b=D zw87uy32InqP6xs+T8cTokZyDbWGeawXGpAueCS0MgVIpAOr6Jsfk6mu3nk4glzp mFhG34UZWcLbDLacemTM/Gberny7POeHw9ICffPsqi4C+6LMEVegWQK3NGB+xeRa P7KqG1lo8AV+UrY9Hf5sftDh/gC2boz4IUi2/TTDyC9QY2fhgoyGXPB32q2V5DwU unPGWYAZ+P27WFLjWYkxyzs+XDBiYZx+f29QsofQOVs1emnnzhLgaQNf1avEb72r CUqdE2SXoaQW3sk0fO6XK9+RHL3MKD/Y7padiMDNpcGUk4UtK48Gek5vhYj/7D2n xEKwIGItWRMz6p9MfhGAQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1667231315; x=1667317715; bh=r49nQ7pHeszYp i4qQBES36vacTd1m1sYISpdGnkpFw8=; b=GvhokGZPXgc3H0TosHFIJuwLvujTB P/6OcFjGbaHaHf/+Vah7TQb+0AzUmuOHPS9P0rCcCIGfEBIs8CtNaL2o5/uvalNu ClLhaLtw9XuakLhoPVPfHHOvrcd22eodnTcDZpcAkctdhrmnwc7eCou8JAeO2InU 2nESjTjUxs1RfL0Bn3B/948HKijRumMxVDCxwdiPr0LvRNYqGn1/XF9kFfruca1J lgXScqrsFQYxAmJspQbKx6GX0iV816zvn9cBhBwieVzrf801QLi8iCh4vL9qVA6b P5csiymFXKJiQlST5urE+80A8MQG5YRKkBJdCmnnbMeIXIqWu7jmWvxTg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg htthgvrhhnpeffleeitddtheehgfelfeekieetvdduieevleetvddviefhgeeuueehieet fedvteenucffohhmrghinhepfhhmuhhlrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh X-ME-Proxy: Feedback-ID: i791144d6:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 31 Oct 2022 11:48:34 -0400 (EDT) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 29VFmQas087310; Mon, 31 Oct 2022 08:48:26 -0700 (PDT) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Cc: Daniel Engel , Christophe Lyon Subject: [PATCH v7 27/34] Import float multiplication from the CM0 library Date: Mon, 31 Oct 2022 08:45:22 -0700 Message-Id: <20221031154529.3627576-28-gnu@danielengel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com> References: <20221031154529.3627576-1-gnu@danielengel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fmul.S (__mulsf3): New file. * config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope (this object was previously blocked on v6m builds). --- libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++++++++++++++++++++++++ libgcc/config/arm/lib1funcs.S | 1 + libgcc/config/arm/t-elf | 3 +- 3 files changed, 218 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/arm/eabi/fmul.S diff --git a/libgcc/config/arm/eabi/fmul.S b/libgcc/config/arm/eabi/fmul.S new file mode 100644 index 00000000000..4ebd5a66f47 --- /dev/null +++ b/libgcc/config/arm/eabi/fmul.S @@ -0,0 +1,215 @@ +/* fmul.S: Thumb-1 optimized 32-bit float multiplication + + Copyright (C) 2018-2022 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +#ifdef L_arm_mulsf3 + +// float __aeabi_fmul(float, float) +// Returns $r0 after multiplication by $r1. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION aeabi_fmul .text.sorted.libgcc.fpcore.m.fmul +FUNC_ALIAS mulsf3 aeabi_fmul + CFI_START_FUNCTION + + // Standard registers, compatible with exception handling. + push { rT, lr } + .cfi_remember_state + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Save the sign of the result. + movs rT, r1 + eors rT, r0 + lsrs rT, #31 + lsls rT, #31 + mov ip, rT + + // Set up INF for comparison. + movs rT, #255 + lsls rT, #24 + + // Check for multiplication by zero. + lsls r2, r0, #1 + beq LLSYM(__fmul_zero1) + + lsls r3, r1, #1 + beq LLSYM(__fmul_zero2) + + // Check for INF/NAN. + cmp r3, rT + bhs LLSYM(__fmul_special2) + + cmp r2, rT + bhs LLSYM(__fmul_special1) + + // Because neither operand is INF/NAN, the result will be finite. + // It is now safe to modify the original operand registers. + lsls r0, #9 + + // Isolate the first exponent. When normal, add back the implicit '1'. + // The result is always aligned with the MSB in bit [31]. + // Subnormal mantissas remain effectively multiplied by 2x relative to + // normals, but this works because the weight of a subnormal is -126. + lsrs r2, #24 + beq LLSYM(__fmul_normalize2) + adds r0, #1 + rors r0, r0 + + LLSYM(__fmul_normalize2): + // IMPORTANT: exp10i() jumps in here! + // Repeat for the mantissa of the second operand. + // Short-circuit when the mantissa is 1.0, as the + // first mantissa is already prepared in $r0 + lsls r1, #9 + + // When normal, add back the implicit '1'. + lsrs r3, #24 + beq LLSYM(__fmul_go) + adds r1, #1 + rors r1, r1 + + LLSYM(__fmul_go): + // Calculate the final exponent, relative to bit [30]. + adds rT, r2, r3 + subs rT, #127 + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + // Short-circuit on multiplication by powers of 2. + lsls r3, r0, #1 + beq LLSYM(__fmul_simple1) + + lsls r3, r1, #1 + beq LLSYM(__fmul_simple2) + #endif + + // Save $ip across the call. + // (Alternatively, could push/pop a separate register, + // but the four instructions here are equivally fast) + // without imposing on the stack. + add rT, ip + + // 32x32 unsigned multiplication, 64 bit result. + bl SYM(__umulsidi3) __PLT__ + + // Separate the saved exponent and sign. + sxth r2, rT + subs rT, r2 + mov ip, rT + + b SYM(__fp_assemble) + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + LLSYM(__fmul_simple2): + // Move the high bits of the result to $r1. + movs r1, r0 + + LLSYM(__fmul_simple1): + // Clear the remainder. + eors r0, r0 + + // Adjust mantissa to match the exponent, relative to bit[30]. + subs r2, rT, #1 + b SYM(__fp_assemble) + #endif + + LLSYM(__fmul_zero1): + // $r0 was equal to 0, set up to check $r1 for INF/NAN. + lsls r2, r1, #1 + + LLSYM(__fmul_zero2): + #if defined(EXCEPTION_CODES) && EXCEPTION_CODES + movs r3, #(INFINITY_TIMES_ZERO) + #endif + + // Check the non-zero operand for INF/NAN. + // If NAN, it should be returned. + // If INF, the result should be NAN. + // Otherwise, the result will be +/-0. + cmp r2, rT + beq SYM(__fp_exception) + + // If the second operand is finite, the result is 0. + blo SYM(__fp_zero) + + #if defined(STRICT_NANS) && STRICT_NANS + // Restore values that got mixed in zero testing, then go back + // to sort out which one is the NAN. + lsls r3, r1, #1 + lsls r2, r0, #1 + #elif defined(TRAP_NANS) && TRAP_NANS + // Return NAN with the sign bit cleared. + lsrs r0, r2, #1 + b SYM(__fp_check_nan) + #else + lsrs r0, r2, #1 + // Return NAN with the sign bit cleared. + pop { rT, pc } + .cfi_restore_state + #endif + + LLSYM(__fmul_special2): + // $r1 is INF/NAN. In case of INF, check $r0 for NAN. + cmp r2, rT + + #if defined(TRAP_NANS) && TRAP_NANS + // Force swap if $r0 is not NAN. + bls LLSYM(__fmul_swap) + + // $r0 is NAN, keep if $r1 is INF + cmp r3, rT + beq LLSYM(__fmul_special1) + + // Both are NAN, keep the smaller value (more likely to signal). + cmp r2, r3 + #endif + + // Prefer the NAN already in $r0. + // (If TRAP_NANS, this is the smaller NAN). + bhi LLSYM(__fmul_special1) + + LLSYM(__fmul_swap): + movs r0, r1 + + LLSYM(__fmul_special1): + // $r0 is either INF or NAN. $r1 has already been examined. + // Flags are already set correctly. + lsls r2, r0, #1 + cmp r2, rT + beq SYM(__fp_infinity) + + #if defined(TRAP_NANS) && TRAP_NANS + b SYM(__fp_check_nan) + #else + pop { rT, pc } + .cfi_restore_state + #endif + + CFI_END_FUNCTION +FUNC_END mulsf3 +FUNC_END aeabi_fmul + +#endif /* L_arm_mulsf3 */ + diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index bfe3397d892..92245353442 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -2015,6 +2015,7 @@ LSYM(Lchange_\register): #include "eabi/fneg.S" #include "eabi/fadd.S" #include "eabi/futil.S" +#include "eabi/fmul.S" #endif /* NOT_ISA_TARGET_32BIT */ #include "eabi/lcmp.S" #endif /* !__symbian__ */ diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf index c57d9ef50ac..682f273a1d2 100644 --- a/libgcc/config/arm/t-elf +++ b/libgcc/config/arm/t-elf @@ -10,7 +10,7 @@ THUMB1_ISA:=$(findstring __ARM_ARCH_ISA_THUMB 1,$(shell $(gcc_compile_bare) -dM # inclusion create when only multiplication is used, thus avoiding pulling in # useless division code. ifneq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) -LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3 +LIB1ASMFUNCS += _arm_muldf3 endif endif # !__symbian__ @@ -26,6 +26,7 @@ LIB1ASMFUNCS += \ _ctzsi2 \ _paritysi2 \ _popcountsi2 \ + _arm_mulsf3 \ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) # Group 0B: WEAK overridable function objects built for v6m only. -- 2.34.1