From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by sourceware.org (Postfix) with ESMTPS id 35E7E382DE2B for ; Mon, 31 Oct 2022 15:48:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 35E7E382DE2B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 1B72F3200984; Mon, 31 Oct 2022 11:48:10 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:48:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=cc:cc:content-transfer-encoding:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231289; x= 1667317689; bh=8XthhmY1hPGFDFTSpu48BxWCwXC/oJiasGLsF0OeDnU=; b=Y fI0gnIPWOuayugNCld//QayTnCV2ObOGiFnJNinYcVHw1JsjMtLhK+zuVtSr8gso nSzUqomxuJt84XTBIV/CS/GgB+oUV4P9t6WnotXlUsjT2NijeE/QRWYFKR14pBzo pAd3XrMQhNQbbsuNxeLHxjXuVbfbFUBY7JiU+GztoxngoB9jhUTKDnJDgqKBzp0J 9iz0fbGvyI1UGlWPaS0TyQV21EsgEj6/4jpw3402o0OuRO0qih+JL/fpPH0acLb4 ktOqZsdbW7DS0ATt6n7GCHWNjznwoTlZhsLZBeaVUHMG3B70N19Hi/Y3+X/en3a9 toiULrBQtfSEp2riDfASA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1667231289; x=1667317689; bh=8XthhmY1hPGFD FTSpu48BxWCwXC/oJiasGLsF0OeDnU=; b=fISZq/DzF5+Ps1uU1Q3F2DlKdIaws dhCz6ORaYuJGxhlyRhTYT/L9DNESarSmp8EbgnYbvF81qm83BUf6kjPIEiGp5Osx zV9QKB+Ox6zI+5T/1MgkkEDxwilBmK0CEiOGQE2U7oIe+Q+IgUEU/rmIY3oLxgYY zQ/pvzn44XZseTqhxudDsuxJZZnkHvyyRxu7PWc+qemiJQ2RHwRv8e6F3UmobKY9 6rM1vOpvxSvbG6m56VVnwE7nXJLK7A/jiiAF48iuL7E7gA/9un2Ng3XZM0hU7C33 djQHc2FLqiQJk3lDeWF+hJe9vQgGtwTayjIzhSvKmK8p1lr47fQzYiYgw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg htthgvrhhnpeefiedugefhveevfedtgfehkefhleduffegveehgeeltedtkedvgefgveef uddtveenucffohhmrghinheplhhmuhhlrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh X-ME-Proxy: Feedback-ID: i791144d6:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 31 Oct 2022 11:48:08 -0400 (EDT) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 29VFm16Z087295; Mon, 31 Oct 2022 08:48:01 -0700 (PDT) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Cc: Daniel Engel , Christophe Lyon Subject: [PATCH v7 22/34] Import integer multiplication from the CM0 library Date: Mon, 31 Oct 2022 08:45:17 -0700 Message-Id: <20221031154529.3627576-23-gnu@danielengel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com> References: <20221031154529.3627576-1-gnu@danielengel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_DNSWL_LOW,SCC_5_SHORT_WORD_LINES,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and __umulsidi3(). * config/arm/lib1funcs.S: #eabi/lmul.S (v6m only). * config/arm/t-elf: Add the new objects to LIB1ASMFUNCS. --- libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++++++++++++++++++++++++ libgcc/config/arm/lib1funcs.S | 1 + libgcc/config/arm/t-elf | 13 +- 3 files changed, 230 insertions(+), 2 deletions(-) create mode 100644 libgcc/config/arm/eabi/lmul.S diff --git a/libgcc/config/arm/eabi/lmul.S b/libgcc/config/arm/eabi/lmul.S new file mode 100644 index 00000000000..377e571bf09 --- /dev/null +++ b/libgcc/config/arm/eabi/lmul.S @@ -0,0 +1,218 @@ +/* lmul.S: Thumb-1 optimized 64-bit integer multiplication + + Copyright (C) 2018-2022 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +#ifdef L_muldi3 + +// long long __aeabi_lmul(long long, long long) +// Returns the least significant 64 bits of a 64 bit multiplication. +// Expects the two multiplicands in $r1:$r0 and $r3:$r2. +// Returns the product in $r1:$r0 (does not distinguish signed types). +// Uses $r4 and $r5 as scratch space. +// Same parent section as __umulsidi3() to keep tail call branch within range. +FUNC_START_SECTION muldi3 .text.sorted.libgcc.lmul.muldi3 + +#ifndef __symbian__ + FUNC_ALIAS aeabi_lmul muldi3 +#endif + + CFI_START_FUNCTION + + // $r1:$r0 = 0xDDDDCCCCBBBBAAAA + // $r3:$r2 = 0xZZZZYYYYXXXXWWWW + + // The following operations that only affect the upper 64 bits + // can be safely discarded: + // DDDD * ZZZZ + // DDDD * YYYY + // DDDD * XXXX + // CCCC * ZZZZ + // CCCC * YYYY + // BBBB * ZZZZ + + // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle + // 'muls' instruction, and skip over the operation in that case. + + // (0xDDDDCCCC * 0xXXXXWWWW), free $r1 + muls xxh, yyl + + // (0xZZZZYYYY * 0xBBBBAAAA), free $r3 + muls yyh, xxl + adds yyh, xxh + + // Put the parameters in the correct form for umulsidi3(). + movs xxh, yyl + b LLSYM(__mul_overflow) + + CFI_END_FUNCTION +FUNC_END muldi3 + +#ifndef __symbian__ + FUNC_END aeabi_lmul +#endif + +#endif /* L_muldi3 */ + + +// The following implementation of __umulsidi3() integrates with __muldi3() +// above to allow the fast tail call while still preserving the extra +// hi-shifted bits of the result. However, these extra bits add a few +// instructions not otherwise required when using only __umulsidi3(). +// Therefore, this block configures __umulsidi3() for compilation twice. +// The first version is a minimal standalone implementation, and the second +// version adds the hi bits of __muldi3(). The standalone version must +// be declared WEAK, so that the combined version can supersede it and +// provide both symbols in programs that multiply long doubles. +// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS. +#if defined(L_muldi3) || defined(L_umulsidi3) + +#ifdef L_umulsidi3 +// unsigned long long __umulsidi3(unsigned int, unsigned int) +// Returns all 64 bits of a 32 bit multiplication. +// Expects the two multiplicands in $r0 and $r1. +// Returns the product in $r1:$r0. +// Uses $r3, $r4 and $ip as scratch space. +WEAK_START_SECTION umulsidi3 .text.sorted.libgcc.lmul.umulsidi3 + CFI_START_FUNCTION + +#else /* L_muldi3 */ +FUNC_ENTRY umulsidi3 + CFI_START_FUNCTION + + // 32x32 multiply with 64 bit result. + // Expand the multiply into 4 parts, since muls only returns 32 bits. + // (a16h * b16h / 2^32) + // + (a16h * b16l / 2^48) + (a16l * b16h / 2^48) + // + (a16l * b16l / 2^64) + + // MAYBE: Test for multiply by 0 on implementations with a 32-cycle + // 'muls' instruction, and skip over the operation in that case. + + eors yyh, yyh + + LLSYM(__mul_overflow): + mov ip, yyh + +#endif /* !L_muldi3 */ + + // a16h * b16h + lsrs r2, xxl, #16 + lsrs r3, xxh, #16 + muls r2, r3 + + #ifdef L_muldi3 + add ip, r2 + #else + mov ip, r2 + #endif + + // a16l * b16h; save a16h first! + lsrs r2, xxl, #16 + #if (__ARM_ARCH >= 6) + uxth xxl, xxl + #else /* __ARM_ARCH < 6 */ + lsls xxl, #16 + lsrs xxl, #16 + #endif + muls r3, xxl + + // a16l * b16l + #if (__ARM_ARCH >= 6) + uxth xxh, xxh + #else /* __ARM_ARCH < 6 */ + lsls xxh, #16 + lsrs xxh, #16 + #endif + muls xxl, xxh + + // a16h * b16l + muls xxh, r2 + + // Distribute intermediate results. + eors r2, r2 + adds xxh, r3 + adcs r2, r2 + lsls r3, xxh, #16 + lsrs xxh, #16 + lsls r2, #16 + adds xxl, r3 + adcs xxh, r2 + + // Add in the high bits. + add xxh, ip + + RET + + CFI_END_FUNCTION +FUNC_END umulsidi3 + +#endif /* L_muldi3 || L_umulsidi3 */ + + +#ifdef L_mulsidi3 + +// long long mulsidi3(int, int) +// Returns all 64 bits of a 32 bit signed multiplication. +// Expects the two multiplicands in $r0 and $r1. +// Returns the product in $r1:$r0. +// Uses $r3, $r4 and $rT as scratch space. +FUNC_START_SECTION mulsidi3 .text.sorted.libgcc.lmul.mulsidi3 + CFI_START_FUNCTION + + // Push registers for function call. + push { rT, lr } + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Save signs of the arguments. + asrs r3, r0, #31 + asrs rT, r1, #31 + + // Absolute value of the arguments. + eors r0, r3 + eors r1, rT + subs r0, r3 + subs r1, rT + + // Save sign of the result. + eors rT, r3 + + bl SYM(__umulsidi3) __PLT__ + + // Apply sign of the result. + eors xxl, rT + eors xxh, rT + subs xxl, rT + sbcs xxh, rT + + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END mulsidi3 + +#endif /* L_mulsidi3 */ + diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index 51fb32e38aa..e828d53d732 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -1578,6 +1578,7 @@ LSYM(Lover12): #define PEDANTIC_DIV0 (1) #include "eabi/idiv.S" #include "eabi/ldiv.S" +#include "eabi/lmul.S" #endif /* NOT_ISA_TARGET_32BIT */ /* ------------------------------------------------------------------------ */ diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf index 4d430325fa1..eb1acd8d5a2 100644 --- a/libgcc/config/arm/t-elf +++ b/libgcc/config/arm/t-elf @@ -27,6 +27,13 @@ LIB1ASMFUNCS += \ _paritysi2 \ _popcountsi2 \ +ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) +# Group 0B: WEAK overridable function objects built for v6m only. +LIB1ASMFUNCS += \ + _muldi3 \ + +endif + # Group 1: Integer function objects. LIB1ASMFUNCS += \ @@ -51,11 +58,13 @@ LIB1ASMFUNCS += \ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) -# Group 1B: Integer functions built for v6m only. +# Group 1B: Integer function objects built for v6m only. LIB1ASMFUNCS += \ _divdi3 \ _udivdi3 \ - + _mulsidi3 \ + _umulsidi3 \ + endif -- 2.34.1