From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by sourceware.org (Postfix) with ESMTPS id CBB393853542 for ; Mon, 31 Oct 2022 15:46:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CBB393853542 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 0481D320093E; Mon, 31 Oct 2022 11:46:03 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:46:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=cc:cc:content-transfer-encoding:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm1; t=1667231163; x=1667317563; bh=3Xl1lH0erE gxl92Cycu1g3xeqAXDyKSoMWp4RU4eM6s=; b=GgYy+/KyfHbszJtTlbrT0ls/3f VVpLOI1TwEBd5oqLWh3uRGnMCKp1rzHskbgAG0kKcbfm7GUg6Sr835ccT2hZqLGU LPSk8oHTBZwVKNPkcROnkux90NfWgVSMujOz0SdQgfxYFOPe622rmE/KqYfGGkUw ZsPUL9Ha7cTrF4mu42dbxGUykn0e8mV7QdgPcv1BkxxeFV0sMLojZm07WpuAs/qz gbOhp97jG2MngovnuHqvypcFcA1euzO40aVwiJgYACsZrUXMJ/CYX1W7Kx4A0h+L 08VnbjIIO/bEoWDl6WQI3/6uYlVkKa9TOq/45diSGI+IcXT+ir2AMUoEfNjA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1667231163; x=1667317563; bh=3Xl1lH0erEgxl92Cycu1g3xeqAXDyKSoMWp 4RU4eM6s=; b=URJbbSGE4qq1VcH5jJ2xy72Kq+uk5OYffMy3vvZapWOtw7m6K9h /RUvlyqxpT3CZ1JHFUwrWI7vVL7sWWj86svNuc94adt2wAl/SXfCrNVXGuJYynor Odc+utuCQTbCHbC+pLJaaGuXzATtsMPMmdQs7ygvf1q5nH5pKMTbYutsvhsd8OUf NyOTYcn5J/Hz/G+otnLmrjuxDHYxGLtysFv9U5+iu5HEXkK0ZKKt5j2wjJc9212k mMrAT+kt9clI2HH+1LVUo7k8Hg8c6V/NSf7bS05s4GKAWjcF0LAkNYBtydZXDuF5 6ey0G6FOOqyXCIzEwBoal7KP5BMmuS+B6HA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkffoggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgvlhcu gfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrghtth gvrhhnpeevudfftdfhffehjedukefgfffhkeelueduueeuudfhveehiefgudfhjeejgeeg tdenucffohhmrghinhepghhnuhdrohhrghdpuggrnhhivghlvghnghgvlhdrtghomhdpnh gvthhlihgsrdhorhhgpdhjhhgruhhsvghrrdhushdpuhhirgdrrggtrdgsvgenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnih gvlhgvnhhgvghlrdgtohhm X-ME-Proxy: Feedback-ID: i791144d6:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 31 Oct 2022 11:46:02 -0400 (EDT) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 29VFjrHY087229; Mon, 31 Oct 2022 08:45:53 -0700 (PDT) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Cc: Daniel Engel , Christophe Lyon Subject: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 Date: Mon, 31 Oct 2022 08:44:55 -0700 Message-Id: <20221031154529.3627576-1-gnu@danielengel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_INFOUSMEBIZ,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, I am re-submitting my libgcc patch from 2021: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html I believe I have finally made the stage1 window. Regards, Daniel --- Changes since v6: * Rebased and tested with gcc-13 There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}. Clean master: # of expected passes 529397 # of unexpected failures 41160 # of unexpected successes 12 # of expected failures 3442 # of unresolved testcases 978 # of unsupported tests 28993 Patched master: # of expected passes 529397 # of unexpected failures 41160 # of unexpected successes 12 # of expected failures 3442 # of unresolved testcases 978 # of unsupported tests 28993 --- This patch series adds an assembly-language implementation of IEEE-754 compliant single-precision functions designed for the Cortex M0 (v6m) architecture. There are improvements to most of the EABI integer functions as well. This is the ibgcc component of a larger library project originally proposed in 2018: https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html As one point of comparison, a test program [1] links 916 bytes from libgcc with the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain. That's a 90% size reduction. I have extensive test vectors [2], and this patch pass all tests on an STM32F051. These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus many of my own generation. There may be some follow-on projects worth discussing: * The library is currently integrated into the ARM v6s-m multilib only. It is likely that some other architectures would benefit from these routines. However, I have NOT profiled the existing implementations (ieee754-sf.S) to estimate where improvements may be found. * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod(). There may be useful bits in [1] that can be integrated. On Cortex M0, the library has (approximately) the following properties: Function(s) Size (bytes) Cycles Stack Accuracy __clzsi2 50 20 0 exact __clzsi2 (OPTIMIZE_SIZE) 22 51 0 exact __clzdi2 8+__clzsi2 4+__clzsi2 0 exact __clrsbsi2 8+__clzsi2 6+__clzsi2 0 exact __clrsbdi2 18+__clzsi2 (8..10)+__clzsi2 0 exact __ctzsi2 52 21 0 exact __ctzsi2 (OPTIMIZE_SIZE) 24 52 0 exact __ctzdi2 8+__ctzsi2 5+__ctzsi2 0 exact __ffssi2 8 6..(5+__ctzsi2) 0 exact __ffsdi2 14+__ctzsi2 9..(8+__ctzsi2) 0 exact __popcountsi2 52 25 0 exact __popcountsi2 (OPTIMIZE_SIZE) 14 9..201 0 exact __popcountdi2 34+__popcountsi2 46 0 exact __popcountdi2 (OPTIMIZE_SIZE) 12+__popcountsi2 17..401 0 exact __paritysi2 24 14 0 exact __paritysi2 (OPTIMIZE_SIZE) 16 38 0 exact __paritydi2 2+__paritysi2 1+__paritysi2 0 exact __umulsidi3 44 24 0 exact __mulsidi3 30+__umulsidi3 24+__umulsidi3 8 exact __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 exact __ashldi3 (__aeabi_llsl) 22 13 0 exact __lshrdi3 (__aeabi_llsr) 22 13 0 exact __ashrdi3 (__aeabi_lasr) 22 13 0 exact __aeabi_lcmp 20 13 0 exact __aeabi_ulcmp 16 10 0 exact __udivsi3 (__aeabi_uidiv) 56 72..385 0 < 1 lsb __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 < 1 lsb __udivdi3 (__aeabi_uldiv) 164 103..1394 16 < 1 lsb __udivdi3 (OPTIMIZE_SIZE) 142 120..1392 16 < 1 lsb __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 < 1 lsb __shared_float 178 __shared_float (OPTIMIZE_SIZE) 154 __addsf3 (__aeabi_fadd) 116+__shared_float 31..76 8 <= 0.5 ulp __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 <= 0.5 ulp __subsf3 (__aeabi_fsub) 6+__addsf3 3+__addsf3 8 <= 0.5 ulp __aeabi_frsub 8+__addsf3 6+__addsf3 8 <= 0.5 ulp __mulsf3 (__aeabi_fmul) 112+__shared_float 73..97 8 <= 0.5 ulp __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 <= 0.5 ulp __divsf3 (__aeabi_fdiv) 132+__shared_float 83..361 8 <= 0.5 ulp __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263..359 8 <= 0.5 ulp __cmpsf2/__lesf2/__ltsf2 72 33 0 exact __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 exact __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 exact __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 exact __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 exact __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 exact __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 exact __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 exact __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 exact __floatundisf (__aeabi_ul2f) 14+__shared_float 40..81 8 <= 0.5 ulp __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40..237 8 <= 0.5 ulp __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 <= 0.5 ulp __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 <= 0.5 ulp __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 <= 0.5 ulp __fixsfdi (__aeabi_f2lz) 74 27..33 0 exact __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 exact __fixsfsi (__aeabi_f2iz) 52 19 0 exact __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 exact __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 exact __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 exact __truncsfdf2 (__aeabi_f2d) 88 34 8 exact __aeabi_d2f 56+__shared_float 54..58 8 <= 0.5 ulp __aeabi_h2f 34+__shared_float 34 8 exact __aeabi_f2h 84 23..34 0 <= 0.5 ulp Copyright assignment is on file with the FSF. Thanks, Daniel Engel [1] // Test program for size comparison extern int main (void) { volatile int x = 1; volatile unsigned long long int y = 10; volatile long long int z = x / y; // 64-bit division volatile float a = x; // 32-bit casting volatile float b = y; // 64 bit casting volatile float c = z / b; // float division volatile float d = a + c; // float addition volatile float e = c * b; // float multiplication volatile float f = d - e - c; // float subtraction if (f != c) // float comparison y -= (long long int)d; // float casting } [2] http://danielengel.com/cm0_test_vectors.tgz [3] http://www.netlib.org/fp/ucbtest.tgz [4] http://www.jhauser.us/arithmetic/TestFloat.html [5] http://win-www.uia.ac.be/u/cant/ieeecc754.html --- Daniel Engel (34): Add and restructure function declaration macros Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY Fix syntax warnings on conditional instructions Reorganize LIB1ASMFUNCS object wrapper macros Add the __HAVE_FEATURE_IT and IT() macros Refactor 'clz' functions into a new file Refactor 'ctz' functions into a new file Refactor 64-bit shift functions into a new file Import 'clz' functions from the CM0 library Import 'ctz' functions from the CM0 library Import 64-bit shift functions from the CM0 library Import 'clrsb' functions from the CM0 library Import 'ffs' functions from the CM0 library Import 'parity' functions from the CM0 library Import 'popcnt' functions from the CM0 library Refactor Thumb-1 64-bit comparison into a new file Import 64-bit comparison from CM0 library Merge Thumb-2 optimizations for 64-bit comparison Import 32-bit division from the CM0 library Refactor Thumb-1 64-bit division into a new file Import 64-bit division from the CM0 library Import integer multiplication from the CM0 library Refactor Thumb-1 float comparison into a new file Import float comparison from the CM0 library Refactor Thumb-1 float subtraction into a new file Import float addition and subtraction from the CM0 library Import float multiplication from the CM0 library Import float division from the CM0 library Import integer-to-float conversion from the CM0 library Import float-to-integer conversion from the CM0 library Import float<->double conversion from the CM0 library Import float<->__fp16 conversion from the CM0 library Drop single-precision Thumb-1 soft-float functions Add -mpure-code support to the CM0 functions. libgcc/Makefile.in | 5 +- libgcc/config/arm/bpabi-lib.h | 12 - libgcc/config/arm/bpabi-v6m.S | 206 ----------- libgcc/config/arm/bpabi.S | 42 --- libgcc/config/arm/bpabi.c | 42 --- libgcc/config/arm/clz2.S | 371 ++++++++++++++++++++ libgcc/config/arm/ctz2.S | 349 ++++++++++++++++++ libgcc/config/arm/eabi/fadd.S | 324 +++++++++++++++++ libgcc/config/arm/eabi/fcast.S | 533 ++++++++++++++++++++++++++++ libgcc/config/arm/eabi/fcmp.S | 604 ++++++++++++++++++++++++++++++++ libgcc/config/arm/eabi/fdiv.S | 261 ++++++++++++++ libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++ libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++ libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++ libgcc/config/arm/eabi/fneg.S | 76 ++++ libgcc/config/arm/eabi/fplib.h | 80 +++++ libgcc/config/arm/eabi/futil.S | 418 ++++++++++++++++++++++ libgcc/config/arm/eabi/idiv.S | 299 ++++++++++++++++ libgcc/config/arm/eabi/lcmp.S | 187 ++++++++++ libgcc/config/arm/eabi/ldiv.S | 493 ++++++++++++++++++++++++++ libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++ libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++ libgcc/config/arm/fp16.c | 4 + libgcc/config/arm/lib1funcs.S | 549 ++++++++++------------------- libgcc/config/arm/parity.S | 120 +++++++ libgcc/config/arm/popcnt.S | 212 +++++++++++ libgcc/config/arm/t-bpabi | 10 +- libgcc/config/arm/t-elf | 138 +++++++- libgcc/config/arm/t-softfp | 2 + 29 files changed, 5997 insertions(+), 675 deletions(-) delete mode 100644 libgcc/config/arm/bpabi.c create mode 100644 libgcc/config/arm/clz2.S create mode 100644 libgcc/config/arm/ctz2.S create mode 100644 libgcc/config/arm/eabi/fadd.S create mode 100644 libgcc/config/arm/eabi/fcast.S create mode 100644 libgcc/config/arm/eabi/fcmp.S create mode 100644 libgcc/config/arm/eabi/fdiv.S create mode 100644 libgcc/config/arm/eabi/ffixed.S create mode 100644 libgcc/config/arm/eabi/ffloat.S create mode 100644 libgcc/config/arm/eabi/fmul.S create mode 100644 libgcc/config/arm/eabi/fneg.S create mode 100644 libgcc/config/arm/eabi/fplib.h create mode 100644 libgcc/config/arm/eabi/futil.S create mode 100644 libgcc/config/arm/eabi/idiv.S create mode 100644 libgcc/config/arm/eabi/lcmp.S create mode 100644 libgcc/config/arm/eabi/ldiv.S create mode 100644 libgcc/config/arm/eabi/lmul.S create mode 100644 libgcc/config/arm/eabi/lshift.S create mode 100644 libgcc/config/arm/parity.S create mode 100644 libgcc/config/arm/popcnt.S -- 2.34.1