From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by sourceware.org (Postfix) with ESMTPS id 6F1BA389839B for ; Tue, 15 Nov 2022 15:25:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F1BA389839B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 41F715C0241; Tue, 15 Nov 2022 10:25:49 -0500 (EST) Received: from imap41 ([10.202.2.91]) by compute4.internal (MEProxy); Tue, 15 Nov 2022 10:25:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1668525949; x=1668612349; bh=AV xSfx/0Rq1AWzXdaoEq9hOuHFNwho6ujqDNweqP9BU=; b=fyb1yHm4+bPDOR729y wJvGuxNM7JtHsriok/s5m/POyWSCi2uzz4pU2AA0tzsQ1vMlMN3ML6AyuFROZIFp EpgTP8HCi8w2F4v3QntppIVQG2PcyYGDOUwNu7Q2v5IkT3l6mxHp+I/z0P53AJ+a QpU4IggnNEjVK+1LSIMTj0jbP3Cm2oThB1D4rUSUTPG9PahYTISN2bZ8y4f1lZoc qfHaFwdrWF9zKAuvmAdA1hA3W+roAzjO/4lKTF497fY1qFFGN5uNFphmeWhylTlr 6C+I3o4AqVSjA4Ha7GD8CDPcxtQyG9UM2nZXJtQ2lA2ixwv8VImweJxZPhdoRhaK OdDw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1668525949; x=1668612349; bh=AVxSfx/0Rq1AWzXdaoEq9hOuHFNw ho6ujqDNweqP9BU=; b=J5QbL5UUzAMhQZdnseCdqXbto5k5XZM03Ft1xHfS/Cof 1Ka6ivVeNvfpnp33pDFFWDj68FURu1424twpfHDeVeFUF43EPUmeDpUePHGh78me 1OSfi5jIhwbDRDr1luncSRN0i9qRa79ljPNcwRrtD2d/e1LmLJv1rB3LIBvAikrH ufOibpaO4EExamdet6KL6GGBRVAdDUCgd8NKfLFpu+/5SnGerm7q55/5a9Gc9Udi h/Vx0Z0jpAN0AVTvboTAn0umGMQFhrvz3TlZJi66RokteqzklvbtFMj6pHEsp10G p/aFv20LE66e5m60vZl1gRH+t8o/CTmbkfN7onDJmQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrgeeggdejiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvfevufgtsehttd ertderredtnecuhfhrohhmpedfffgrnhhivghlucfgnhhgvghlfdcuoehgnhhusegurghn ihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrghtthgvrhhnpeeikeffgedtudetveejle fgtedujefhgfffgeegteetieduffeitedtgfetfeelgfenucffohhmrghinhepghhnuhdr ohhrghdpuggrnhhivghlvghnghgvlhdrtghomhdpnhgvthhlihgsrdhorhhgpdhjhhgruh hsvghrrdhushdpuhhirgdrrggtrdgsvgenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm X-ME-Proxy: Feedback-ID: i791144d6:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id C90F6234007B; Tue, 15 Nov 2022 10:25:48 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-1115-g8b801eadce-fm-20221102.001-g8b801ead Mime-Version: 1.0 Message-Id: <6d704904-06bb-4c02-ae30-fcbc11b8d003@app.fastmail.com> In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com> References: <20221031154529.3627576-1-gnu@danielengel.com> Date: Tue, 15 Nov 2022 07:27:45 -0800 From: "Daniel Engel" To: gcc-patches@gcc.gnu.org Cc: christophe.lyon@arm.com, Richard.Earnshaw@foss.arm.com Subject: [PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 Content-Type: text/plain X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_INFOUSMEBIZ,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, Is there still any interest in merging this patch? Thanks, Daniel On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote: > Hi Richard, > > I am re-submitting my libgcc patch from 2021: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html > > I believe I have finally made the stage1 window. > > Regards, > Daniel > > --- > > Changes since v6: > > * Rebased and tested with gcc-13 > > There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}. > Clean master: > > # of expected passes 529397 > # of unexpected failures 41160 > # of unexpected successes 12 > # of expected failures 3442 > # of unresolved testcases 978 > # of unsupported tests 28993 > > Patched master: > > # of expected passes 529397 > # of unexpected failures 41160 > # of unexpected successes 12 > # of expected failures 3442 > # of unresolved testcases 978 > # of unsupported tests 28993 > > --- > > This patch series adds an assembly-language implementation of IEEE-754 compliant > single-precision functions designed for the Cortex M0 (v6m) architecture. There > are improvements to most of the EABI integer functions as well. This is the > ibgcc component of a larger library project originally proposed in 2018: > > https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html > > As one point of comparison, a test program [1] links 916 bytes from libgcc with > the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain. > That's a 90% size reduction. > > I have extensive test vectors [2], and this patch pass all tests on an > STM32F051. > These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 > [5], plus > many of my own generation. > > There may be some follow-on projects worth discussing: > > * The library is currently integrated into the ARM v6s-m multilib only. It > is likely that some other architectures would benefit from these routines. > However, I have NOT profiled the existing implementations (ieee754-sf.S) to > estimate where improvements may be found. > > * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod(). > There may be useful bits in [1] that can be integrated. > > On Cortex M0, the library has (approximately) the following properties: > > Function(s) Size (bytes) Cycles > Stack Accuracy > __clzsi2 50 20 > 0 exact > __clzsi2 (OPTIMIZE_SIZE) 22 51 > 0 exact > __clzdi2 8+__clzsi2 4+__clzsi2 > 0 exact > > __clrsbsi2 8+__clzsi2 6+__clzsi2 > 0 exact > __clrsbdi2 18+__clzsi2 (8..10)+__clzsi2 > 0 exact > > __ctzsi2 52 21 > 0 exact > __ctzsi2 (OPTIMIZE_SIZE) 24 52 > 0 exact > __ctzdi2 8+__ctzsi2 5+__ctzsi2 > 0 exact > > __ffssi2 8 6..(5+__ctzsi2) > 0 exact > __ffsdi2 14+__ctzsi2 9..(8+__ctzsi2) > 0 exact > > __popcountsi2 52 25 > 0 exact > __popcountsi2 (OPTIMIZE_SIZE) 14 9..201 > 0 exact > __popcountdi2 34+__popcountsi2 46 > 0 exact > __popcountdi2 (OPTIMIZE_SIZE) 12+__popcountsi2 17..401 > 0 exact > > __paritysi2 24 14 > 0 exact > __paritysi2 (OPTIMIZE_SIZE) 16 38 > 0 exact > __paritydi2 2+__paritysi2 1+__paritysi2 > 0 exact > > __umulsidi3 44 24 > 0 exact > __mulsidi3 30+__umulsidi3 24+__umulsidi3 > 8 exact > __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 > 0 exact > __ashldi3 (__aeabi_llsl) 22 13 > 0 exact > __lshrdi3 (__aeabi_llsr) 22 13 > 0 exact > __ashrdi3 (__aeabi_lasr) 22 13 > 0 exact > > __aeabi_lcmp 20 13 > 0 exact > __aeabi_ulcmp 16 10 > 0 exact > > __udivsi3 (__aeabi_uidiv) 56 72..385 > 0 < 1 lsb > __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 > 8 < 1 lsb > __udivdi3 (__aeabi_uldiv) 164 103..1394 > 16 < 1 lsb > __udivdi3 (OPTIMIZE_SIZE) 142 120..1392 > 16 < 1 lsb > __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 > 32 < 1 lsb > > __shared_float 178 > __shared_float (OPTIMIZE_SIZE) 154 > > __addsf3 (__aeabi_fadd) 116+__shared_float 31..76 > 8 <= 0.5 ulp > __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 > 8 <= 0.5 ulp > __subsf3 (__aeabi_fsub) 6+__addsf3 3+__addsf3 > 8 <= 0.5 ulp > __aeabi_frsub 8+__addsf3 6+__addsf3 > 8 <= 0.5 ulp > __mulsf3 (__aeabi_fmul) 112+__shared_float 73..97 > 8 <= 0.5 ulp > __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 > 8 <= 0.5 ulp > __divsf3 (__aeabi_fdiv) 132+__shared_float 83..361 > 8 <= 0.5 ulp > __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263..359 > 8 <= 0.5 ulp > > __cmpsf2/__lesf2/__ltsf2 72 33 > 0 exact > __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 > 0 exact > __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 > 0 exact > __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 > 0 exact > __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 > 0 exact > __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 > 0 exact > __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 > 0 exact > __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 > 0 exact > __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 > 0 exact > > __floatundisf (__aeabi_ul2f) 14+__shared_float 40..81 > 8 <= 0.5 ulp > __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40..237 > 8 <= 0.5 ulp > __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf > 8 <= 0.5 ulp > __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf > 8 <= 0.5 ulp > __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf > 8 <= 0.5 ulp > > __fixsfdi (__aeabi_f2lz) 74 27..33 > 0 exact > __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi > 0 exact > __fixsfsi (__aeabi_f2iz) 52 19 > 0 exact > __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi > 0 exact > __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi > 0 exact > > __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 > 8 exact > __truncsfdf2 (__aeabi_f2d) 88 34 > 8 exact > __aeabi_d2f 56+__shared_float 54..58 > 8 <= 0.5 ulp > __aeabi_h2f 34+__shared_float 34 > 8 exact > __aeabi_f2h 84 23..34 > 0 <= 0.5 ulp > > Copyright assignment is on file with the FSF. > > Thanks, > Daniel Engel > > > [1] // Test program for size comparison > > extern int main (void) > { > volatile int x = 1; > volatile unsigned long long int y = 10; > volatile long long int z = x / y; // 64-bit division > > volatile float a = x; // 32-bit casting > volatile float b = y; // 64 bit casting > volatile float c = z / b; // float division > volatile float d = a + c; // float addition > volatile float e = c * b; // float multiplication > volatile float f = d - e - c; // float subtraction > > if (f != c) // float comparison > y -= (long long int)d; // float casting > } > > [2] http://danielengel.com/cm0_test_vectors.tgz > [3] http://www.netlib.org/fp/ucbtest.tgz > [4] http://www.jhauser.us/arithmetic/TestFloat.html > [5] http://win-www.uia.ac.be/u/cant/ieeecc754.html > > --- > > Daniel Engel (34): > Add and restructure function declaration macros > Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY > Fix syntax warnings on conditional instructions > Reorganize LIB1ASMFUNCS object wrapper macros > Add the __HAVE_FEATURE_IT and IT() macros > Refactor 'clz' functions into a new file > Refactor 'ctz' functions into a new file > Refactor 64-bit shift functions into a new file > Import 'clz' functions from the CM0 library > Import 'ctz' functions from the CM0 library > Import 64-bit shift functions from the CM0 library > Import 'clrsb' functions from the CM0 library > Import 'ffs' functions from the CM0 library > Import 'parity' functions from the CM0 library > Import 'popcnt' functions from the CM0 library > Refactor Thumb-1 64-bit comparison into a new file > Import 64-bit comparison from CM0 library > Merge Thumb-2 optimizations for 64-bit comparison > Import 32-bit division from the CM0 library > Refactor Thumb-1 64-bit division into a new file > Import 64-bit division from the CM0 library > Import integer multiplication from the CM0 library > Refactor Thumb-1 float comparison into a new file > Import float comparison from the CM0 library > Refactor Thumb-1 float subtraction into a new file > Import float addition and subtraction from the CM0 library > Import float multiplication from the CM0 library > Import float division from the CM0 library > Import integer-to-float conversion from the CM0 library > Import float-to-integer conversion from the CM0 library > Import float<->double conversion from the CM0 library > Import float<->__fp16 conversion from the CM0 library > Drop single-precision Thumb-1 soft-float functions > Add -mpure-code support to the CM0 functions. > > libgcc/Makefile.in | 5 +- > libgcc/config/arm/bpabi-lib.h | 12 - > libgcc/config/arm/bpabi-v6m.S | 206 ----------- > libgcc/config/arm/bpabi.S | 42 --- > libgcc/config/arm/bpabi.c | 42 --- > libgcc/config/arm/clz2.S | 371 ++++++++++++++++++++ > libgcc/config/arm/ctz2.S | 349 ++++++++++++++++++ > libgcc/config/arm/eabi/fadd.S | 324 +++++++++++++++++ > libgcc/config/arm/eabi/fcast.S | 533 ++++++++++++++++++++++++++++ > libgcc/config/arm/eabi/fcmp.S | 604 ++++++++++++++++++++++++++++++++ > libgcc/config/arm/eabi/fdiv.S | 261 ++++++++++++++ > libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++ > libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++ > libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++ > libgcc/config/arm/eabi/fneg.S | 76 ++++ > libgcc/config/arm/eabi/fplib.h | 80 +++++ > libgcc/config/arm/eabi/futil.S | 418 ++++++++++++++++++++++ > libgcc/config/arm/eabi/idiv.S | 299 ++++++++++++++++ > libgcc/config/arm/eabi/lcmp.S | 187 ++++++++++ > libgcc/config/arm/eabi/ldiv.S | 493 ++++++++++++++++++++++++++ > libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++ > libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++ > libgcc/config/arm/fp16.c | 4 + > libgcc/config/arm/lib1funcs.S | 549 ++++++++++------------------- > libgcc/config/arm/parity.S | 120 +++++++ > libgcc/config/arm/popcnt.S | 212 +++++++++++ > libgcc/config/arm/t-bpabi | 10 +- > libgcc/config/arm/t-elf | 138 +++++++- > libgcc/config/arm/t-softfp | 2 + > 29 files changed, 5997 insertions(+), 675 deletions(-) > delete mode 100644 libgcc/config/arm/bpabi.c > create mode 100644 libgcc/config/arm/clz2.S > create mode 100644 libgcc/config/arm/ctz2.S > create mode 100644 libgcc/config/arm/eabi/fadd.S > create mode 100644 libgcc/config/arm/eabi/fcast.S > create mode 100644 libgcc/config/arm/eabi/fcmp.S > create mode 100644 libgcc/config/arm/eabi/fdiv.S > create mode 100644 libgcc/config/arm/eabi/ffixed.S > create mode 100644 libgcc/config/arm/eabi/ffloat.S > create mode 100644 libgcc/config/arm/eabi/fmul.S > create mode 100644 libgcc/config/arm/eabi/fneg.S > create mode 100644 libgcc/config/arm/eabi/fplib.h > create mode 100644 libgcc/config/arm/eabi/futil.S > create mode 100644 libgcc/config/arm/eabi/idiv.S > create mode 100644 libgcc/config/arm/eabi/lcmp.S > create mode 100644 libgcc/config/arm/eabi/ldiv.S > create mode 100644 libgcc/config/arm/eabi/lmul.S > create mode 100644 libgcc/config/arm/eabi/lshift.S > create mode 100644 libgcc/config/arm/parity.S > create mode 100644 libgcc/config/arm/popcnt.S > > -- > 2.34.1