From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>,
gcc-patches@gcc.gnu.org
Cc: Daniel Engel <gnu@danielengel.com>,
Christophe Lyon <christophe.lyon@linaro.org>
Subject: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
Date: Mon, 31 Oct 2022 08:44:55 -0700 [thread overview]
Message-ID: <20221031154529.3627576-1-gnu@danielengel.com> (raw)
Hi Richard,
I am re-submitting my libgcc patch from 2021:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
I believe I have finally made the stage1 window.
Regards,
Daniel
---
Changes since v6:
* Rebased and tested with gcc-13
There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
Clean master:
# of expected passes 529397
# of unexpected failures 41160
# of unexpected successes 12
# of expected failures 3442
# of unresolved testcases 978
# of unsupported tests 28993
Patched master:
# of expected passes 529397
# of unexpected failures 41160
# of unexpected successes 12
# of expected failures 3442
# of unresolved testcases 978
# of unsupported tests 28993
---
This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture. There
are improvements to most of the EABI integer functions as well. This is the
ibgcc component of a larger library project originally proposed in 2018:
https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.
I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.
There may be some follow-on projects worth discussing:
* The library is currently integrated into the ARM v6s-m multilib only. It
is likely that some other architectures would benefit from these routines.
However, I have NOT profiled the existing implementations (ieee754-sf.S) to
estimate where improvements may be found.
* GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
There may be useful bits in [1] that can be integrated.
On Cortex M0, the library has (approximately) the following properties:
Function(s) Size (bytes) Cycles Stack Accuracy
__clzsi2 50 20 0 exact
__clzsi2 (OPTIMIZE_SIZE) 22 51 0 exact
__clzdi2 8+__clzsi2 4+__clzsi2 0 exact
__clrsbsi2 8+__clzsi2 6+__clzsi2 0 exact
__clrsbdi2 18+__clzsi2 (8..10)+__clzsi2 0 exact
__ctzsi2 52 21 0 exact
__ctzsi2 (OPTIMIZE_SIZE) 24 52 0 exact
__ctzdi2 8+__ctzsi2 5+__ctzsi2 0 exact
__ffssi2 8 6..(5+__ctzsi2) 0 exact
__ffsdi2 14+__ctzsi2 9..(8+__ctzsi2) 0 exact
__popcountsi2 52 25 0 exact
__popcountsi2 (OPTIMIZE_SIZE) 14 9..201 0 exact
__popcountdi2 34+__popcountsi2 46 0 exact
__popcountdi2 (OPTIMIZE_SIZE) 12+__popcountsi2 17..401 0 exact
__paritysi2 24 14 0 exact
__paritysi2 (OPTIMIZE_SIZE) 16 38 0 exact
__paritydi2 2+__paritysi2 1+__paritysi2 0 exact
__umulsidi3 44 24 0 exact
__mulsidi3 30+__umulsidi3 24+__umulsidi3 8 exact
__muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 exact
__ashldi3 (__aeabi_llsl) 22 13 0 exact
__lshrdi3 (__aeabi_llsr) 22 13 0 exact
__ashrdi3 (__aeabi_lasr) 22 13 0 exact
__aeabi_lcmp 20 13 0 exact
__aeabi_ulcmp 16 10 0 exact
__udivsi3 (__aeabi_uidiv) 56 72..385 0 < 1 lsb
__divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 < 1 lsb
__udivdi3 (__aeabi_uldiv) 164 103..1394 16 < 1 lsb
__udivdi3 (OPTIMIZE_SIZE) 142 120..1392 16 < 1 lsb
__divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 < 1 lsb
__shared_float 178
__shared_float (OPTIMIZE_SIZE) 154
__addsf3 (__aeabi_fadd) 116+__shared_float 31..76 8 <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 <= 0.5 ulp
__subsf3 (__aeabi_fsub) 6+__addsf3 3+__addsf3 8 <= 0.5 ulp
__aeabi_frsub 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
__mulsf3 (__aeabi_fmul) 112+__shared_float 73..97 8 <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 <= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float 83..361 8 <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263..359 8 <= 0.5 ulp
__cmpsf2/__lesf2/__ltsf2 72 33 0 exact
__eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 exact
__floatundisf (__aeabi_ul2f) 14+__shared_float 40..81 8 <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40..237 8 <= 0.5 ulp
__floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 <= 0.5 ulp
__floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 <= 0.5 ulp
__floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 <= 0.5 ulp
__fixsfdi (__aeabi_f2lz) 74 27..33 0 exact
__fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixsfsi (__aeabi_f2iz) 52 19 0 exact
__fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 exact
__extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 exact
__truncsfdf2 (__aeabi_f2d) 88 34 8 exact
__aeabi_d2f 56+__shared_float 54..58 8 <= 0.5 ulp
__aeabi_h2f 34+__shared_float 34 8 exact
__aeabi_f2h 84 23..34 0 <= 0.5 ulp
Copyright assignment is on file with the FSF.
Thanks,
Daniel Engel
[1] // Test program for size comparison
extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division
volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction
if (f != c) // float comparison
y -= (long long int)d; // float casting
}
[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html
---
Daniel Engel (34):
Add and restructure function declaration macros
Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
Fix syntax warnings on conditional instructions
Reorganize LIB1ASMFUNCS object wrapper macros
Add the __HAVE_FEATURE_IT and IT() macros
Refactor 'clz' functions into a new file
Refactor 'ctz' functions into a new file
Refactor 64-bit shift functions into a new file
Import 'clz' functions from the CM0 library
Import 'ctz' functions from the CM0 library
Import 64-bit shift functions from the CM0 library
Import 'clrsb' functions from the CM0 library
Import 'ffs' functions from the CM0 library
Import 'parity' functions from the CM0 library
Import 'popcnt' functions from the CM0 library
Refactor Thumb-1 64-bit comparison into a new file
Import 64-bit comparison from CM0 library
Merge Thumb-2 optimizations for 64-bit comparison
Import 32-bit division from the CM0 library
Refactor Thumb-1 64-bit division into a new file
Import 64-bit division from the CM0 library
Import integer multiplication from the CM0 library
Refactor Thumb-1 float comparison into a new file
Import float comparison from the CM0 library
Refactor Thumb-1 float subtraction into a new file
Import float addition and subtraction from the CM0 library
Import float multiplication from the CM0 library
Import float division from the CM0 library
Import integer-to-float conversion from the CM0 library
Import float-to-integer conversion from the CM0 library
Import float<->double conversion from the CM0 library
Import float<->__fp16 conversion from the CM0 library
Drop single-precision Thumb-1 soft-float functions
Add -mpure-code support to the CM0 functions.
libgcc/Makefile.in | 5 +-
libgcc/config/arm/bpabi-lib.h | 12 -
libgcc/config/arm/bpabi-v6m.S | 206 -----------
libgcc/config/arm/bpabi.S | 42 ---
libgcc/config/arm/bpabi.c | 42 ---
libgcc/config/arm/clz2.S | 371 ++++++++++++++++++++
libgcc/config/arm/ctz2.S | 349 ++++++++++++++++++
libgcc/config/arm/eabi/fadd.S | 324 +++++++++++++++++
libgcc/config/arm/eabi/fcast.S | 533 ++++++++++++++++++++++++++++
libgcc/config/arm/eabi/fcmp.S | 604 ++++++++++++++++++++++++++++++++
libgcc/config/arm/eabi/fdiv.S | 261 ++++++++++++++
libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++
libgcc/config/arm/eabi/fneg.S | 76 ++++
libgcc/config/arm/eabi/fplib.h | 80 +++++
libgcc/config/arm/eabi/futil.S | 418 ++++++++++++++++++++++
libgcc/config/arm/eabi/idiv.S | 299 ++++++++++++++++
libgcc/config/arm/eabi/lcmp.S | 187 ++++++++++
libgcc/config/arm/eabi/ldiv.S | 493 ++++++++++++++++++++++++++
libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++
libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
libgcc/config/arm/fp16.c | 4 +
libgcc/config/arm/lib1funcs.S | 549 ++++++++++-------------------
libgcc/config/arm/parity.S | 120 +++++++
libgcc/config/arm/popcnt.S | 212 +++++++++++
libgcc/config/arm/t-bpabi | 10 +-
libgcc/config/arm/t-elf | 138 +++++++-
libgcc/config/arm/t-softfp | 2 +
29 files changed, 5997 insertions(+), 675 deletions(-)
delete mode 100644 libgcc/config/arm/bpabi.c
create mode 100644 libgcc/config/arm/clz2.S
create mode 100644 libgcc/config/arm/ctz2.S
create mode 100644 libgcc/config/arm/eabi/fadd.S
create mode 100644 libgcc/config/arm/eabi/fcast.S
create mode 100644 libgcc/config/arm/eabi/fcmp.S
create mode 100644 libgcc/config/arm/eabi/fdiv.S
create mode 100644 libgcc/config/arm/eabi/ffixed.S
create mode 100644 libgcc/config/arm/eabi/ffloat.S
create mode 100644 libgcc/config/arm/eabi/fmul.S
create mode 100644 libgcc/config/arm/eabi/fneg.S
create mode 100644 libgcc/config/arm/eabi/fplib.h
create mode 100644 libgcc/config/arm/eabi/futil.S
create mode 100644 libgcc/config/arm/eabi/idiv.S
create mode 100644 libgcc/config/arm/eabi/lcmp.S
create mode 100644 libgcc/config/arm/eabi/ldiv.S
create mode 100644 libgcc/config/arm/eabi/lmul.S
create mode 100644 libgcc/config/arm/eabi/lshift.S
create mode 100644 libgcc/config/arm/parity.S
create mode 100644 libgcc/config/arm/popcnt.S
--
2.34.1
next reply other threads:[~2022-10-31 15:46 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 15:44 Daniel Engel [this message]
2022-10-31 15:44 ` [PATCH v7 01/34] Add and restructure function declaration macros Daniel Engel
2022-10-31 15:44 ` [PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY Daniel Engel
2022-10-31 15:44 ` [PATCH v7 03/34] Fix syntax warnings on conditional instructions Daniel Engel
2022-10-31 15:44 ` [PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 06/34] Refactor 'clz' functions into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 07/34] Refactor 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 08/34] Refactor 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 09/34] Import 'clz' functions from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 10/34] Import 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 11/34] Import 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 12/34] Import 'clrsb' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 13/34] Import 'ffs' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 14/34] Import 'parity' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 15/34] Import 'popcnt' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 17/34] Import 64-bit comparison from CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison Daniel Engel
2022-10-31 15:45 ` [PATCH v7 19/34] Import 32-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 21/34] Import 64-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 22/34] Import integer multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 24/34] Import float comparison from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 26/34] Import float addition and subtraction from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 27/34] Import float multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 28/34] Import float division " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 29/34] Import integer-to-float conversion " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 30/34] Import float-to-integer " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 31/34] Import float<->double " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 32/34] Import float<->__fp16 " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions Daniel Engel
2022-10-31 15:45 ` [PATCH v7 34/34] Add -mpure-code support to the CM0 functions Daniel Engel
2022-11-15 15:27 ` [PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 Daniel Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221031154529.3627576-1-gnu@danielengel.com \
--to=gnu@danielengel.com \
--cc=Richard.Earnshaw@foss.arm.com \
--cc=christophe.lyon@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).