From: "Daniel Engel" <gnu@danielengel.com>
To: gcc-patches@gcc.gnu.org
Cc: christophe.lyon@arm.com, Richard.Earnshaw@foss.arm.com
Subject: [PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
Date: Tue, 15 Nov 2022 07:27:45 -0800 [thread overview]
Message-ID: <6d704904-06bb-4c02-ae30-fcbc11b8d003@app.fastmail.com> (raw)
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
Hello,
Is there still any interest in merging this patch?
Thanks,
Daniel
On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window.
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
> * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
> # of expected passes 529397
> # of unexpected failures 41160
> # of unexpected successes 12
> # of expected failures 3442
> # of unresolved testcases 978
> # of unsupported tests 28993
>
> Patched master:
>
> # of expected passes 529397
> # of unexpected failures 41160
> # of unexpected successes 12
> # of expected failures 3442
> # of unresolved testcases 978
> # of unsupported tests 28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture. There
> are improvements to most of the EABI integer functions as well. This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
> https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
> * The library is currently integrated into the ARM v6s-m multilib only. It
> is likely that some other architectures would benefit from these routines.
> However, I have NOT profiled the existing implementations (ieee754-sf.S) to
> estimate where improvements may be found.
>
> * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
> There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s) Size (bytes) Cycles
> Stack Accuracy
> __clzsi2 50 20
> 0 exact
> __clzsi2 (OPTIMIZE_SIZE) 22 51
> 0 exact
> __clzdi2 8+__clzsi2 4+__clzsi2
> 0 exact
>
> __clrsbsi2 8+__clzsi2 6+__clzsi2
> 0 exact
> __clrsbdi2 18+__clzsi2 (8..10)+__clzsi2
> 0 exact
>
> __ctzsi2 52 21
> 0 exact
> __ctzsi2 (OPTIMIZE_SIZE) 24 52
> 0 exact
> __ctzdi2 8+__ctzsi2 5+__ctzsi2
> 0 exact
>
> __ffssi2 8 6..(5+__ctzsi2)
> 0 exact
> __ffsdi2 14+__ctzsi2 9..(8+__ctzsi2)
> 0 exact
>
> __popcountsi2 52 25
> 0 exact
> __popcountsi2 (OPTIMIZE_SIZE) 14 9..201
> 0 exact
> __popcountdi2 34+__popcountsi2 46
> 0 exact
> __popcountdi2 (OPTIMIZE_SIZE) 12+__popcountsi2 17..401
> 0 exact
>
> __paritysi2 24 14
> 0 exact
> __paritysi2 (OPTIMIZE_SIZE) 16 38
> 0 exact
> __paritydi2 2+__paritysi2 1+__paritysi2
> 0 exact
>
> __umulsidi3 44 24
> 0 exact
> __mulsidi3 30+__umulsidi3 24+__umulsidi3
> 8 exact
> __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3
> 0 exact
> __ashldi3 (__aeabi_llsl) 22 13
> 0 exact
> __lshrdi3 (__aeabi_llsr) 22 13
> 0 exact
> __ashrdi3 (__aeabi_lasr) 22 13
> 0 exact
>
> __aeabi_lcmp 20 13
> 0 exact
> __aeabi_ulcmp 16 10
> 0 exact
>
> __udivsi3 (__aeabi_uidiv) 56 72..385
> 0 < 1 lsb
> __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3
> 8 < 1 lsb
> __udivdi3 (__aeabi_uldiv) 164 103..1394
> 16 < 1 lsb
> __udivdi3 (OPTIMIZE_SIZE) 142 120..1392
> 16 < 1 lsb
> __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3
> 32 < 1 lsb
>
> __shared_float 178
> __shared_float (OPTIMIZE_SIZE) 154
>
> __addsf3 (__aeabi_fadd) 116+__shared_float 31..76
> 8 <= 0.5 ulp
> __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74
> 8 <= 0.5 ulp
> __subsf3 (__aeabi_fsub) 6+__addsf3 3+__addsf3
> 8 <= 0.5 ulp
> __aeabi_frsub 8+__addsf3 6+__addsf3
> 8 <= 0.5 ulp
> __mulsf3 (__aeabi_fmul) 112+__shared_float 73..97
> 8 <= 0.5 ulp
> __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93
> 8 <= 0.5 ulp
> __divsf3 (__aeabi_fdiv) 132+__shared_float 83..361
> 8 <= 0.5 ulp
> __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263..359
> 8 <= 0.5 ulp
>
> __cmpsf2/__lesf2/__ltsf2 72 33
> 0 exact
> __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2
> 0 exact
> __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2
> 0 exact
>
> __floatundisf (__aeabi_ul2f) 14+__shared_float 40..81
> 8 <= 0.5 ulp
> __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40..237
> 8 <= 0.5 ulp
> __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf
> 8 <= 0.5 ulp
> __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf
> 8 <= 0.5 ulp
> __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf
> 8 <= 0.5 ulp
>
> __fixsfdi (__aeabi_f2lz) 74 27..33
> 0 exact
> __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi
> 0 exact
> __fixsfsi (__aeabi_f2iz) 52 19
> 0 exact
> __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi
> 0 exact
> __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi
> 0 exact
>
> __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38
> 8 exact
> __truncsfdf2 (__aeabi_f2d) 88 34
> 8 exact
> __aeabi_d2f 56+__shared_float 54..58
> 8 <= 0.5 ulp
> __aeabi_h2f 34+__shared_float 34
> 8 exact
> __aeabi_f2h 84 23..34
> 0 <= 0.5 ulp
>
> Copyright assignment is on file with the FSF.
>
> Thanks,
> Daniel Engel
>
>
> [1] // Test program for size comparison
>
> extern int main (void)
> {
> volatile int x = 1;
> volatile unsigned long long int y = 10;
> volatile long long int z = x / y; // 64-bit division
>
> volatile float a = x; // 32-bit casting
> volatile float b = y; // 64 bit casting
> volatile float c = z / b; // float division
> volatile float d = a + c; // float addition
> volatile float e = c * b; // float multiplication
> volatile float f = d - e - c; // float subtraction
>
> if (f != c) // float comparison
> y -= (long long int)d; // float casting
> }
>
> [2] http://danielengel.com/cm0_test_vectors.tgz
> [3] http://www.netlib.org/fp/ucbtest.tgz
> [4] http://www.jhauser.us/arithmetic/TestFloat.html
> [5] http://win-www.uia.ac.be/u/cant/ieeecc754.html
>
> ---
>
> Daniel Engel (34):
> Add and restructure function declaration macros
> Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
> Fix syntax warnings on conditional instructions
> Reorganize LIB1ASMFUNCS object wrapper macros
> Add the __HAVE_FEATURE_IT and IT() macros
> Refactor 'clz' functions into a new file
> Refactor 'ctz' functions into a new file
> Refactor 64-bit shift functions into a new file
> Import 'clz' functions from the CM0 library
> Import 'ctz' functions from the CM0 library
> Import 64-bit shift functions from the CM0 library
> Import 'clrsb' functions from the CM0 library
> Import 'ffs' functions from the CM0 library
> Import 'parity' functions from the CM0 library
> Import 'popcnt' functions from the CM0 library
> Refactor Thumb-1 64-bit comparison into a new file
> Import 64-bit comparison from CM0 library
> Merge Thumb-2 optimizations for 64-bit comparison
> Import 32-bit division from the CM0 library
> Refactor Thumb-1 64-bit division into a new file
> Import 64-bit division from the CM0 library
> Import integer multiplication from the CM0 library
> Refactor Thumb-1 float comparison into a new file
> Import float comparison from the CM0 library
> Refactor Thumb-1 float subtraction into a new file
> Import float addition and subtraction from the CM0 library
> Import float multiplication from the CM0 library
> Import float division from the CM0 library
> Import integer-to-float conversion from the CM0 library
> Import float-to-integer conversion from the CM0 library
> Import float<->double conversion from the CM0 library
> Import float<->__fp16 conversion from the CM0 library
> Drop single-precision Thumb-1 soft-float functions
> Add -mpure-code support to the CM0 functions.
>
> libgcc/Makefile.in | 5 +-
> libgcc/config/arm/bpabi-lib.h | 12 -
> libgcc/config/arm/bpabi-v6m.S | 206 -----------
> libgcc/config/arm/bpabi.S | 42 ---
> libgcc/config/arm/bpabi.c | 42 ---
> libgcc/config/arm/clz2.S | 371 ++++++++++++++++++++
> libgcc/config/arm/ctz2.S | 349 ++++++++++++++++++
> libgcc/config/arm/eabi/fadd.S | 324 +++++++++++++++++
> libgcc/config/arm/eabi/fcast.S | 533 ++++++++++++++++++++++++++++
> libgcc/config/arm/eabi/fcmp.S | 604 ++++++++++++++++++++++++++++++++
> libgcc/config/arm/eabi/fdiv.S | 261 ++++++++++++++
> libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
> libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
> libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++
> libgcc/config/arm/eabi/fneg.S | 76 ++++
> libgcc/config/arm/eabi/fplib.h | 80 +++++
> libgcc/config/arm/eabi/futil.S | 418 ++++++++++++++++++++++
> libgcc/config/arm/eabi/idiv.S | 299 ++++++++++++++++
> libgcc/config/arm/eabi/lcmp.S | 187 ++++++++++
> libgcc/config/arm/eabi/ldiv.S | 493 ++++++++++++++++++++++++++
> libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++
> libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
> libgcc/config/arm/fp16.c | 4 +
> libgcc/config/arm/lib1funcs.S | 549 ++++++++++-------------------
> libgcc/config/arm/parity.S | 120 +++++++
> libgcc/config/arm/popcnt.S | 212 +++++++++++
> libgcc/config/arm/t-bpabi | 10 +-
> libgcc/config/arm/t-elf | 138 +++++++-
> libgcc/config/arm/t-softfp | 2 +
> 29 files changed, 5997 insertions(+), 675 deletions(-)
> delete mode 100644 libgcc/config/arm/bpabi.c
> create mode 100644 libgcc/config/arm/clz2.S
> create mode 100644 libgcc/config/arm/ctz2.S
> create mode 100644 libgcc/config/arm/eabi/fadd.S
> create mode 100644 libgcc/config/arm/eabi/fcast.S
> create mode 100644 libgcc/config/arm/eabi/fcmp.S
> create mode 100644 libgcc/config/arm/eabi/fdiv.S
> create mode 100644 libgcc/config/arm/eabi/ffixed.S
> create mode 100644 libgcc/config/arm/eabi/ffloat.S
> create mode 100644 libgcc/config/arm/eabi/fmul.S
> create mode 100644 libgcc/config/arm/eabi/fneg.S
> create mode 100644 libgcc/config/arm/eabi/fplib.h
> create mode 100644 libgcc/config/arm/eabi/futil.S
> create mode 100644 libgcc/config/arm/eabi/idiv.S
> create mode 100644 libgcc/config/arm/eabi/lcmp.S
> create mode 100644 libgcc/config/arm/eabi/ldiv.S
> create mode 100644 libgcc/config/arm/eabi/lmul.S
> create mode 100644 libgcc/config/arm/eabi/lshift.S
> create mode 100644 libgcc/config/arm/parity.S
> create mode 100644 libgcc/config/arm/popcnt.S
>
> --
> 2.34.1
prev parent reply other threads:[~2022-11-15 15:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 15:44 Daniel Engel
2022-10-31 15:44 ` [PATCH v7 01/34] Add and restructure function declaration macros Daniel Engel
2022-10-31 15:44 ` [PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY Daniel Engel
2022-10-31 15:44 ` [PATCH v7 03/34] Fix syntax warnings on conditional instructions Daniel Engel
2022-10-31 15:44 ` [PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 06/34] Refactor 'clz' functions into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 07/34] Refactor 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 08/34] Refactor 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 09/34] Import 'clz' functions from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 10/34] Import 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 11/34] Import 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 12/34] Import 'clrsb' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 13/34] Import 'ffs' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 14/34] Import 'parity' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 15/34] Import 'popcnt' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 17/34] Import 64-bit comparison from CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison Daniel Engel
2022-10-31 15:45 ` [PATCH v7 19/34] Import 32-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 21/34] Import 64-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 22/34] Import integer multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 24/34] Import float comparison from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 26/34] Import float addition and subtraction from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 27/34] Import float multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 28/34] Import float division " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 29/34] Import integer-to-float conversion " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 30/34] Import float-to-integer " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 31/34] Import float<->double " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 32/34] Import float<->__fp16 " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions Daniel Engel
2022-10-31 15:45 ` [PATCH v7 34/34] Add -mpure-code support to the CM0 functions Daniel Engel
2022-11-15 15:27 ` Daniel Engel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d704904-06bb-4c02-ae30-fcbc11b8d003@app.fastmail.com \
--to=gnu@danielengel.com \
--cc=Richard.Earnshaw@foss.arm.com \
--cc=christophe.lyon@arm.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).