[PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>,
	gcc-patches@gcc.gnu.org
Cc: Daniel Engel <gnu@danielengel.com>,
	Christophe Lyon <christophe.lyon@linaro.org>
Subject: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
Date: Mon, 31 Oct 2022 08:44:55 -0700	[thread overview]
Message-ID: <20221031154529.3627576-1-gnu@danielengel.com> (raw)

Hi Richard,

I am re-submitting my libgcc patch from 2021:

    https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
    https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html

I believe I have finally made the stage1 window. 

Regards,
Daniel

---

Changes since v6:

    * Rebased and tested with gcc-13

There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
Clean master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

Patched master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

    * The library is currently integrated into the ARM v6s-m multilib only.  It
    is likely that some other architectures would benefit from these routines.
    However, I have NOT profiled the existing implementations (ieee754-sf.S) to
    estimate where improvements may be found.

    * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
    There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s)                     Size (bytes)        Cycles              Stack   Accuracy
__clzsi2                        50                  20                  0       exact
__clzsi2 (OPTIMIZE_SIZE)        22                  51                  0       exact
__clzdi2                        8+__clzsi2          4+__clzsi2          0       exact

__clrsbsi2                      8+__clzsi2          6+__clzsi2          0       exact
__clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    0       exact

__ctzsi2                        52                  21                  0       exact
__ctzsi2 (OPTIMIZE_SIZE)        24                  52                  0       exact
__ctzdi2                        8+__ctzsi2          5+__ctzsi2          0       exact

__ffssi2                        8                   6..(5+__ctzsi2)     0       exact
__ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     0       exact

__popcountsi2                   52                  25                  0       exact
__popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              0       exact
__popcountdi2                   34+__popcountsi2    46                  0       exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             0       exact

__paritysi2                     24                  14                  0       exact
__paritysi2 (OPTIMIZE_SIZE)     16                  38                  0       exact
__paritydi2                     2+__paritysi2       1+__paritysi2       0       exact

__umulsidi3                     44                  24                  0       exact
__mulsidi3                      30+__umulsidi3      24+__umulsidi3      8       exact
__muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       0       exact
__ashldi3 (__aeabi_llsl)        22                  13                  0       exact
__lshrdi3 (__aeabi_llsr)        22                  13                  0       exact
__ashrdi3 (__aeabi_lasr)        22                  13                  0       exact

__aeabi_lcmp                    20                  13                  0       exact
__aeabi_ulcmp                   16                  10                  0       exact

__udivsi3 (__aeabi_uidiv)       56                  72..385             0       < 1 lsb
__divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        8       < 1 lsb
__udivdi3 (__aeabi_uldiv)       164                 103..1394           16      < 1 lsb
__udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           16      < 1 lsb
__divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        32      < 1 lsb

__shared_float                  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd)         116+__shared_float  31..76              8       <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  8       <= 0.5 ulp
__subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          8       <= 0.5 ulp
__aeabi_frsub                   8+__addsf3          6+__addsf3          8       <= 0.5 ulp
__mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              8       <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  8       <= 0.5 ulp
__divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             8       <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            8       <= 0.5 ulp

__cmpsf2/__lesf2/__ltsf2        72                  33                  0       exact
__eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          0       exact

__floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              8       <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             8       <= 0.5 ulp
__floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     8       <= 0.5 ulp
__floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     8       <= 0.5 ulp
__floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       8       <= 0.5 ulp

__fixsfdi (__aeabi_f2lz)        74                  27..33              0       exact
__fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         0       exact
__fixsfsi (__aeabi_f2iz)        52                  19                  0       exact
__fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         0       exact
__fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         0       exact

__extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  8       exact
__truncsfdf2 (__aeabi_f2d)      88                  34                  8       exact
__aeabi_d2f                     56+__shared_float   54..58              8       <= 0.5 ulp
__aeabi_h2f                     34+__shared_float   34                  8       exact
__aeabi_f2h                     84                  23..34              0       <= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division

        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction

        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

---

Daniel Engel (34):
  Add and restructure function declaration macros
  Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
  Fix syntax warnings on conditional instructions
  Reorganize LIB1ASMFUNCS object wrapper macros
  Add the __HAVE_FEATURE_IT and IT() macros
  Refactor 'clz' functions into a new file
  Refactor 'ctz' functions into a new file
  Refactor 64-bit shift functions into a new file
  Import 'clz' functions from the CM0 library
  Import 'ctz' functions from the CM0 library
  Import 64-bit shift functions from the CM0 library
  Import 'clrsb' functions from the CM0 library
  Import 'ffs' functions from the CM0 library
  Import 'parity' functions from the CM0 library
  Import 'popcnt' functions from the CM0 library
  Refactor Thumb-1 64-bit comparison into a new file
  Import 64-bit comparison from CM0 library
  Merge Thumb-2 optimizations for 64-bit comparison
  Import 32-bit division from the CM0 library
  Refactor Thumb-1 64-bit division into a new file
  Import 64-bit division from the CM0 library
  Import integer multiplication from the CM0 library
  Refactor Thumb-1 float comparison into a new file
  Import float comparison from the CM0 library
  Refactor Thumb-1 float subtraction into a new file
  Import float addition and subtraction from the CM0 library
  Import float multiplication from the CM0 library
  Import float division from the CM0 library
  Import integer-to-float conversion from the CM0 library
  Import float-to-integer conversion from the CM0 library
  Import float<->double conversion from the CM0 library
  Import float<->__fp16 conversion from the CM0 library
  Drop single-precision Thumb-1 soft-float functions
  Add -mpure-code support to the CM0 functions.

 libgcc/Makefile.in              |   5 +-
 libgcc/config/arm/bpabi-lib.h   |  12 -
 libgcc/config/arm/bpabi-v6m.S   | 206 -----------
 libgcc/config/arm/bpabi.S       |  42 ---
 libgcc/config/arm/bpabi.c       |  42 ---
 libgcc/config/arm/clz2.S        | 371 ++++++++++++++++++++
 libgcc/config/arm/ctz2.S        | 349 ++++++++++++++++++
 libgcc/config/arm/eabi/fadd.S   | 324 +++++++++++++++++
 libgcc/config/arm/eabi/fcast.S  | 533 ++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fcmp.S   | 604 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fdiv.S   | 261 ++++++++++++++
 libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
 libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
 libgcc/config/arm/eabi/fmul.S   | 215 ++++++++++++
 libgcc/config/arm/eabi/fneg.S   |  76 ++++
 libgcc/config/arm/eabi/fplib.h  |  80 +++++
 libgcc/config/arm/eabi/futil.S  | 418 ++++++++++++++++++++++
 libgcc/config/arm/eabi/idiv.S   | 299 ++++++++++++++++
 libgcc/config/arm/eabi/lcmp.S   | 187 ++++++++++
 libgcc/config/arm/eabi/ldiv.S   | 493 ++++++++++++++++++++++++++
 libgcc/config/arm/eabi/lmul.S   | 218 ++++++++++++
 libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
 libgcc/config/arm/fp16.c        |   4 +
 libgcc/config/arm/lib1funcs.S   | 549 ++++++++++-------------------
 libgcc/config/arm/parity.S      | 120 +++++++
 libgcc/config/arm/popcnt.S      | 212 +++++++++++
 libgcc/config/arm/t-bpabi       |  10 +-
 libgcc/config/arm/t-elf         | 138 +++++++-
 libgcc/config/arm/t-softfp      |   2 +
 29 files changed, 5997 insertions(+), 675 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c
 create mode 100644 libgcc/config/arm/clz2.S
 create mode 100644 libgcc/config/arm/ctz2.S
 create mode 100644 libgcc/config/arm/eabi/fadd.S
 create mode 100644 libgcc/config/arm/eabi/fcast.S
 create mode 100644 libgcc/config/arm/eabi/fcmp.S
 create mode 100644 libgcc/config/arm/eabi/fdiv.S
 create mode 100644 libgcc/config/arm/eabi/ffixed.S
 create mode 100644 libgcc/config/arm/eabi/ffloat.S
 create mode 100644 libgcc/config/arm/eabi/fmul.S
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/fplib.h
 create mode 100644 libgcc/config/arm/eabi/futil.S
 create mode 100644 libgcc/config/arm/eabi/idiv.S
 create mode 100644 libgcc/config/arm/eabi/lcmp.S
 create mode 100644 libgcc/config/arm/eabi/ldiv.S
 create mode 100644 libgcc/config/arm/eabi/lmul.S
 create mode 100644 libgcc/config/arm/eabi/lshift.S
 create mode 100644 libgcc/config/arm/parity.S
 create mode 100644 libgcc/config/arm/popcnt.S

-- 
2.34.1

next             reply	other threads:[~2022-10-31 15:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 15:44 Daniel Engel [this message]
2022-10-31 15:44 ` [PATCH v7 01/34] Add and restructure function declaration macros Daniel Engel
2022-10-31 15:44 ` [PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY Daniel Engel
2022-10-31 15:44 ` [PATCH v7 03/34] Fix syntax warnings on conditional instructions Daniel Engel
2022-10-31 15:44 ` [PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros Daniel Engel
2022-10-31 15:45 ` [PATCH v7 06/34] Refactor 'clz' functions into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 07/34] Refactor 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 08/34] Refactor 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 09/34] Import 'clz' functions from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 10/34] Import 'ctz' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 11/34] Import 64-bit shift " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 12/34] Import 'clrsb' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 13/34] Import 'ffs' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 14/34] Import 'parity' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 15/34] Import 'popcnt' " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 17/34] Import 64-bit comparison from CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison Daniel Engel
2022-10-31 15:45 ` [PATCH v7 19/34] Import 32-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 21/34] Import 64-bit division from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 22/34] Import integer multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 24/34] Import float comparison from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file Daniel Engel
2022-10-31 15:45 ` [PATCH v7 26/34] Import float addition and subtraction from the CM0 library Daniel Engel
2022-10-31 15:45 ` [PATCH v7 27/34] Import float multiplication " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 28/34] Import float division " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 29/34] Import integer-to-float conversion " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 30/34] Import float-to-integer " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 31/34] Import float<->double " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 32/34] Import float<->__fp16 " Daniel Engel
2022-10-31 15:45 ` [PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions Daniel Engel
2022-10-31 15:45 ` [PATCH v7 34/34] Add -mpure-code support to the CM0 functions Daniel Engel
2022-11-15 15:27 ` [PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 Daniel Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221031154529.3627576-1-gnu@danielengel.com \
    --to=gnu@danielengel.com \
    --cc=Richard.Earnshaw@foss.arm.com \
    --cc=christophe.lyon@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).