public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
@ 2021-12-27 19:04 Daniel Engel
  2021-12-27 19:04 ` [PATCH v6 01/34] Add and restructure function declaration macros Daniel Engel
                   ` (33 more replies)
  0 siblings, 34 replies; 35+ messages in thread
From: Daniel Engel @ 2021-12-27 19:04 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches; +Cc: Daniel Engel, Christophe Lyon

Hi Richard, 

I am re-submitting my libgcc patch from last year: 

    https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html 

I clearly missed the stage1 window again.  However, since the patch rebased 
cleanly onto gcc-12 with no regressions, and it's not quite stage4 yet, I 
figured submission is worth a chance. 

Regards,
Daniel

---

Changes since v5:

    * Rebased and tested with gcc-12

Regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}, clean master:

    # of expected passes            513596
    # of unexpected failures        38829
    # of unexpected successes       16
    # of expected failures          3450
    # of unresolved testcases       1108
    # of unsupported tests          28224

Patched master:

    # of expected passes            513596
    # of unexpected failures        38829
    # of unexpected successes       16
    # of expected failures          3450
    # of unresolved testcases       1108
    # of unsupported tests          28224

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

    * The library is currently integrated into the ARM v6s-m multilib only.  It
    is likely that some other architectures would benefit from these routines.
    However, I have NOT profiled the existing implementations (ieee754-sf.S) to
    estimate where improvements may be found.

    * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
    There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s)                     Size (bytes)        Cycles              Stack   Accuracy
__clzsi2                        50                  20                  0       exact
__clzsi2 (OPTIMIZE_SIZE)        22                  51                  0       exact
__clzdi2                        8+__clzsi2          4+__clzsi2          0       exact

__clrsbsi2                      8+__clzsi2          6+__clzsi2          0       exact
__clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    0       exact

__ctzsi2                        52                  21                  0       exact
__ctzsi2 (OPTIMIZE_SIZE)        24                  52                  0       exact
__ctzdi2                        8+__ctzsi2          5+__ctzsi2          0       exact

__ffssi2                        8                   6..(5+__ctzsi2)     0       exact
__ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     0       exact

__popcountsi2                   52                  25                  0       exact
__popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              0       exact
__popcountdi2                   34+__popcountsi2    46                  0       exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             0       exact

__paritysi2                     24                  14                  0       exact
__paritysi2 (OPTIMIZE_SIZE)     16                  38                  0       exact
__paritydi2                     2+__paritysi2       1+__paritysi2       0       exact

__umulsidi3                     44                  24                  0       exact
__mulsidi3                      30+__umulsidi3      24+__umulsidi3      8       exact
__muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       0       exact
__ashldi3 (__aeabi_llsl)        22                  13                  0       exact
__lshrdi3 (__aeabi_llsr)        22                  13                  0       exact
__ashrdi3 (__aeabi_lasr)        22                  13                  0       exact

__aeabi_lcmp                    20                  13                  0       exact
__aeabi_ulcmp                   16                  10                  0       exact

__udivsi3 (__aeabi_uidiv)       56                  72..385             0       < 1 lsb
__divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        8       < 1 lsb
__udivdi3 (__aeabi_uldiv)       164                 103..1394           16      < 1 lsb
__udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           16      < 1 lsb
__divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        32      < 1 lsb

__shared_float                  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd)         116+__shared_float  31..76              8       <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  8       <= 0.5 ulp
__subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          8       <= 0.5 ulp
__aeabi_frsub                   8+__addsf3          6+__addsf3          8       <= 0.5 ulp
__mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              8       <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  8       <= 0.5 ulp
__divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             8       <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            8       <= 0.5 ulp

__cmpsf2/__lesf2/__ltsf2        72                  33                  0       exact
__eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          0       exact

__floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              8       <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             8       <= 0.5 ulp
__floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     8       <= 0.5 ulp
__floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     8       <= 0.5 ulp
__floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       8       <= 0.5 ulp

__fixsfdi (__aeabi_f2lz)        74                  27..33              0       exact
__fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         0       exact
__fixsfsi (__aeabi_f2iz)        52                  19                  0       exact
__fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         0       exact
__fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         0       exact

__extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  8       exact
__truncsfdf2 (__aeabi_f2d)      88                  34                  8       exact
__aeabi_d2f                     56+__shared_float   54..58              8       <= 0.5 ulp
__aeabi_h2f                     34+__shared_float   34                  8       exact
__aeabi_f2h                     84                  23..34              0       <= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division

        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction

        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-12-27 19:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-27 19:04 [PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 Daniel Engel
2021-12-27 19:04 ` [PATCH v6 01/34] Add and restructure function declaration macros Daniel Engel
2021-12-27 19:04 ` [PATCH v6 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY Daniel Engel
2021-12-27 19:04 ` [PATCH v6 03/34] Fix syntax warnings on conditional instructions Daniel Engel
2021-12-27 19:05 ` [PATCH v6 04/34] Reorganize LIB1ASMFUNCS object wrapper macros Daniel Engel
2021-12-27 19:05 ` [PATCH v6 05/34] Add the __HAVE_FEATURE_IT and IT() macros Daniel Engel
2021-12-27 19:05 ` [PATCH v6 06/34] Refactor 'clz' functions into a new file Daniel Engel
2021-12-27 19:05 ` [PATCH v6 07/34] Refactor 'ctz' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 08/34] Refactor 64-bit shift " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 09/34] Import 'clz' functions from the CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 10/34] Import 'ctz' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 11/34] Import 64-bit shift " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 12/34] Import 'clrsb' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 13/34] Import 'ffs' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 14/34] Import 'parity' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 15/34] Import 'popcnt' " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 16/34] Refactor Thumb-1 64-bit comparison into a new file Daniel Engel
2021-12-27 19:05 ` [PATCH v6 17/34] Import 64-bit comparison from CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 18/34] Merge Thumb-2 optimizations for 64-bit comparison Daniel Engel
2021-12-27 19:05 ` [PATCH v6 19/34] Import 32-bit division from the CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 20/34] Refactor Thumb-1 64-bit division into a new file Daniel Engel
2021-12-27 19:05 ` [PATCH v6 21/34] Import 64-bit division from the CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 22/34] Import integer multiplication " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 23/34] Refactor Thumb-1 float comparison into a new file Daniel Engel
2021-12-27 19:05 ` [PATCH v6 24/34] Import float comparison from the CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 25/34] Refactor Thumb-1 float subtraction into a new file Daniel Engel
2021-12-27 19:05 ` [PATCH v6 26/34] Import float addition and subtraction from the CM0 library Daniel Engel
2021-12-27 19:05 ` [PATCH v6 27/34] Import float multiplication " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 28/34] Import float division " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 29/34] Import integer-to-float conversion " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 30/34] Import float-to-integer " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 31/34] Import float<->double " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 32/34] Import float<->__fp16 " Daniel Engel
2021-12-27 19:05 ` [PATCH v6 33/34] Drop single-precision Thumb-1 soft-float functions Daniel Engel
2021-12-27 19:05 ` [PATCH v6 34/34] Add -mpure-code support to the CM0 functions Daniel Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).