public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0
@ 2020-11-12 23:04 Daniel Engel
  2020-11-26  9:14 ` Christophe Lyon
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Engel @ 2020-11-12 23:04 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7165 bytes --]

Hi, 

This patch adds an efficient assembly-language implementation of IEEE-754 compliant floating point routines for Cortex M0 EABI (v6m, thumb-1).  This is the libgcc portion of a larger library originally described in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

Since that time, I've separated the libm functions for submission to newlib.  The remaining libgcc functions in the attached patch have the following characteristics:

    Function(s)                     Size (bytes)        Cycles          Stack   Accuracy
    __clzsi2                        42                  23              0       exact
    __clzsi2 (OPTIMIZE_SIZE)        22                  55              0       exact
    __clzdi2                        8+__clzsi2          4+__clzsi2      0       exact
   
    __umulsidi3                     44                  24              0       exact
    __mulsidi3                      30+__umulsidi3      24+__umulsidi3  8       exact
    __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3   0       exact
    __ashldi3 (__aeabi_llsl)        22                  13              0       exact
    __lshrdi3 (__aeabi_llsr)        22                  13              0       exact
    __ashrdi3 (__aeabi_lasr)        22                  13              0       exact
   
    __aeabi_lcmp                    20                   13             0       exact
    __aeabi_ulcmp                   16                  10              0       exact
   
    __udivsi3 (__aeabi_uidiv)       56                  72 – 385        0       < 1 lsb
    __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3    8       < 1 lsb
    __udivdi3 (__aeabi_uldiv)       164                 103 – 1394      16      < 1 lsb
    __udivdi3 (OPTIMIZE_SIZE)       142                 120 – 1392      16      < 1 lsb
    __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3    32      < 1 lsb
   
    __shared_float                  178        
    __shared_float (OPTIMIZE_SIZE)  154   
        
    __addsf3 (__aeabi_fadd)         116+__shared_float  31 – 76         8       <= 0.5 ulp
    __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74              8       <= 0.5 ulp
    __subsf3 (__aeabi_fsub)         8+__addsf3          6+__addsf3      8       <= 0.5 ulp
    __aeabi_frsub                   8+__addsf3          6+__addsf3      8       <= 0.5 ulp
    __mulsf3 (__aeabi_fmul)         112+__shared_float  73 – 97         8       <= 0.5 ulp
    __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93              8       <= 0.5 ulp
    __divsf3 (__aeabi_fdiv)         132+__shared_float  83 – 361        8       <= 0.5 ulp
    __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263 – 359       8       <= 0.5 ulp
   
    __cmpsf2/__lesf2/__ltsf2        72                  33              0       exact
    __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
    __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
    __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2      0       exact
    __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2      0       exact
   
    __floatundisf (__aeabi_ul2f)    14+__shared_float   40 – 81         8       <= 0.5 ulp
    __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40 – 237        8       <= 0.5 ulp
    __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf 8       <= 0.5 ulp
    __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf 8       <= 0.5 ulp
    __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf   8       <= 0.5 ulp
   
    __fixsfdi (__aeabi_f2lz)        74                  27 – 33         0       exact
    __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi     0       exact
    __fixsfsi (__aeabi_f2iz)        52                  19              0       exact
    __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi     0       exact
    __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi     0       exact
     
    __extendsfdf2 (__aeabi_f2d)     42+__shared_float 38             8     exact
    __aeabi_d2f                     56+__shared_float 54 – 58     8     <= 0.5 ulp
    __aeabi_h2f                     34+__shared_float 34             8     exact
    __aeabi_f2h                     84                 23 – 34         0     <= 0.5 ulp

Copyright assignment is on file with the FSF.  

I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapshot of GCC plus this patch, and successfully compiled a test program:

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division
      
        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction
      
        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

As one point of comparison, the test program links to 876 bytes of libgcc code from the patched toolchain, vs 10276 bytes from the latest released gcc-arm-none-eabi-9-2020-q2 toolchain.    That's a 90% size reduction.  

I have extensive test vectors, and have passed these tests on an STM32F051.  These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [3] sources, plus some of my own creation.  Unfortunately, I'm not sure how "make check" should work for a cross compiler run time library.  

Although I believe this patch can be incorporated as-is, there are at least two points that might bear discussion: 

* I'm not sure where or how they would be integrated, but I would be happy to provide sources for my test vectors.  

* The library is currently built for the ARM v6m architecture only.  It is likely that some of the other Cortex variants would benefit from these routines.  However, I would need some guidance on this to proceed without introducing regressions.  I do not currently have a test strategy for architectures beyond Cortex M0, and I have NOT profiled the existing thumb-2 implementations (ieee754-sf.S) for comparison.

I'm naturally hoping for some action on this patch before the Nov 16th deadline for GCC-11 stage 3.  Please review and advise.  

Thanks,
Daniel Engel

[1] http://www.netlib.org/fp/ucbtest.tgz
[2] http://www.jhauser.us/arithmetic/TestFloat.html
[3] http://win-www.uia.ac.be/u/cant/ieeecc754.html

[-- Attachment #2: cortex-m0-fplib-20201112.patch --]
[-- Type: application/octet-stream, Size: 133513 bytes --]

diff -ruN libgcc/config/arm/bpabi-v6m.S libgcc/config/arm/bpabi-v6m.S
--- libgcc/config/arm/bpabi-v6m.S	2020-11-08 14:32:11.000000000 -0800
+++ libgcc/config/arm/bpabi-v6m.S	2020-11-12 09:06:46.383424089 -0800
@@ -33,212 +33,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-	cmp	xxh, yyh
-	beq	1f
-	bgt	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-	RET
-1:
-	subs	r0, xxl, yyl
-	beq	1f
-	bhi	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-1:
-	RET
-	FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-	
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-	cmp	xxh, yyh
-	bne	1f
-	subs	r0, xxl, yyl
-	beq	2f
-1:
-	bcs	1f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-1:
-	movs	r0, #1
-2:
-	RET
-	FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
-.macro test_div_by_zero signed
-	cmp	yyh, #0
-	bne	7f
-	cmp	yyl, #0
-	bne	7f
-	cmp	xxh, #0
-	.ifc	\signed, unsigned
-	bne	2f
-	cmp	xxl, #0
-2:
-	beq	3f
-	movs	xxh, #0
-	mvns	xxh, xxh		@ 0xffffffff
-	movs	xxl, xxh
-3:
-	.else
-	blt	6f
-	bgt	4f
-	cmp	xxl, #0
-	beq	5f
-4:	movs	xxl, #0
-	mvns	xxl, xxl		@ 0xffffffff
-	lsrs	xxh, xxl, #1		@ 0x7fffffff
-	b	5f
-6:	movs	xxh, #0x80
-	lsls	xxh, xxh, #24		@ 0x80000000
-	movs	xxl, #0
-5:
-	.endif
-	@ tailcalls are tricky on v6-m.
-	push	{r0, r1, r2}
-	ldr	r0, 1f
-	adr	r1, 1f
-	adds	r0, r1
-	str	r0, [sp, #8]
-	@ We know we are not on armv4t, so pop pc is safe.
-	pop	{r0, r1, pc}
-	.align	2
-1:
-	.word	__aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-	test_div_by_zero signed
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__gnu_ldivmod_helper)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
-	test_div_by_zero unsigned
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__udivmoddi4)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_uldivmod
-	
-#endif /* L_aeabi_uldivmod */
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
-      push	{r4, lr}
-      movs	r4, #1
-      lsls	r4, #31
-      eors	r0, r0, r4
-      bl	__aeabi_fadd
-      pop	{r4, pc}
-
-      FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
-	mov	ip, r0
-	movs	r0, r1
-	mov	r1, ip
-	b	6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-	@ The status-returning routines are required to preserve all
-	@ registers except ip, lr, and cpsr.
-6:	push	{r0, r1, r2, r3, r4, lr}
-	bl	__lesf2
-	@ Set the Z flag correctly, and the C flag unconditionally.
-	cmp	r0, #0
-	@ Clear the C flag if the return value was -1, indicating
-	@ that the first operand was smaller than the second.
-	bmi	1f
-	movs	r1, #0
-	cmn	r0, r1
-1:
-	pop	{r0, r1, r2, r3, r4, pc}
-
-	FUNC_END aeabi_cfcmple
-	FUNC_END aeabi_cfcmpeq
-	FUNC_END aeabi_cfrcmple
-
-FUNC_START	aeabi_fcmpeq
-
-	push	{r4, lr}
-	bl	__eqsf2
-	negs	r0, r0
-	adds	r0, r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START	aeabi_fcmp\cond
-
-	push	{r4, lr}
-	bl	__\helper\mode
-	cmp	r0, #0
-	b\cond	1f
-	movs	r0, #0
-	pop	{r4, pc}
-1:
-	movs	r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff -ruN libgcc/config/arm/cm0/clz2.S libgcc/config/arm/cm0/clz2.S
--- libgcc/config/arm/cm0/clz2.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/clz2.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,122 @@
+/* clz2.S: Cortex M0 optimized 'clz' functions 
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __clzdi2(long long)
+// Counts leading zeros in a 64 bit double word.
+// Expects the argument  in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+.section .text.libgcc.clz2,"x"
+CM0_FUNC_START clzdi2
+    CFI_START_FUNCTION
+
+        // Assume all the bits in the argument are zero.
+        movs    r2,     #64
+
+        // If the upper word is ZERO, calculate 32 + __clzsi2(lower).
+        cmp     r1,     #0
+        beq     LSYM(__clz16)
+
+        // The upper word is non-zero, so calculate __clzsi2(upper).
+        movs    r0,     r1
+
+        // Fall through.
+
+
+// int __clzsi2(int)
+// Counts leading zeros in a 32 bit word.
+// Expects the argument in $r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+CM0_FUNC_START clzsi2
+        // Assume all the bits in the argument are zero
+        movs    r2,     #32
+
+    LSYM(__clz16):
+        // Size optimized: 22 bytes, 51 clocks
+        // Speed optimized: 42 bytes, 23 clocks
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Binary search starts at half the word width.
+        movs    r3,     #16
+
+    LSYM(__clz_loop):
+        // Test the upper 'n' bits of the operand for ZERO.
+        movs    r1,     r0
+        lsrs    r1,     r3
+        beq     LSYM(__clz_skip)
+
+        // When the test fails, discard the lower bits of the register,
+        //  and deduct the count of discarded bits from the result.
+        movs    r0,     r1
+        subs    r2,     r3
+
+    LSYM(__clz_skip):
+        // Decrease the shift distance for the next test.
+        lsrs    r3,     #1
+        bne     LSYM(__clz_loop)
+  #else
+        // Unrolled binary search.
+        lsrs    r1,     r0,     #16
+        beq     LSYM(__clz8)
+        movs    r0,     r1
+        subs    r2,     #16
+
+    LSYM(__clz8):
+        lsrs    r1,     r0,     #8
+        beq     LSYM(__clz4)
+        movs    r0,     r1
+        subs    r2,     #8
+
+    LSYM(__clz4):
+        lsrs    r1,     r0,     #4
+        beq     LSYM(__clz2)
+        movs    r0,     r1
+        subs    r2,     #4
+
+    LSYM(__clz2):
+        lsrs    r1,     r0,     #2
+        beq     LSYM(__clz1)
+        movs    r0,     r1
+        subs    r2,     #2
+
+    LSYM(__clz1):
+        // Convert remainder {0,1,2,3} to {0,1,2,2} (no 'ldr' cache hit).
+        lsrs    r1,     r0,     #1
+        bics    r0,     r1
+  #endif
+
+        // Account for the remainder.
+        subs    r0,     r2,     r0
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+CM0_FUNC_END clzdi2
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/div.S libgcc/config/arm/cm0/div.S
--- libgcc/config/arm/cm0/div.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/div.S	2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,180 @@
+/* div.S: Cortex M0 optimized 32-bit integer division
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+.section .text.libgcc.idiv0,"x"
+CM0_FUNC_START aeabi_idiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
+        // Return {0, numerator}.
+        movs    r1,     r0
+        eors    r0,     r0
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_idiv0
+
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+.section .text.libgcc.idiv,"x"
+CM0_FUNC_START aeabi_idivmod
+FUNC_ALIAS aeabi_idiv aeabi_idivmod
+FUNC_ALIAS divsi3 aeabi_idivmod
+    CFI_START_FUNCTION
+
+        // Extend the sign of the denominator.
+        asrs    r3,     r1,     #31
+
+        // Absolute value of the denominator, abort on division by zero.
+        eors    r1,     r3
+        subs    r1,     r3
+        beq     SYM(__aeabi_idiv0)
+
+        // Absolute value of the numerator.
+        asrs    r2,     r0,     #31
+        eors    r0,     r2
+        subs    r0,     r2
+
+        // Keep the sign of the numerator in bit[31] (for the remainder).
+        // Save the XOR of the signs in bits[15:0] (for the quotient).
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        lsrs    rT,     r3,     #16
+        eors    rT,     r2
+
+        // Handle division as unsigned.
+        bl      LSYM(__internal_uidivmod)
+
+        // Set the sign of the remainder.
+        asrs    r2,     rT,     #31
+        eors    r1,     r2
+        subs    r1,     r2
+
+        // Set the sign of the quotient.
+        sxth    r3,     rT
+        eors    r0,     r3
+        subs    r0,     r3
+
+    LSYM(__idivmod_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divsi3
+CM0_FUNC_END aeabi_idiv
+CM0_FUNC_END aeabi_idivmod
+
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned int)
+// Returns unsigned $r0 after division by $r1.
+// Also returns the remainder in $r1.
+.section .text.libgcc.uidiv,"x"
+CM0_FUNC_START aeabi_uidivmod
+FUNC_ALIAS aeabi_uidiv aeabi_uidivmod
+FUNC_ALIAS udivsi3 aeabi_uidivmod
+    CFI_START_FUNCTION
+
+        // Abort on division by zero.
+        tst     r1,     r1
+        beq     SYM(__aeabi_idiv0)
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    LSYM(__internal_uidivmod):
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // The loop is destructive, save a copy of the numerator.
+        mov     ip,     r0
+
+        // Set up binary search.
+        movs    r3,     #16
+        movs    r2,     #1
+
+    LSYM(__uidivmod_align):
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    r0,     r3
+        cmp     r0,     r1
+        blo     LSYM(__uidivmod_skip)
+
+        // Multiply the denominator and the result together.
+        lsls    r1,     r3
+        lsls    r2,     r3
+
+    LSYM(__uidivmod_skip):
+        // Restore the numerator, and iterate until search goes to 0.
+        mov     r0,     ip
+        lsrs    r3,     #1
+        bne     LSYM(__uidivmod_align)
+
+        // In The result $r3 has been conveniently initialized to 0.
+        b       LSYM(__uidivmod_entry)
+
+    LSYM(__uidivmod_loop):
+        // Scale the denominator and the quotient together.
+        lsrs    r1,     #1
+        lsrs    r2,     #1
+        beq     LSYM(__uidivmod_return)
+
+    LSYM(__uidivmod_entry):
+        // Test if the denominator is smaller than the numerator.
+        cmp     r0,     r1
+        blo     LSYM(__uidivmod_loop)
+
+        // If the denominator is smaller, the next bit of the result is '1'.
+        // If the new remainder goes to 0, exit early.
+        adds    r3,     r2
+        subs    r0,     r1
+        bne     LSYM(__uidivmod_loop)
+
+    LSYM(__uidivmod_return):
+        mov     r1,     r0
+        mov     r0,     r3
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END udivsi3
+CM0_FUNC_END aeabi_uidiv
+CM0_FUNC_END aeabi_uidivmod
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fadd.S libgcc/config/arm/cm0/fadd.S
--- libgcc/config/arm/cm0/fadd.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fadd.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,301 @@
+/* fadd.S: Cortex M0 optimized 32-bit float addition
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+.section .text.libgcc.frsub,"x"
+CM0_FUNC_START aeabi_frsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r0 is NAN before modifying.
+        lsls    r2,     r0,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     LSYM(__internal_fadd)
+      #endif
+
+        // Flip sign and run through fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r0,     r2
+        b       LSYM(__internal_fadd)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_frsub
+
+
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+.section .text.libgcc.fsub,"x"
+CM0_FUNC_START aeabi_fsub
+FUNC_ALIAS subsf3 aeabi_fsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r1 is NAN before modifying.
+        lsls    r2,     r1,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     LSYM(__internal_fadd)
+      #endif
+
+        // Flip sign and run through fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r1,     r2
+        b       LSYM(__internal_fadd)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END subsf3
+CM0_FUNC_END aeabi_fsub
+
+
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+.section .text.libgcc.fadd,"x"
+CM0_FUNC_START aeabi_fadd
+FUNC_ALIAS addsf3 aeabi_fadd
+    CFI_START_FUNCTION
+
+    LSYM(__internal_fadd):
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Drop the sign bit to compare absolute value.
+        lsls    r2,     r0,     #1
+        lsls    r3,     r1,     #1
+
+        // Save the logical difference of original values.
+        // This actually makes the following swap slightly faster.
+        eors    r1,     r0
+
+        // Compare exponents+mantissa.
+        // MAYBE: Speedup for equal values?  This would have to separately
+        //  check for NAN/INF and then either:
+        // * Increase the exponent by '1' (for multiply by 2), or
+        // * Return +0
+        cmp     r2,     r3
+        bhs     LSYM(__fadd_ordered)
+
+        // Reorder operands so the larger absolute value is in r2,
+        //  the corresponding original operand is in $r0,
+        //  and the smaller absolute value is in $r3.
+        movs    r3,     r2
+        eors    r0,     r1
+        lsls    r2,     r0,     #1
+
+    LSYM(__fadd_ordered):
+        // Extract the exponent of the larger operand.
+        // If INF/NAN, then it becomes an automatic result.
+        lsrs    r2,     #24
+        cmp     r2,     #255
+        beq     LSYM(__fadd_special)
+
+        // Save the sign of the result.
+        lsrs    rT,     r0,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // If the original value of $r1 was to +/-0,
+        //  $r0 becomes the automatic result.
+        // Because $r0 is known to be a finite value, return directly.
+        // It's actually important that +/-0 not go through the normal
+        //  process, to keep "-0 +/- 0"  from being turned into +0.
+        cmp     r3,     #0
+        beq     LSYM(__fadd_zero)
+
+        // Extract the second exponent.
+        lsrs    r3,     #24
+
+        // Calculate the difference of exponents (always positive).
+        subs    r3,     r2,     r3
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // If the smaller operand is more than 25 bits less significant
+        //  than the larger, the larger operand is an automatic result.
+        // The smaller operand can't affect the result, even after rounding.
+        cmp     r3,     #25
+        bhi     LSYM(__fadd_return)
+      #endif
+
+        // Isolate both mantissas, recovering the smaller.
+        lsls    rT,     r0,     #9
+        lsls    r0,     r1,     #9
+        eors    r0,     rT // 26
+
+        // If the larger operand is normal, restore the implicit '1'.
+        // If subnormal, the second operand will also be subnormal.
+        cmp     r2,     #0
+        beq     LSYM(__fadd_normal)
+        adds    rT,     #1
+        rors    rT,     rT
+
+        // If the smaller operand is also normal, restore the implicit '1'.
+        // If subnormal, the smaller operand effectively remains multiplied
+        //  by 2 w.r.t the first.  This compensates for subnormal exponents,
+        //  which are technically still -126, not -127.
+        cmp     r2,     r3
+        beq     LSYM(__fadd_normal)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LSYM(__fadd_normal):
+        // Provide a spare bit for overflow.
+        // Normal values will be aligned in bits [30:7]
+        // Subnormal values will be aligned in bits [30:8]
+        lsrs    rT,     #1
+        lsrs    r0,     #1
+
+        // If signs weren't matched, negate the smaller operand (branchless).
+        asrs    r1,     #31
+        eors    r0,     r1
+        subs    r0,     r1
+
+        // Keep a copy of the small mantissa for the remainder.
+        movs    r1,     r0
+
+        // Align the small mantissa for addition.
+        asrs    r1,     r3
+
+        // Isolate the remainder.
+        // NOTE: Given the various cases above, the remainder will only
+        //  be used as a boolean for rounding ties to even.  It is not
+        //  necessary to negate the remainder for subtraction operations.
+        rsbs    r3,     #0
+        adds    r3,     #32
+        lsls    r0,     r3
+
+        // Because operands are ordered, the result will never be negative.
+        // If the result of subtraction is 0, the overall result must be +0.
+        // If the overall result in $r1 is 0, then the remainder in $r0
+        //  must also be 0, so no register copy is necessary on return.
+        adds    r1,     rT
+        beq     LSYM(__fadd_return)
+
+        // The large operand was aligned in bits [29:7]...
+        // If the larger operand was normal, the implicit '1' went in bit [30].
+        //
+        // After addition, the MSB of the result may be in bit:
+        //    31,  if the result overflowed.
+        //    30,  the usual case.
+        //    29,  if there was a subtraction of operands with exponents
+        //          differing by more than 1.
+        //  < 28, if there was a subtraction of operands with exponents +/-1,
+        //  < 28, if both operands were subnormal.
+
+        // In the last case (both subnormal), the alignment shift will be 8,
+        //  the exponent will be 0, and no rounding is necessary.
+        cmp     r2,     #0
+        bne     SYM(__fp_assemble) // 46
+
+        // Subnormal overflow automatically forms the correct exponent.
+        lsrs    r0,     r1,     #8
+        add     r0,     ip
+
+    LSYM(__fadd_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LSYM(__fadd_special):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If $r1 is (also) NAN, force it in place of $r0.
+        // As the smaller NAN, it is more likely to be signaling.
+        movs    rT,     #255
+        lsls    rT,     #24
+        cmp     r3,     rT
+        bls     LSYM(__fadd_ordered2)
+
+        eors    r0,     r1
+      #endif
+
+    LSYM(__fadd_ordered2):
+        // There are several possible cases to consider here:
+        //  1. Any NAN/NAN combination
+        //  2. Any NAN/INF combination
+        //  3. Any NAN/value combination
+        //  4. INF/INF with matching signs
+        //  5. INF/INF with mismatched signs.
+        //  6. Any INF/value combination.
+        // In all cases but the case 5, it is safe to return $r0.
+        // In the special case, a new NAN must be constructed.
+        // First, check the mantissa to see if $r0 is NAN.
+        lsls    r2,     r0,     #9
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LSYM(__fadd_return)
+      #endif
+
+    LSYM(__fadd_zero):
+        // Next, check for an INF/value combination.
+        lsls    r2,     r1,     #1
+        bne     LSYM(__fadd_return)
+
+        // Finally, check for matching sign on INF/INF.
+        // Also accepts matching signs when +/-0 are added.
+        bcc     LSYM(__fadd_return)
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(SUBTRACTED_INFINITY)
+      #endif
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // Restore original operands.
+        eors    r1,     r0
+      #endif
+
+        // Identify mismatched 0.
+        lsls    r2,     r0,     #1
+        bne     SYM(__fp_exception)
+
+        // Force mismatched 0 to +0.
+        eors    r0,     r0
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END addsf3
+CM0_FUNC_END aeabi_fadd
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fcmp.S libgcc/config/arm/cm0/fcmp.S
--- libgcc/config/arm/cm0/fcmp.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fcmp.S	2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,555 @@
+/* fcmp.S: Cortex M0 optimized 32-bit float comparison
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * +1 if ($r0 > $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.cmpsf2,"x"
+CM0_FUNC_START cmpsf2
+FUNC_ALIAS lesf2 cmpsf2
+FUNC_ALIAS ltsf2 cmpsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+CM0_FUNC_START internal_cmpsf2
+
+        // When operand signs are considered, the comparison result falls
+        //  within one of the following quadrants:
+        //
+        // $r0  $r1  $r0-$r1* flags  result
+        //  +    +      >      C=0     GT
+        //  +    +      =      Z=1     EQ
+        //  +    +      <      C=1     LT
+        //  +    -      >      C=1     GT
+        //  +    -      =      C=1     GT
+        //  +    -      <      C=1     GT
+        //  -    +      >      C=0     LT
+        //  -    +      =      C=0     LT
+        //  -    +      <      C=0     LT
+        //  -    -      >      C=0     LT
+        //  -    -      =      Z=1     EQ
+        //  -    -      <      C=1     GT
+        //
+        // *When interpeted as a subtraction of unsigned integers
+        //
+        // From the table, it is clear that in the presence of any negative
+        //  operand, the natural result simply needs to be reversed.
+        // Save the 'N' flag for later use.
+        movs    r3,     r0
+        orrs    r3,     r1
+        mov     ip,     r3
+
+        // Keep the absolute value of the second argument for NAN testing.
+        lsls    r3,     r1,     #1
+
+        // With the absolute value of the second argument safely stored,
+        //  recycle $r1 to calculate the difference of the arguments.
+        subs    r1,     r0,     r1
+
+        // Save the 'C' flag for use later.
+        // Effectively shifts all the flags 1 bit left.
+        adcs    r2,     r2
+
+        // Absolute value of the first argument.
+        lsls    r0,     #1
+
+        // Identify the largest absolute value between the two arguments.
+        cmp     r0,     r3
+        bhs     LSYM(__fcmp_sorted)
+
+        // Keep the larger absolute value for NAN testing.
+        // NOTE: When the arguments are respectively a signaling NAN and a
+        //  quiet NAN, the quiet NAN has precedence.  This has consequences
+        //  if TRAP_NANS is enabled, but the flags indicate that exceptions
+        //  for quiet NANs should be suppressed.  After the signaling NAN is
+        //  discarded, no exception is raised, although it should have been.
+        // This could be avoided by using a fifth register to save both
+        //  arguments until the signaling bit can be tested, but that seems
+        //  like an excessive amount of ugly code for an ambiguous case.
+        movs    r0,     r3
+
+    LSYM(__fcmp_sorted):
+        // If $r3 is NAN, the result is unordered.
+        movs    r3,     #255
+        lsls    r3,     #24
+        cmp     r0,     r3
+        bhi     LSYM(__fcmp_unordered)
+
+        // Positive and negative zero must be considered equal.
+        // If the larger absolute value is +/-0, both must have been +/-0.
+        subs    r3,     r0,     #0
+        beq     LSYM(__fcmp_zero)
+
+        // Test for regular equality.
+        subs    r3,     r1,     #0
+        beq     LSYM(__fcmp_zero)
+
+        // Isolate the saved 'C', and invert if either argument was negative.
+        // Remembering that the original subtraction was $r1 - $r0,
+        //  the result will be 1 if 'C' was set (gt), or 0 for not 'C' (lt).
+        lsls    r3,     r2,     #31
+        add     r3,     ip
+        lsrs    r3,     #31
+
+        // HACK: Clear the 'C' bit
+        adds    r3,     #0
+
+    LSYM(__fcmp_zero):
+        // After everything is combined, the temp result will be
+        //  2 (gt), 1 (eq), or 0 (lt).
+        adcs    r3,     r3
+
+        // Return directly if the 3-way comparison flag is set.
+        // Also shifts the condition mask into bits[2:0].
+        lsrs    r2,     #2 // 26
+        bcs     LSYM(__fcmp_return)
+
+        // If the bit corresponding to the comparison result is set in the
+        //  accepance mask, a '1' will fall out into the result.
+        movs    r0,     #1
+        lsrs    r2,     r3
+        ands    r0,     r2
+        RETx    lr // 33
+
+    LSYM(__fcmp_unordered):
+        // Set up the requested UNORDERED result.
+        // Remember the shift in the flags (above).
+        lsrs    r2,     #6
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // TODO: ... The
+
+
+  #endif
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Always raise an exception if FCMP_RAISE_EXCEPTIONS was specified.
+        bcs     LSYM(__fcmp_trap)
+
+        // If FCMP_NO_EXCEPTIONS was specified, no exceptions on quiet NANs.
+        // The comparison flags are moot, so $r1 can serve as scratch space.
+        lsrs    r1,     r0,     #24
+        bcs     LSYM(__fcmp_return2)
+
+    LSYM(__fcmp_trap):
+        // Restore the NAN (sans sign) for an argument to the exception.
+        // As an IRQ, the handler restores all registers, including $r3.
+        // NOTE: The service handler may not return.
+        lsrs    r0,     #1
+        movs    r3,     #(UNORDERED_COMPARISON)
+        svc     #(SVC_TRAP_NAN)
+  #endif
+
+     LSYM(__fcmp_return2):
+        // HACK: Work around result register mapping.
+        // This could probably be eliminated by remapping the flags register.
+        movs    r3,     r2
+
+    LSYM(__fcmp_return):
+        // Finish setting up the result.
+        // The subtraction allows a negative result from an 8 bit set of flags.
+        //  (See the variations on the FCMP_UN parameter, above).
+        subs    r0,     r3,     #1
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ltsf2
+CM0_FUNC_END lesf2
+CM0_FUNC_END cmpsf2
+
+
+// int __eqsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1)
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1), or either argument is NAN
+// Uses $r2, $r3, and $ip as scratch space.
+CM0_FUNC_START eqsf2
+FUNC_ALIAS nesf2 eqsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END nesf2
+CM0_FUNC_END eqsf2
+
+
+// int __gesf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.gesf2,"x"
+CM0_FUNC_START gesf2
+FUNC_ALIAS gtsf2 gesf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_NEGATIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END gtsf2
+CM0_FUNC_END gesf2
+
+
+// int __aeabi_fcmpeq(float, float)
+// Returns '1' in $r1 if ($r0 == $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpeq,"x"
+CM0_FUNC_START aeabi_fcmpeq
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpeq
+
+
+// int __aeabi_fcmpne(float, float) [non-standard]
+// Returns '1' in $r1 if ($r0 != $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpne,"x"
+CM0_FUNC_START aeabi_fcmpne
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_NE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpne
+
+
+// int __aeabi_fcmplt(float, float)
+// Returns '1' in $r1 if ($r0 < $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmplt,"x"
+CM0_FUNC_START aeabi_fcmplt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmplt
+
+
+// int __aeabi_fcmple(float, float)
+// Returns '1' in $r1 if ($r0 <= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmple,"x"
+CM0_FUNC_START aeabi_fcmple
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmple
+
+
+// int __aeabi_fcmpge(float, float)
+// Returns '1' in $r1 if ($r0 >= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpge,"x"
+CM0_FUNC_START aeabi_fcmpge
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpge
+
+
+// int __aeabi_fcmpgt(float, float)
+// Returns '1' in $r1 if ($r0 > $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpgt,"x"
+CM0_FUNC_START aeabi_fcmpgt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpgt
+
+
+// int __aeabi_fcmpun(float, float)
+// Returns '1' in $r1 if $r0 and $r1 are unordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpun,"x"
+CM0_FUNC_START aeabi_fcmpun
+FUNC_ALIAS unordsf2 aeabi_fcmpun
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END unordsf2
+CM0_FUNC_END aeabi_fcmpun
+
+#if 0
+
+
+// void __aeabi_cfrcmple(float, float)
+// Reverse three-way compare of $r1 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 > $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+.section .text.libgcc.cfrcmple,"x"
+CM0_FUNC_START aeabi_cfrcmple
+    CFI_START_FUNCTION
+
+        push    { r0-r3, lr }
+
+        // Save the current CFI state
+        .cfi_adjust_cfa_offset 20
+        .cfi_rel_offset r0, 0
+        .cfi_rel_offset r1, 4
+        .cfi_rel_offset r2, 8
+        .cfi_rel_offset r3, 12
+        .cfi_rel_offset lr, 16
+
+        // Reverse the order of the arguments.
+        ldr     r0,     [sp, #4]
+        ldr     r1,     [sp, #0]
+
+        // Don't just fall through into cfcmple(), else registers will get pushed twice.
+        b       SYM(__real_cfrcmple)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_cfrcmple
+
+
+// void __aeabi_cfcmpeq(float, float)
+// NOTE: This function only applies if __aeabi_cfcmple() can raise exceptions.
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+#if defined(TRAP_NANS) && TRAP_NANS
+  .section .text.libgcc.cfcmpeq,"x"
+  CM0_FUNC_START aeabi_cfcmpeq
+    CFI_START_FUNCTION
+
+        push    { r0-r3, lr }
+
+        // Save the current CFI state
+        .cfi_adjust_cfa_offset 20
+        .cfi_rel_offset r0, 0
+        .cfi_rel_offset r1, 4
+        .cfi_rel_offset r2, 8
+        .cfi_rel_offset r3, 12
+        .cfi_rel_offset lr, 16
+
+        // No exceptions on quiet NAN.
+        // On an unordered result, 'C' should be '1' and 'Z' should be '0'.
+        // A subtraction giving -1 sets these flags correctly.
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS)
+        b       LSYM(__real_cfcmpeq)
+
+    CFI_END_FUNCTION
+  CM0_FUNC_END aeabi_cfcmpeq
+#endif
+
+// void __aeabi_cfcmple(float, float)
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+.section .text.libgcc.cfcmple,"x"
+CM0_FUNC_START aeabi_cfcmple
+
+  // __aeabi_cfcmpeq() is defined separately when TRAP_NANS is enabled.
+  #if !defined(TRAP_NANS) || !TRAP_NANS
+    FUNC_ALIAS aeabi_cfcmpeq aeabi_cfcmple
+  #endif
+
+    CFI_START_FUNCTION
+
+        push    { r0-r3, lr }
+
+        // Save the current CFI state
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 20
+        .cfi_rel_offset r0, 0
+        .cfi_rel_offset r1, 4
+        .cfi_rel_offset r2, 8
+        .cfi_rel_offset r3, 12
+        .cfi_rel_offset lr, 16
+
+    LSYM(__real_cfrcmple):
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // The result in $r0 will be ignored, but do raise exceptions.
+        // On an unordered result, 'C' should be '1' and 'Z' should be '0'.
+        // A subtraction giving -1 sets these flags correctly.
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS)
+  #endif
+
+    LSYM(__real_cfcmpeq):
+        // __internal_cmpsf2() always sets the APSR flags on return.
+        bl      LSYM(__internal_cmpsf2)
+
+        // Because __aeabi_cfcmpeq() wants the 'C' flag set on equal values,
+        //  magic is required.   For the possible intermediate values in $r3:
+        //  * 0b01 gives C = 0 and Z = 0 for $r0 < $r1
+        //  * 0b10 gives C = 1 and Z = 1 for $r0 == $r1
+        //  * 0b11 gives C = 1 and Z = 0 for $r0 > $r1 (or unordered)
+        cmp    r1,     #0
+
+        // Cleanup.
+        pop    { r0-r3, pc }
+        .cfi_restore_state
+
+    CFI_END_FUNCTION
+
+  #if !defined(TRAP_NANS) || !TRAP_NANS
+    CM0_FUNC_END aeabi_cfcmpeq
+  #endif
+
+CM0_FUNC_END aeabi_cfcmple
+
+
+// int isgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 > $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isgreaterf,"x"
+CM0_FUNC_START isgreaterf
+MATH_ALIAS isgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterf
+CM0_FUNC_END isgreaterf
+
+
+// int isgreaterequalf(float, float)
+// Returns '1' in $r0 if ($r0 >= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isgreaterequalf,"x"
+CM0_FUNC_START isgreaterequalf
+MATH_ALIAS isgreaterequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterequalf
+CM0_FUNC_END isgreaterequalf
+
+
+// int islessf(float, float)
+// Returns '1' in $r0 if ($r0 < $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessf,"x"
+CM0_FUNC_START islessf
+MATH_ALIAS islessf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessf
+CM0_FUNC_END islessf
+
+
+// int islessequalf(float, float)
+// Returns '1' in $r0 if ($r0 <= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessequalf,"x"
+CM0_FUNC_START islessequalf
+MATH_ALIAS islessequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessequalf
+CM0_FUNC_END islessequalf
+
+
+// int islessgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 != $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessgreaterf,"x"
+CM0_FUNC_START islessgreaterf
+MATH_ALIAS islessgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessgreaterf
+CM0_FUNC_END islessgreaterf
+
+
+// int isunorderedf(float, float)
+// Returns '1' in $r0 if either $r0 or $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isunorderedf,"x"
+CM0_FUNC_START isunorderedf
+MATH_ALIAS isunorderedf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isunorderedf
+CM0_FUNC_END isunorderedf
+
+
+#endif
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fconv.S libgcc/config/arm/cm0/fconv.S
--- libgcc/config/arm/cm0/fconv.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fconv.S	2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,346 @@
+/* fconv.S: Cortex M0 optimized 32- and 64-bit float conversions
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// Reference: <libgcc/config/arm/fp16.c>
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.libgcc.f2d,"x"
+CM0_FUNC_START aeabi_f2d
+FUNC_ALIAS extendsfdf2 aeabi_f2d
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r1,     r0,     #31
+        lsls    r1,     #31
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Test for zero.
+        lsls    r0,     #1
+        beq     LSYM(__f2d_return) // 7
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        movs    r2,     r0
+        bl      SYM(__fp_normalize2) __PLT__ // +4+8
+
+        // Set up the exponent bias.  For INF/NAN values, the bias
+        //  is 1791 (2047 - 255 - 1), where the last '1' accounts
+        //  for the implicit '1' in the mantissa.
+        movs    r0,     #3
+        lsls    r0,     #9
+        adds    r0,     #255
+
+        // Test for INF/NAN, promote exponent if necessary
+        cmp     r2,     #255
+        beq     LSYM(__f2d_indefinite)
+
+        // For normal values, the exponent bias is 895 (1023 - 127 - 1),
+        //  which is half of the prepared INF/NAN bias.
+        lsrs    r0,     #1
+
+    LSYM(__f2d_indefinite):
+        // Assemble exponent with bias correction.
+        adds    r2,     r0
+        lsls    r2,     #20
+        adds    r1,     r2
+
+        // Assemble the high word of the mantissa.
+        lsrs    r0,     r3,     #11
+        add     r1,     r0
+
+        // Remainder of the mantissa in the low word of the result.
+        lsls    r0,     r3,     #21
+
+    LSYM(__f2d_return):
+        pop     { rT, pc } // 38
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END extendsfdf2
+CM0_FUNC_END aeabi_f2d
+
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+// Values out of range become ZERO or INF; returns the upper 23 bits of NAN.
+// Rounds to nearest, ties to even.  The ARM ABI does not appear to specify a
+//  rounding mode, so no problems here.  Unfortunately, GCC specifies rounding
+//  towards zero, which makes this implementation incompatible.
+// (It would be easy enough to truncate normal values, but single-precision
+//  subnormals would require a significantly more complex approach.)
+.section .text.libgcc.d2f,"x"
+CM0_FUNC_START aeabi_d2f
+// FUNC_ALIAS truncdfsf2 aeabi_d2f // incompatible
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r2,     r1,     #31
+        lsls    r2,     #31
+        mov     ip,     r2
+
+        // Isolate the exponent (11 bits).
+        lsls    r2,     r1,     #1
+        lsrs    r2,     #21
+
+        // Isolate the mantissa.  It's safe to always add the implicit '1' --
+        //  even for subnormals -- since they will underflow in every case.
+        lsls    r1,     #12
+        adds    r1,     #1
+        rors    r1,     r1
+        lsrs    r3,     r0,     #21
+        adds    r1,     r3
+        lsls    r0,     #11 // 11
+
+        // Test for INF/NAN (r3 = 2047)
+        mvns    r3,     r2
+        lsrs    r3,     #21
+        cmp     r3,     r2
+        beq     LSYM(__d2f_indefinite)
+
+        // Adjust exponent bias.  Offset is 127 - 1023, less 1 more since
+        //  __fp_assemble() expects the exponent relative to bit[30].
+        lsrs    r3,     #1
+        subs    r2,     r3
+        adds    r2,     #126
+
+    LSYM(__d2f_assemble):
+        // Use the standard formatting for overflow and underflow.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(__fp_assemble) // 24-28 + 30
+                .cfi_restore_state
+
+    LSYM(__d2f_indefinite):
+        // Test for INF.  If the mantissa, exclusive of the implicit '1',
+        //  is equal to '0', the result will be INF.
+        lsls    r3,     r1,     #1
+        orrs    r3,     r0
+        beq     LSYM(__d2f_assemble) // 20
+
+        // Construct NAN with the upper 22 bits of the mantissa, setting bit[21]
+        //  to ensure a valid NAN without changing bit[22] (quiet)
+        subs    r2,     #0xD
+        lsls    r0,     r2,     #20
+        lsrs    r1,     #8
+        orrs    r0,     r1
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        add     r0,     ip
+      #endif
+
+        RETx    lr // 27
+
+    CFI_END_FUNCTION
+// CM0_FUNC_END truncdfsf2
+CM0_FUNC_END aeabi_d2f
+
+
+// float __aeabi_h2f(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.libgcc.h2f,"x"
+CM0_FUNC_START aeabi_h2f
+    CFI_START_FUNCTION
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the mantissa and exponent.
+        lsls    r2,     r0,     #17
+
+        // Isolate the sign.
+        lsrs    r0,     #15
+        lsls    r0,     #31
+
+        // Align the exponent at bit[24] for normalization.
+        // If zero, return the original sign.
+        lsrs    r2,     #3
+        beq     LSYM(__h2f_return) // 8
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        bl      SYM(__fp_normalize2) __PLT__ // +4+8
+
+        // Set up the exponent bias.  For INF/NAN values, the bias is 223,
+        //  where the last '1' accounts for the implicit '1' in the mantissa.
+        adds    r2,     #(255 - 31 - 1)
+
+        // Test for INF/NAN.
+        cmp     r2,     #254
+        beq     LSYM(__h2f_assemble)
+
+        // For normal values, the bias should have been 111.
+        // However, this adjustment now is faster than branching.
+        subs    r2,     #((255 - 31 - 1) - (127 - 15 - 1))
+
+    LSYM(__h2f_assemble):
+        // Combine exponent and sign.
+        lsls    r2,     #23
+        adds    r0,     r2
+
+        // Combine mantissa.
+        lsrs    r3,     #8
+        add     r0,     r3
+
+    LSYM(__h2f_return):
+        pop     { rT, pc } // 34
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_h2f
+
+
+// short __aeabi_f2h(float f)
+// Converts a single-precision float in $r1 to half-precision,
+//  rounding to nearest, ties to even.
+// Values out of range become ZERO or INF; returns the upper 12 bits of NAN.
+// Values out of range are forced to either ZERO or INF.
+.section .text.libgcc.f2h,"x"
+CM0_FUNC_START aeabi_f2h
+    CFI_START_FUNCTION
+
+        // Set up the sign.
+        lsrs    r2,     r0,     #31
+        lsls    r2,     #15
+
+        // Save the exponent and mantissa.
+        // If ZERO, return the original sign.
+        lsls    r0,     #1
+        beq     LSYM(__f2h_return)
+
+        // Isolate the exponent, check for NAN.
+        lsrs    r1,     r0,     #24
+        cmp     r1,     #255
+        beq     LSYM(__f2h_indefinite)
+
+        // Check for overflow.
+        cmp     r1,     #(127 + 15)
+        bhi     LSYM(__f2h_overflow)
+
+        // Isolate the mantissa, adding back the implicit '1'.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0 // 12
+
+        // Adjust exponent bias for half-precision, including '1' to
+        //  account for the mantissa's implicit '1'.
+        subs    r1,     #(127 - 15 + 1)
+        bmi     LSYM(__f2h_underflow)
+
+        // Combine the exponent and sign.
+        lsls    r1,     #10
+        adds    r2,     r1
+
+        // Split the mantissa (11 bits) and remainder (13 bits).
+        lsls    r3,     r0,     #12
+        lsrs    r0,     #21
+
+     LSYM(__f2h_round):
+        // If the carry bit is '0', always round down.
+        bcc     LSYM(__f2h_return)
+
+        // Carry was set.  If a tie (no remainder) and the
+        //  LSB of the result are '0', round down (to even).
+        lsls    r1,     r0,     #31
+        orrs    r1,     r3
+        beq     LSYM(__f2h_return)
+
+        // Round up, ties to even.
+        adds    r0,     #1
+
+     LSYM(__f2h_return):
+        // Combine mantissa and exponent.
+        adds    r0,     r2
+        RETx    lr // 25 - 34
+
+    LSYM(__f2h_underflow):
+        // Align the remainder. The remainder consists of the last 12 bits
+        //  of the mantissa plus the magnitude of underflow.
+        movs    r3,     r0
+        adds    r1,     #12
+        lsls    r3,     r1
+
+        // Align the mantissa.  The MSB of the remainder must be
+        // shifted out into last the 'C' flag for rounding.
+        subs    r1,     #33
+        rsbs    r1,     #0
+        lsrs    r0,     r1
+        b       LSYM(__f2h_round) // 25
+
+    LSYM(__f2h_overflow):
+        // Create single-precision INF from which to construct half-precision.
+        movs    r0,     #255
+        lsls    r0,     #24 // 13
+
+    LSYM(__f2h_indefinite):
+        // Check for INF.
+        lsls    r3,     r0,     #8
+        beq     LSYM(__f2h_infinite)
+
+        // Set bit[8] to ensure a valid NAN without changing bit[9] (quiet).
+        adds    r2,     #128
+        adds    r2,     #128
+
+    LSYM(__f2h_infinite):
+        // Construct the result from the upper 22 bits of the mantissa
+        //  and the lower 5 bits of the exponent.
+        lsls    r0,     #3
+        lsrs    r0,     #17
+
+        // Combine with the sign (and possibly NAN flag).
+        orrs    r0,     r2
+        RETx    lr // 23
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_f2h
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fdiv.S libgcc/config/arm/cm0/fdiv.S
--- libgcc/config/arm/cm0/fdiv.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fdiv.S	2020-11-12 09:46:26.939907002 -0800
@@ -0,0 +1,258 @@
+/* fdiv.S: Cortex M0 optimized 32-bit float division
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+.section .text.libgcc.fdiv,"x"
+CM0_FUNC_START aeabi_fdiv
+FUNC_ALIAS divsf3 aeabi_fdiv
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save for the sign of the result.
+        movs    r3,     r1
+        eors    r3,     r0
+        lsrs    rT,     r3,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for divide by 0.  Automatically catches 0/0.
+        lsls    r2,     r1,     #1
+        beq     LSYM(__fdiv_by_zero)
+
+        // Check for INF/INF, or a number divided by itself.
+        lsls    r3,     #1
+        beq     LSYM(__fdiv_equal)
+
+        // Check the numerator for INF/NAN.
+        eors    r3,     r2
+        cmp     r3,     rT
+        bhs     LSYM(__fdiv_special1)
+
+        // Check the denominator for INF/NAN.
+        cmp     r2,     rT
+        bhs     LSYM(__fdiv_special2)
+
+        // Check the numerator for zero.
+        cmp     r3,     #0
+        beq     SYM(__fp_zero)
+
+        // No action if the numerator is subnormal.
+        //  The mantissa will normalize naturally in the division loop.
+        lsls    r0,     #9
+        lsrs    r1,     r3,     #24
+        beq     LSYM(__fdiv_denominator)
+
+        // Restore the numerator's implicit '1'.
+        adds    r0,     #1
+        rors    r0,     r0 // 26
+
+    LSYM(__fdiv_denominator):
+        // The denominator must be normalized and left aligned.
+        bl      SYM(__fp_normalize2) // +4+8
+
+        // 25 bits of precision will be sufficient.
+        movs    rT,     #64
+
+        // Run division.
+        bl      SYM(__internal_fdiv) // 41
+        b       SYM(__fp_assemble)
+
+    LSYM(__fdiv_equal):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_INF_BY_INF)
+      #endif
+
+        // The absolute value of both operands are equal, but not 0.
+        // If both operands are INF, create a new NAN.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If both operands are NAN, return the NAN in $r0.
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LSYM(__fdiv_return)
+      #endif
+
+        // Return 1.0f, with appropriate sign.
+        movs    r0,     #127
+        lsls    r0,     #23
+        add     r0,     ip
+
+    LSYM(__fdiv_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LSYM(__fdiv_special2):
+        // The denominator is either INF or NAN, numerator is neither.
+        // Also, the denominator is not equal to 0.
+        // If the denominator is INF, the result goes to 0.
+        beq     SYM(__fp_zero)
+
+        // The only other option is NAN, fall through to branch.
+        mov     r0,     r1
+
+    LSYM(__fdiv_special1):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // The numerator is INF or NAN.  If NAN, return it directly.
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LSYM(__fdiv_return)
+      #endif
+
+        // If INF, the result will be INF if the denominator is finite.
+        // The denominator won't be either INF or 0,
+        //  so fall through the exception trap to check for NAN.
+        movs    r0,     r1
+
+    LSYM(__fdiv_by_zero):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_0_BY_0)
+      #endif
+
+        // The denominator is 0.
+        // If the numerator is also 0, the result will be a new NAN.
+        // Otherwise the result will be INF, with the correct sign.
+        lsls    r2,     r0,     #1
+        beq     SYM(__fp_exception)
+
+        // The result should be NAN if the numerator is NAN.  Otherwise,
+        //  the result is INF regardless of the numerator value.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LSYM(__fdiv_return)
+      #endif
+
+        // Recreate INF with the correct sign.
+        b       SYM(__fp_infinity)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divsf3
+CM0_FUNC_END aeabi_fdiv
+
+
+// Division helper, possibly to be shared with atan2.
+// Expects the numerator mantissa in $r0, exponent in $r1,
+//  plus the denominator mantissa in $r3, exponent in $r2, and
+//  a bit pattern in $rT that controls the result precision.
+// Returns quotient in $r1, exponent in $r2, pseudo remainder in $r0.
+.section .text.libgcc.fdiv2,"x"
+CM0_FUNC_START internal_fdiv
+    CFI_START_FUNCTION
+
+        // Initialize the exponent, relative to bit[30].
+        subs    r2,     r1,     r2
+
+    SYM(__internal_fdiv2):
+        // The exponent should be (expN - 127) - (expD - 127) + 127.
+        // An additional offset of 25 is required to account for the
+        //  minimum number of bits in the result (before rounding).
+        // However, drop '1' because the offset is relative to bit[30],
+        //  while the result is calculated relative to bit[31].
+        adds    r2,     #(127 + 25 - 1)
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Dividing by a power of 2?
+        lsls    r1,     r3,     #1
+        beq     LSYM(__fdiv_simple) // 47
+      #endif
+
+        // Initialize the result.
+        eors    r1,     r1
+
+        // Clear the MSB, so that when the numerator is smaller than
+        //  the denominator, there is one bit free for a left shift.
+        // After a single shift, the numerator is guaranteed to be larger.
+        // The denominator ends up in r3, and the numerator ends up in r0,
+        //  so that the numerator serves as a psuedo-remainder in rounding.
+        // Shift the numerator one additional bit to compensate for the
+        //  pre-incrementing loop.
+        lsrs    r0,     #2
+        lsrs    r3,     #1 // 49
+
+    LSYM(__fdiv_loop):
+        // Once the MSB of the output reaches the MSB of the register,
+        //  the result has been calculated to the required precision.
+        lsls    r1,     #1
+        bmi     LSYM(__fdiv_break)
+
+        // Shift the numerator/remainder left to set up the next bit.
+        subs    r2,     #1
+        lsls    r0,     #1
+
+        // Test if the numerator/remainder is smaller than the denominator,
+        //  do nothing if it is.
+        cmp     r0,     r3
+        blo     LSYM(__fdiv_loop)
+
+        // If the numerator/remainder is greater or equal, set the next bit,
+        //  and subtract the denominator.
+        adds    r1,     rT
+        subs    r0,     r3
+
+        // Short-circuit if the remainder goes to 0.
+        // Even with the overhead of "subnormal" alignment,
+        //  this is usually much faster than continuing.
+        bne     LSYM(__fdiv_loop) // 11*25
+
+        // Compensate the alignment of the result.
+        // The remainder does not need compensation, it's already 0.
+        lsls    r1,     #1 // 61 + 202 (underflow)
+
+    LSYM(__fdiv_break):
+        RETx    lr  // 331 + 30,
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__fdiv_simple):
+        // The numerator becomes the result, with a remainder of 0.
+        movs    r1,     r0
+        eors    r0,     r0
+        subs    r2,     #25
+        RETx    lr   // 53 + 30
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END internal_fdiv
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ffixed.S libgcc/config/arm/cm0/ffixed.S
--- libgcc/config/arm/cm0/ffixed.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ffixed.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,340 @@
+/* ffixed.S: Cortex M0 optimized float->int conversion
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+.section .text.libgcc.f2iz,"x"
+CM0_FUNC_START aeabi_f2iz
+FUNC_ALIAS fixsfsi aeabi_f2iz
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #33
+        b       LSYM(__real_f2lz)
+  #else
+        // Flag for signed conversion.
+        movs    r3,     #1
+
+
+    LSYM(__real_f2iz):
+        // Isolate the sign of the result.
+        asrs    r1,     r0,     #31
+        lsls    r0,     #1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Check for zero to avoid spurious underflow exception on -0.
+        beq     LSYM(__f2iz_return)
+  #endif
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Test for NAN.
+        // Otherwise, NAN will be converted like +/-INF.
+        cmp     r2,     #255
+        beq     LSYM(__f2iz_nan)
+  #endif
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 31.
+        //  * An exponent of 128 will result in a shift of 30.
+        //  *  ...
+        //  * An exponent of 157 will result in a shift of 1.
+        //  * An exponent of 158 will result in no shift at all.
+        //  * An exponent larger than 158 will result in overflow.
+        rsbs    r2,     #0
+        adds    r2,     #158
+
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r3
+        blt     LSYM(__f2iz_overflow)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Would also catch -0.0f if not handled earlier.
+        cmn     r3,     r1
+        blt     LSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Save a copy for remainder testing
+        movs    r3,     r0
+  #endif
+
+        // Truncate the fraction.
+        lsrs    r0,     r2
+
+        // Two's complement negation, if applicable.
+        // Bonus: the sign in $r1 provides a suitable long long result.
+        eors    r0,     r1
+        subs    r0,     r1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // If any bits set in the remainder, raise FE_INEXACT
+        rsbs    r2,     #0
+        adds    r2,     #32
+        lsls    r3,     r2
+        bne     LSYM(__f2iz_inexact)
+  #endif
+
+    LSYM(__f2iz_return):
+        RETx    lr
+
+    LSYM(__f2iz_overflow):
+        // Positive unsigned integers (r1 == 0, r3 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r1 == -1, r3 == 0), return 0x00000000.
+        // Positive signed integers (r1 == 0, r3 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r1 == -1, r3 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^31).
+        mvns    r0,     r1
+        lsls    r3,     #31
+        eors    r0,     r3
+        RETx    lr
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+    LSYM(__f2iz_inexact):
+        // TODO: Another class of exceptions that doesn't overwrite $r0.
+        bkpt    #0
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_INEXACT)
+      #endif
+
+        b       SYM(__fp_exception)
+  #endif
+
+    LSYM(__f2iz_nan):
+        // Check for INF
+        lsls    r2,     r0,     #9
+        beq     LSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_UNDEFINED)
+      #endif
+
+        b       SYM(__fp_exception)
+  #else
+
+  #endif
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+
+        // TODO: Extend to long long
+
+        // TODO: bl  fp_check_nan
+      #endif
+
+        // Return long long 0 on NAN.
+        eors    r0,     r0
+        eors    r1,     r1
+        RETx    lr
+
+  #endif // !__OPTIMIZE_SIZE__
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixsfsi
+CM0_FUNC_END aeabi_f2iz
+
+
+// unsigned int __aeabi_f2uiz(float)
+// Converts a float in $r0 to unsigned integer, rounding toward 0.
+// Values out of range are forced to UINT_MAX.
+// Negative values and NAN all become zero.
+.section .text.libgcc.f2uiz,"x"
+CM0_FUNC_START aeabi_f2uiz
+FUNC_ALIAS fixunssfsi aeabi_f2uiz
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #32
+        b       LSYM(__real_f2lz)
+  #else
+        // Flag for unsigned conversion.
+        movs    r3,     #0
+        b       LSYM(__real_f2iz)
+  #endif // !__OPTIMIZE_SIZE__
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixunssfsi
+CM0_FUNC_END aeabi_f2uiz
+
+
+// long long aeabi_f2lz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to either INT64_MAX or INT64_MIN.
+// NAN becomes zero.
+.section .text.libgcc.f2lz,"x"
+CM0_FUNC_START aeabi_f2lz
+FUNC_ALIAS fixsfdi aeabi_f2lz
+    CFI_START_FUNCTION
+
+        movs    r1,     #1
+
+    LSYM(__real_f2lz):
+        // Split the sign of the result from the mantissa/exponent field.
+        // Handle +/-0 specially to avoid spurious exceptions.
+        asrs    r3,     r0,     #31
+        lsls    r0,     #1
+        beq     LSYM(__f2lz_zero)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Specifically, is the LSB of $r1 clear when $r3 is equal to '-1'?
+        //
+        // $r3 (sign)   >=     $r2 (flag)
+        // 0xFFFFFFFF   false   0x00000000
+        // 0x00000000   true    0x00000000
+        // 0xFFFFFFFF   true    0x80000000
+        // 0x00000000   true    0x80000000
+        //
+        // (NOTE: This test will also trap -0.0f, unless handled earlier.)
+/****/  lsls    r2,     r1,     #31
+        cmp     r3,     r2
+        blt     LSYM(__f2lz_overflow)
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+//   #if defined(TRAP_NANS) && TRAP_NANS
+//         // Test for NAN.
+//         // Otherwise, NAN will be converted like +/-INF.
+//         cmp     r2,     #255
+//         beq     LSYM(__f2lz_nan)
+//   #endif
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 63.
+        //  * An exponent of 128 will result in a shift of 62.
+        //  *  ...
+        //  * An exponent of 189 will result in a shift of 1.
+        //  * An exponent of 190 will result in no shift at all.
+        //  * An exponent larger than 190 will result in overflow
+        //     (189 in the case of signed integers).
+        rsbs    r2,     #0
+        adds    r2,     #190
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r1
+        blt     LSYM(__f2lz_overflow)
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate the upper word.
+        // If the shift is greater than 32, gives an automatic '0'.
+/**/    movs    r1,     r0
+/**/    lsrs    r1,     r2
+
+        // Reduce the shift for the lower word.
+        // If the original shift was less than 32, the result may be split
+        //  between the upper and lower words.
+/**/    subs    r2,     #32 // 18
+/**/    blt     LSYM(__f2lz_split)
+
+        // Shift is still positive, keep moving right.
+        lsrs    r0,     r2
+
+        // TODO: Remainder test.
+        // $r1 is technically free, as long as it's zero by the time
+        //  this is over.
+
+    LSYM(__f2lz_return):
+        // Two's complement negation, if the original was negative.
+        eors    r0,     r3
+/**/    eors    r1,     r3
+        subs    r0,     r3
+/**/    sbcs    r1,     r3
+        RETx    lr // 27 - 33
+
+    LSYM(__f2lz_split):
+        // Shift was negative, calculate the remainder
+        rsbs    r2,     #0
+        lsls    r0,     r2
+        b       LSYM(__f2lz_return)
+
+    LSYM(__f2lz_zero):
+        eors    r1,     r1
+        RETx    lr
+
+    LSYM(__f2lz_overflow):
+        // Positive unsigned integers (r3 == 0, r1 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r3 == -1, r1 == 0), return 0x00000000.
+        // Positive signed integers (r3 == 0, r1 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r3 == -1, r1 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^63).
+        mvns    r0,     r3
+
+        // For 32-bit results
+/***/   lsls    r2,     r1,     #26
+        lsls    r1,     #31
+/***/   ands    r2,     r1
+/***/   eors    r0,     r2
+
+//    LSYM(__f2lz_zero):
+        eors    r1,     r0
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixsfdi
+CM0_FUNC_END aeabi_f2lz
+
+
+// unsigned long long __aeabi_f2ulz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to UINT64_MAX.
+// Negative values and NAN all become zero.
+.section .text.libgcc.f2ulz,"x"
+CM0_FUNC_START aeabi_f2ulz
+FUNC_ALIAS fixunssfdi aeabi_f2ulz
+    CFI_START_FUNCTION
+
+        eors    r1,     r1
+        b       LSYM(__real_f2lz)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixunssfdi
+CM0_FUNC_END aeabi_f2ulz
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ffloat.S libgcc/config/arm/cm0/ffloat.S
--- libgcc/config/arm/cm0/ffloat.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ffloat.S	2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,96 @@
+/* ffixed.S: Cortex M0 optimized int->float conversion
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+.section .text.libgcc.il2f,"x"
+CM0_FUNC_START aeabi_i2f
+FUNC_ALIAS floatsisf aeabi_i2f
+    CFI_START_FUNCTION
+
+        // Sign extension to long long.
+        asrs    r1,     r0,     #31
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+CM0_FUNC_START aeabi_l2f
+FUNC_ALIAS floatdisf aeabi_l2f
+
+        // Save the sign.
+        asrs    r3,     r1,     #31
+
+        // Absolute value of the input.
+        eors    r0,     r3
+        eors    r1,     r3
+        subs    r0,     r3
+        sbcs    r1,     r3
+
+        b       LSYM(__internal_uil2f) // 8, 9
+
+    CFI_END_FUNCTION
+CM0_FUNC_END floatdisf
+CM0_FUNC_END aeabi_l2f
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+
+
+// float __aeabi_ui2f(unsigned)
+// Converts a unsigned integer in $r0 to float.
+.section .text.libgcc.uil2f,"x"
+CM0_FUNC_START aeabi_ui2f
+FUNC_ALIAS floatunsisf aeabi_ui2f
+    CFI_START_FUNCTION
+
+        // Convert to unsigned long long with upper bits of 0.
+        eors    r1,     r1
+
+// float __aeabi_ul2f(unsigned long long)
+// Converts a unsigned 64-bit integer in $r1:$r0 to a float in $r0.
+CM0_FUNC_START aeabi_ul2f
+FUNC_ALIAS floatundisf aeabi_ul2f
+
+        // Sign is always positive.
+        eors    r3,     r3
+
+    LSYM(__internal_uil2f):
+        // Default exponent, relative to bit[30] of $r1.
+        movs    r2,     #(189)
+
+        // Format the sign.
+        lsls    r3,     #31
+        mov     ip,     r3
+
+        push    { rT, lr }
+        b       SYM(__fp_assemble) // { 10, 11, 18, 19 } + 30-227
+
+    CFI_END_FUNCTION
+CM0_FUNC_END floatundisf
+CM0_FUNC_END aeabi_ul2f
+CM0_FUNC_END floatunsisf
+CM0_FUNC_END aeabi_ui2f
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fmul.S libgcc/config/arm/cm0/fmul.S
--- libgcc/config/arm/cm0/fmul.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fmul.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,214 @@
+/* fmul.S: Cortex M0 optimized 32-bit float multiplication
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+.section .text.libgcc.fmul,"x"
+CM0_FUNC_START aeabi_fmul
+FUNC_ALIAS mulsf3 aeabi_fmul
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the sign of the result.
+        movs    rT,     r1
+        eors    rT,     r0
+        lsrs    rT,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for multiplication by zero.
+        lsls    r2,     r0,     #1
+        beq     LSYM(__fmul_zero1)
+
+        lsls    r3,     r1,     #1
+        beq     LSYM(__fmul_zero2)
+
+        // Check for INF/NAN.
+        cmp     r3,     rT
+        bhs     LSYM(__fmul_special2)
+
+        cmp     r2,     rT
+        bhs     LSYM(__fmul_special1)
+
+        // Because neither operand is INF/NAN, the result will be finite.
+        // It is now safe to modify the original operand registers.
+        lsls    r0,     #9
+
+        // Isolate the first exponent.  When normal, add back the implicit '1'.
+        // The result is always aligned with the MSB in bit [31].
+        // Subnormal mantissas remain effectively multiplied by 2x relative to
+        //  normals, but this works because the weight of a subnormal is -126.
+        lsrs    r2,     #24
+        beq     LSYM(__fmul_normalize2)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LSYM(__fmul_normalize2):
+        // IMPORTANT: exp10i() jumps in here!
+        // Repeat for the mantissa of the second operand.
+        // Short-circuit when the mantissa is 1.0, as the
+        //  first mantissa is already prepared in $r0
+        lsls    r1,     #9
+
+        // When normal, add back the implicit '1'.
+        lsrs    r3,     #24
+        beq     LSYM(__fmul_go)
+        adds    r1,     #1
+        rors    r1,     r1
+
+    LSYM(__fmul_go):
+        // Calculate the final exponent, relative to bit [30].
+        adds    rT,     r2,     r3
+        subs    rT,     #127 // 30
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Short-circuit on multiplication by powers of 2.
+        lsls    r3,     r0,     #1
+        beq     LSYM(__fmul_simple1)
+
+        lsls    r3,     r1,     #1
+        beq     LSYM(__fmul_simple2)
+  #endif
+
+        // Save $ip across the call.
+        // (Alternatively, could push/pop a separate register,
+        //  but the four instructions here are equivally fast)
+        //  without imposing on the stack.
+        add     rT,     ip
+
+        // 32x32 unsigned multiplication, 64 bit result.
+        bl      SYM(__umulsidi3) __PLT__ // +22
+
+        // Separate the saved exponent and sign.
+        sxth    r2,     rT
+        subs    rT,     r2
+        mov     ip,     rT
+
+        b       SYM(__fp_assemble) // 62
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__fmul_simple2):
+        // Move the high bits of the result to $r1.
+        movs    r1,     r0
+
+    LSYM(__fmul_simple1):
+        // Clear the remainder.
+        eors    r0,     r0
+
+        // Adjust mantissa to match the exponent, relative to bit[30].
+        subs    r2,     rT,     #1
+        b       SYM(__fp_assemble) // 42
+  #endif
+
+    LSYM(__fmul_zero1):
+        // $r0 was equal to 0, set up to check $r1 for INF/NAN.
+        lsls    r2,     r1,     #1
+
+    LSYM(__fmul_zero2):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(INFINITY_TIMES_ZERO)
+      #endif
+
+        // Check the non-zero operand for INF/NAN.
+        // If NAN, it should be returned.
+        // If INF, the result should be NAN.
+        // Otherwise, the result will be +/-0.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+        // If the second operand is finite, the result is 0.
+        blo     SYM(__fp_zero)
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Restore values that got mixed in zero testing, then go back
+        //  to sort out which one is the NAN.
+        lsls    r3,     r1,     #1
+        lsls    r2,     r0,     #1
+      #elif defined(TRAP_NANS) && TRAP_NANS
+        // Return NAN with the sign bit cleared.
+        lsrs    r0,     r2,     #1
+        b       SYM(__fp_check_nan)
+      #else
+        lsrs    r0,     r2,     #1
+        // Return NAN with the sign bit cleared.
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    LSYM(__fmul_special2):
+        // $r1 is INF/NAN.  In case of INF, check $r0 for NAN.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // Force swap if $r0 is not NAN.
+        bls     LSYM(__fmul_swap)
+
+        // $r0 is NAN, keep if $r1 is INF
+        cmp     r3,     rT
+        beq     LSYM(__fmul_special1)
+
+        // Both are NAN, keep the smaller value (more likely to signal).
+        cmp     r2,     r3
+      #endif
+
+        // Prefer the NAN already in $r0.
+        //  (If TRAP_NANS, this is the smaller NAN).
+        bhi     LSYM(__fmul_special1)
+
+    LSYM(__fmul_swap):
+        movs    r0,     r1
+
+    LSYM(__fmul_special1):
+        // $r0 is either INF or NAN.  $r1 has already been examined.
+        // Flags are already set correctly.
+        lsls    r2,     r0,     #1
+        cmp     r2,     rT
+        beq     SYM(__fp_infinity)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        b       SYM(__fp_check_nan)
+      #else
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END mulsf3
+CM0_FUNC_END aeabi_fmul
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fneg.S libgcc/config/arm/cm0/fneg.S
--- libgcc/config/arm/cm0/fneg.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fneg.S	2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,75 @@
+/* fneg.S: Cortex M0 optimized 32-bit float negation
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+.section .text.libgcc.fneg,"x"
+CM0_FUNC_START aeabi_fneg
+FUNC_ALIAS negsf2 aeabi_fneg
+    CFI_START_FUNCTION
+
+  #if (defined(STRICT_NANS) && STRICT_NANS) || \
+      (defined(TRAP_NANS) && TRAP_NANS)
+        // Check for NAN.
+        lsls    r1,     r0,     #1
+        movs    r2,     #255
+        lsls    r2,     #24
+        cmp     r1,     r2
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        blo     SYM(__fneg_nan)
+      #else
+        blo     LSYM(__fneg_return)
+      #endif
+  #endif
+
+        // Flip the sign.
+        movs    r1,     #1
+        lsls    r1,     #31
+        eors    r0,     r1
+
+    LSYM(__fneg_return):
+        RETx    lr
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+    LSYM(__fneg_nan):
+        // Set up registers for exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(fp_check_nan)
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END negsf2
+CM0_FUNC_END aeabi_fneg
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fplib.h libgcc/config/arm/cm0/fplib.h
--- libgcc/config/arm/cm0/fplib.h	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fplib.h	2020-11-12 09:45:36.032217491 -0800
@@ -0,0 +1,80 @@
+/* fplib.h: Cortex M0 optimized 64-bit header definitions 
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __CM0_FPLIB_H
+#define __CM0_FPLIB_H 
+
+/* Enable exception interrupt handler.  
+   Exception implementation is opportunistic, and not fully tested.  */
+#define TRAP_EXCEPTIONS (0)
+#define EXCEPTION_CODES (0)
+
+/* Perform extra checks to avoid modifying the sign bit of NANs */
+#define STRICT_NANS (0)
+
+/* Trap signaling NANs regardless of context. */
+#define TRAP_NANS (0)
+
+/* TODO: Define service numbers according to the handler requirements */ 
+#define SVC_TRAP_NAN (0)
+#define SVC_FP_EXCEPTION (0)
+#define SVC_DIVISION_BY_ZERO (0)
+
+/* Push extra registers when required for 64-bit stack alignment */
+#define DOUBLE_ALIGN_STACK (0)
+
+/* Define various exception codes.  These don't map to anything in particular */
+#define SUBTRACTED_INFINITY (20)
+#define INFINITY_TIMES_ZERO (21)
+#define DIVISION_0_BY_0 (22)
+#define DIVISION_INF_BY_INF (23)
+#define UNORDERED_COMPARISON (24)
+#define CAST_OVERFLOW (25)
+#define CAST_INEXACT (26)
+#define CAST_UNDEFINED (27)
+
+/* Exception control for quiet NANs.
+   If TRAP_NAN support is enabled, signaling NANs always raise exceptions. */
+.equ FCMP_RAISE_EXCEPTIONS, 16
+.equ FCMP_NO_EXCEPTIONS,    0
+
+/* These assignments are significant.  See implementation.
+   They must be shared for use in libm functions.  */
+.equ FCMP_3WAY, 1
+.equ FCMP_LT, 2
+.equ FCMP_EQ, 4
+.equ FCMP_GT, 8
+
+.equ FCMP_GE, (FCMP_EQ | FCMP_GT)
+.equ FCMP_LE, (FCMP_LT | FCMP_EQ)
+.equ FCMP_NE, (FCMP_LT | FCMP_GT)
+
+/* These flags affect the result of unordered comparisons.  See implementation.  */
+.equ FCMP_UN_THREE,     128
+.equ FCMP_UN_POSITIVE,  64
+.equ FCMP_UN_ZERO,      32
+.equ FCMP_UN_NEGATIVE,  0
+
+#endif /* __CM0_FPLIB_H */
diff -ruN libgcc/config/arm/cm0/futil.S libgcc/config/arm/cm0/futil.S
--- libgcc/config/arm/cm0/futil.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/futil.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,407 @@
+/* futil.S: Cortex M0 optimized 32-bit common routines
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+//  the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+.section .text.libgcc.normf,"x"
+CM0_FUNC_START fp_normalize2
+    CFI_START_FUNCTION
+
+        // Extract the mantissa.
+        lsls    r3,     r2,     #8
+
+        // Extract the exponent.
+        lsrs    r2,     #24
+        beq     SYM(__fp_lalign2)
+
+        // Restore the mantissa's implicit '1'.
+        adds    r3,     #1
+        rors    r3,     r3
+
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+.section .text.libgcc.alignf,"x"
+CM0_FUNC_START fp_lalign2
+    CFI_START_FUNCTION
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Unroll the loop, similar to __clzsi2().
+        lsrs    rT,     r3,     #16
+        bne     LSYM(__align8)
+        subs    r2,     #16
+        lsls    r3,     #16
+
+    LSYM(__align8):
+        lsrs    rT,     r3,     #24
+        bne     LSYM(__align4)
+        subs    r2,     #8
+        lsls    r3,     #8
+
+    LSYM(__align4):
+        lsrs    rT,     r3,     #28
+        bne     LSYM(__align2)
+        subs    r2,     #4
+        lsls    r3,     #4 // 12
+  #endif
+
+    LSYM(__align2):
+        // Refresh the state of the N flag before entering the loop.
+        tst     r3,     r3
+
+    LSYM(__align_loop):
+        // Test before subtracting to compensate for the natural exponent.
+        // The largest subnormal should have an exponent of 0, not -1.
+        bmi     LSYM(__align_return)
+        subs    r2,     #1
+        lsls    r3,     #1
+        bne     LSYM(__align_loop) // 6 * 31
+
+        // Not just a subnormal... 0!  By design, this should never happen.
+        // All callers of this internal function filter 0 as a special case.
+        // Was there an uncontrolled jump from somewhere else?  Cosmic ray?
+        eors    r2,     r2
+
+      #ifdef DEBUG
+        bkpt    #0
+      #endif
+
+    LSYM(__align_return):
+        RETx    lr // 24 - 192 (size), 19 - 36
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_lalign2
+
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1.  To avoid underflow (slower),
+//  the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2.  The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+.section .text.libgcc.assemblef,"x"
+CM0_FUNC_START fp_assemble
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Examine the upper three bits [31:29] for underflow.
+        lsrs    r3,     r1,     #29
+        beq     LSYM(__fp_underflow)
+
+        // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+        // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+        lsrs    r3,     r1,     #1
+        mvns    r3,     r3
+        ands    r3,     r1
+        lsrs    r3,     #30
+        subs    r3,     #2
+        rors    r1,     r3
+
+        // Update the exponent, assuming the final result will be normal.
+        // The new exponent is 1 less than actual, to compensate for the
+        //  eventual addition of the implicit '1' in the result.
+        // If the final exponent becomes negative, proceed directly to gradual
+        //  underflow, without bothering to search for the MSB.
+        adds    r2,     r3
+
+CM0_FUNC_START fp_assemble2
+        bmi     LSYM(__fp_subnormal)
+
+    LSYM(__fp_normal):
+        // Check for overflow (remember the implicit '1' to be added later).
+        cmp     r2,     #254
+        bge     SYM(__fp_overflow) // +13 underflow
+
+        // Save LSBs for the remainder. Position doesn't matter any more,
+        //  these are just tiebreakers for round-to-even.
+        lsls    rT,     r1,     #25
+
+        // Align the final result.
+        lsrs    r1,     #8
+
+    LSYM(__fp_round):
+        // If carry bit is '0', always round down.
+        bcc     LSYM(__fp_return)
+
+        // The carry bit is '1'.  Round to nearest, ties to even.
+        // If either the saved remainder bits [6:0], the additional remainder
+        //  bits in $r1, or the final LSB is '1', round up.
+        lsls    r3,     r1,     #31
+        orrs    r3,     rT
+        orrs    r3,     r0
+        beq     LSYM(__fp_return)
+
+        // If rounding up overflows the result to 2.0, the result
+        //  is still correct, up to and including INF.
+        adds    r1,     #1
+
+    LSYM(__fp_return):
+        // Combine the mantissa and the exponent.
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        // End of library call, return to user.
+        add     r0,     ip
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: Underflow/inexact reporting IFF remainder
+  #endif
+
+        pop     { rT, pc } // +30 (typical)
+                .cfi_restore_state
+
+    LSYM(__fp_underflow):
+        // Set up to align the mantissa.
+        movs    r3,     r1 // 5
+        bne     LSYM(__fp_underflow2)
+
+        // MSB wasn't in the upper 32 bits, check the remainder.
+        // If the remainder is also zero, the result is +/-0.
+        movs    r3,     r0
+        beq     SYM(__fp_zero)
+
+        eors    r0,     r0
+        subs    r2,     #32
+
+    LSYM(__fp_underflow2):
+        // Save the pre-alignment exponent to align the remainder later.
+        movs    r1,     r2 // 9 - 11
+
+        // Align the mantissa with the MSB in bit[31].
+        bl      SYM(__fp_lalign2) // 37 - 207 (size), 32 - 51
+
+        // Calculate the actual remainder shift.
+        subs    rT,     r1,     r2
+
+        // Align the lower bits of the remainder.
+        movs    r1,     r0
+        lsls    r0,     rT
+
+        // Combine the upper bits of the remainder with the aligned value.
+        rsbs    rT,     #0
+        adds    rT,     #32
+        lsrs    r1,     rT
+        adds    r1,     r3
+
+        // The MSB is now aligned at bit[31] of $r1.
+        // If the net exponent is still positive, the result will be normal.
+        // Because this function is used by fmul(), there is a possibility
+        //  that the value is still wider than 24 bits; always round.
+        tst     r2,     r2
+        bpl     LSYM(__fp_normal)
+
+    LSYM(__fp_subnormal):
+        // The MSB is aligned at bit[31], with a net negative exponent.
+        // The mantissa will need to be shifted right by the absolute value of
+        //  the exponent, plus the normal shift of 8.
+
+        // If the negative shift is smaller than -25, there is no result,
+        //  no rounding, no anything.  Return signed zero.
+        // (Otherwise, the shift for result and remainder may wrap.)
+        adds    r2,     #25
+        bmi     SYM(__fp_inexact_zero)
+
+        // Save the extra bits for the remainder.
+        movs    rT,     r1
+        lsls    rT,     r2
+
+        // Shift the mantissa to create a subnormal.
+        // Just like normal, round to nearest, ties to even.
+        movs    r3,     #33
+        subs    r3,     r2
+        eors    r2,     r2
+
+        // This shift must be last, leaving the shifted LSB in the C flag.
+        lsrs    r1,     r3
+        b       LSYM(__fp_round)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_assemble2
+CM0_FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+.section .text.libgcc.infinityf,"x"
+CM0_FUNC_START fp_overflow
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/overflow exception
+  #endif
+
+CM0_FUNC_START fp_infinity
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        movs    r0,     #255
+        lsls    r0,     #23
+        add     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_infinity
+CM0_FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+.section .text.libgcc.zerof,"x"
+CM0_FUNC_START fp_inexact_zero
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/underflow exception
+  #endif
+
+CM0_FUNC_START fp_zero
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Return 0 with the correct sign.
+        mov     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_zero
+CM0_FUNC_END fp_inexact_zero
+
+
+// Internal function to detect signaling NANs.  No return.
+// Uses $r2 as scratch space.
+.section .text.libgcc.checkf,"x"
+CM0_FUNC_START fp_check_nan2
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+
+CM0_FUNC_START fp_check_nan
+
+        // Check for quiet NAN.
+        lsrs    r2,     r0,     #23
+        bcs     LSYM(__quiet_nan)
+
+        // Raise exception.  Preserves both $r0 and $r1.
+        svc     #(SVC_TRAP_NAN)
+
+        // Quiet the resulting NAN.
+        movs    r2,     #1
+        lsls    r2,     #22
+        orrs    r0,     r2
+
+    LSYM(__quiet_nan):
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_check_nan
+CM0_FUNC_END fp_check_nan2
+
+
+// Internal function to report floating point exceptions.  No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+.section .text.libgcc.exceptf,"x"
+CM0_FUNC_START fp_exception
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Create a quiet NAN.
+        movs    r2,     #255
+        lsls    r2,     #1
+        adds    r2,     #1
+        lsls    r2,     #22
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        // Annotate the exception type in the NAN field.
+        // Make sure that the exception is in the valid region 
+        lsls    rT,     r3,     #13
+        orrs    r2,     rT
+      #endif
+
+// Exception handler that expects the result already in $r2,
+//  typically when the result is not going to be NAN.
+CM0_FUNC_START fp_exception2
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_FP_EXCEPTION)
+      #endif
+
+        // TODO: Save exception flags in a static variable.
+
+        // Set up the result, now that the argument isn't required any more.
+        movs    r0,     r2
+
+        // HACK: for sincosf(), with 2 parameters to return.
+        movs    r1,     r2
+
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_exception2
+CM0_FUNC_END fp_exception
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/lcmp.S libgcc/config/arm/cm0/lcmp.S
--- libgcc/config/arm/cm0/lcmp.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/lcmp.S	2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,96 @@
+/* lcmp.S: Cortex M0 optimized 64-bit integer comparison
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __aeabi_lcmp(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// Returns { -1, 0, +1 } in $r0 for ordering { <, ==, > }, respectively.
+.section .text.libgcc.lcmp,"x"
+CM0_FUNC_START aeabi_lcmp
+    CFI_START_FUNCTION
+
+        // Calculate the difference $r1:$r0 - $r3:$r2.
+        subs    r0,     r2
+        sbcs    r1,     r3
+
+        // With $r2 free, create a reference value without affecting flags.
+        mov     r2,     r3
+
+        // Finish the comparison.
+        blt     LSYM(__lcmp_lt)
+
+        // The reference difference ($r2 - $r3) will be +2 iff the first
+        //  argument is larger, otherwise $r2 remains equal to $r3.
+        adds    r2,     #2
+
+    LSYM(__lcmp_lt):
+        // Check for equality (all 64 bits).
+        orrs    r0,     r1
+        beq     LSYM(__lcmp_return)
+
+        // Convert the relative difference to an absolute value +/-1.
+        subs    r0,     r2,     r3
+        subs    r0,     #1
+
+    LSYM(__lcmp_return):
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lcmp
+
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// Returns { -1, 0, +1 } in $r0 for ordering { <, ==, > }, respectively.
+.section .text.libgcc.ulcmp,"x"
+CM0_FUNC_START aeabi_ulcmp
+    CFI_START_FUNCTION
+
+        // Calculate the 'C' flag.
+        subs    r0,     r2
+        sbcs    r1,     r3
+
+        // $r2 will contain -1 if the first value is smaller,
+        //  0 if the first value is larger or equal.
+        sbcs    r2,     r2
+
+        // Check for equality (all 64 bits).
+        orrs    r0,     r1
+        beq     LSYM(__ulcmp_return)
+
+        // $r0 should contain +1 or -1
+        movs    r0,     #1
+        orrs    r0,     r2
+
+    LSYM(__ulcmp_return):
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ulcmp
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ldiv.S libgcc/config/arm/cm0/ldiv.S
--- libgcc/config/arm/cm0/ldiv.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ldiv.S	2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,413 @@
+/* ldiv.S: Cortex M0 optimized 64-bit integer division
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+.section .text.libgcc.ldiv0,"x"
+CM0_FUNC_START aeabi_ldiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
+        // Return { 0, numerator } for quotient and remainder.
+        movs    r2,     r0
+        movs    r3,     r1
+        eors    r0,     r0
+        eors    r1,     r1
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ldiv0
+
+
+// long long __aeabi_ldiv(long long, long long)
+// lldiv_return __aeabi_ldivmod(long long, long long)
+// Returns signed $r1:$r0 after division by $r3:$r2.
+// Also returns the signed remainder in $r3:$r2.
+.section .text.libgcc.ldiv,"x"
+CM0_FUNC_START aeabi_ldivmod
+FUNC_ALIAS aeabi_ldiv aeabi_ldivmod
+FUNC_ALIAS divdi3 aeabi_ldivmod
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before pushing registers.
+        cmp     r2,     #0
+        bne     LSYM(__ldivmod_valid)
+
+        cmp     r3,     #0
+        beq     SYM(__aeabi_ldiv0)
+
+    LSYM(__ldivmod_valid):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+      #else
+        push    { rP, rQ, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 12
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset lr, 8
+      #endif
+
+        // Absolute value of the numerator.
+        asrs    rP,     r1,     #31
+        eors    r0,     rP
+        eors    r1,     rP
+        subs    r0,     rP
+        sbcs    r1,     rP
+
+        // Absolute value of the denominator.
+        asrs    rQ,     r3,     #31
+        eors    r2,     rQ
+        eors    r3,     rQ
+        subs    r2,     rQ
+        sbcs    r3,     rQ
+
+        // Keep the XOR of signs for the quotient.
+        eors    rQ,     rP
+
+        // Handle division as unsigned.
+        bl      LSYM(__internal_uldivmod)
+
+        // Set the sign of the quotient.
+        eors    r0,     rQ
+        eors    r1,     rQ
+        subs    r0,     rQ
+        sbcs    r1,     rQ
+
+        // Set the sign of the remainder.
+        eors    r2,     rP
+        eors    r3,     rP
+        subs    r2,     rP
+        sbcs    r3,     rP
+
+    LSYM(__ldivmod_return):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { rP, rQ, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divdi3
+CM0_FUNC_END aeabi_ldiv
+CM0_FUNC_END aeabi_ldivmod
+
+
+// unsigned long long __aeabi_uldiv(unsigned long long, unsigned long long)
+// ulldiv_return __aeabi_uldivmod(unsigned long long, unsigned long long)
+// Returns unsigned $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+.section .text.libgcc.uldiv,"x"
+CM0_FUNC_START aeabi_uldivmod
+FUNC_ALIAS aeabi_uldiv aeabi_uldivmod
+FUNC_ALIAS udivdi3 aeabi_uldivmod
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before changing the stack.
+        cmp     r3,     #0
+        bne     LSYM(__internal_uldivmod)
+
+        cmp     r2,     #0
+        beq     SYM(__aeabi_ldiv0)
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    LSYM(__internal_uldivmod):
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+
+        // Set up denominator shift, assuming a single width result.
+        movs    rP,     #32
+
+        // If the upper word of the denominator is 0 ...
+        tst     r3,     r3
+        bne     LSYM(__uldivmod_setup) // 12
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // ... and the upper word of the numerator is also 0,
+        //  single width division will be at least twice as fast.
+        tst     r1,     r1
+        beq     LSYM(__uldivmod_small)
+  #endif
+
+        // ... and the lower word of the denominator is less than or equal
+        //     to the upper word of the numerator ...
+        cmp     r1,     r2
+        blo     LSYM(__uldivmod_setup)
+
+        //  ... then the result will be double width, at least 33 bits.
+        // Set up a flag in $rP to seed the shift for the second word.
+        movs    r3,     r2
+        eors    r2,     r2
+        adds    rP,     #64
+
+    LSYM(__uldivmod_setup):
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // Since search is destructive, first save a copy of the numerator.
+        mov     ip,     r0
+        mov     lr,     r1
+
+        // Set up binary search.
+        movs    rQ,     #16
+        eors    rT,     rT // 21
+
+    LSYM(__uldivmod_align):
+        // Maintain a secondary shift $rT = 32 - $rQ, making the overlapping
+        //  shifts between low and high words easier to construct.
+        adds    rT,     rQ
+
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    r1,     rQ
+
+        // Measure the high bits of denominator against the numerator.
+        cmp     r1,     r3
+        blo     LSYM(__uldivmod_skip)
+        bhi     LSYM(__uldivmod_shift)
+
+        // If the high bits are equal, construct the low bits for checking.
+        mov     r1,     lr
+        lsls    r1,     rT
+
+        lsrs    r0,     rQ
+        orrs    r1,     r0
+
+        cmp     r1,     r2
+        blo     LSYM(__uldivmod_skip)
+
+    LSYM(__uldivmod_shift):
+        // Scale the denominator and the result together.
+        subs    rP,     rQ
+
+        // If the reduced numerator is still larger than or equal to the
+        //  denominator, it is safe to shift the denominator left.
+        movs    r1,     r2
+        lsrs    r1,     rT
+        lsls    r3,     rQ
+
+        lsls    r2,     rQ
+        orrs    r3,     r1
+
+    LSYM(__uldivmod_skip):
+        // Restore the numerator.
+        mov     r0,     ip
+        mov     r1,     lr
+
+        // Iterate until the shift goes to 0.
+        lsrs    rQ,     #1
+        bne     LSYM(__uldivmod_align) // (12 to 23) * 5
+
+        // Initialize the result (zero).
+        mov     ip,     rQ
+
+        // HACK: Compensate for the first word test.
+        lsls    rP,     #6 // 2, 140
+
+    LSYM(__uldivmod_word2):
+        // Is there another word?
+        lsrs    rP,     #6
+        beq     LSYM(__uldivmod_return) // +4
+
+        // Shift the calculated result by 1 word.
+        mov     lr,     ip
+        mov     ip,     rQ
+
+        // Set up the MSB of the next word of the quotient
+        movs    rQ,     #1
+        rors    rQ,     rP
+        b     LSYM(__uldivmod_entry) // 9 * 2, 149
+
+    LSYM(__uldivmod_loop):
+        // Divide the denominator by 2.
+        // It could be slightly faster to multiply the numerator,
+        //  but that would require shifting the remainder at the end.
+        lsls    rT,     r3,     #31
+        lsrs    r3,     #1
+        lsrs    r2,     #1
+        adds    r2,     rT
+
+        // Step to the next bit of the result.
+        lsrs    rQ,     #1
+        beq     LSYM(__uldivmod_word2) // (19 * 32 + 2) * 2, 140+9+610+9+610+4+12
+
+    LSYM(__uldivmod_entry):
+        // Test if the denominator is smaller, high byte first.
+        cmp     r1,     r3
+        blo     LSYM(__uldivmod_loop)
+        bhi     LSYM(__uldivmod_quotient)
+
+        cmp     r0,     r2
+        blo     LSYM(__uldivmod_loop)
+
+    LSYM(__uldivmod_quotient):
+        // Smaller denominator: the next bit of the quotient will be set.
+        add     ip,     rQ
+
+        // Subtract the denominator from the remainder.
+        // If the new remainder goes to 0, exit early.
+        subs    r0,     r2
+        sbcs    r1,     r3
+        bne     LSYM(__uldivmod_loop)
+
+        tst     r0,     r0
+        bne     LSYM(__uldivmod_loop)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Check whether there's still a second word to calculate.
+        lsrs    rP,     #6
+        beq     LSYM(__uldivmod_return)
+
+        // If so, shift the result left by a full word.
+        mov     lr,     ip
+        mov     ip,     r1 // zero
+  #else
+        eors    rQ,     rQ
+        b       LSYM(__uldivmod_word2)
+  #endif
+
+    LSYM(__uldivmod_return):
+        // Move the remainder to the second half of the result.
+        movs    r2,     r0
+        movs    r3,     r1
+
+        // Move the quotient to the first half of the result.
+        mov     r0,     ip
+        mov     r1,     lr
+
+        pop     { rP, rQ, rT, pc } // + 12
+                .cfi_restore_state
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__uldivmod_small):
+        // Arrange arguments for 32-bit division.
+        movs    r1,     r2
+        bl      LSYM(__internal_uidivmod) // 20
+
+        // Extend quotient and remainder to 64 bits, unsigned.
+        movs    r2,     r1
+        eors    r1,     r1
+        eors    r3,     r3
+        pop     { rP, rQ, rT, pc } // 31
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END udivdi3
+CM0_FUNC_END aeabi_uldiv
+CM0_FUNC_END aeabi_uldivmod
+
+
+#if 0
+
+    LSYM(__internal_uldivmod):
+        push    { r0 - rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 32
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset rP, 16
+                .cfi_rel_offset rQ, 20
+                .cfi_rel_offset rT, 24
+                .cfi_rel_offset lr, 28
+
+        // Count leading zeros of the numerator
+        bl      SYM(__clzdi2) // 55
+        mov     rP,     r0
+
+        // Load denominator
+        add     r0,     sp,     #8
+        ldm     r0,     { r0, r1 }
+
+        // Count leading zeros of the denominator.
+        bl      SYM(__clzdi2) // 55
+
+        // If the numerator has more zeros than the denominator,
+        //  the result is { 0, numerator }
+        subs    rP,     r0,     rP
+        bhi     LSYM(__uldivmod_simple)
+
+        // Reload the denominator
+        add     r0,     sp,     #8
+        ldm     r0,     { r0, r1 }
+
+        // Shift the denominator
+        movs    r2,     rP
+        bl      SYM(__aeabi_llsl) // 14
+
+        // Reload the numerator as remainder.
+        pop     { r2, r3 }
+
+        // Discard the copy of the denominator on the stack.
+        add     sp,     #8
+
+        // Shift the first quotient bit into place
+
+        // Initialize the result.
+
+        // Main division loop.
+
+
+        // Copy the quotient to the result.
+        mov     r0,     ip
+        mov     r1,     lr
+
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+
+
+
+    LSYM(__uldivmod_simple):
+        movs    r2,     r0
+        movs    r3,     r1
+        eors    r0,     r0
+        eors    r1,     r1
+
+#endif
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/lmul.S libgcc/config/arm/cm0/lmul.S
--- libgcc/config/arm/cm0/lmul.S	1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/lmul.S	2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,294 @@
+/* lmul.S: Cortex M0 optimized 64-bit integer multiplication 
+
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+.section .text.libgcc.lmul,"x"
+CM0_FUNC_START aeabi_lmul
+FUNC_ALIAS muldi3 aeabi_lmul
+    CFI_START_FUNCTION
+
+        // $r1:$r0 = 0xDDDDCCCCBBBBAAAA
+        // $r3:$r2 = 0xZZZZYYYYXXXXWWWW
+
+        // The following operations that only affect the upper 64 bits
+        //  can be safely discarded:
+        //   DDDD * ZZZZ
+        //   DDDD * YYYY
+        //   DDDD * XXXX
+        //   CCCC * ZZZZ
+        //   CCCC * YYYY
+        //   BBBB * ZZZZ
+
+        // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+    LSYM(__safe_muldi3):
+        // (0xDDDDCCCC * 0xXXXXWWWW), free $r1
+        muls    r1,     r2
+
+        // (0xZZZZYYYY * 0xBBBBAAAA), free $r3
+        muls    r3,     r0
+        add     r3,     r1
+
+        // Put the parameters in the correct form for umulsidi3().
+        movs    r1,     r2
+        b       LSYM(__internal_umulsidi3) // 7
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lmul
+CM0_FUNC_END muldi3
+
+// unsigned long long __aeabi_lmul(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+.section .text.libgcc.umulsidi3,"x"
+CM0_FUNC_START umulsidi3
+    CFI_START_FUNCTION
+
+        // 32x32 multiply with 64 bit result.
+        // Expand the multiply into 4 parts, since muls only returns 32 bits.
+        //         (a16h * b16h / 2^32)
+        //       + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+        //       + (a16l * b16l / 2^64)
+
+        // MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+    LSYM(__safe_umulsidi3):
+        eors    r3,     r3
+
+    LSYM(__internal_umulsidi3):
+        mov     ip,     r3
+
+        // a16h * b16h
+        lsrs    r2,     r0,     #16
+        lsrs    r3,     r1,     #16
+        muls    r2,     r3
+        add     ip,     r2
+
+        // a16l * b16h; save a16h first!
+        lsrs    r2,     r0,     #16
+        uxth    r0,     r0
+        muls    r3,     r0
+
+        // a16l * b16l
+        uxth    r1,     r1
+        muls    r0,     r1
+
+        // a16h * b16l
+        muls    r1,     r2
+
+        // Distribute intermediate results.
+        eors    r2,     r2
+        adds    r1,     r3
+        adcs    r2,     r2
+        lsls    r3,     r1,     #16
+        lsrs    r1,     #16
+        lsls    r2,     #16
+        adds    r0,     r3
+        adcs    r1,     r2
+
+        // Add in the remaining high bits.
+        add     r1,     ip
+        RETx    lr // 24
+
+    CFI_END_FUNCTION
+CM0_FUNC_END umulsidi3
+
+
+// long long __aeabi_lmul(int, int)
+// Returns all 64 bits of a 32 bit signed multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $rT as scratch space.
+.section .text.libgcc.mulsidi3,"x"
+CM0_FUNC_START mulsidi3
+    CFI_START_FUNCTION
+
+        // Push registers for function call .
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save signs of the arguments.
+        asrs    r3,     r0,     #31
+        asrs    rT,     r1,     #31
+
+        // Absolute value of the arguments.
+        eors    r0,     r3
+        eors    r1,     rT
+        subs    r0,     r3
+        subs    r1,     rT
+
+        // Save sign of the result.
+        eors    rT,     r3
+
+        bl      SYM(__umulsidi3) __PLT__ // 14+24
+
+        // Apply sign of the result.
+        eors    r0,     rT
+        eors    r1,     rT
+        subs    r0,     rT
+        sbcs    r1,     rT
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END mulsidi3
+
+
+// long long __aeabi_llsl(long long, int)
+// Logical shift left the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.llsl,"x"
+CM0_FUNC_START aeabi_llsl
+FUNC_ALIAS ashldi3 aeabi_llsl
+    CFI_START_FUNCTION
+
+        // Save a copy for the remainder.
+        movs    r3,     r0
+
+        // Assume a simple shift.
+        lsls    r0,     r2
+        lsls    r1,     r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__llsl_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsrs    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__llsl_large):
+        // Apply any remaining shift
+        lsls    r3,     r2
+
+        // Merge remainder and result.
+        adds    r1,     r3
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ashldi3
+CM0_FUNC_END aeabi_llsl
+
+
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.llsr,"x"
+CM0_FUNC_START aeabi_llsr
+FUNC_ALIAS lshrdi3 aeabi_llsr
+    CFI_START_FUNCTION
+
+        // Save a copy for the remainder.
+        movs    r3,     r1
+
+        // Assume a simple shift.
+        lsrs    r0,     r2
+        lsrs    r1,     r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__llsr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__llsr_large):
+        // Apply any remaining shift
+        lsrs    r3,     r2
+
+        // Merge remainder and result.
+        adds    r0,     r3
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END lshrdi3
+CM0_FUNC_END aeabi_llsr
+
+
+// long long __aeabi_lasr(long long, int)
+// Arithmetic shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.lasr,"x"
+CM0_FUNC_START aeabi_lasr
+FUNC_ALIAS ashrdi3 aeabi_lasr
+    CFI_START_FUNCTION
+
+        // Save a copy for the remainder.
+        movs    r3,     r1
+
+        // Assume a simple shift.
+        lsrs    r0,     r2
+        asrs    r1,     r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__lasr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__lasr_large):
+        // Apply any remaining shift
+        asrs    r3,     r2
+
+        // Merge remainder and result.
+        adds    r0,     r3
+        RETx    lr
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ashrdi3
+CM0_FUNC_END aeabi_lasr
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/lib1funcs.S libgcc/config/arm/lib1funcs.S
--- libgcc/config/arm/lib1funcs.S	2020-11-08 14:32:11.000000000 -0800
+++ libgcc/config/arm/lib1funcs.S	2020-11-12 10:13:44.383982884 -0800
@@ -1050,6 +1050,10 @@
 /* ------------------------------------------------------------------------ */
 /*		Start of the Real Functions				    */
 /* ------------------------------------------------------------------------ */
+
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
 #ifdef L_udivsi3
 
 #if defined(__prefer_thumb__)
@@ -1507,6 +1511,8 @@
 	cfi_end	LSYM(Lend_div0)
 	FUNC_END div0
 #endif
+
+#endif /* NOT_ISA_TARGET_32BIT */
 	
 #endif /* L_dvmd_lnx */
 #ifdef L_clear_cache
@@ -1583,6 +1589,9 @@
    so for Reg value in (32...63) and (-1...-31) we will get zero (in the
    case of logical shifts) or the sign (for asr).  */
 
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
 #ifdef __ARMEB__
 #define al	r1
 #define ah	r0
@@ -1820,6 +1829,8 @@
 #endif
 #endif /* L_clzdi2 */
 
+#endif /* NOT_ISA_TARGET_32BIT */
+
 #ifdef L_ctzsi2
 #ifdef NOT_ISA_TARGET_32BIT
 FUNC_START ctzsi2
@@ -2189,5 +2200,54 @@
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+
+
+#include "cm0/fplib.h"
+
+/* Temp registers. */
+#define rP r4
+#define rQ r5
+#define rS r6
+#define rT r7
+
+.macro CM0_FUNC_START name
+.global SYM(__\name)
+.type SYM(__\name),function
+.thumb_func
+.align 1
+    SYM(__\name):
+.endm
+
+.macro CM0_FUNC_END name
+.size SYM(__\name), . - SYM(__\name)
+.endm
+
+.macro RETx x
+        bx      \x
+.endm
+
+/* Order files to maximize +/- 2k jump offset of 'b' */
+#define __BUILD_CM0_FPLIB
+
+#include "cm0/clz2.S"
+#include "cm0/lmul.S"
+#include "cm0/lcmp.S"
+#include "cm0/div.S"
+#include "cm0/ldiv.S"
+
+#include "cm0/fcmp.S"
+#include "cm0/fconv.S"
+#include "cm0/fneg.S"
+
+#include "cm0/fadd.S"
+#include "cm0/futil.S"
+#include "cm0/fmul.S"
+#include "cm0/fdiv.S"
+
+#include "cm0/ffloat.S"
+#include "cm0/ffixed.S"
+
+#undef __BUILD_CM0_FPLIB
+
 #endif /* NOT_ISA_TARGET_32BIT */
 #endif /* !__symbian__ */

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-25 23:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 23:04 [PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0 Daniel Engel
2020-11-26  9:14 ` Christophe Lyon
2020-12-02  3:32   ` Daniel Engel
2020-12-16 17:15     ` Christophe Lyon
2021-01-06 11:20       ` [PATCH v3] " Daniel Engel
2021-01-06 17:05         ` Richard Earnshaw
2021-01-07  0:59           ` Daniel Engel
2021-01-07 12:56             ` Richard Earnshaw
2021-01-07 13:27               ` Christophe Lyon
2021-01-07 16:44                 ` Richard Earnshaw
2021-01-09 12:28               ` Daniel Engel
2021-01-09 13:09                 ` Christophe Lyon
2021-01-09 18:04                   ` Daniel Engel
2021-01-11 14:49                     ` Richard Earnshaw
2021-01-09 18:48                   ` Daniel Engel
2021-01-11 16:07                   ` Christophe Lyon
2021-01-11 16:18                     ` Daniel Engel
2021-01-11 16:39                       ` Christophe Lyon
2021-01-15 11:40                         ` Daniel Engel
2021-01-15 12:30                           ` Christophe Lyon
2021-01-16 16:14                             ` Daniel Engel
2021-01-21 10:29                               ` Christophe Lyon
2021-01-21 20:35                                 ` Daniel Engel
2021-01-22 18:28                                   ` Christophe Lyon
2021-01-25 17:48                                     ` Christophe Lyon
2021-01-25 23:36                                       ` Daniel Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).