From: "Daniel Engel" <libgcc@danielengel.com>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0
Date: Thu, 12 Nov 2020 15:04:01 -0800 [thread overview]
Message-ID: <3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 7165 bytes --]
Hi,
This patch adds an efficient assembly-language implementation of IEEE-754 compliant floating point routines for Cortex M0 EABI (v6m, thumb-1). This is the libgcc portion of a larger library originally described in 2018:
https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
Since that time, I've separated the libm functions for submission to newlib. The remaining libgcc functions in the attached patch have the following characteristics:
Function(s) Size (bytes) Cycles Stack Accuracy
__clzsi2 42 23 0 exact
__clzsi2 (OPTIMIZE_SIZE) 22 55 0 exact
__clzdi2 8+__clzsi2 4+__clzsi2 0 exact
__umulsidi3 44 24 0 exact
__mulsidi3 30+__umulsidi3 24+__umulsidi3 8 exact
__muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 exact
__ashldi3 (__aeabi_llsl) 22 13 0 exact
__lshrdi3 (__aeabi_llsr) 22 13 0 exact
__ashrdi3 (__aeabi_lasr) 22 13 0 exact
__aeabi_lcmp 20 13 0 exact
__aeabi_ulcmp 16 10 0 exact
__udivsi3 (__aeabi_uidiv) 56 72 – 385 0 < 1 lsb
__divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 < 1 lsb
__udivdi3 (__aeabi_uldiv) 164 103 – 1394 16 < 1 lsb
__udivdi3 (OPTIMIZE_SIZE) 142 120 – 1392 16 < 1 lsb
__divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 < 1 lsb
__shared_float 178
__shared_float (OPTIMIZE_SIZE) 154
__addsf3 (__aeabi_fadd) 116+__shared_float 31 – 76 8 <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 <= 0.5 ulp
__subsf3 (__aeabi_fsub) 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
__aeabi_frsub 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
__mulsf3 (__aeabi_fmul) 112+__shared_float 73 – 97 8 <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 <= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float 83 – 361 8 <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263 – 359 8 <= 0.5 ulp
__cmpsf2/__lesf2/__ltsf2 72 33 0 exact
__eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 exact
__aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 exact
__floatundisf (__aeabi_ul2f) 14+__shared_float 40 – 81 8 <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40 – 237 8 <= 0.5 ulp
__floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 <= 0.5 ulp
__floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 <= 0.5 ulp
__floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 <= 0.5 ulp
__fixsfdi (__aeabi_f2lz) 74 27 – 33 0 exact
__fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixsfsi (__aeabi_f2iz) 52 19 0 exact
__fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 exact
__fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 exact
__extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 exact
__aeabi_d2f 56+__shared_float 54 – 58 8 <= 0.5 ulp
__aeabi_h2f 34+__shared_float 34 8 exact
__aeabi_f2h 84 23 – 34 0 <= 0.5 ulp
Copyright assignment is on file with the FSF.
I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapshot of GCC plus this patch, and successfully compiled a test program:
extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division
volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction
if (f != c) // float comparison
y -= (long long int)d; // float casting
}
As one point of comparison, the test program links to 876 bytes of libgcc code from the patched toolchain, vs 10276 bytes from the latest released gcc-arm-none-eabi-9-2020-q2 toolchain. That's a 90% size reduction.
I have extensive test vectors, and have passed these tests on an STM32F051. These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [3] sources, plus some of my own creation. Unfortunately, I'm not sure how "make check" should work for a cross compiler run time library.
Although I believe this patch can be incorporated as-is, there are at least two points that might bear discussion:
* I'm not sure where or how they would be integrated, but I would be happy to provide sources for my test vectors.
* The library is currently built for the ARM v6m architecture only. It is likely that some of the other Cortex variants would benefit from these routines. However, I would need some guidance on this to proceed without introducing regressions. I do not currently have a test strategy for architectures beyond Cortex M0, and I have NOT profiled the existing thumb-2 implementations (ieee754-sf.S) for comparison.
I'm naturally hoping for some action on this patch before the Nov 16th deadline for GCC-11 stage 3. Please review and advise.
Thanks,
Daniel Engel
[1] http://www.netlib.org/fp/ucbtest.tgz
[2] http://www.jhauser.us/arithmetic/TestFloat.html
[3] http://win-www.uia.ac.be/u/cant/ieeecc754.html
[-- Attachment #2: cortex-m0-fplib-20201112.patch --]
[-- Type: application/octet-stream, Size: 133513 bytes --]
diff -ruN libgcc/config/arm/bpabi-v6m.S libgcc/config/arm/bpabi-v6m.S
--- libgcc/config/arm/bpabi-v6m.S 2020-11-08 14:32:11.000000000 -0800
+++ libgcc/config/arm/bpabi-v6m.S 2020-11-12 09:06:46.383424089 -0800
@@ -33,212 +33,6 @@
.eabi_attribute 25, 1
#endif /* __ARM_EABI__ */
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
- cmp xxh, yyh
- beq 1f
- bgt 2f
- movs r0, #1
- negs r0, r0
- RET
-2:
- movs r0, #1
- RET
-1:
- subs r0, xxl, yyl
- beq 1f
- bhi 2f
- movs r0, #1
- negs r0, r0
- RET
-2:
- movs r0, #1
-1:
- RET
- FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
- cmp xxh, yyh
- bne 1f
- subs r0, xxl, yyl
- beq 2f
-1:
- bcs 1f
- movs r0, #1
- negs r0, r0
- RET
-1:
- movs r0, #1
-2:
- RET
- FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
-.macro test_div_by_zero signed
- cmp yyh, #0
- bne 7f
- cmp yyl, #0
- bne 7f
- cmp xxh, #0
- .ifc \signed, unsigned
- bne 2f
- cmp xxl, #0
-2:
- beq 3f
- movs xxh, #0
- mvns xxh, xxh @ 0xffffffff
- movs xxl, xxh
-3:
- .else
- blt 6f
- bgt 4f
- cmp xxl, #0
- beq 5f
-4: movs xxl, #0
- mvns xxl, xxl @ 0xffffffff
- lsrs xxh, xxl, #1 @ 0x7fffffff
- b 5f
-6: movs xxh, #0x80
- lsls xxh, xxh, #24 @ 0x80000000
- movs xxl, #0
-5:
- .endif
- @ tailcalls are tricky on v6-m.
- push {r0, r1, r2}
- ldr r0, 1f
- adr r1, 1f
- adds r0, r1
- str r0, [sp, #8]
- @ We know we are not on armv4t, so pop pc is safe.
- pop {r0, r1, pc}
- .align 2
-1:
- .word __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
- test_div_by_zero signed
-
- push {r0, r1}
- mov r0, sp
- push {r0, lr}
- ldr r0, [sp, #8]
- bl SYM(__gnu_ldivmod_helper)
- ldr r3, [sp, #4]
- mov lr, r3
- add sp, sp, #8
- pop {r2, r3}
- RET
- FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
- test_div_by_zero unsigned
-
- push {r0, r1}
- mov r0, sp
- push {r0, lr}
- ldr r0, [sp, #8]
- bl SYM(__udivmoddi4)
- ldr r3, [sp, #4]
- mov lr, r3
- add sp, sp, #8
- pop {r2, r3}
- RET
- FUNC_END aeabi_uldivmod
-
-#endif /* L_aeabi_uldivmod */
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
- push {r4, lr}
- movs r4, #1
- lsls r4, #31
- eors r0, r0, r4
- bl __aeabi_fadd
- pop {r4, pc}
-
- FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
- mov ip, r0
- movs r0, r1
- mov r1, ip
- b 6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
- @ The status-returning routines are required to preserve all
- @ registers except ip, lr, and cpsr.
-6: push {r0, r1, r2, r3, r4, lr}
- bl __lesf2
- @ Set the Z flag correctly, and the C flag unconditionally.
- cmp r0, #0
- @ Clear the C flag if the return value was -1, indicating
- @ that the first operand was smaller than the second.
- bmi 1f
- movs r1, #0
- cmn r0, r1
-1:
- pop {r0, r1, r2, r3, r4, pc}
-
- FUNC_END aeabi_cfcmple
- FUNC_END aeabi_cfcmpeq
- FUNC_END aeabi_cfrcmple
-
-FUNC_START aeabi_fcmpeq
-
- push {r4, lr}
- bl __eqsf2
- negs r0, r0
- adds r0, r0, #1
- pop {r4, pc}
-
- FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START aeabi_fcmp\cond
-
- push {r4, lr}
- bl __\helper\mode
- cmp r0, #0
- b\cond 1f
- movs r0, #0
- pop {r4, pc}
-1:
- movs r0, #1
- pop {r4, pc}
-
- FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
#ifdef L_arm_addsubdf3
FUNC_START aeabi_drsub
diff -ruN libgcc/config/arm/cm0/clz2.S libgcc/config/arm/cm0/clz2.S
--- libgcc/config/arm/cm0/clz2.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/clz2.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,122 @@
+/* clz2.S: Cortex M0 optimized 'clz' functions
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __clzdi2(long long)
+// Counts leading zeros in a 64 bit double word.
+// Expects the argument in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+.section .text.libgcc.clz2,"x"
+CM0_FUNC_START clzdi2
+ CFI_START_FUNCTION
+
+ // Assume all the bits in the argument are zero.
+ movs r2, #64
+
+ // If the upper word is ZERO, calculate 32 + __clzsi2(lower).
+ cmp r1, #0
+ beq LSYM(__clz16)
+
+ // The upper word is non-zero, so calculate __clzsi2(upper).
+ movs r0, r1
+
+ // Fall through.
+
+
+// int __clzsi2(int)
+// Counts leading zeros in a 32 bit word.
+// Expects the argument in $r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+CM0_FUNC_START clzsi2
+ // Assume all the bits in the argument are zero
+ movs r2, #32
+
+ LSYM(__clz16):
+ // Size optimized: 22 bytes, 51 clocks
+ // Speed optimized: 42 bytes, 23 clocks
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Binary search starts at half the word width.
+ movs r3, #16
+
+ LSYM(__clz_loop):
+ // Test the upper 'n' bits of the operand for ZERO.
+ movs r1, r0
+ lsrs r1, r3
+ beq LSYM(__clz_skip)
+
+ // When the test fails, discard the lower bits of the register,
+ // and deduct the count of discarded bits from the result.
+ movs r0, r1
+ subs r2, r3
+
+ LSYM(__clz_skip):
+ // Decrease the shift distance for the next test.
+ lsrs r3, #1
+ bne LSYM(__clz_loop)
+ #else
+ // Unrolled binary search.
+ lsrs r1, r0, #16
+ beq LSYM(__clz8)
+ movs r0, r1
+ subs r2, #16
+
+ LSYM(__clz8):
+ lsrs r1, r0, #8
+ beq LSYM(__clz4)
+ movs r0, r1
+ subs r2, #8
+
+ LSYM(__clz4):
+ lsrs r1, r0, #4
+ beq LSYM(__clz2)
+ movs r0, r1
+ subs r2, #4
+
+ LSYM(__clz2):
+ lsrs r1, r0, #2
+ beq LSYM(__clz1)
+ movs r0, r1
+ subs r2, #2
+
+ LSYM(__clz1):
+ // Convert remainder {0,1,2,3} to {0,1,2,2} (no 'ldr' cache hit).
+ lsrs r1, r0, #1
+ bics r0, r1
+ #endif
+
+ // Account for the remainder.
+ subs r0, r2, r0
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+CM0_FUNC_END clzdi2
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/div.S libgcc/config/arm/cm0/div.S
--- libgcc/config/arm/cm0/div.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/div.S 2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,180 @@
+/* div.S: Cortex M0 optimized 32-bit integer division
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+.section .text.libgcc.idiv0,"x"
+CM0_FUNC_START aeabi_idiv0
+ CFI_START_FUNCTION
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_DIVISION_BY_ZERO)
+ #endif
+
+ // Return {0, numerator}.
+ movs r1, r0
+ eors r0, r0
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_idiv0
+
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+.section .text.libgcc.idiv,"x"
+CM0_FUNC_START aeabi_idivmod
+FUNC_ALIAS aeabi_idiv aeabi_idivmod
+FUNC_ALIAS divsi3 aeabi_idivmod
+ CFI_START_FUNCTION
+
+ // Extend the sign of the denominator.
+ asrs r3, r1, #31
+
+ // Absolute value of the denominator, abort on division by zero.
+ eors r1, r3
+ subs r1, r3
+ beq SYM(__aeabi_idiv0)
+
+ // Absolute value of the numerator.
+ asrs r2, r0, #31
+ eors r0, r2
+ subs r0, r2
+
+ // Keep the sign of the numerator in bit[31] (for the remainder).
+ // Save the XOR of the signs in bits[15:0] (for the quotient).
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ lsrs rT, r3, #16
+ eors rT, r2
+
+ // Handle division as unsigned.
+ bl LSYM(__internal_uidivmod)
+
+ // Set the sign of the remainder.
+ asrs r2, rT, #31
+ eors r1, r2
+ subs r1, r2
+
+ // Set the sign of the quotient.
+ sxth r3, rT
+ eors r0, r3
+ subs r0, r3
+
+ LSYM(__idivmod_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divsi3
+CM0_FUNC_END aeabi_idiv
+CM0_FUNC_END aeabi_idivmod
+
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned int)
+// Returns unsigned $r0 after division by $r1.
+// Also returns the remainder in $r1.
+.section .text.libgcc.uidiv,"x"
+CM0_FUNC_START aeabi_uidivmod
+FUNC_ALIAS aeabi_uidiv aeabi_uidivmod
+FUNC_ALIAS udivsi3 aeabi_uidivmod
+ CFI_START_FUNCTION
+
+ // Abort on division by zero.
+ tst r1, r1
+ beq SYM(__aeabi_idiv0)
+
+ #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+ // MAYBE: Optimize division by a power of 2
+ #endif
+
+ LSYM(__internal_uidivmod):
+ // Pre division: Shift the denominator as far as possible left
+ // without making it larger than the numerator.
+ // The loop is destructive, save a copy of the numerator.
+ mov ip, r0
+
+ // Set up binary search.
+ movs r3, #16
+ movs r2, #1
+
+ LSYM(__uidivmod_align):
+ // Prefer dividing the numerator to multipying the denominator
+ // (multiplying the denominator may result in overflow).
+ lsrs r0, r3
+ cmp r0, r1
+ blo LSYM(__uidivmod_skip)
+
+ // Multiply the denominator and the result together.
+ lsls r1, r3
+ lsls r2, r3
+
+ LSYM(__uidivmod_skip):
+ // Restore the numerator, and iterate until search goes to 0.
+ mov r0, ip
+ lsrs r3, #1
+ bne LSYM(__uidivmod_align)
+
+ // In The result $r3 has been conveniently initialized to 0.
+ b LSYM(__uidivmod_entry)
+
+ LSYM(__uidivmod_loop):
+ // Scale the denominator and the quotient together.
+ lsrs r1, #1
+ lsrs r2, #1
+ beq LSYM(__uidivmod_return)
+
+ LSYM(__uidivmod_entry):
+ // Test if the denominator is smaller than the numerator.
+ cmp r0, r1
+ blo LSYM(__uidivmod_loop)
+
+ // If the denominator is smaller, the next bit of the result is '1'.
+ // If the new remainder goes to 0, exit early.
+ adds r3, r2
+ subs r0, r1
+ bne LSYM(__uidivmod_loop)
+
+ LSYM(__uidivmod_return):
+ mov r1, r0
+ mov r0, r3
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END udivsi3
+CM0_FUNC_END aeabi_uidiv
+CM0_FUNC_END aeabi_uidivmod
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fadd.S libgcc/config/arm/cm0/fadd.S
--- libgcc/config/arm/cm0/fadd.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fadd.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,301 @@
+/* fadd.S: Cortex M0 optimized 32-bit float addition
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+.section .text.libgcc.frsub,"x"
+CM0_FUNC_START aeabi_frsub
+ CFI_START_FUNCTION
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Check if $r0 is NAN before modifying.
+ lsls r2, r0, #1
+ movs r3, #255
+ lsls r3, #24
+
+ // Let fadd() find the NAN in the normal course of operation,
+ // moving it to $r0 and checking the quiet/signaling bit.
+ cmp r2, r3
+ bhi LSYM(__internal_fadd)
+ #endif
+
+ // Flip sign and run through fadd().
+ movs r2, #1
+ lsls r2, #31
+ adds r0, r2
+ b LSYM(__internal_fadd)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_frsub
+
+
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+.section .text.libgcc.fsub,"x"
+CM0_FUNC_START aeabi_fsub
+FUNC_ALIAS subsf3 aeabi_fsub
+ CFI_START_FUNCTION
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Check if $r1 is NAN before modifying.
+ lsls r2, r1, #1
+ movs r3, #255
+ lsls r3, #24
+
+ // Let fadd() find the NAN in the normal course of operation,
+ // moving it to $r0 and checking the quiet/signaling bit.
+ cmp r2, r3
+ bhi LSYM(__internal_fadd)
+ #endif
+
+ // Flip sign and run through fadd().
+ movs r2, #1
+ lsls r2, #31
+ adds r1, r2
+ b LSYM(__internal_fadd)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END subsf3
+CM0_FUNC_END aeabi_fsub
+
+
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+.section .text.libgcc.fadd,"x"
+CM0_FUNC_START aeabi_fadd
+FUNC_ALIAS addsf3 aeabi_fadd
+ CFI_START_FUNCTION
+
+ LSYM(__internal_fadd):
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Drop the sign bit to compare absolute value.
+ lsls r2, r0, #1
+ lsls r3, r1, #1
+
+ // Save the logical difference of original values.
+ // This actually makes the following swap slightly faster.
+ eors r1, r0
+
+ // Compare exponents+mantissa.
+ // MAYBE: Speedup for equal values? This would have to separately
+ // check for NAN/INF and then either:
+ // * Increase the exponent by '1' (for multiply by 2), or
+ // * Return +0
+ cmp r2, r3
+ bhs LSYM(__fadd_ordered)
+
+ // Reorder operands so the larger absolute value is in r2,
+ // the corresponding original operand is in $r0,
+ // and the smaller absolute value is in $r3.
+ movs r3, r2
+ eors r0, r1
+ lsls r2, r0, #1
+
+ LSYM(__fadd_ordered):
+ // Extract the exponent of the larger operand.
+ // If INF/NAN, then it becomes an automatic result.
+ lsrs r2, #24
+ cmp r2, #255
+ beq LSYM(__fadd_special)
+
+ // Save the sign of the result.
+ lsrs rT, r0, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // If the original value of $r1 was to +/-0,
+ // $r0 becomes the automatic result.
+ // Because $r0 is known to be a finite value, return directly.
+ // It's actually important that +/-0 not go through the normal
+ // process, to keep "-0 +/- 0" from being turned into +0.
+ cmp r3, #0
+ beq LSYM(__fadd_zero)
+
+ // Extract the second exponent.
+ lsrs r3, #24
+
+ // Calculate the difference of exponents (always positive).
+ subs r3, r2, r3
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // If the smaller operand is more than 25 bits less significant
+ // than the larger, the larger operand is an automatic result.
+ // The smaller operand can't affect the result, even after rounding.
+ cmp r3, #25
+ bhi LSYM(__fadd_return)
+ #endif
+
+ // Isolate both mantissas, recovering the smaller.
+ lsls rT, r0, #9
+ lsls r0, r1, #9
+ eors r0, rT // 26
+
+ // If the larger operand is normal, restore the implicit '1'.
+ // If subnormal, the second operand will also be subnormal.
+ cmp r2, #0
+ beq LSYM(__fadd_normal)
+ adds rT, #1
+ rors rT, rT
+
+ // If the smaller operand is also normal, restore the implicit '1'.
+ // If subnormal, the smaller operand effectively remains multiplied
+ // by 2 w.r.t the first. This compensates for subnormal exponents,
+ // which are technically still -126, not -127.
+ cmp r2, r3
+ beq LSYM(__fadd_normal)
+ adds r0, #1
+ rors r0, r0
+
+ LSYM(__fadd_normal):
+ // Provide a spare bit for overflow.
+ // Normal values will be aligned in bits [30:7]
+ // Subnormal values will be aligned in bits [30:8]
+ lsrs rT, #1
+ lsrs r0, #1
+
+ // If signs weren't matched, negate the smaller operand (branchless).
+ asrs r1, #31
+ eors r0, r1
+ subs r0, r1
+
+ // Keep a copy of the small mantissa for the remainder.
+ movs r1, r0
+
+ // Align the small mantissa for addition.
+ asrs r1, r3
+
+ // Isolate the remainder.
+ // NOTE: Given the various cases above, the remainder will only
+ // be used as a boolean for rounding ties to even. It is not
+ // necessary to negate the remainder for subtraction operations.
+ rsbs r3, #0
+ adds r3, #32
+ lsls r0, r3
+
+ // Because operands are ordered, the result will never be negative.
+ // If the result of subtraction is 0, the overall result must be +0.
+ // If the overall result in $r1 is 0, then the remainder in $r0
+ // must also be 0, so no register copy is necessary on return.
+ adds r1, rT
+ beq LSYM(__fadd_return)
+
+ // The large operand was aligned in bits [29:7]...
+ // If the larger operand was normal, the implicit '1' went in bit [30].
+ //
+ // After addition, the MSB of the result may be in bit:
+ // 31, if the result overflowed.
+ // 30, the usual case.
+ // 29, if there was a subtraction of operands with exponents
+ // differing by more than 1.
+ // < 28, if there was a subtraction of operands with exponents +/-1,
+ // < 28, if both operands were subnormal.
+
+ // In the last case (both subnormal), the alignment shift will be 8,
+ // the exponent will be 0, and no rounding is necessary.
+ cmp r2, #0
+ bne SYM(__fp_assemble) // 46
+
+ // Subnormal overflow automatically forms the correct exponent.
+ lsrs r0, r1, #8
+ add r0, ip
+
+ LSYM(__fadd_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ LSYM(__fadd_special):
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // If $r1 is (also) NAN, force it in place of $r0.
+ // As the smaller NAN, it is more likely to be signaling.
+ movs rT, #255
+ lsls rT, #24
+ cmp r3, rT
+ bls LSYM(__fadd_ordered2)
+
+ eors r0, r1
+ #endif
+
+ LSYM(__fadd_ordered2):
+ // There are several possible cases to consider here:
+ // 1. Any NAN/NAN combination
+ // 2. Any NAN/INF combination
+ // 3. Any NAN/value combination
+ // 4. INF/INF with matching signs
+ // 5. INF/INF with mismatched signs.
+ // 6. Any INF/value combination.
+ // In all cases but the case 5, it is safe to return $r0.
+ // In the special case, a new NAN must be constructed.
+ // First, check the mantissa to see if $r0 is NAN.
+ lsls r2, r0, #9
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ bne SYM(__fp_check_nan)
+ #else
+ bne LSYM(__fadd_return)
+ #endif
+
+ LSYM(__fadd_zero):
+ // Next, check for an INF/value combination.
+ lsls r2, r1, #1
+ bne LSYM(__fadd_return)
+
+ // Finally, check for matching sign on INF/INF.
+ // Also accepts matching signs when +/-0 are added.
+ bcc LSYM(__fadd_return)
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(SUBTRACTED_INFINITY)
+ #endif
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ // Restore original operands.
+ eors r1, r0
+ #endif
+
+ // Identify mismatched 0.
+ lsls r2, r0, #1
+ bne SYM(__fp_exception)
+
+ // Force mismatched 0 to +0.
+ eors r0, r0
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END addsf3
+CM0_FUNC_END aeabi_fadd
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fcmp.S libgcc/config/arm/cm0/fcmp.S
--- libgcc/config/arm/cm0/fcmp.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fcmp.S 2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,555 @@
+/* fcmp.S: Cortex M0 optimized 32-bit float comparison
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * +1 if ($r0 > $r1), or either argument is NAN
+// * 0 if ($r0 == $r1)
+// * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.cmpsf2,"x"
+CM0_FUNC_START cmpsf2
+FUNC_ALIAS lesf2 cmpsf2
+FUNC_ALIAS ltsf2 cmpsf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+CM0_FUNC_START internal_cmpsf2
+
+ // When operand signs are considered, the comparison result falls
+ // within one of the following quadrants:
+ //
+ // $r0 $r1 $r0-$r1* flags result
+ // + + > C=0 GT
+ // + + = Z=1 EQ
+ // + + < C=1 LT
+ // + - > C=1 GT
+ // + - = C=1 GT
+ // + - < C=1 GT
+ // - + > C=0 LT
+ // - + = C=0 LT
+ // - + < C=0 LT
+ // - - > C=0 LT
+ // - - = Z=1 EQ
+ // - - < C=1 GT
+ //
+ // *When interpeted as a subtraction of unsigned integers
+ //
+ // From the table, it is clear that in the presence of any negative
+ // operand, the natural result simply needs to be reversed.
+ // Save the 'N' flag for later use.
+ movs r3, r0
+ orrs r3, r1
+ mov ip, r3
+
+ // Keep the absolute value of the second argument for NAN testing.
+ lsls r3, r1, #1
+
+ // With the absolute value of the second argument safely stored,
+ // recycle $r1 to calculate the difference of the arguments.
+ subs r1, r0, r1
+
+ // Save the 'C' flag for use later.
+ // Effectively shifts all the flags 1 bit left.
+ adcs r2, r2
+
+ // Absolute value of the first argument.
+ lsls r0, #1
+
+ // Identify the largest absolute value between the two arguments.
+ cmp r0, r3
+ bhs LSYM(__fcmp_sorted)
+
+ // Keep the larger absolute value for NAN testing.
+ // NOTE: When the arguments are respectively a signaling NAN and a
+ // quiet NAN, the quiet NAN has precedence. This has consequences
+ // if TRAP_NANS is enabled, but the flags indicate that exceptions
+ // for quiet NANs should be suppressed. After the signaling NAN is
+ // discarded, no exception is raised, although it should have been.
+ // This could be avoided by using a fifth register to save both
+ // arguments until the signaling bit can be tested, but that seems
+ // like an excessive amount of ugly code for an ambiguous case.
+ movs r0, r3
+
+ LSYM(__fcmp_sorted):
+ // If $r3 is NAN, the result is unordered.
+ movs r3, #255
+ lsls r3, #24
+ cmp r0, r3
+ bhi LSYM(__fcmp_unordered)
+
+ // Positive and negative zero must be considered equal.
+ // If the larger absolute value is +/-0, both must have been +/-0.
+ subs r3, r0, #0
+ beq LSYM(__fcmp_zero)
+
+ // Test for regular equality.
+ subs r3, r1, #0
+ beq LSYM(__fcmp_zero)
+
+ // Isolate the saved 'C', and invert if either argument was negative.
+ // Remembering that the original subtraction was $r1 - $r0,
+ // the result will be 1 if 'C' was set (gt), or 0 for not 'C' (lt).
+ lsls r3, r2, #31
+ add r3, ip
+ lsrs r3, #31
+
+ // HACK: Clear the 'C' bit
+ adds r3, #0
+
+ LSYM(__fcmp_zero):
+ // After everything is combined, the temp result will be
+ // 2 (gt), 1 (eq), or 0 (lt).
+ adcs r3, r3
+
+ // Return directly if the 3-way comparison flag is set.
+ // Also shifts the condition mask into bits[2:0].
+ lsrs r2, #2 // 26
+ bcs LSYM(__fcmp_return)
+
+ // If the bit corresponding to the comparison result is set in the
+ // accepance mask, a '1' will fall out into the result.
+ movs r0, #1
+ lsrs r2, r3
+ ands r0, r2
+ RETx lr // 33
+
+ LSYM(__fcmp_unordered):
+ // Set up the requested UNORDERED result.
+ // Remember the shift in the flags (above).
+ lsrs r2, #6
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ // TODO: ... The
+
+
+ #endif
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Always raise an exception if FCMP_RAISE_EXCEPTIONS was specified.
+ bcs LSYM(__fcmp_trap)
+
+ // If FCMP_NO_EXCEPTIONS was specified, no exceptions on quiet NANs.
+ // The comparison flags are moot, so $r1 can serve as scratch space.
+ lsrs r1, r0, #24
+ bcs LSYM(__fcmp_return2)
+
+ LSYM(__fcmp_trap):
+ // Restore the NAN (sans sign) for an argument to the exception.
+ // As an IRQ, the handler restores all registers, including $r3.
+ // NOTE: The service handler may not return.
+ lsrs r0, #1
+ movs r3, #(UNORDERED_COMPARISON)
+ svc #(SVC_TRAP_NAN)
+ #endif
+
+ LSYM(__fcmp_return2):
+ // HACK: Work around result register mapping.
+ // This could probably be eliminated by remapping the flags register.
+ movs r3, r2
+
+ LSYM(__fcmp_return):
+ // Finish setting up the result.
+ // The subtraction allows a negative result from an 8 bit set of flags.
+ // (See the variations on the FCMP_UN parameter, above).
+ subs r0, r3, #1
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ltsf2
+CM0_FUNC_END lesf2
+CM0_FUNC_END cmpsf2
+
+
+// int __eqsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * -1 if ($r0 < $r1)
+// * 0 if ($r0 == $r1)
+// * +1 if ($r0 > $r1), or either argument is NAN
+// Uses $r2, $r3, and $ip as scratch space.
+CM0_FUNC_START eqsf2
+FUNC_ALIAS nesf2 eqsf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END nesf2
+CM0_FUNC_END eqsf2
+
+
+// int __gesf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * -1 if ($r0 < $r1), or either argument is NAN
+// * 0 if ($r0 == $r1)
+// * +1 if ($r0 > $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.gesf2,"x"
+CM0_FUNC_START gesf2
+FUNC_ALIAS gtsf2 gesf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_NEGATIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END gtsf2
+CM0_FUNC_END gesf2
+
+
+// int __aeabi_fcmpeq(float, float)
+// Returns '1' in $r1 if ($r0 == $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpeq,"x"
+CM0_FUNC_START aeabi_fcmpeq
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpeq
+
+
+// int __aeabi_fcmpne(float, float) [non-standard]
+// Returns '1' in $r1 if ($r0 != $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpne,"x"
+CM0_FUNC_START aeabi_fcmpne
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_NE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpne
+
+
+// int __aeabi_fcmplt(float, float)
+// Returns '1' in $r1 if ($r0 < $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmplt,"x"
+CM0_FUNC_START aeabi_fcmplt
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmplt
+
+
+// int __aeabi_fcmple(float, float)
+// Returns '1' in $r1 if ($r0 <= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmple,"x"
+CM0_FUNC_START aeabi_fcmple
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmple
+
+
+// int __aeabi_fcmpge(float, float)
+// Returns '1' in $r1 if ($r0 >= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpge,"x"
+CM0_FUNC_START aeabi_fcmpge
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpge
+
+
+// int __aeabi_fcmpgt(float, float)
+// Returns '1' in $r1 if ($r0 > $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpgt,"x"
+CM0_FUNC_START aeabi_fcmpgt
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpgt
+
+
+// int __aeabi_fcmpun(float, float)
+// Returns '1' in $r1 if $r0 and $r1 are unordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libgcc.fcmpun,"x"
+CM0_FUNC_START aeabi_fcmpun
+FUNC_ALIAS unordsf2 aeabi_fcmpun
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END unordsf2
+CM0_FUNC_END aeabi_fcmpun
+
+#if 0
+
+
+// void __aeabi_cfrcmple(float, float)
+// Reverse three-way compare of $r1 ? $r1, with result in the status flags:
+// * 'Z' is set only when the operands are ordered and equal.
+// * 'C' is clear only when the operands are ordered and $r0 > $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+.section .text.libgcc.cfrcmple,"x"
+CM0_FUNC_START aeabi_cfrcmple
+ CFI_START_FUNCTION
+
+ push { r0-r3, lr }
+
+ // Save the current CFI state
+ .cfi_adjust_cfa_offset 20
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset lr, 16
+
+ // Reverse the order of the arguments.
+ ldr r0, [sp, #4]
+ ldr r1, [sp, #0]
+
+ // Don't just fall through into cfcmple(), else registers will get pushed twice.
+ b SYM(__real_cfrcmple)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_cfrcmple
+
+
+// void __aeabi_cfcmpeq(float, float)
+// NOTE: This function only applies if __aeabi_cfcmple() can raise exceptions.
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+// * 'Z' is set only when the operands are ordered and equal.
+// * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+#if defined(TRAP_NANS) && TRAP_NANS
+ .section .text.libgcc.cfcmpeq,"x"
+ CM0_FUNC_START aeabi_cfcmpeq
+ CFI_START_FUNCTION
+
+ push { r0-r3, lr }
+
+ // Save the current CFI state
+ .cfi_adjust_cfa_offset 20
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset lr, 16
+
+ // No exceptions on quiet NAN.
+ // On an unordered result, 'C' should be '1' and 'Z' should be '0'.
+ // A subtraction giving -1 sets these flags correctly.
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS)
+ b LSYM(__real_cfcmpeq)
+
+ CFI_END_FUNCTION
+ CM0_FUNC_END aeabi_cfcmpeq
+#endif
+
+// void __aeabi_cfcmple(float, float)
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+// * 'Z' is set only when the operands are ordered and equal.
+// * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+.section .text.libgcc.cfcmple,"x"
+CM0_FUNC_START aeabi_cfcmple
+
+ // __aeabi_cfcmpeq() is defined separately when TRAP_NANS is enabled.
+ #if !defined(TRAP_NANS) || !TRAP_NANS
+ FUNC_ALIAS aeabi_cfcmpeq aeabi_cfcmple
+ #endif
+
+ CFI_START_FUNCTION
+
+ push { r0-r3, lr }
+
+ // Save the current CFI state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 20
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset lr, 16
+
+ LSYM(__real_cfrcmple):
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // The result in $r0 will be ignored, but do raise exceptions.
+ // On an unordered result, 'C' should be '1' and 'Z' should be '0'.
+ // A subtraction giving -1 sets these flags correctly.
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS)
+ #endif
+
+ LSYM(__real_cfcmpeq):
+ // __internal_cmpsf2() always sets the APSR flags on return.
+ bl LSYM(__internal_cmpsf2)
+
+ // Because __aeabi_cfcmpeq() wants the 'C' flag set on equal values,
+ // magic is required. For the possible intermediate values in $r3:
+ // * 0b01 gives C = 0 and Z = 0 for $r0 < $r1
+ // * 0b10 gives C = 1 and Z = 1 for $r0 == $r1
+ // * 0b11 gives C = 1 and Z = 0 for $r0 > $r1 (or unordered)
+ cmp r1, #0
+
+ // Cleanup.
+ pop { r0-r3, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+
+ #if !defined(TRAP_NANS) || !TRAP_NANS
+ CM0_FUNC_END aeabi_cfcmpeq
+ #endif
+
+CM0_FUNC_END aeabi_cfcmple
+
+
+// int isgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 > $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isgreaterf,"x"
+CM0_FUNC_START isgreaterf
+MATH_ALIAS isgreaterf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isgreaterf
+CM0_FUNC_END isgreaterf
+
+
+// int isgreaterequalf(float, float)
+// Returns '1' in $r0 if ($r0 >= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isgreaterequalf,"x"
+CM0_FUNC_START isgreaterequalf
+MATH_ALIAS isgreaterequalf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isgreaterequalf
+CM0_FUNC_END isgreaterequalf
+
+
+// int islessf(float, float)
+// Returns '1' in $r0 if ($r0 < $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessf,"x"
+CM0_FUNC_START islessf
+MATH_ALIAS islessf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessf
+CM0_FUNC_END islessf
+
+
+// int islessequalf(float, float)
+// Returns '1' in $r0 if ($r0 <= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessequalf,"x"
+CM0_FUNC_START islessequalf
+MATH_ALIAS islessequalf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessequalf
+CM0_FUNC_END islessequalf
+
+
+// int islessgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 != $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.islessgreaterf,"x"
+CM0_FUNC_START islessgreaterf
+MATH_ALIAS islessgreaterf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessgreaterf
+CM0_FUNC_END islessgreaterf
+
+
+// int isunorderedf(float, float)
+// Returns '1' in $r0 if either $r0 or $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.libm.isunorderedf,"x"
+CM0_FUNC_START isunorderedf
+MATH_ALIAS isunorderedf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isunorderedf
+CM0_FUNC_END isunorderedf
+
+
+#endif
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fconv.S libgcc/config/arm/cm0/fconv.S
--- libgcc/config/arm/cm0/fconv.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fconv.S 2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,346 @@
+/* fconv.S: Cortex M0 optimized 32- and 64-bit float conversions
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// Reference: <libgcc/config/arm/fp16.c>
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.libgcc.f2d,"x"
+CM0_FUNC_START aeabi_f2d
+FUNC_ALIAS extendsfdf2 aeabi_f2d
+ CFI_START_FUNCTION
+
+ // Save the sign.
+ lsrs r1, r0, #31
+ lsls r1, #31
+
+ // Set up registers for __fp_normalize2().
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Test for zero.
+ lsls r0, #1
+ beq LSYM(__f2d_return) // 7
+
+ // Split the exponent and mantissa into separate registers.
+ // This is the most efficient way to convert subnormals in the
+ // half-precision form into normals in single-precision.
+ // This does add a leading implicit '1' to INF and NAN,
+ // but that will be absorbed when the value is re-assembled.
+ movs r2, r0
+ bl SYM(__fp_normalize2) __PLT__ // +4+8
+
+ // Set up the exponent bias. For INF/NAN values, the bias
+ // is 1791 (2047 - 255 - 1), where the last '1' accounts
+ // for the implicit '1' in the mantissa.
+ movs r0, #3
+ lsls r0, #9
+ adds r0, #255
+
+ // Test for INF/NAN, promote exponent if necessary
+ cmp r2, #255
+ beq LSYM(__f2d_indefinite)
+
+ // For normal values, the exponent bias is 895 (1023 - 127 - 1),
+ // which is half of the prepared INF/NAN bias.
+ lsrs r0, #1
+
+ LSYM(__f2d_indefinite):
+ // Assemble exponent with bias correction.
+ adds r2, r0
+ lsls r2, #20
+ adds r1, r2
+
+ // Assemble the high word of the mantissa.
+ lsrs r0, r3, #11
+ add r1, r0
+
+ // Remainder of the mantissa in the low word of the result.
+ lsls r0, r3, #21
+
+ LSYM(__f2d_return):
+ pop { rT, pc } // 38
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END extendsfdf2
+CM0_FUNC_END aeabi_f2d
+
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+// Values out of range become ZERO or INF; returns the upper 23 bits of NAN.
+// Rounds to nearest, ties to even. The ARM ABI does not appear to specify a
+// rounding mode, so no problems here. Unfortunately, GCC specifies rounding
+// towards zero, which makes this implementation incompatible.
+// (It would be easy enough to truncate normal values, but single-precision
+// subnormals would require a significantly more complex approach.)
+.section .text.libgcc.d2f,"x"
+CM0_FUNC_START aeabi_d2f
+// FUNC_ALIAS truncdfsf2 aeabi_d2f // incompatible
+ CFI_START_FUNCTION
+
+ // Save the sign.
+ lsrs r2, r1, #31
+ lsls r2, #31
+ mov ip, r2
+
+ // Isolate the exponent (11 bits).
+ lsls r2, r1, #1
+ lsrs r2, #21
+
+ // Isolate the mantissa. It's safe to always add the implicit '1' --
+ // even for subnormals -- since they will underflow in every case.
+ lsls r1, #12
+ adds r1, #1
+ rors r1, r1
+ lsrs r3, r0, #21
+ adds r1, r3
+ lsls r0, #11 // 11
+
+ // Test for INF/NAN (r3 = 2047)
+ mvns r3, r2
+ lsrs r3, #21
+ cmp r3, r2
+ beq LSYM(__d2f_indefinite)
+
+ // Adjust exponent bias. Offset is 127 - 1023, less 1 more since
+ // __fp_assemble() expects the exponent relative to bit[30].
+ lsrs r3, #1
+ subs r2, r3
+ adds r2, #126
+
+ LSYM(__d2f_assemble):
+ // Use the standard formatting for overflow and underflow.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ b SYM(__fp_assemble) // 24-28 + 30
+ .cfi_restore_state
+
+ LSYM(__d2f_indefinite):
+ // Test for INF. If the mantissa, exclusive of the implicit '1',
+ // is equal to '0', the result will be INF.
+ lsls r3, r1, #1
+ orrs r3, r0
+ beq LSYM(__d2f_assemble) // 20
+
+ // Construct NAN with the upper 22 bits of the mantissa, setting bit[21]
+ // to ensure a valid NAN without changing bit[22] (quiet)
+ subs r2, #0xD
+ lsls r0, r2, #20
+ lsrs r1, #8
+ orrs r0, r1
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ add r0, ip
+ #endif
+
+ RETx lr // 27
+
+ CFI_END_FUNCTION
+// CM0_FUNC_END truncdfsf2
+CM0_FUNC_END aeabi_d2f
+
+
+// float __aeabi_h2f(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.libgcc.h2f,"x"
+CM0_FUNC_START aeabi_h2f
+ CFI_START_FUNCTION
+
+ // Set up registers for __fp_normalize2().
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save the mantissa and exponent.
+ lsls r2, r0, #17
+
+ // Isolate the sign.
+ lsrs r0, #15
+ lsls r0, #31
+
+ // Align the exponent at bit[24] for normalization.
+ // If zero, return the original sign.
+ lsrs r2, #3
+ beq LSYM(__h2f_return) // 8
+
+ // Split the exponent and mantissa into separate registers.
+ // This is the most efficient way to convert subnormals in the
+ // half-precision form into normals in single-precision.
+ // This does add a leading implicit '1' to INF and NAN,
+ // but that will be absorbed when the value is re-assembled.
+ bl SYM(__fp_normalize2) __PLT__ // +4+8
+
+ // Set up the exponent bias. For INF/NAN values, the bias is 223,
+ // where the last '1' accounts for the implicit '1' in the mantissa.
+ adds r2, #(255 - 31 - 1)
+
+ // Test for INF/NAN.
+ cmp r2, #254
+ beq LSYM(__h2f_assemble)
+
+ // For normal values, the bias should have been 111.
+ // However, this adjustment now is faster than branching.
+ subs r2, #((255 - 31 - 1) - (127 - 15 - 1))
+
+ LSYM(__h2f_assemble):
+ // Combine exponent and sign.
+ lsls r2, #23
+ adds r0, r2
+
+ // Combine mantissa.
+ lsrs r3, #8
+ add r0, r3
+
+ LSYM(__h2f_return):
+ pop { rT, pc } // 34
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_h2f
+
+
+// short __aeabi_f2h(float f)
+// Converts a single-precision float in $r1 to half-precision,
+// rounding to nearest, ties to even.
+// Values out of range become ZERO or INF; returns the upper 12 bits of NAN.
+// Values out of range are forced to either ZERO or INF.
+.section .text.libgcc.f2h,"x"
+CM0_FUNC_START aeabi_f2h
+ CFI_START_FUNCTION
+
+ // Set up the sign.
+ lsrs r2, r0, #31
+ lsls r2, #15
+
+ // Save the exponent and mantissa.
+ // If ZERO, return the original sign.
+ lsls r0, #1
+ beq LSYM(__f2h_return)
+
+ // Isolate the exponent, check for NAN.
+ lsrs r1, r0, #24
+ cmp r1, #255
+ beq LSYM(__f2h_indefinite)
+
+ // Check for overflow.
+ cmp r1, #(127 + 15)
+ bhi LSYM(__f2h_overflow)
+
+ // Isolate the mantissa, adding back the implicit '1'.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0 // 12
+
+ // Adjust exponent bias for half-precision, including '1' to
+ // account for the mantissa's implicit '1'.
+ subs r1, #(127 - 15 + 1)
+ bmi LSYM(__f2h_underflow)
+
+ // Combine the exponent and sign.
+ lsls r1, #10
+ adds r2, r1
+
+ // Split the mantissa (11 bits) and remainder (13 bits).
+ lsls r3, r0, #12
+ lsrs r0, #21
+
+ LSYM(__f2h_round):
+ // If the carry bit is '0', always round down.
+ bcc LSYM(__f2h_return)
+
+ // Carry was set. If a tie (no remainder) and the
+ // LSB of the result are '0', round down (to even).
+ lsls r1, r0, #31
+ orrs r1, r3
+ beq LSYM(__f2h_return)
+
+ // Round up, ties to even.
+ adds r0, #1
+
+ LSYM(__f2h_return):
+ // Combine mantissa and exponent.
+ adds r0, r2
+ RETx lr // 25 - 34
+
+ LSYM(__f2h_underflow):
+ // Align the remainder. The remainder consists of the last 12 bits
+ // of the mantissa plus the magnitude of underflow.
+ movs r3, r0
+ adds r1, #12
+ lsls r3, r1
+
+ // Align the mantissa. The MSB of the remainder must be
+ // shifted out into last the 'C' flag for rounding.
+ subs r1, #33
+ rsbs r1, #0
+ lsrs r0, r1
+ b LSYM(__f2h_round) // 25
+
+ LSYM(__f2h_overflow):
+ // Create single-precision INF from which to construct half-precision.
+ movs r0, #255
+ lsls r0, #24 // 13
+
+ LSYM(__f2h_indefinite):
+ // Check for INF.
+ lsls r3, r0, #8
+ beq LSYM(__f2h_infinite)
+
+ // Set bit[8] to ensure a valid NAN without changing bit[9] (quiet).
+ adds r2, #128
+ adds r2, #128
+
+ LSYM(__f2h_infinite):
+ // Construct the result from the upper 22 bits of the mantissa
+ // and the lower 5 bits of the exponent.
+ lsls r0, #3
+ lsrs r0, #17
+
+ // Combine with the sign (and possibly NAN flag).
+ orrs r0, r2
+ RETx lr // 23
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_f2h
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fdiv.S libgcc/config/arm/cm0/fdiv.S
--- libgcc/config/arm/cm0/fdiv.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fdiv.S 2020-11-12 09:46:26.939907002 -0800
@@ -0,0 +1,258 @@
+/* fdiv.S: Cortex M0 optimized 32-bit float division
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+.section .text.libgcc.fdiv,"x"
+CM0_FUNC_START aeabi_fdiv
+FUNC_ALIAS divsf3 aeabi_fdiv
+ CFI_START_FUNCTION
+
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save for the sign of the result.
+ movs r3, r1
+ eors r3, r0
+ lsrs rT, r3, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // Set up INF for comparison.
+ movs rT, #255
+ lsls rT, #24
+
+ // Check for divide by 0. Automatically catches 0/0.
+ lsls r2, r1, #1
+ beq LSYM(__fdiv_by_zero)
+
+ // Check for INF/INF, or a number divided by itself.
+ lsls r3, #1
+ beq LSYM(__fdiv_equal)
+
+ // Check the numerator for INF/NAN.
+ eors r3, r2
+ cmp r3, rT
+ bhs LSYM(__fdiv_special1)
+
+ // Check the denominator for INF/NAN.
+ cmp r2, rT
+ bhs LSYM(__fdiv_special2)
+
+ // Check the numerator for zero.
+ cmp r3, #0
+ beq SYM(__fp_zero)
+
+ // No action if the numerator is subnormal.
+ // The mantissa will normalize naturally in the division loop.
+ lsls r0, #9
+ lsrs r1, r3, #24
+ beq LSYM(__fdiv_denominator)
+
+ // Restore the numerator's implicit '1'.
+ adds r0, #1
+ rors r0, r0 // 26
+
+ LSYM(__fdiv_denominator):
+ // The denominator must be normalized and left aligned.
+ bl SYM(__fp_normalize2) // +4+8
+
+ // 25 bits of precision will be sufficient.
+ movs rT, #64
+
+ // Run division.
+ bl SYM(__internal_fdiv) // 41
+ b SYM(__fp_assemble)
+
+ LSYM(__fdiv_equal):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(DIVISION_INF_BY_INF)
+ #endif
+
+ // The absolute value of both operands are equal, but not 0.
+ // If both operands are INF, create a new NAN.
+ cmp r2, rT
+ beq SYM(__fp_exception)
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // If both operands are NAN, return the NAN in $r0.
+ bhi SYM(__fp_check_nan)
+ #else
+ bhi LSYM(__fdiv_return)
+ #endif
+
+ // Return 1.0f, with appropriate sign.
+ movs r0, #127
+ lsls r0, #23
+ add r0, ip
+
+ LSYM(__fdiv_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ LSYM(__fdiv_special2):
+ // The denominator is either INF or NAN, numerator is neither.
+ // Also, the denominator is not equal to 0.
+ // If the denominator is INF, the result goes to 0.
+ beq SYM(__fp_zero)
+
+ // The only other option is NAN, fall through to branch.
+ mov r0, r1
+
+ LSYM(__fdiv_special1):
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // The numerator is INF or NAN. If NAN, return it directly.
+ bne SYM(__fp_check_nan)
+ #else
+ bne LSYM(__fdiv_return)
+ #endif
+
+ // If INF, the result will be INF if the denominator is finite.
+ // The denominator won't be either INF or 0,
+ // so fall through the exception trap to check for NAN.
+ movs r0, r1
+
+ LSYM(__fdiv_by_zero):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(DIVISION_0_BY_0)
+ #endif
+
+ // The denominator is 0.
+ // If the numerator is also 0, the result will be a new NAN.
+ // Otherwise the result will be INF, with the correct sign.
+ lsls r2, r0, #1
+ beq SYM(__fp_exception)
+
+ // The result should be NAN if the numerator is NAN. Otherwise,
+ // the result is INF regardless of the numerator value.
+ cmp r2, rT
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ bhi SYM(__fp_check_nan)
+ #else
+ bhi LSYM(__fdiv_return)
+ #endif
+
+ // Recreate INF with the correct sign.
+ b SYM(__fp_infinity)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divsf3
+CM0_FUNC_END aeabi_fdiv
+
+
+// Division helper, possibly to be shared with atan2.
+// Expects the numerator mantissa in $r0, exponent in $r1,
+// plus the denominator mantissa in $r3, exponent in $r2, and
+// a bit pattern in $rT that controls the result precision.
+// Returns quotient in $r1, exponent in $r2, pseudo remainder in $r0.
+.section .text.libgcc.fdiv2,"x"
+CM0_FUNC_START internal_fdiv
+ CFI_START_FUNCTION
+
+ // Initialize the exponent, relative to bit[30].
+ subs r2, r1, r2
+
+ SYM(__internal_fdiv2):
+ // The exponent should be (expN - 127) - (expD - 127) + 127.
+ // An additional offset of 25 is required to account for the
+ // minimum number of bits in the result (before rounding).
+ // However, drop '1' because the offset is relative to bit[30],
+ // while the result is calculated relative to bit[31].
+ adds r2, #(127 + 25 - 1)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Dividing by a power of 2?
+ lsls r1, r3, #1
+ beq LSYM(__fdiv_simple) // 47
+ #endif
+
+ // Initialize the result.
+ eors r1, r1
+
+ // Clear the MSB, so that when the numerator is smaller than
+ // the denominator, there is one bit free for a left shift.
+ // After a single shift, the numerator is guaranteed to be larger.
+ // The denominator ends up in r3, and the numerator ends up in r0,
+ // so that the numerator serves as a psuedo-remainder in rounding.
+ // Shift the numerator one additional bit to compensate for the
+ // pre-incrementing loop.
+ lsrs r0, #2
+ lsrs r3, #1 // 49
+
+ LSYM(__fdiv_loop):
+ // Once the MSB of the output reaches the MSB of the register,
+ // the result has been calculated to the required precision.
+ lsls r1, #1
+ bmi LSYM(__fdiv_break)
+
+ // Shift the numerator/remainder left to set up the next bit.
+ subs r2, #1
+ lsls r0, #1
+
+ // Test if the numerator/remainder is smaller than the denominator,
+ // do nothing if it is.
+ cmp r0, r3
+ blo LSYM(__fdiv_loop)
+
+ // If the numerator/remainder is greater or equal, set the next bit,
+ // and subtract the denominator.
+ adds r1, rT
+ subs r0, r3
+
+ // Short-circuit if the remainder goes to 0.
+ // Even with the overhead of "subnormal" alignment,
+ // this is usually much faster than continuing.
+ bne LSYM(__fdiv_loop) // 11*25
+
+ // Compensate the alignment of the result.
+ // The remainder does not need compensation, it's already 0.
+ lsls r1, #1 // 61 + 202 (underflow)
+
+ LSYM(__fdiv_break):
+ RETx lr // 331 + 30,
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__fdiv_simple):
+ // The numerator becomes the result, with a remainder of 0.
+ movs r1, r0
+ eors r0, r0
+ subs r2, #25
+ RETx lr // 53 + 30
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END internal_fdiv
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ffixed.S libgcc/config/arm/cm0/ffixed.S
--- libgcc/config/arm/cm0/ffixed.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ffixed.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,340 @@
+/* ffixed.S: Cortex M0 optimized float->int conversion
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+.section .text.libgcc.f2iz,"x"
+CM0_FUNC_START aeabi_f2iz
+FUNC_ALIAS fixsfsi aeabi_f2iz
+ CFI_START_FUNCTION
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Flag for unsigned conversion.
+ movs r1, #33
+ b LSYM(__real_f2lz)
+ #else
+ // Flag for signed conversion.
+ movs r3, #1
+
+
+ LSYM(__real_f2iz):
+ // Isolate the sign of the result.
+ asrs r1, r0, #31
+ lsls r0, #1
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // Check for zero to avoid spurious underflow exception on -0.
+ beq LSYM(__f2iz_return)
+ #endif
+
+ // Isolate the exponent.
+ lsrs r2, r0, #24
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Test for NAN.
+ // Otherwise, NAN will be converted like +/-INF.
+ cmp r2, #255
+ beq LSYM(__f2iz_nan)
+ #endif
+
+ // Extract the mantissa and restore the implicit '1'. Technically,
+ // this is wrong for subnormals, but they flush to zero regardless.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0
+
+ // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+ // * An exponent less than 127 will automatically flush to 0.
+ // * An exponent of 127 will result in a shift of 31.
+ // * An exponent of 128 will result in a shift of 30.
+ // * ...
+ // * An exponent of 157 will result in a shift of 1.
+ // * An exponent of 158 will result in no shift at all.
+ // * An exponent larger than 158 will result in overflow.
+ rsbs r2, #0
+ adds r2, #158
+
+ // When the shift is less than minimum, the result will overflow.
+ // The only signed value to fail this test is INT_MIN (0x80000000),
+ // but it will be returned correctly from the overflow branch.
+ cmp r2, r3
+ blt LSYM(__f2iz_overflow)
+
+ // If unsigned conversion of a negative value, also overflow.
+ // Would also catch -0.0f if not handled earlier.
+ cmn r3, r1
+ blt LSYM(__f2iz_overflow)
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // Save a copy for remainder testing
+ movs r3, r0
+ #endif
+
+ // Truncate the fraction.
+ lsrs r0, r2
+
+ // Two's complement negation, if applicable.
+ // Bonus: the sign in $r1 provides a suitable long long result.
+ eors r0, r1
+ subs r0, r1
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // If any bits set in the remainder, raise FE_INEXACT
+ rsbs r2, #0
+ adds r2, #32
+ lsls r3, r2
+ bne LSYM(__f2iz_inexact)
+ #endif
+
+ LSYM(__f2iz_return):
+ RETx lr
+
+ LSYM(__f2iz_overflow):
+ // Positive unsigned integers (r1 == 0, r3 == 0), return 0xFFFFFFFF.
+ // Negative unsigned integers (r1 == -1, r3 == 0), return 0x00000000.
+ // Positive signed integers (r1 == 0, r3 == 1), return 0x7FFFFFFF.
+ // Negative signed integers (r1 == -1, r3 == 1), return 0x80000000.
+ // TODO: FE_INVALID exception, (but not for -2^31).
+ mvns r0, r1
+ lsls r3, #31
+ eors r0, r3
+ RETx lr
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ LSYM(__f2iz_inexact):
+ // TODO: Another class of exceptions that doesn't overwrite $r0.
+ bkpt #0
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(CAST_INEXACT)
+ #endif
+
+ b SYM(__fp_exception)
+ #endif
+
+ LSYM(__f2iz_nan):
+ // Check for INF
+ lsls r2, r0, #9
+ beq LSYM(__f2iz_overflow)
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(CAST_UNDEFINED)
+ #endif
+
+ b SYM(__fp_exception)
+ #else
+
+ #endif
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+
+ // TODO: Extend to long long
+
+ // TODO: bl fp_check_nan
+ #endif
+
+ // Return long long 0 on NAN.
+ eors r0, r0
+ eors r1, r1
+ RETx lr
+
+ #endif // !__OPTIMIZE_SIZE__
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixsfsi
+CM0_FUNC_END aeabi_f2iz
+
+
+// unsigned int __aeabi_f2uiz(float)
+// Converts a float in $r0 to unsigned integer, rounding toward 0.
+// Values out of range are forced to UINT_MAX.
+// Negative values and NAN all become zero.
+.section .text.libgcc.f2uiz,"x"
+CM0_FUNC_START aeabi_f2uiz
+FUNC_ALIAS fixunssfsi aeabi_f2uiz
+ CFI_START_FUNCTION
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Flag for unsigned conversion.
+ movs r1, #32
+ b LSYM(__real_f2lz)
+ #else
+ // Flag for unsigned conversion.
+ movs r3, #0
+ b LSYM(__real_f2iz)
+ #endif // !__OPTIMIZE_SIZE__
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixunssfsi
+CM0_FUNC_END aeabi_f2uiz
+
+
+// long long aeabi_f2lz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to either INT64_MAX or INT64_MIN.
+// NAN becomes zero.
+.section .text.libgcc.f2lz,"x"
+CM0_FUNC_START aeabi_f2lz
+FUNC_ALIAS fixsfdi aeabi_f2lz
+ CFI_START_FUNCTION
+
+ movs r1, #1
+
+ LSYM(__real_f2lz):
+ // Split the sign of the result from the mantissa/exponent field.
+ // Handle +/-0 specially to avoid spurious exceptions.
+ asrs r3, r0, #31
+ lsls r0, #1
+ beq LSYM(__f2lz_zero)
+
+ // If unsigned conversion of a negative value, also overflow.
+ // Specifically, is the LSB of $r1 clear when $r3 is equal to '-1'?
+ //
+ // $r3 (sign) >= $r2 (flag)
+ // 0xFFFFFFFF false 0x00000000
+ // 0x00000000 true 0x00000000
+ // 0xFFFFFFFF true 0x80000000
+ // 0x00000000 true 0x80000000
+ //
+ // (NOTE: This test will also trap -0.0f, unless handled earlier.)
+/****/ lsls r2, r1, #31
+ cmp r3, r2
+ blt LSYM(__f2lz_overflow)
+
+ // Isolate the exponent.
+ lsrs r2, r0, #24
+
+// #if defined(TRAP_NANS) && TRAP_NANS
+// // Test for NAN.
+// // Otherwise, NAN will be converted like +/-INF.
+// cmp r2, #255
+// beq LSYM(__f2lz_nan)
+// #endif
+
+ // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+ // * An exponent less than 127 will automatically flush to 0.
+ // * An exponent of 127 will result in a shift of 63.
+ // * An exponent of 128 will result in a shift of 62.
+ // * ...
+ // * An exponent of 189 will result in a shift of 1.
+ // * An exponent of 190 will result in no shift at all.
+ // * An exponent larger than 190 will result in overflow
+ // (189 in the case of signed integers).
+ rsbs r2, #0
+ adds r2, #190
+ // When the shift is less than minimum, the result will overflow.
+ // The only signed value to fail this test is INT_MIN (0x80000000),
+ // but it will be returned correctly from the overflow branch.
+ cmp r2, r1
+ blt LSYM(__f2lz_overflow)
+
+ // Extract the mantissa and restore the implicit '1'. Technically,
+ // this is wrong for subnormals, but they flush to zero regardless.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0
+
+ // Calculate the upper word.
+ // If the shift is greater than 32, gives an automatic '0'.
+/**/ movs r1, r0
+/**/ lsrs r1, r2
+
+ // Reduce the shift for the lower word.
+ // If the original shift was less than 32, the result may be split
+ // between the upper and lower words.
+/**/ subs r2, #32 // 18
+/**/ blt LSYM(__f2lz_split)
+
+ // Shift is still positive, keep moving right.
+ lsrs r0, r2
+
+ // TODO: Remainder test.
+ // $r1 is technically free, as long as it's zero by the time
+ // this is over.
+
+ LSYM(__f2lz_return):
+ // Two's complement negation, if the original was negative.
+ eors r0, r3
+/**/ eors r1, r3
+ subs r0, r3
+/**/ sbcs r1, r3
+ RETx lr // 27 - 33
+
+ LSYM(__f2lz_split):
+ // Shift was negative, calculate the remainder
+ rsbs r2, #0
+ lsls r0, r2
+ b LSYM(__f2lz_return)
+
+ LSYM(__f2lz_zero):
+ eors r1, r1
+ RETx lr
+
+ LSYM(__f2lz_overflow):
+ // Positive unsigned integers (r3 == 0, r1 == 0), return 0xFFFFFFFF.
+ // Negative unsigned integers (r3 == -1, r1 == 0), return 0x00000000.
+ // Positive signed integers (r3 == 0, r1 == 1), return 0x7FFFFFFF.
+ // Negative signed integers (r3 == -1, r1 == 1), return 0x80000000.
+ // TODO: FE_INVALID exception, (but not for -2^63).
+ mvns r0, r3
+
+ // For 32-bit results
+/***/ lsls r2, r1, #26
+ lsls r1, #31
+/***/ ands r2, r1
+/***/ eors r0, r2
+
+// LSYM(__f2lz_zero):
+ eors r1, r0
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixsfdi
+CM0_FUNC_END aeabi_f2lz
+
+
+// unsigned long long __aeabi_f2ulz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to UINT64_MAX.
+// Negative values and NAN all become zero.
+.section .text.libgcc.f2ulz,"x"
+CM0_FUNC_START aeabi_f2ulz
+FUNC_ALIAS fixunssfdi aeabi_f2ulz
+ CFI_START_FUNCTION
+
+ eors r1, r1
+ b LSYM(__real_f2lz)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixunssfdi
+CM0_FUNC_END aeabi_f2ulz
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ffloat.S libgcc/config/arm/cm0/ffloat.S
--- libgcc/config/arm/cm0/ffloat.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ffloat.S 2020-11-10 21:33:20.981886999 -0800
@@ -0,0 +1,96 @@
+/* ffixed.S: Cortex M0 optimized int->float conversion
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+.section .text.libgcc.il2f,"x"
+CM0_FUNC_START aeabi_i2f
+FUNC_ALIAS floatsisf aeabi_i2f
+ CFI_START_FUNCTION
+
+ // Sign extension to long long.
+ asrs r1, r0, #31
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+CM0_FUNC_START aeabi_l2f
+FUNC_ALIAS floatdisf aeabi_l2f
+
+ // Save the sign.
+ asrs r3, r1, #31
+
+ // Absolute value of the input.
+ eors r0, r3
+ eors r1, r3
+ subs r0, r3
+ sbcs r1, r3
+
+ b LSYM(__internal_uil2f) // 8, 9
+
+ CFI_END_FUNCTION
+CM0_FUNC_END floatdisf
+CM0_FUNC_END aeabi_l2f
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+
+
+// float __aeabi_ui2f(unsigned)
+// Converts a unsigned integer in $r0 to float.
+.section .text.libgcc.uil2f,"x"
+CM0_FUNC_START aeabi_ui2f
+FUNC_ALIAS floatunsisf aeabi_ui2f
+ CFI_START_FUNCTION
+
+ // Convert to unsigned long long with upper bits of 0.
+ eors r1, r1
+
+// float __aeabi_ul2f(unsigned long long)
+// Converts a unsigned 64-bit integer in $r1:$r0 to a float in $r0.
+CM0_FUNC_START aeabi_ul2f
+FUNC_ALIAS floatundisf aeabi_ul2f
+
+ // Sign is always positive.
+ eors r3, r3
+
+ LSYM(__internal_uil2f):
+ // Default exponent, relative to bit[30] of $r1.
+ movs r2, #(189)
+
+ // Format the sign.
+ lsls r3, #31
+ mov ip, r3
+
+ push { rT, lr }
+ b SYM(__fp_assemble) // { 10, 11, 18, 19 } + 30-227
+
+ CFI_END_FUNCTION
+CM0_FUNC_END floatundisf
+CM0_FUNC_END aeabi_ul2f
+CM0_FUNC_END floatunsisf
+CM0_FUNC_END aeabi_ui2f
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fmul.S libgcc/config/arm/cm0/fmul.S
--- libgcc/config/arm/cm0/fmul.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fmul.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,214 @@
+/* fmul.S: Cortex M0 optimized 32-bit float multiplication
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+.section .text.libgcc.fmul,"x"
+CM0_FUNC_START aeabi_fmul
+FUNC_ALIAS mulsf3 aeabi_fmul
+ CFI_START_FUNCTION
+
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save the sign of the result.
+ movs rT, r1
+ eors rT, r0
+ lsrs rT, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // Set up INF for comparison.
+ movs rT, #255
+ lsls rT, #24
+
+ // Check for multiplication by zero.
+ lsls r2, r0, #1
+ beq LSYM(__fmul_zero1)
+
+ lsls r3, r1, #1
+ beq LSYM(__fmul_zero2)
+
+ // Check for INF/NAN.
+ cmp r3, rT
+ bhs LSYM(__fmul_special2)
+
+ cmp r2, rT
+ bhs LSYM(__fmul_special1)
+
+ // Because neither operand is INF/NAN, the result will be finite.
+ // It is now safe to modify the original operand registers.
+ lsls r0, #9
+
+ // Isolate the first exponent. When normal, add back the implicit '1'.
+ // The result is always aligned with the MSB in bit [31].
+ // Subnormal mantissas remain effectively multiplied by 2x relative to
+ // normals, but this works because the weight of a subnormal is -126.
+ lsrs r2, #24
+ beq LSYM(__fmul_normalize2)
+ adds r0, #1
+ rors r0, r0
+
+ LSYM(__fmul_normalize2):
+ // IMPORTANT: exp10i() jumps in here!
+ // Repeat for the mantissa of the second operand.
+ // Short-circuit when the mantissa is 1.0, as the
+ // first mantissa is already prepared in $r0
+ lsls r1, #9
+
+ // When normal, add back the implicit '1'.
+ lsrs r3, #24
+ beq LSYM(__fmul_go)
+ adds r1, #1
+ rors r1, r1
+
+ LSYM(__fmul_go):
+ // Calculate the final exponent, relative to bit [30].
+ adds rT, r2, r3
+ subs rT, #127 // 30
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Short-circuit on multiplication by powers of 2.
+ lsls r3, r0, #1
+ beq LSYM(__fmul_simple1)
+
+ lsls r3, r1, #1
+ beq LSYM(__fmul_simple2)
+ #endif
+
+ // Save $ip across the call.
+ // (Alternatively, could push/pop a separate register,
+ // but the four instructions here are equivally fast)
+ // without imposing on the stack.
+ add rT, ip
+
+ // 32x32 unsigned multiplication, 64 bit result.
+ bl SYM(__umulsidi3) __PLT__ // +22
+
+ // Separate the saved exponent and sign.
+ sxth r2, rT
+ subs rT, r2
+ mov ip, rT
+
+ b SYM(__fp_assemble) // 62
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__fmul_simple2):
+ // Move the high bits of the result to $r1.
+ movs r1, r0
+
+ LSYM(__fmul_simple1):
+ // Clear the remainder.
+ eors r0, r0
+
+ // Adjust mantissa to match the exponent, relative to bit[30].
+ subs r2, rT, #1
+ b SYM(__fp_assemble) // 42
+ #endif
+
+ LSYM(__fmul_zero1):
+ // $r0 was equal to 0, set up to check $r1 for INF/NAN.
+ lsls r2, r1, #1
+
+ LSYM(__fmul_zero2):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(INFINITY_TIMES_ZERO)
+ #endif
+
+ // Check the non-zero operand for INF/NAN.
+ // If NAN, it should be returned.
+ // If INF, the result should be NAN.
+ // Otherwise, the result will be +/-0.
+ cmp r2, rT
+ beq SYM(__fp_exception)
+
+ // If the second operand is finite, the result is 0.
+ blo SYM(__fp_zero)
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Restore values that got mixed in zero testing, then go back
+ // to sort out which one is the NAN.
+ lsls r3, r1, #1
+ lsls r2, r0, #1
+ #elif defined(TRAP_NANS) && TRAP_NANS
+ // Return NAN with the sign bit cleared.
+ lsrs r0, r2, #1
+ b SYM(__fp_check_nan)
+ #else
+ lsrs r0, r2, #1
+ // Return NAN with the sign bit cleared.
+ pop { rT, pc }
+ .cfi_restore_state
+ #endif
+
+ LSYM(__fmul_special2):
+ // $r1 is INF/NAN. In case of INF, check $r0 for NAN.
+ cmp r2, rT
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Force swap if $r0 is not NAN.
+ bls LSYM(__fmul_swap)
+
+ // $r0 is NAN, keep if $r1 is INF
+ cmp r3, rT
+ beq LSYM(__fmul_special1)
+
+ // Both are NAN, keep the smaller value (more likely to signal).
+ cmp r2, r3
+ #endif
+
+ // Prefer the NAN already in $r0.
+ // (If TRAP_NANS, this is the smaller NAN).
+ bhi LSYM(__fmul_special1)
+
+ LSYM(__fmul_swap):
+ movs r0, r1
+
+ LSYM(__fmul_special1):
+ // $r0 is either INF or NAN. $r1 has already been examined.
+ // Flags are already set correctly.
+ lsls r2, r0, #1
+ cmp r2, rT
+ beq SYM(__fp_infinity)
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ b SYM(__fp_check_nan)
+ #else
+ pop { rT, pc }
+ .cfi_restore_state
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END mulsf3
+CM0_FUNC_END aeabi_fmul
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fneg.S libgcc/config/arm/cm0/fneg.S
--- libgcc/config/arm/cm0/fneg.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fneg.S 2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,75 @@
+/* fneg.S: Cortex M0 optimized 32-bit float negation
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+.section .text.libgcc.fneg,"x"
+CM0_FUNC_START aeabi_fneg
+FUNC_ALIAS negsf2 aeabi_fneg
+ CFI_START_FUNCTION
+
+ #if (defined(STRICT_NANS) && STRICT_NANS) || \
+ (defined(TRAP_NANS) && TRAP_NANS)
+ // Check for NAN.
+ lsls r1, r0, #1
+ movs r2, #255
+ lsls r2, #24
+ cmp r1, r2
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ blo SYM(__fneg_nan)
+ #else
+ blo LSYM(__fneg_return)
+ #endif
+ #endif
+
+ // Flip the sign.
+ movs r1, #1
+ lsls r1, #31
+ eors r0, r1
+
+ LSYM(__fneg_return):
+ RETx lr
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ LSYM(__fneg_nan):
+ // Set up registers for exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ b SYM(fp_check_nan)
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END negsf2
+CM0_FUNC_END aeabi_fneg
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/fplib.h libgcc/config/arm/cm0/fplib.h
--- libgcc/config/arm/cm0/fplib.h 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/fplib.h 2020-11-12 09:45:36.032217491 -0800
@@ -0,0 +1,80 @@
+/* fplib.h: Cortex M0 optimized 64-bit header definitions
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifndef __CM0_FPLIB_H
+#define __CM0_FPLIB_H
+
+/* Enable exception interrupt handler.
+ Exception implementation is opportunistic, and not fully tested. */
+#define TRAP_EXCEPTIONS (0)
+#define EXCEPTION_CODES (0)
+
+/* Perform extra checks to avoid modifying the sign bit of NANs */
+#define STRICT_NANS (0)
+
+/* Trap signaling NANs regardless of context. */
+#define TRAP_NANS (0)
+
+/* TODO: Define service numbers according to the handler requirements */
+#define SVC_TRAP_NAN (0)
+#define SVC_FP_EXCEPTION (0)
+#define SVC_DIVISION_BY_ZERO (0)
+
+/* Push extra registers when required for 64-bit stack alignment */
+#define DOUBLE_ALIGN_STACK (0)
+
+/* Define various exception codes. These don't map to anything in particular */
+#define SUBTRACTED_INFINITY (20)
+#define INFINITY_TIMES_ZERO (21)
+#define DIVISION_0_BY_0 (22)
+#define DIVISION_INF_BY_INF (23)
+#define UNORDERED_COMPARISON (24)
+#define CAST_OVERFLOW (25)
+#define CAST_INEXACT (26)
+#define CAST_UNDEFINED (27)
+
+/* Exception control for quiet NANs.
+ If TRAP_NAN support is enabled, signaling NANs always raise exceptions. */
+.equ FCMP_RAISE_EXCEPTIONS, 16
+.equ FCMP_NO_EXCEPTIONS, 0
+
+/* These assignments are significant. See implementation.
+ They must be shared for use in libm functions. */
+.equ FCMP_3WAY, 1
+.equ FCMP_LT, 2
+.equ FCMP_EQ, 4
+.equ FCMP_GT, 8
+
+.equ FCMP_GE, (FCMP_EQ | FCMP_GT)
+.equ FCMP_LE, (FCMP_LT | FCMP_EQ)
+.equ FCMP_NE, (FCMP_LT | FCMP_GT)
+
+/* These flags affect the result of unordered comparisons. See implementation. */
+.equ FCMP_UN_THREE, 128
+.equ FCMP_UN_POSITIVE, 64
+.equ FCMP_UN_ZERO, 32
+.equ FCMP_UN_NEGATIVE, 0
+
+#endif /* __CM0_FPLIB_H */
diff -ruN libgcc/config/arm/cm0/futil.S libgcc/config/arm/cm0/futil.S
--- libgcc/config/arm/cm0/futil.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/futil.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,407 @@
+/* futil.S: Cortex M0 optimized 32-bit common routines
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+// the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+.section .text.libgcc.normf,"x"
+CM0_FUNC_START fp_normalize2
+ CFI_START_FUNCTION
+
+ // Extract the mantissa.
+ lsls r3, r2, #8
+
+ // Extract the exponent.
+ lsrs r2, #24
+ beq SYM(__fp_lalign2)
+
+ // Restore the mantissa's implicit '1'.
+ adds r3, #1
+ rors r3, r3
+
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+.section .text.libgcc.alignf,"x"
+CM0_FUNC_START fp_lalign2
+ CFI_START_FUNCTION
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Unroll the loop, similar to __clzsi2().
+ lsrs rT, r3, #16
+ bne LSYM(__align8)
+ subs r2, #16
+ lsls r3, #16
+
+ LSYM(__align8):
+ lsrs rT, r3, #24
+ bne LSYM(__align4)
+ subs r2, #8
+ lsls r3, #8
+
+ LSYM(__align4):
+ lsrs rT, r3, #28
+ bne LSYM(__align2)
+ subs r2, #4
+ lsls r3, #4 // 12
+ #endif
+
+ LSYM(__align2):
+ // Refresh the state of the N flag before entering the loop.
+ tst r3, r3
+
+ LSYM(__align_loop):
+ // Test before subtracting to compensate for the natural exponent.
+ // The largest subnormal should have an exponent of 0, not -1.
+ bmi LSYM(__align_return)
+ subs r2, #1
+ lsls r3, #1
+ bne LSYM(__align_loop) // 6 * 31
+
+ // Not just a subnormal... 0! By design, this should never happen.
+ // All callers of this internal function filter 0 as a special case.
+ // Was there an uncontrolled jump from somewhere else? Cosmic ray?
+ eors r2, r2
+
+ #ifdef DEBUG
+ bkpt #0
+ #endif
+
+ LSYM(__align_return):
+ RETx lr // 24 - 192 (size), 19 - 36
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_lalign2
+
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1. To avoid underflow (slower),
+// the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2. The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+.section .text.libgcc.assemblef,"x"
+CM0_FUNC_START fp_assemble
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Examine the upper three bits [31:29] for underflow.
+ lsrs r3, r1, #29
+ beq LSYM(__fp_underflow)
+
+ // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+ // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+ lsrs r3, r1, #1
+ mvns r3, r3
+ ands r3, r1
+ lsrs r3, #30
+ subs r3, #2
+ rors r1, r3
+
+ // Update the exponent, assuming the final result will be normal.
+ // The new exponent is 1 less than actual, to compensate for the
+ // eventual addition of the implicit '1' in the result.
+ // If the final exponent becomes negative, proceed directly to gradual
+ // underflow, without bothering to search for the MSB.
+ adds r2, r3
+
+CM0_FUNC_START fp_assemble2
+ bmi LSYM(__fp_subnormal)
+
+ LSYM(__fp_normal):
+ // Check for overflow (remember the implicit '1' to be added later).
+ cmp r2, #254
+ bge SYM(__fp_overflow) // +13 underflow
+
+ // Save LSBs for the remainder. Position doesn't matter any more,
+ // these are just tiebreakers for round-to-even.
+ lsls rT, r1, #25
+
+ // Align the final result.
+ lsrs r1, #8
+
+ LSYM(__fp_round):
+ // If carry bit is '0', always round down.
+ bcc LSYM(__fp_return)
+
+ // The carry bit is '1'. Round to nearest, ties to even.
+ // If either the saved remainder bits [6:0], the additional remainder
+ // bits in $r1, or the final LSB is '1', round up.
+ lsls r3, r1, #31
+ orrs r3, rT
+ orrs r3, r0
+ beq LSYM(__fp_return)
+
+ // If rounding up overflows the result to 2.0, the result
+ // is still correct, up to and including INF.
+ adds r1, #1
+
+ LSYM(__fp_return):
+ // Combine the mantissa and the exponent.
+ lsls r2, #23
+ adds r0, r1, r2
+
+ // Combine with the saved sign.
+ // End of library call, return to user.
+ add r0, ip
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: Underflow/inexact reporting IFF remainder
+ #endif
+
+ pop { rT, pc } // +30 (typical)
+ .cfi_restore_state
+
+ LSYM(__fp_underflow):
+ // Set up to align the mantissa.
+ movs r3, r1 // 5
+ bne LSYM(__fp_underflow2)
+
+ // MSB wasn't in the upper 32 bits, check the remainder.
+ // If the remainder is also zero, the result is +/-0.
+ movs r3, r0
+ beq SYM(__fp_zero)
+
+ eors r0, r0
+ subs r2, #32
+
+ LSYM(__fp_underflow2):
+ // Save the pre-alignment exponent to align the remainder later.
+ movs r1, r2 // 9 - 11
+
+ // Align the mantissa with the MSB in bit[31].
+ bl SYM(__fp_lalign2) // 37 - 207 (size), 32 - 51
+
+ // Calculate the actual remainder shift.
+ subs rT, r1, r2
+
+ // Align the lower bits of the remainder.
+ movs r1, r0
+ lsls r0, rT
+
+ // Combine the upper bits of the remainder with the aligned value.
+ rsbs rT, #0
+ adds rT, #32
+ lsrs r1, rT
+ adds r1, r3
+
+ // The MSB is now aligned at bit[31] of $r1.
+ // If the net exponent is still positive, the result will be normal.
+ // Because this function is used by fmul(), there is a possibility
+ // that the value is still wider than 24 bits; always round.
+ tst r2, r2
+ bpl LSYM(__fp_normal)
+
+ LSYM(__fp_subnormal):
+ // The MSB is aligned at bit[31], with a net negative exponent.
+ // The mantissa will need to be shifted right by the absolute value of
+ // the exponent, plus the normal shift of 8.
+
+ // If the negative shift is smaller than -25, there is no result,
+ // no rounding, no anything. Return signed zero.
+ // (Otherwise, the shift for result and remainder may wrap.)
+ adds r2, #25
+ bmi SYM(__fp_inexact_zero)
+
+ // Save the extra bits for the remainder.
+ movs rT, r1
+ lsls rT, r2
+
+ // Shift the mantissa to create a subnormal.
+ // Just like normal, round to nearest, ties to even.
+ movs r3, #33
+ subs r3, r2
+ eors r2, r2
+
+ // This shift must be last, leaving the shifted LSB in the C flag.
+ lsrs r1, r3
+ b LSYM(__fp_round)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_assemble2
+CM0_FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign. No return.
+// Expects the sign of the result in $ip.
+.section .text.libgcc.infinityf,"x"
+CM0_FUNC_START fp_overflow
+ CFI_START_FUNCTION
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: inexact/overflow exception
+ #endif
+
+CM0_FUNC_START fp_infinity
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ movs r0, #255
+ lsls r0, #23
+ add r0, ip
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_infinity
+CM0_FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign. No return.
+// Expects the sign of the result in $ip.
+.section .text.libgcc.zerof,"x"
+CM0_FUNC_START fp_inexact_zero
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: inexact/underflow exception
+ #endif
+
+CM0_FUNC_START fp_zero
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Return 0 with the correct sign.
+ mov r0, ip
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_zero
+CM0_FUNC_END fp_inexact_zero
+
+
+// Internal function to detect signaling NANs. No return.
+// Uses $r2 as scratch space.
+.section .text.libgcc.checkf,"x"
+CM0_FUNC_START fp_check_nan2
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+
+CM0_FUNC_START fp_check_nan
+
+ // Check for quiet NAN.
+ lsrs r2, r0, #23
+ bcs LSYM(__quiet_nan)
+
+ // Raise exception. Preserves both $r0 and $r1.
+ svc #(SVC_TRAP_NAN)
+
+ // Quiet the resulting NAN.
+ movs r2, #1
+ lsls r2, #22
+ orrs r0, r2
+
+ LSYM(__quiet_nan):
+ // End of library call, return to user.
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_check_nan
+CM0_FUNC_END fp_check_nan2
+
+
+// Internal function to report floating point exceptions. No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+.section .text.libgcc.exceptf,"x"
+CM0_FUNC_START fp_exception
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Create a quiet NAN.
+ movs r2, #255
+ lsls r2, #1
+ adds r2, #1
+ lsls r2, #22
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ // Annotate the exception type in the NAN field.
+ // Make sure that the exception is in the valid region
+ lsls rT, r3, #13
+ orrs r2, rT
+ #endif
+
+// Exception handler that expects the result already in $r2,
+// typically when the result is not going to be NAN.
+CM0_FUNC_START fp_exception2
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_FP_EXCEPTION)
+ #endif
+
+ // TODO: Save exception flags in a static variable.
+
+ // Set up the result, now that the argument isn't required any more.
+ movs r0, r2
+
+ // HACK: for sincosf(), with 2 parameters to return.
+ movs r1, r2
+
+ // End of library call, return to user.
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_exception2
+CM0_FUNC_END fp_exception
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/lcmp.S libgcc/config/arm/cm0/lcmp.S
--- libgcc/config/arm/cm0/lcmp.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/lcmp.S 2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,96 @@
+/* lcmp.S: Cortex M0 optimized 64-bit integer comparison
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// int __aeabi_lcmp(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// Returns { -1, 0, +1 } in $r0 for ordering { <, ==, > }, respectively.
+.section .text.libgcc.lcmp,"x"
+CM0_FUNC_START aeabi_lcmp
+ CFI_START_FUNCTION
+
+ // Calculate the difference $r1:$r0 - $r3:$r2.
+ subs r0, r2
+ sbcs r1, r3
+
+ // With $r2 free, create a reference value without affecting flags.
+ mov r2, r3
+
+ // Finish the comparison.
+ blt LSYM(__lcmp_lt)
+
+ // The reference difference ($r2 - $r3) will be +2 iff the first
+ // argument is larger, otherwise $r2 remains equal to $r3.
+ adds r2, #2
+
+ LSYM(__lcmp_lt):
+ // Check for equality (all 64 bits).
+ orrs r0, r1
+ beq LSYM(__lcmp_return)
+
+ // Convert the relative difference to an absolute value +/-1.
+ subs r0, r2, r3
+ subs r0, #1
+
+ LSYM(__lcmp_return):
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lcmp
+
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// Returns { -1, 0, +1 } in $r0 for ordering { <, ==, > }, respectively.
+.section .text.libgcc.ulcmp,"x"
+CM0_FUNC_START aeabi_ulcmp
+ CFI_START_FUNCTION
+
+ // Calculate the 'C' flag.
+ subs r0, r2
+ sbcs r1, r3
+
+ // $r2 will contain -1 if the first value is smaller,
+ // 0 if the first value is larger or equal.
+ sbcs r2, r2
+
+ // Check for equality (all 64 bits).
+ orrs r0, r1
+ beq LSYM(__ulcmp_return)
+
+ // $r0 should contain +1 or -1
+ movs r0, #1
+ orrs r0, r2
+
+ LSYM(__ulcmp_return):
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ulcmp
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/ldiv.S libgcc/config/arm/cm0/ldiv.S
--- libgcc/config/arm/cm0/ldiv.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/ldiv.S 2020-11-12 09:46:26.943906976 -0800
@@ -0,0 +1,413 @@
+/* ldiv.S: Cortex M0 optimized 64-bit integer division
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+.section .text.libgcc.ldiv0,"x"
+CM0_FUNC_START aeabi_ldiv0
+ CFI_START_FUNCTION
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_DIVISION_BY_ZERO)
+ #endif
+
+ // Return { 0, numerator } for quotient and remainder.
+ movs r2, r0
+ movs r3, r1
+ eors r0, r0
+ eors r1, r1
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ldiv0
+
+
+// long long __aeabi_ldiv(long long, long long)
+// lldiv_return __aeabi_ldivmod(long long, long long)
+// Returns signed $r1:$r0 after division by $r3:$r2.
+// Also returns the signed remainder in $r3:$r2.
+.section .text.libgcc.ldiv,"x"
+CM0_FUNC_START aeabi_ldivmod
+FUNC_ALIAS aeabi_ldiv aeabi_ldivmod
+FUNC_ALIAS divdi3 aeabi_ldivmod
+ CFI_START_FUNCTION
+
+ // Test the denominator for zero before pushing registers.
+ cmp r2, #0
+ bne LSYM(__ldivmod_valid)
+
+ cmp r3, #0
+ beq SYM(__aeabi_ldiv0)
+
+ LSYM(__ldivmod_valid):
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { rP, rQ, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 16
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset rT, 8
+ .cfi_rel_offset lr, 12
+ #else
+ push { rP, rQ, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 12
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset lr, 8
+ #endif
+
+ // Absolute value of the numerator.
+ asrs rP, r1, #31
+ eors r0, rP
+ eors r1, rP
+ subs r0, rP
+ sbcs r1, rP
+
+ // Absolute value of the denominator.
+ asrs rQ, r3, #31
+ eors r2, rQ
+ eors r3, rQ
+ subs r2, rQ
+ sbcs r3, rQ
+
+ // Keep the XOR of signs for the quotient.
+ eors rQ, rP
+
+ // Handle division as unsigned.
+ bl LSYM(__internal_uldivmod)
+
+ // Set the sign of the quotient.
+ eors r0, rQ
+ eors r1, rQ
+ subs r0, rQ
+ sbcs r1, rQ
+
+ // Set the sign of the remainder.
+ eors r2, rP
+ eors r3, rP
+ subs r2, rP
+ sbcs r3, rP
+
+ LSYM(__ldivmod_return):
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { rP, rQ, rT, pc }
+ .cfi_restore_state
+ #else
+ pop { rP, rQ, pc }
+ .cfi_restore_state
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divdi3
+CM0_FUNC_END aeabi_ldiv
+CM0_FUNC_END aeabi_ldivmod
+
+
+// unsigned long long __aeabi_uldiv(unsigned long long, unsigned long long)
+// ulldiv_return __aeabi_uldivmod(unsigned long long, unsigned long long)
+// Returns unsigned $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+.section .text.libgcc.uldiv,"x"
+CM0_FUNC_START aeabi_uldivmod
+FUNC_ALIAS aeabi_uldiv aeabi_uldivmod
+FUNC_ALIAS udivdi3 aeabi_uldivmod
+ CFI_START_FUNCTION
+
+ // Test the denominator for zero before changing the stack.
+ cmp r3, #0
+ bne LSYM(__internal_uldivmod)
+
+ cmp r2, #0
+ beq SYM(__aeabi_ldiv0)
+
+ #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+ // MAYBE: Optimize division by a power of 2
+ #endif
+
+ LSYM(__internal_uldivmod):
+ push { rP, rQ, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 16
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset rT, 8
+ .cfi_rel_offset lr, 12
+
+ // Set up denominator shift, assuming a single width result.
+ movs rP, #32
+
+ // If the upper word of the denominator is 0 ...
+ tst r3, r3
+ bne LSYM(__uldivmod_setup) // 12
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // ... and the upper word of the numerator is also 0,
+ // single width division will be at least twice as fast.
+ tst r1, r1
+ beq LSYM(__uldivmod_small)
+ #endif
+
+ // ... and the lower word of the denominator is less than or equal
+ // to the upper word of the numerator ...
+ cmp r1, r2
+ blo LSYM(__uldivmod_setup)
+
+ // ... then the result will be double width, at least 33 bits.
+ // Set up a flag in $rP to seed the shift for the second word.
+ movs r3, r2
+ eors r2, r2
+ adds rP, #64
+
+ LSYM(__uldivmod_setup):
+ // Pre division: Shift the denominator as far as possible left
+ // without making it larger than the numerator.
+ // Since search is destructive, first save a copy of the numerator.
+ mov ip, r0
+ mov lr, r1
+
+ // Set up binary search.
+ movs rQ, #16
+ eors rT, rT // 21
+
+ LSYM(__uldivmod_align):
+ // Maintain a secondary shift $rT = 32 - $rQ, making the overlapping
+ // shifts between low and high words easier to construct.
+ adds rT, rQ
+
+ // Prefer dividing the numerator to multipying the denominator
+ // (multiplying the denominator may result in overflow).
+ lsrs r1, rQ
+
+ // Measure the high bits of denominator against the numerator.
+ cmp r1, r3
+ blo LSYM(__uldivmod_skip)
+ bhi LSYM(__uldivmod_shift)
+
+ // If the high bits are equal, construct the low bits for checking.
+ mov r1, lr
+ lsls r1, rT
+
+ lsrs r0, rQ
+ orrs r1, r0
+
+ cmp r1, r2
+ blo LSYM(__uldivmod_skip)
+
+ LSYM(__uldivmod_shift):
+ // Scale the denominator and the result together.
+ subs rP, rQ
+
+ // If the reduced numerator is still larger than or equal to the
+ // denominator, it is safe to shift the denominator left.
+ movs r1, r2
+ lsrs r1, rT
+ lsls r3, rQ
+
+ lsls r2, rQ
+ orrs r3, r1
+
+ LSYM(__uldivmod_skip):
+ // Restore the numerator.
+ mov r0, ip
+ mov r1, lr
+
+ // Iterate until the shift goes to 0.
+ lsrs rQ, #1
+ bne LSYM(__uldivmod_align) // (12 to 23) * 5
+
+ // Initialize the result (zero).
+ mov ip, rQ
+
+ // HACK: Compensate for the first word test.
+ lsls rP, #6 // 2, 140
+
+ LSYM(__uldivmod_word2):
+ // Is there another word?
+ lsrs rP, #6
+ beq LSYM(__uldivmod_return) // +4
+
+ // Shift the calculated result by 1 word.
+ mov lr, ip
+ mov ip, rQ
+
+ // Set up the MSB of the next word of the quotient
+ movs rQ, #1
+ rors rQ, rP
+ b LSYM(__uldivmod_entry) // 9 * 2, 149
+
+ LSYM(__uldivmod_loop):
+ // Divide the denominator by 2.
+ // It could be slightly faster to multiply the numerator,
+ // but that would require shifting the remainder at the end.
+ lsls rT, r3, #31
+ lsrs r3, #1
+ lsrs r2, #1
+ adds r2, rT
+
+ // Step to the next bit of the result.
+ lsrs rQ, #1
+ beq LSYM(__uldivmod_word2) // (19 * 32 + 2) * 2, 140+9+610+9+610+4+12
+
+ LSYM(__uldivmod_entry):
+ // Test if the denominator is smaller, high byte first.
+ cmp r1, r3
+ blo LSYM(__uldivmod_loop)
+ bhi LSYM(__uldivmod_quotient)
+
+ cmp r0, r2
+ blo LSYM(__uldivmod_loop)
+
+ LSYM(__uldivmod_quotient):
+ // Smaller denominator: the next bit of the quotient will be set.
+ add ip, rQ
+
+ // Subtract the denominator from the remainder.
+ // If the new remainder goes to 0, exit early.
+ subs r0, r2
+ sbcs r1, r3
+ bne LSYM(__uldivmod_loop)
+
+ tst r0, r0
+ bne LSYM(__uldivmod_loop)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Check whether there's still a second word to calculate.
+ lsrs rP, #6
+ beq LSYM(__uldivmod_return)
+
+ // If so, shift the result left by a full word.
+ mov lr, ip
+ mov ip, r1 // zero
+ #else
+ eors rQ, rQ
+ b LSYM(__uldivmod_word2)
+ #endif
+
+ LSYM(__uldivmod_return):
+ // Move the remainder to the second half of the result.
+ movs r2, r0
+ movs r3, r1
+
+ // Move the quotient to the first half of the result.
+ mov r0, ip
+ mov r1, lr
+
+ pop { rP, rQ, rT, pc } // + 12
+ .cfi_restore_state
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__uldivmod_small):
+ // Arrange arguments for 32-bit division.
+ movs r1, r2
+ bl LSYM(__internal_uidivmod) // 20
+
+ // Extend quotient and remainder to 64 bits, unsigned.
+ movs r2, r1
+ eors r1, r1
+ eors r3, r3
+ pop { rP, rQ, rT, pc } // 31
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END udivdi3
+CM0_FUNC_END aeabi_uldiv
+CM0_FUNC_END aeabi_uldivmod
+
+
+#if 0
+
+ LSYM(__internal_uldivmod):
+ push { r0 - rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 32
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset rP, 16
+ .cfi_rel_offset rQ, 20
+ .cfi_rel_offset rT, 24
+ .cfi_rel_offset lr, 28
+
+ // Count leading zeros of the numerator
+ bl SYM(__clzdi2) // 55
+ mov rP, r0
+
+ // Load denominator
+ add r0, sp, #8
+ ldm r0, { r0, r1 }
+
+ // Count leading zeros of the denominator.
+ bl SYM(__clzdi2) // 55
+
+ // If the numerator has more zeros than the denominator,
+ // the result is { 0, numerator }
+ subs rP, r0, rP
+ bhi LSYM(__uldivmod_simple)
+
+ // Reload the denominator
+ add r0, sp, #8
+ ldm r0, { r0, r1 }
+
+ // Shift the denominator
+ movs r2, rP
+ bl SYM(__aeabi_llsl) // 14
+
+ // Reload the numerator as remainder.
+ pop { r2, r3 }
+
+ // Discard the copy of the denominator on the stack.
+ add sp, #8
+
+ // Shift the first quotient bit into place
+
+ // Initialize the result.
+
+ // Main division loop.
+
+
+ // Copy the quotient to the result.
+ mov r0, ip
+ mov r1, lr
+
+ pop { rP, rQ, rT, pc }
+ .cfi_restore_state
+
+
+
+ LSYM(__uldivmod_simple):
+ movs r2, r0
+ movs r3, r1
+ eors r0, r0
+ eors r1, r1
+
+#endif
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/cm0/lmul.S libgcc/config/arm/cm0/lmul.S
--- libgcc/config/arm/cm0/lmul.S 1969-12-31 16:00:00.000000000 -0800
+++ libgcc/config/arm/cm0/lmul.S 2020-11-10 21:33:20.985886867 -0800
@@ -0,0 +1,294 @@
+/* lmul.S: Cortex M0 optimized 64-bit integer multiplication
+
+ Copyright (C) 2018-2020 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef __BUILD_CM0_FPLIB // whole file
+
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+.section .text.libgcc.lmul,"x"
+CM0_FUNC_START aeabi_lmul
+FUNC_ALIAS muldi3 aeabi_lmul
+ CFI_START_FUNCTION
+
+ // $r1:$r0 = 0xDDDDCCCCBBBBAAAA
+ // $r3:$r2 = 0xZZZZYYYYXXXXWWWW
+
+ // The following operations that only affect the upper 64 bits
+ // can be safely discarded:
+ // DDDD * ZZZZ
+ // DDDD * YYYY
+ // DDDD * XXXX
+ // CCCC * ZZZZ
+ // CCCC * YYYY
+ // BBBB * ZZZZ
+
+ // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+ // 'muls' instruction, and skip over the operation in that case.
+
+ LSYM(__safe_muldi3):
+ // (0xDDDDCCCC * 0xXXXXWWWW), free $r1
+ muls r1, r2
+
+ // (0xZZZZYYYY * 0xBBBBAAAA), free $r3
+ muls r3, r0
+ add r3, r1
+
+ // Put the parameters in the correct form for umulsidi3().
+ movs r1, r2
+ b LSYM(__internal_umulsidi3) // 7
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lmul
+CM0_FUNC_END muldi3
+
+// unsigned long long __aeabi_lmul(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+.section .text.libgcc.umulsidi3,"x"
+CM0_FUNC_START umulsidi3
+ CFI_START_FUNCTION
+
+ // 32x32 multiply with 64 bit result.
+ // Expand the multiply into 4 parts, since muls only returns 32 bits.
+ // (a16h * b16h / 2^32)
+ // + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+ // + (a16l * b16l / 2^64)
+
+ // MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+ // 'muls' instruction, and skip over the operation in that case.
+
+ LSYM(__safe_umulsidi3):
+ eors r3, r3
+
+ LSYM(__internal_umulsidi3):
+ mov ip, r3
+
+ // a16h * b16h
+ lsrs r2, r0, #16
+ lsrs r3, r1, #16
+ muls r2, r3
+ add ip, r2
+
+ // a16l * b16h; save a16h first!
+ lsrs r2, r0, #16
+ uxth r0, r0
+ muls r3, r0
+
+ // a16l * b16l
+ uxth r1, r1
+ muls r0, r1
+
+ // a16h * b16l
+ muls r1, r2
+
+ // Distribute intermediate results.
+ eors r2, r2
+ adds r1, r3
+ adcs r2, r2
+ lsls r3, r1, #16
+ lsrs r1, #16
+ lsls r2, #16
+ adds r0, r3
+ adcs r1, r2
+
+ // Add in the remaining high bits.
+ add r1, ip
+ RETx lr // 24
+
+ CFI_END_FUNCTION
+CM0_FUNC_END umulsidi3
+
+
+// long long __aeabi_lmul(int, int)
+// Returns all 64 bits of a 32 bit signed multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $rT as scratch space.
+.section .text.libgcc.mulsidi3,"x"
+CM0_FUNC_START mulsidi3
+ CFI_START_FUNCTION
+
+ // Push registers for function call .
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save signs of the arguments.
+ asrs r3, r0, #31
+ asrs rT, r1, #31
+
+ // Absolute value of the arguments.
+ eors r0, r3
+ eors r1, rT
+ subs r0, r3
+ subs r1, rT
+
+ // Save sign of the result.
+ eors rT, r3
+
+ bl SYM(__umulsidi3) __PLT__ // 14+24
+
+ // Apply sign of the result.
+ eors r0, rT
+ eors r1, rT
+ subs r0, rT
+ sbcs r1, rT
+
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END mulsidi3
+
+
+// long long __aeabi_llsl(long long, int)
+// Logical shift left the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.llsl,"x"
+CM0_FUNC_START aeabi_llsl
+FUNC_ALIAS ashldi3 aeabi_llsl
+ CFI_START_FUNCTION
+
+ // Save a copy for the remainder.
+ movs r3, r0
+
+ // Assume a simple shift.
+ lsls r0, r2
+ lsls r1, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__llsl_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsrs r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__llsl_large):
+ // Apply any remaining shift
+ lsls r3, r2
+
+ // Merge remainder and result.
+ adds r1, r3
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ashldi3
+CM0_FUNC_END aeabi_llsl
+
+
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.llsr,"x"
+CM0_FUNC_START aeabi_llsr
+FUNC_ALIAS lshrdi3 aeabi_llsr
+ CFI_START_FUNCTION
+
+ // Save a copy for the remainder.
+ movs r3, r1
+
+ // Assume a simple shift.
+ lsrs r0, r2
+ lsrs r1, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__llsr_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsls r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__llsr_large):
+ // Apply any remaining shift
+ lsrs r3, r2
+
+ // Merge remainder and result.
+ adds r0, r3
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END lshrdi3
+CM0_FUNC_END aeabi_llsr
+
+
+// long long __aeabi_lasr(long long, int)
+// Arithmetic shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.libgcc.lasr,"x"
+CM0_FUNC_START aeabi_lasr
+FUNC_ALIAS ashrdi3 aeabi_lasr
+ CFI_START_FUNCTION
+
+ // Save a copy for the remainder.
+ movs r3, r1
+
+ // Assume a simple shift.
+ lsrs r0, r2
+ asrs r1, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__lasr_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsls r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__lasr_large):
+ // Apply any remaining shift
+ asrs r3, r2
+
+ // Merge remainder and result.
+ adds r0, r3
+ RETx lr
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ashrdi3
+CM0_FUNC_END aeabi_lasr
+
+
+#endif // __BUILD_CM0_FPLIB
diff -ruN libgcc/config/arm/lib1funcs.S libgcc/config/arm/lib1funcs.S
--- libgcc/config/arm/lib1funcs.S 2020-11-08 14:32:11.000000000 -0800
+++ libgcc/config/arm/lib1funcs.S 2020-11-12 10:13:44.383982884 -0800
@@ -1050,6 +1050,10 @@
/* ------------------------------------------------------------------------ */
/* Start of the Real Functions */
/* ------------------------------------------------------------------------ */
+
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
#ifdef L_udivsi3
#if defined(__prefer_thumb__)
@@ -1507,6 +1511,8 @@
cfi_end LSYM(Lend_div0)
FUNC_END div0
#endif
+
+#endif /* NOT_ISA_TARGET_32BIT */
#endif /* L_dvmd_lnx */
#ifdef L_clear_cache
@@ -1583,6 +1589,9 @@
so for Reg value in (32...63) and (-1...-31) we will get zero (in the
case of logical shifts) or the sign (for asr). */
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
#ifdef __ARMEB__
#define al r1
#define ah r0
@@ -1820,6 +1829,8 @@
#endif
#endif /* L_clzdi2 */
+#endif /* NOT_ISA_TARGET_32BIT */
+
#ifdef L_ctzsi2
#ifdef NOT_ISA_TARGET_32BIT
FUNC_START ctzsi2
@@ -2189,5 +2200,54 @@
#include "bpabi.S"
#else /* NOT_ISA_TARGET_32BIT */
#include "bpabi-v6m.S"
+
+
+#include "cm0/fplib.h"
+
+/* Temp registers. */
+#define rP r4
+#define rQ r5
+#define rS r6
+#define rT r7
+
+.macro CM0_FUNC_START name
+.global SYM(__\name)
+.type SYM(__\name),function
+.thumb_func
+.align 1
+ SYM(__\name):
+.endm
+
+.macro CM0_FUNC_END name
+.size SYM(__\name), . - SYM(__\name)
+.endm
+
+.macro RETx x
+ bx \x
+.endm
+
+/* Order files to maximize +/- 2k jump offset of 'b' */
+#define __BUILD_CM0_FPLIB
+
+#include "cm0/clz2.S"
+#include "cm0/lmul.S"
+#include "cm0/lcmp.S"
+#include "cm0/div.S"
+#include "cm0/ldiv.S"
+
+#include "cm0/fcmp.S"
+#include "cm0/fconv.S"
+#include "cm0/fneg.S"
+
+#include "cm0/fadd.S"
+#include "cm0/futil.S"
+#include "cm0/fmul.S"
+#include "cm0/fdiv.S"
+
+#include "cm0/ffloat.S"
+#include "cm0/ffixed.S"
+
+#undef __BUILD_CM0_FPLIB
+
#endif /* NOT_ISA_TARGET_32BIT */
#endif /* !__symbian__ */
next reply other threads:[~2020-11-12 23:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-12 23:04 Daniel Engel [this message]
2020-11-26 9:14 ` Christophe Lyon
2020-12-02 3:32 ` Daniel Engel
2020-12-16 17:15 ` Christophe Lyon
2021-01-06 11:20 ` [PATCH v3] " Daniel Engel
2021-01-06 17:05 ` Richard Earnshaw
2021-01-07 0:59 ` Daniel Engel
2021-01-07 12:56 ` Richard Earnshaw
2021-01-07 13:27 ` Christophe Lyon
2021-01-07 16:44 ` Richard Earnshaw
2021-01-09 12:28 ` Daniel Engel
2021-01-09 13:09 ` Christophe Lyon
2021-01-09 18:04 ` Daniel Engel
2021-01-11 14:49 ` Richard Earnshaw
2021-01-09 18:48 ` Daniel Engel
2021-01-11 16:07 ` Christophe Lyon
2021-01-11 16:18 ` Daniel Engel
2021-01-11 16:39 ` Christophe Lyon
2021-01-15 11:40 ` Daniel Engel
2021-01-15 12:30 ` Christophe Lyon
2021-01-16 16:14 ` Daniel Engel
2021-01-21 10:29 ` Christophe Lyon
2021-01-21 20:35 ` Daniel Engel
2021-01-22 18:28 ` Christophe Lyon
2021-01-25 17:48 ` Christophe Lyon
2021-01-25 23:36 ` Daniel Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com \
--to=libgcc@danielengel.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).