public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Daniel Engel" <libgcc@danielengel.com>
To: "Christophe Lyon" <christophe.lyon@linaro.org>
Cc: "gcc Patches" <gcc-patches@gcc.gnu.org>
Subject: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0
Date: Wed, 06 Jan 2021 03:20:18 -0800	[thread overview]
Message-ID: <962a0e7d-f431-42ee-aa42-e4e4cc823a10@www.fastmail.com> (raw)
In-Reply-To: <CAKdteObdRur2SfYQ3TPQ7soRCsS9d++krfq9138jWvbu+JFLxA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 17488 bytes --]

Hi Christophe, 

On Wed, Dec 16, 2020, at 9:15 AM, Christophe Lyon wrote:
> On Wed, 2 Dec 2020 at 04:31, Daniel Engel <libgcc@danielengel.com> wrote:
> >
> > Hi Christophe,
> >
> > On Thu, Nov 26, 2020, at 1:14 AM, Christophe Lyon wrote:
> > > Hi,
> > >
> > > On Fri, 13 Nov 2020 at 00:03, Daniel Engel <libgcc@danielengel.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch adds an efficient assembly-language implementation of IEEE-
> > > > 754 compliant floating point routines for Cortex M0 EABI (v6m, thumb-
> > > > 1).  This is the libgcc portion of a larger library originally
> > > > described in 2018:
> > > >
> > > >     https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
> > > >
> > > > Since that time, I've separated the libm functions for submission to
> > > > newlib.  The remaining libgcc functions in the attached patch have
> > > > the following characteristics:
> > > >
> > > >     Function(s)                     Size (bytes)        Cycles          Stack   Accuracy
> > > >     __clzsi2                        42                  23              0       exact
> > > >     __clzsi2 (OPTIMIZE_SIZE)        22                  55              0       exact
> > > >     __clzdi2                        8+__clzsi2          4+__clzsi2      0       exact
> > > >
> > > >     __umulsidi3                     44                  24              0       exact
> > > >     __mulsidi3                      30+__umulsidi3      24+__umulsidi3  8       exact
> > > >     __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3   0       exact
> > > >     __ashldi3 (__aeabi_llsl)        22                  13              0       exact
> > > >     __lshrdi3 (__aeabi_llsr)        22                  13              0       exact
> > > >     __ashrdi3 (__aeabi_lasr)        22                  13              0       exact
> > > >
> > > >     __aeabi_lcmp                    20                   13             0       exact
> > > >     __aeabi_ulcmp                   16                  10              0       exact
> > > >
> > > >     __udivsi3 (__aeabi_uidiv)       56                  72 – 385        0       < 1 lsb
> > > >     __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3    8       < 1 lsb
> > > >     __udivdi3 (__aeabi_uldiv)       164                 103 – 1394      16      < 1 lsb
> > > >     __udivdi3 (OPTIMIZE_SIZE)       142                 120 – 1392      16      < 1 lsb
> > > >     __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3    32      < 1 lsb
> > > >
> > > >     __shared_float                  178
> > > >     __shared_float (OPTIMIZE_SIZE)  154
> > > >
> > > >     __addsf3 (__aeabi_fadd)         116+__shared_float  31 – 76         8       <= 0.5 ulp
> > > >     __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74              8       <= 0.5 ulp
> > > >     __subsf3 (__aeabi_fsub)         8+__addsf3          6+__addsf3      8       <= 0.5 ulp
> > > >     __aeabi_frsub                   8+__addsf3          6+__addsf3      8       <= 0.5 ulp
> > > >     __mulsf3 (__aeabi_fmul)         112+__shared_float  73 – 97         8       <= 0.5 ulp
> > > >     __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93              8       <= 0.5 ulp
> > > >     __divsf3 (__aeabi_fdiv)         132+__shared_float  83 – 361        8       <= 0.5 ulp
> > > >     __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263 – 359       8       <= 0.5 ulp
> > > >
> > > >     __cmpsf2/__lesf2/__ltsf2        72                  33              0       exact
> > > >     __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2      0       exact
> > > >     __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2      0       exact
> > > >
> > > >     __floatundisf (__aeabi_ul2f)    14+__shared_float   40 – 81         8       <= 0.5 ulp
> > > >     __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40 – 237        8       <= 0.5 ulp
> > > >     __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf 8       <= 0.5 ulp
> > > >     __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf 8       <= 0.5 ulp
> > > >     __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf   8       <= 0.5 ulp
> > > >
> > > >     __fixsfdi (__aeabi_f2lz)        74                  27 – 33         0       exact
> > > >     __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi     0       exact
> > > >     __fixsfsi (__aeabi_f2iz)        52                  19              0       exact
> > > >     __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi     0       exact
> > > >     __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi     0       exact
> > > >
> > > >     __extendsfdf2 (__aeabi_f2d)     42+__shared_float 38             8     exact
> > > >     __aeabi_d2f                     56+__shared_float 54 – 58     8     <= 0.5 ulp
> > > >     __aeabi_h2f                     34+__shared_float 34             8     exact
> > > >     __aeabi_f2h                     84                 23 – 34         0     <= 0.5 ulp
> > > >
> > > > Copyright assignment is on file with the FSF.
> > > >
> > > > I've built the gcc-arm-none-eabi cross-compiler using the 20201108
> > > > snapshot of GCC plus this patch, and successfully compiled a test
> > > > program:
> > > >
> > > >     extern int main (void)
> > > >     {
> > > >         volatile int x = 1;
> > > >         volatile unsigned long long int y = 10;
> > > >         volatile long long int z = x / y; // 64-bit division
> > > >
> > > >         volatile float a = x; // 32-bit casting
> > > >         volatile float b = y; // 64 bit casting
> > > >         volatile float c = z / b; // float division
> > > >         volatile float d = a + c; // float addition
> > > >         volatile float e = c * b; // float multiplication
> > > >         volatile float f = d - e - c; // float subtraction
> > > >
> > > >         if (f != c) // float comparison
> > > >             y -= (long long int)d; // float casting
> > > >     }
> > > >
> > > > As one point of comparison, the test program links to 876 bytes of
> > > > libgcc code from the patched toolchain, vs 10276 bytes from the
> > > > latest released gcc-arm-none-eabi-9-2020-q2 toolchain.    That's a
> > > > 90% size reduction.
> > >
> > > This looks awesome!
> > >
> > > >
> > > > I have extensive test vectors, and have passed these tests on an
> > > > STM32F051.  These vectors were derived from UCB [1], Testfloat [2],
> > > > and IEEECC754 [3] sources, plus some of my own creation.
> > > > Unfortunately, I'm not sure how "make check" should work for a cross
> > > > compiler run time library.
> > > >
> > > > Although I believe this patch can be incorporated as-is, there are
> > > > at least two points that might bear discussion:
> > > >
> > > > * I'm not sure where or how they would be integrated, but I would be
> > > >   happy to provide sources for my test vectors.
> > > >
> > > > * The library is currently built for the ARM v6m architecture only.
> > > >   It is likely that some of the other Cortex variants would benefit
> > > >   from these routines.  However, I would need some guidance on this
> > > >   to proceed without introducing regressions.  I do not currently
> > > >   have a test strategy for architectures beyond Cortex M0, and I
> > > >   have NOT profiled the existing thumb-2 implementations (ieee754-
> > > >   sf.S) for comparison.
> > >
> > > I tried your patch, and I see many regressions in the GCC testsuite
> > > because many tests fail to link with errors like:
> > > ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function
> > > `__clzdi2':
> > > /libgcc/config/arm/cm0/clz2.S:39: multiple definition of
> > > `__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config/arm/cm0/clz2.S:39:
> > > first defined here
> > >
> > > This happens with a toolchain configured with --target arm-none-eabi,
> > > default cpu/fpu/mode,
> > > --enable-multilib --with-multilib-list=rmprofile and running the tests with
> > > -mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > >
> > > Does it work for you?
> >
> > Thanks for the feedback.
> >
> > I'm afraid I'm quite ignorant as to the gcc test suite
> > infrastructure, so I don't know how to use the options you've shared
> > above.  I'm cross- compiling the Windows toolchain on Ubuntu.  Would
> > you mind sharing a full command line you would use for testing?  The
> > toolchain is built with the default options, which includes "--
> > target arm-none-eabi".
> >
>
> Why put Windows in the picture? This seems unnecessarily
> complicated... I suggest you build your cross-toolchain on x86_64
> ubuntu and run it on x86_64 ubuntu (of course targetting arm)

Mostly because I had not previously committed the time to understand the
GCC regression test environment.  My company and personal computers both
run Windows.  I created an Ubuntu virtual machine for this project, and
I'd been trying to get by with the build scripts provided by the ARM
toolchain.  Clearly that was insufficient.

> The above options where GCC configure options, except for the last one
> which I used when running the tests.
>
> There is some documentation about how to run the GCC testsuite there:
> https://gcc.gnu.org/install/test.html

Thanks.  I was able to take this document, plus some additional pages
about constructing a combined tree with newlib, and put together a
working regression test.  GDB didn't want to build cleanly at first, so
eventually I gave up and disabled that part.

> Basically 'make check' should mostly work except for execution tests
> for which you'll need to teach DejaGnu how to run the generated
> programs on a real board or on a simulator.
>
> I didn't analyze your patch, I just submitted it to my validation
> system:
> https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20201130.patch2/report-build-info.html
> - the red "regressed" items indicate regressions in the testsuite. You
>   can click on "log" to download the corresponding gcc.log
> - the dark-red "build broken" items indicate that the toolchain build
>   failed
> - the orange "interrupted" items indicate an infrastructure problem,
>   so you can ignore such cases
> - similarly the dark red "ref build failed" indicate that the
>   reference build failed for some infrastructure reason
>
> for the arm-none-eabi target, several toolchain versions fail to
> build, some succeed. This is because I use different multilib
> configuration flags, it looks like the ones involving --with-
> multilib=rmprofile are broken with your patch.
>
> These ones should be reasonably easy to fix: no 'make check' involved.
> 
> For instance if you configure GCC with:
> --target arm-none-eabi --enable-multilib --with-multilib-list=rmprofile
> you should see the build failure.

So far, I have not found a cause for the build failures you are seeing.
The ARM toolchain script I was using before did build with the
'rmprofile' option.  With my current configure options, gcc builds
'rmprofile', 'aprofile', and even 'armeb'.  I did find a number of link
issues with 'make check' due to incorrect usage of the 'L_'  defines in
LIB1ASMFUNCS.  These are fixed in the new version attached.

Returning to the build failures you logged, I do consistently see this
message in the logs [1]: "fatal error: cm0/fplib.h: No such file or
directory".  I recognize the file, since it's one of the new files in
my patch (the full sub-directory is libgcc/config/arm/cm0/fplib.h).
Do I have to format patches in some different way so that new files
get created?

Regression testing also showed that the previous patch was failing the
"arm/divzero" test because I wasn't providing the same arguments to
div0() as the existing implementation.  Having made that change, I think
the patch is clean.  (I don't think there is a strict specification for
div0(), and the changes add a non-trivial number of instructions, but
I'll hold that discussion for another time).

Do you have time to re-check this patch on your build system?

Thanks,
Daniel

[1] Line 36054: <https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20201130.patch2/arm-none-eabi/build-rh70-arm-none-eabi-default-default-default-mthumb.-mcpu=cortex-m0.-mfloat-abi=soft.-march=armv6s-m.log.xz>

> 
> HTH
> 
> Christophe
> 
> > I did see similar errors once before.  It turned out then that I omitted
> > one of the ".S" files from the build.  My interpretation at that point
> > was that gcc had been searching multiple versions of "libgcc.a" and
> > unable to merge the symbols.  In hindsight, that was a really bad
> > interpretation.   I was able to reproduce the error above by simply
> > adding a line like "volatile double m = 1.0; m += 2;".
> >
> > After reviewing the existing asm implementations more closely, I
> > believe that I have not been using the function guard macros (L_arm_*)
> > as intended.  The make script appears to compile "lib1funcs.S" dozens of
> > times -- once for each function guard macro listed in LIB1ASMFUNCS --
> > with the intent of generating a separate ".o" file for each function.
> > Because they were unguarded, my new library functions were duplicated
> > into every ".o" file, which caused the link errors you saw.
> >
> > I have attached an updated patch that implements the macros.
> >
> > However, I'm not sure whether my usage is really consistent with the
> > spirit of the make script.  If there's a README or HOWTO, I haven't
> > found it yet.  The following points summarize my concerns as I was
> > making these updates:
> >
> > 1.  While some of the new functions (e.g. __cmpsf2) are standalone,
> >     there is a common core in the new library shared by several related
> >     functions.  That keeps the library small.  For now, I've elected to
> >     group all of these related functions together in a single object
> >     file "_arm_addsubsf3.o" to protect the short branches (+/-2KB)
> >     within this unit.  Notice that I manually assigned section names in
> >     the code, so there still shouldn't be any unnecessary code linked in
> >     the final build.  Does the multiple-".o" files strategy predate "-gc-
> >     sections", or should I be trying harder to break these related
> >     functions into separate compilation units?
> >
> > 2.  I introduced a few new macro keywords for functions/groups (e.g.
> >     "_arm_f2h" and '_arm_f2h'.  My assumption is that some empty ".o"
> >     files compiled for the non-v6m architectures will be benign.
> >
> > 3.  The "t-elf" make script implies that __mulsf3() should not be
> >     compiled in thumb mode (it's inside a conditional), but this is one
> >     of the new functions.  Moot for now, since my __mulsf3() is grouped
> >     with the common core functions (see point 1) and is thus currently
> >     guarded by the "_arm_addsubsf3.o" macro.
> >
> > 4.  The advice (in "ieee754-sf.S") regarding WEAK symbols does not seem
> >     to be working.  I have defined __clzsi2() as a weak symbol to be
> >     overridden by the combined function __clzdi2().  I can also see
> >     (with "nm") that "clzsi2.o" is compiled before "clzdi2.o" in
> >     "libgcc.a".  Yet, the full __clzdi2() function (8 bytes larger) is
> >     always linked, even in programs that only call __clzsi2(),  A minor
> >     annoyance at this point.
> >
> > 5.  Is there a permutation of the makefile that compiles libgcc with
> >     __OPTIMIZE_SIZE__?  There are a few sections in the patch that can
> >     optimize either way, yet the final product only seems to have the
> >     "fast" code.  At this optimization level, the sample program above
> >     pulls in 1012 bytes of library code instead of 836. Perhaps this is
> >     meant to be controlled by the toolchain configuration step, but it
> >     doesn't follow that the optimization for the cross-compiler would
> >     automatically translate to the target runtime libraries.
> >
> > Thanks again,
> > Daniel
> >
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > >
> > > > I'm naturally hoping for some action on this patch before the Nov 16th deadline for GCC-11 stage 3.  Please review and advise.
> > > >
> > > > Thanks,
> > > > Daniel Engel
> > > >
> > > > [1] http://www.netlib.org/fp/ucbtest.tgz
> > > > [2] http://www.jhauser.us/arithmetic/TestFloat.html
> > > > [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html
> > >
>

[-- Attachment #2: cortex-m0-fplib-20210105.patch --]
[-- Type: application/octet-stream, Size: 195994 bytes --]

diff -ruN gcc-11-20201220-clean/libgcc/config/arm/bpabi.S gcc-11-20201220/libgcc/config/arm/bpabi.S
--- gcc-11-20201220-clean/libgcc/config/arm/bpabi.S	2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/bpabi.S	2021-01-06 02:45:47.416262493 -0800
@@ -34,48 +34,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-ARM_FUNC_START aeabi_lcmp
-	cmp	xxh, yyh
-	do_it	lt
-	movlt	r0, #-1
-	do_it	gt
-	movgt	r0, #1
-	do_it	ne
-	RETc(ne)
-	subs	r0, xxl, yyl
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	RET
-	FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-	
-#ifdef L_aeabi_ulcmp
-
-ARM_FUNC_START aeabi_ulcmp
-	cmp	xxh, yyh
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	do_it	ne
-	RETc(ne)
-	cmp	xxl, yyl
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	do_it	eq
-	moveq	r0, #0
-	RET
-	FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
 .macro test_div_by_zero signed
 /* Tail-call to divide-by-zero handlers which may be overridden by the user,
    so unwinding works properly.  */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/bpabi-v6m.S gcc-11-20201220/libgcc/config/arm/bpabi-v6m.S
--- gcc-11-20201220-clean/libgcc/config/arm/bpabi-v6m.S	2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/bpabi-v6m.S	2021-01-06 02:45:47.428262284 -0800
@@ -33,212 +33,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-	cmp	xxh, yyh
-	beq	1f
-	bgt	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-	RET
-1:
-	subs	r0, xxl, yyl
-	beq	1f
-	bhi	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-1:
-	RET
-	FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-	
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-	cmp	xxh, yyh
-	bne	1f
-	subs	r0, xxl, yyl
-	beq	2f
-1:
-	bcs	1f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-1:
-	movs	r0, #1
-2:
-	RET
-	FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
-.macro test_div_by_zero signed
-	cmp	yyh, #0
-	bne	7f
-	cmp	yyl, #0
-	bne	7f
-	cmp	xxh, #0
-	.ifc	\signed, unsigned
-	bne	2f
-	cmp	xxl, #0
-2:
-	beq	3f
-	movs	xxh, #0
-	mvns	xxh, xxh		@ 0xffffffff
-	movs	xxl, xxh
-3:
-	.else
-	blt	6f
-	bgt	4f
-	cmp	xxl, #0
-	beq	5f
-4:	movs	xxl, #0
-	mvns	xxl, xxl		@ 0xffffffff
-	lsrs	xxh, xxl, #1		@ 0x7fffffff
-	b	5f
-6:	movs	xxh, #0x80
-	lsls	xxh, xxh, #24		@ 0x80000000
-	movs	xxl, #0
-5:
-	.endif
-	@ tailcalls are tricky on v6-m.
-	push	{r0, r1, r2}
-	ldr	r0, 1f
-	adr	r1, 1f
-	adds	r0, r1
-	str	r0, [sp, #8]
-	@ We know we are not on armv4t, so pop pc is safe.
-	pop	{r0, r1, pc}
-	.align	2
-1:
-	.word	__aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-	test_div_by_zero signed
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__gnu_ldivmod_helper)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
-	test_div_by_zero unsigned
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__udivmoddi4)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_uldivmod
-	
-#endif /* L_aeabi_uldivmod */
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
-      push	{r4, lr}
-      movs	r4, #1
-      lsls	r4, #31
-      eors	r0, r0, r4
-      bl	__aeabi_fadd
-      pop	{r4, pc}
-
-      FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
-	mov	ip, r0
-	movs	r0, r1
-	mov	r1, ip
-	b	6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-	@ The status-returning routines are required to preserve all
-	@ registers except ip, lr, and cpsr.
-6:	push	{r0, r1, r2, r3, r4, lr}
-	bl	__lesf2
-	@ Set the Z flag correctly, and the C flag unconditionally.
-	cmp	r0, #0
-	@ Clear the C flag if the return value was -1, indicating
-	@ that the first operand was smaller than the second.
-	bmi	1f
-	movs	r1, #0
-	cmn	r0, r1
-1:
-	pop	{r0, r1, r2, r3, r4, pc}
-
-	FUNC_END aeabi_cfcmple
-	FUNC_END aeabi_cfcmpeq
-	FUNC_END aeabi_cfrcmple
-
-FUNC_START	aeabi_fcmpeq
-
-	push	{r4, lr}
-	bl	__eqsf2
-	negs	r0, r0
-	adds	r0, r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START	aeabi_fcmp\cond
-
-	push	{r4, lr}
-	bl	__\helper\mode
-	cmp	r0, #0
-	b\cond	1f
-	movs	r0, #0
-	pop	{r4, pc}
-1:
-	movs	r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/clz2.S gcc-11-20201220/libgcc/config/arm/cm0/clz2.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/clz2.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/clz2.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,324 @@
+/* clz2.S: Cortex M0 optimized 'clz' functions 
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.clz2.clzdi2,"x"
+CM0_FUNC_START clzdi2
+    CFI_START_FUNCTION
+
+        // Moved here from lib1funcs.S
+        cmp     xxh,    #0
+        do_it   eq,     et
+        clzeq   r0,     xxl
+        clzne   r0,     xxh
+        addeq   r0,     #32
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
+
+
+#ifdef L_clzsi2
+
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.clz2.clzsi2,"x"
+CM0_FUNC_START clzsi2
+    CFI_START_FUNCTION
+
+        // Moved here from lib1funcs.S
+        clz     r0,     r0
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+
+#endif /* L_clzsi2 */
+
+#else /* !__ARM_FEATURE_CLZ */
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clzdi2,"x"
+CM0_FUNC_START clzdi2
+    CFI_START_FUNCTION
+
+  #if defined(__ARMEB__) && __ARMEB__
+        // Check if the upper word is zero.
+        cmp     r0,     #0
+
+        // The upper word is non-zero, so calculate __clzsi2(upper).
+        bne     SYM(__clzsi2)
+
+        // The upper word is zero, so calculate 32 + __clzsi2(lower).
+        movs    r2,     #64
+        movs    r0,     r1
+        b       SYM(__internal_clzsi2)
+        
+  #else /* !__ARMEB__ */
+        // Assume all the bits in the argument are zero.
+        movs    r2,     #64
+
+        // Check if the upper word is zero.
+        cmp     r1,     #0
+
+        // The upper word is zero, so calculate 32 + __clzsi2(lower).
+        beq     SYM(__internal_clzsi2)
+
+        // The upper word is non-zero, so set up __clzsi2(upper).
+        // Then fall through.
+        movs    r0,     r1
+        
+  #endif /* !__ARMEB__ */
+
+#endif /* L_clzdi2 */
+
+
+// The bitwise implementation of __clzdi2() tightly couples with __clzsi2(), 
+//  such that instructions must appear consecutively in the same memory 
+//  section for proper flow control.  However, this construction inhibits 
+//  the ability to discard __clzdi2() when only using __clzsi2().
+// Therefore, this block configures __clzsi2() for compilation twice.  
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __clzdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and  
+//  provide both symbols when required. 
+// '_clzsi2' should appear before '_clzdi2' in LIB1ASMFUNCS.
+#if defined(L_clzsi2) || defined(L_clzdi2)
+
+#ifdef L_clzsi2
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clzsi2,"x"
+CM0_WEAK_START clzsi2
+    CFI_START_FUNCTION
+
+#else /* L_clzdi2 */
+CM0_FUNC_START clzsi2
+
+#endif
+
+        // Assume all the bits in the argument are zero
+        movs    r2,     #32
+
+#ifdef L_clzsi2
+    CM0_WEAK_START internal_clzsi2
+#else /* L_clzdi2 */
+    CM0_FUNC_START internal_clzsi2
+#endif
+
+        // Size optimized: 22 bytes, 51 cycles 
+        // Speed optimized: 50 bytes, 20 cycles
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+        // Binary search starts at half the word width.
+        movs    r3,     #16
+
+    LSYM(__clz_loop):
+        // Test the upper 'n' bits of the operand for ZERO.
+        movs    r1,     r0
+        lsrs    r1,     r3
+        beq     LSYM(__clz_skip)
+
+        // When the test fails, discard the lower bits of the register,
+        //  and deduct the count of discarded bits from the result.
+        movs    r0,     r1
+        subs    r2,     r3
+
+    LSYM(__clz_skip):
+        // Decrease the shift distance for the next test.
+        lsrs    r3,     #1
+        bne     LSYM(__clz_loop)
+
+  #else /* __OPTIMIZE_SIZE__ */
+
+        // Unrolled binary search.
+        lsrs    r1,     r0,     #16
+        beq     LSYM(__clz8)
+        movs    r0,     r1
+        subs    r2,     #16
+
+    LSYM(__clz8):
+        lsrs    r1,     r0,     #8
+        beq     LSYM(__clz4)
+        movs    r0,     r1
+        subs    r2,     #8
+
+    LSYM(__clz4):
+        lsrs    r1,     r0,     #4
+        beq     LSYM(__clz2)
+        movs    r0,     r1
+        subs    r2,     #4
+
+    LSYM(__clz2):
+        // Load the remainder by index
+	adr     r1,     LSYM(__clz_remainder)
+        ldrb    r0,     [r1, r0]
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+        // Account for the remainder.
+        subs    r0,     r2,     r0
+        RET
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        .align 2
+    LSYM(__clz_remainder):
+        .byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+
+#ifdef L_clzdi2
+CM0_FUNC_END clzdi2
+#endif
+
+#endif /* L_clzsi2 || L_clzdi2 */
+
+#endif /* !__ARM_FEATURE_CLZ */
+
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clrsbdi2,"x"
+CM0_FUNC_START clrsbdi2
+    CFI_START_FUNCTION
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+        // Invert negative signs to keep counting zeros.
+        asrs    r3,     xxh,    #31
+        eors    xxl,    r3 
+        eors    xxh,    r3 
+
+        // Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+        // Also, the trailing 'subs', since the last bit is not redundant.  
+        do_it   eq,     et
+        clzeq   r0,     xxl
+        clzne   r0,     xxh
+        addeq   r0,     #32
+        subs    r0,     #1
+        RET
+
+  #else  /* !__ARM_FEATURE_CLZ */ 
+        // Result if all the bits in the argument are zero.
+        // Set it here to keep the flags clean after 'eors' below.  
+        movs    r2,     #31         
+
+        // Invert negative signs to keep counting zeros.
+        asrs    r3,     xxh,    #31
+        eors    xxh,    r3 
+
+    #if defined(__ARMEB__) && __ARMEB__
+        // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+        bne     SYM(__internal_clzsi2) 
+
+        // The upper word is zero, prepare the lower word.
+        movs    r0,     r1
+        eors    r0,     r3 
+
+    #else /* !__ARMEB__ */
+        // Save the lower word temporarily. 
+        // This somewhat awkward construction adds one cycle when the  
+        //  branch is not taken, but prevents a double-branch.   
+        eors    r3,     r0
+
+        // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+        movs    r0,     r1
+        bne    SYM(__internal_clzsi2)
+
+        // Restore the lower word. 
+        movs    r0,     r3 
+
+    #endif /* !__ARMEB__ */
+
+        // The upper word is zero, return '31 + __clzsi2(lower)'.
+        adds    r2,     #32
+        b       SYM(__internal_clzsi2)
+
+  #endif /* !__ARM_FEATURE_CLZ */ 
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.  
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clrsbsi2,"x"
+CM0_FUNC_START clrsbsi2 
+    CFI_START_FUNCTION
+
+        // Invert negative signs to keep counting zeros.
+        asrs    r2,     r0,    #31
+        eors    r0,     r2
+
+      #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+        // Count.  
+        clz     r0,     r0
+
+        // The result for a positive value will always be >= 1.  
+        // By definition, the last bit is not redundant. 
+        subs    r0,     #1
+        RET  
+
+      #else /* !__ARM_FEATURE_CLZ */
+        // Result if all the bits in the argument are zero.
+        // By definition, the last bit is not redundant. 
+        movs    r2,     #31
+        b       SYM(__internal_clzsi2)
+
+      #endif  /* !__ARM_FEATURE_CLZ */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END clrsbsi2 
+
+#endif /* L_clrsbsi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ctz2.S gcc-11-20201220/libgcc/config/arm/cm0/ctz2.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ctz2.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ctz2.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,285 @@
+/* ctz2.S: Cortex M0 optimized 'ctz' functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// When the hardware 'clz' function is available, an efficient version 
+//  of __ctzsi2(x) can be created by calculating '31 - __clzsi2(lsb(x))', 
+//  where lsb(x) is 'x' with only the least-significant '1' bit set.  
+// The following offset applies to all of the functions in this file.   
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+  #define CTZ_RESULT_OFFSET 1
+#else 
+  #define CTZ_RESULT_OFFSET 0
+#endif 
+
+
+#ifdef L_ctzdi2
+
+// int __ctzdi2(long long)
+// Counts trailing zeros in a 64 bit double word.
+// Expects the argument  in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.ctz2.ctzdi2,"x"
+CM0_FUNC_START ctzdi2
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // Assume all the bits in the argument are zero.
+        movs    r2,    #(64 - CTZ_RESULT_OFFSET)
+        
+        // Check if the lower word is zero.
+        cmp     r1,     #0
+        
+        // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+        beq     SYM(__internal_ctzsi2)
+
+        // The lower word is non-zero, so set up __ctzsi2(lower).
+        // Then fall through.
+        movs    r0,     r1
+        
+      #else /* !__ARMEB__ */
+        // Check if the lower word is zero.
+        cmp     r0,     #0
+        
+        // If the lower word is non-zero, result is just __ctzsi2(lower).
+        bne     SYM(__ctzsi2)
+
+        // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+        movs    r2,    #(64 - CTZ_RESULT_OFFSET)
+        movs    r0,     r1
+        b       SYM(__internal_ctzsi2)
+        
+      #endif /* !__ARMEB__ */
+
+#endif /* L_ctzdi2 */
+
+
+// The bitwise implementation of __clzdi2() tightly couples with __ctzsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __clzdi2() when only using __ctzsi2().
+// Therefore, this block configures __ctzsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __clzdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_ctzsi2' should appear before '_clzdi2' in LIB1ASMFUNCS.
+#if defined(L_ctzsi2) || defined(L_ctzdi2)
+
+#ifdef L_ctzsi2
+// int __ctzsi2(int)
+// Counts trailing zeros in a 32 bit word.
+// Expects the argument in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.ctz2.ctzsi2,"x"
+CM0_WEAK_START ctzsi2
+    CFI_START_FUNCTION
+
+#else /* L_ctzdi2 */
+CM0_FUNC_START ctzsi2
+
+#endif
+
+        // Assume all the bits in the argument are zero
+        movs    r2,     #(32 - CTZ_RESULT_OFFSET)
+
+#ifdef L_ctzsi2
+    CM0_WEAK_START internal_ctzsi2
+#else /* L_ctzdi2 */
+    CM0_FUNC_START internal_ctzsi2
+#endif
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+        // Find the least-significant '1' bit of the argument. 
+        rsbs    r1,     r0,     #0
+        ands    r1,     r0
+        
+        // Maintain result compatibility with the software implementation.
+        // Technically, __ctzsi2(0) is undefined, but 32 seems better than -1.
+        //  (or possibly 31 if this is an intermediate result for __ctzdi2(0)).   
+        // The carry flag from 'rsbs' gives '-1' iff the argument was 'zero'.  
+        //  (NOTE: 'ands' with 0 shift bits does not change the carry flag.)
+        // After the jump, the final result will be '31 - (-1)'.   
+        sbcs    r0,     r0
+        beq     LSYM(__ctz_zero)
+
+        // Gives the number of '0' bits left of the least-significant '1'.  
+        clz     r0,     r1
+
+  #elif defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Size optimized: 24 bytes, 52 cycles
+        // Speed optimized: 52 bytes, 21 cycles
+
+        // Binary search starts at half the word width.
+        movs    r3,     #16
+
+    LSYM(__ctz_loop):
+        // Test the upper 'n' bits of the operand for ZERO.
+        movs    r1,     r0
+        
+        lsls    r1,     r3
+        beq     LSYM(__ctz_skip)
+
+        // When the test fails, discard the lower bits of the register,
+        //  and deduct the count of discarded bits from the result.
+        movs    r0,     r1
+        subs    r2,     r3
+
+    LSYM(__ctz_skip):
+        // Decrease the shift distance for the next test.
+        lsrs    r3,     #1
+        bne     LSYM(__ctz_loop)
+       
+        // Prepare the remainder.
+        lsrs    r0,     #31
+ 
+  #else /* !__OPTIMIZE_SIZE__ */
+ 
+        // Unrolled binary search.
+        lsls    r1,     r0,     #16
+        beq     LSYM(__ctz8)
+        movs    r0,     r1
+        subs    r2,     #16
+
+    LSYM(__ctz8):
+        lsls    r1,     r0,     #8
+        beq     LSYM(__ctz4)
+        movs    r0,     r1
+        subs    r2,     #8
+
+    LSYM(__ctz4):
+        lsls    r1,     r0,     #4
+        beq     LSYM(__ctz2)
+        movs    r0,     r1
+        subs    r2,     #4
+
+    LSYM(__ctz2):
+        // Load the remainder by index
+        lsrs    r0,     #28 
+        adr     r3,     LSYM(__ctz_remainder)
+        ldrb    r0,     [r3, r0]
+  
+  #endif /* !__OPTIMIZE_SIZE__ */ 
+
+    LSYM(__ctz_zero):
+        // Apply the remainder.
+        subs    r0,     r2,     r0
+        RET
+       
+  #if (!defined(__ARM_FEATURE_CLZ) || !__ARM_FEATURE_CLZ) && \
+      (!defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__)
+        .align 2
+    LSYM(__ctz_remainder):
+        .byte 0,4,3,4,2,4,3,4,1,4,3,4,2,4,3,4
+  #endif  
+ 
+    CFI_END_FUNCTION
+CM0_FUNC_END ctzsi2
+
+#ifdef L_ctzdi2
+CM0_FUNC_END ctzdi2
+#endif
+
+#endif /* L_ctzsi2 || L_ctzdi2 */
+
+ 
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0, 
+//  or zero if $r1:r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+.section .text.sorted.libgcc.ctz2.ffsdi2,"x"
+CM0_FUNC_START ffsdi2
+    CFI_START_FUNCTION
+       
+        // Simplify branching by assuming a non-zero lower word.  
+        // For all such, ffssi2(x) == ctzsi2(x) + 1.  
+        movs    r2,    #(33 - CTZ_RESULT_OFFSET)
+        
+      #if defined(__ARMEB__) && __ARMEB__
+        // HACK: Save the upper word in a scratch register. 
+        movs    r3,     r0
+      
+        // Test the lower word.
+        movs    r0,     r1
+        bne     SYM(__internal_ctzsi2)
+
+        // Test the upper word.
+        movs    r2,    #(65 - CTZ_RESULT_OFFSET)
+        movs    r0,     r3
+        bne     SYM(__internal_ctzsi2)
+        
+      #else /* !__ARMEB__ */
+        // Test the lower word.
+        cmp     r0,     #0
+        bne     SYM(__internal_ctzsi2)
+
+        // Test the upper word.
+        movs    r2,    #(65 - CTZ_RESULT_OFFSET)
+        movs    r0,     r1
+        bne     SYM(__internal_ctzsi2)
+        
+      #endif /* !__ARMEB__ */
+
+        // Upper and lower words are both zero. 
+        RET
+        
+    CFI_END_FUNCTION
+CM0_FUNC_END ffsdi2
+   
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2 
+    
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0, 
+//  or zero if $r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+.section .text.sorted.libgcc.ctz2.ffssi2,"x"
+CM0_FUNC_START ffssi2
+    CFI_START_FUNCTION
+
+        // Simplify branching by assuming a non-zero argument.  
+        // For all such, ffssi2(x) == ctzsi2(x) + 1.  
+        movs    r2,    #(33 - CTZ_RESULT_OFFSET)
+ 
+        // Test for zero, return unmodified.  
+        cmp     r0,     #0 
+        bne     SYM(__internal_ctzsi2)
+        RET
+ 
+    CFI_END_FUNCTION
+CM0_FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fadd.S gcc-11-20201220/libgcc/config/arm/cm0/fadd.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fadd.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fadd.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,324 @@
+/* fadd.S: Cortex M0 optimized 32-bit float addition
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+.section .text.sorted.libgcc.fpcore.b.frsub,"x"
+CM0_FUNC_START aeabi_frsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r0 is NAN before modifying.
+        lsls    r2,     r0,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and run through fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r0,     r2
+        b       SYM(__aeabi_fadd)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
+#ifdef L_arm_addsubsf3 
+
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+.section .text.sorted.libgcc.fpcore.c.faddsub,"x"
+CM0_FUNC_START aeabi_fsub
+CM0_FUNC_ALIAS subsf3 aeabi_fsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r1 is NAN before modifying.
+        lsls    r2,     r1,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and fall into fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r1,     r2
+
+#endif /* L_arm_addsubsf3 */
+
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+//  instructions must appear consecutively in the same memory section.
+//  However, this construction inhibits the ability to discard __subsf3()
+//  when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __subsf3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3) 
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+.section .text.sorted.libgcc.fpcore.c.fadd,"x"
+CM0_WEAK_START aeabi_fadd
+CM0_WEAK_ALIAS addsf3 aeabi_fadd
+    CFI_START_FUNCTION
+
+#else /* L_arm_addsubsf3 */
+CM0_FUNC_START aeabi_fadd
+CM0_FUNC_ALIAS addsf3 aeabi_fadd
+
+#endif
+ 
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Drop the sign bit to compare absolute value.
+        lsls    r2,     r0,     #1
+        lsls    r3,     r1,     #1
+
+        // Save the logical difference of original values.
+        // This actually makes the following swap slightly faster.
+        eors    r1,     r0
+
+        // Compare exponents+mantissa.
+        // MAYBE: Speedup for equal values?  This would have to separately
+        //  check for NAN/INF and then either:
+        // * Increase the exponent by '1' (for multiply by 2), or
+        // * Return +0
+        cmp     r2,     r3
+        bhs     LSYM(__fadd_ordered)
+
+        // Reorder operands so the larger absolute value is in r2,
+        //  the corresponding original operand is in $r0,
+        //  and the smaller absolute value is in $r3.
+        movs    r3,     r2
+        eors    r0,     r1
+        lsls    r2,     r0,     #1
+
+    LSYM(__fadd_ordered):
+        // Extract the exponent of the larger operand.
+        // If INF/NAN, then it becomes an automatic result.
+        lsrs    r2,     #24
+        cmp     r2,     #255
+        beq     LSYM(__fadd_special)
+
+        // Save the sign of the result.
+        lsrs    rT,     r0,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // If the original value of $r1 was to +/-0,
+        //  $r0 becomes the automatic result.
+        // Because $r0 is known to be a finite value, return directly.
+        // It's actually important that +/-0 not go through the normal
+        //  process, to keep "-0 +/- 0"  from being turned into +0.
+        cmp     r3,     #0
+        beq     LSYM(__fadd_zero)
+
+        // Extract the second exponent.
+        lsrs    r3,     #24
+
+        // Calculate the difference of exponents (always positive).
+        subs    r3,     r2,     r3
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // If the smaller operand is more than 25 bits less significant
+        //  than the larger, the larger operand is an automatic result.
+        // The smaller operand can't affect the result, even after rounding.
+        cmp     r3,     #25
+        bhi     LSYM(__fadd_return)
+      #endif
+
+        // Isolate both mantissas, recovering the smaller.
+        lsls    rT,     r0,     #9
+        lsls    r0,     r1,     #9
+        eors    r0,     rT
+
+        // If the larger operand is normal, restore the implicit '1'.
+        // If subnormal, the second operand will also be subnormal.
+        cmp     r2,     #0
+        beq     LSYM(__fadd_normal)
+        adds    rT,     #1
+        rors    rT,     rT
+
+        // If the smaller operand is also normal, restore the implicit '1'.
+        // If subnormal, the smaller operand effectively remains multiplied
+        //  by 2 w.r.t the first.  This compensates for subnormal exponents,
+        //  which are technically still -126, not -127.
+        cmp     r2,     r3
+        beq     LSYM(__fadd_normal)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LSYM(__fadd_normal):
+        // Provide a spare bit for overflow.
+        // Normal values will be aligned in bits [30:7]
+        // Subnormal values will be aligned in bits [30:8]
+        lsrs    rT,     #1
+        lsrs    r0,     #1
+
+        // If signs weren't matched, negate the smaller operand (branchless).
+        asrs    r1,     #31
+        eors    r0,     r1
+        subs    r0,     r1
+
+        // Keep a copy of the small mantissa for the remainder.
+        movs    r1,     r0
+
+        // Align the small mantissa for addition.
+        asrs    r1,     r3
+
+        // Isolate the remainder.
+        // NOTE: Given the various cases above, the remainder will only
+        //  be used as a boolean for rounding ties to even.  It is not
+        //  necessary to negate the remainder for subtraction operations.
+        rsbs    r3,     #0
+        adds    r3,     #32
+        lsls    r0,     r3
+
+        // Because operands are ordered, the result will never be negative.
+        // If the result of subtraction is 0, the overall result must be +0.
+        // If the overall result in $r1 is 0, then the remainder in $r0
+        //  must also be 0, so no register copy is necessary on return.
+        adds    r1,     rT
+        beq     LSYM(__fadd_return)
+
+        // The large operand was aligned in bits [29:7]...
+        // If the larger operand was normal, the implicit '1' went in bit [30].
+        //
+        // After addition, the MSB of the result may be in bit:
+        //    31,  if the result overflowed.
+        //    30,  the usual case.
+        //    29,  if there was a subtraction of operands with exponents
+        //          differing by more than 1.
+        //  < 28, if there was a subtraction of operands with exponents +/-1,
+        //  < 28, if both operands were subnormal.
+
+        // In the last case (both subnormal), the alignment shift will be 8,
+        //  the exponent will be 0, and no rounding is necessary.
+        cmp     r2,     #0
+        bne     SYM(__fp_assemble)
+
+        // Subnormal overflow automatically forms the correct exponent.
+        lsrs    r0,     r1,     #8
+        add     r0,     ip
+
+    LSYM(__fadd_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LSYM(__fadd_special):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If $r1 is (also) NAN, force it in place of $r0.
+        // As the smaller NAN, it is more likely to be signaling.
+        movs    rT,     #255
+        lsls    rT,     #24
+        cmp     r3,     rT
+        bls     LSYM(__fadd_ordered2)
+
+        eors    r0,     r1
+      #endif
+
+    LSYM(__fadd_ordered2):
+        // There are several possible cases to consider here:
+        //  1. Any NAN/NAN combination
+        //  2. Any NAN/INF combination
+        //  3. Any NAN/value combination
+        //  4. INF/INF with matching signs
+        //  5. INF/INF with mismatched signs.
+        //  6. Any INF/value combination.
+        // In all cases but the case 5, it is safe to return $r0.
+        // In the special case, a new NAN must be constructed.
+        // First, check the mantissa to see if $r0 is NAN.
+        lsls    r2,     r0,     #9
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LSYM(__fadd_return)
+      #endif
+
+    LSYM(__fadd_zero):
+        // Next, check for an INF/value combination.
+        lsls    r2,     r1,     #1
+        bne     LSYM(__fadd_return)
+
+        // Finally, check for matching sign on INF/INF.
+        // Also accepts matching signs when +/-0 are added.
+        bcc     LSYM(__fadd_return)
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(SUBTRACTED_INFINITY)
+      #endif
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // Restore original operands.
+        eors    r1,     r0
+      #endif
+
+        // Identify mismatched 0.
+        lsls    r2,     r0,     #1
+        bne     SYM(__fp_exception)
+
+        // Force mismatched 0 to +0.
+        eors    r0,     r0
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END addsf3
+CM0_FUNC_END aeabi_fadd
+
+#ifdef L_arm_addsubsf3
+CM0_FUNC_END subsf3
+CM0_FUNC_END aeabi_fsub
+#endif
+
+#endif /* L_arm_addsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fcmp.S gcc-11-20201220/libgcc/config/arm/cm0/fcmp.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fcmp.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fcmp.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,634 @@
+/* fcmp.S: Cortex M0 optimized 32-bit float comparison
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_cmpsf2
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * +1 if ($r0 > $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.sorted.libgcc.fcmp.cmpsf2,"x"
+CM0_FUNC_START cmpsf2
+CM0_FUNC_ALIAS lesf2 cmpsf2
+CM0_FUNC_ALIAS ltsf2 cmpsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+CM0_FUNC_START internal_cmpsf2
+
+        // When operand signs are considered, the comparison result falls
+        //  within one of the following quadrants:
+        //
+        // $r0  $r1  $r0-$r1* flags  result
+        //  +    +      >      C=0     GT
+        //  +    +      =      Z=1     EQ
+        //  +    +      <      C=1     LT
+        //  +    -      >      C=1     GT
+        //  +    -      =      C=1     GT
+        //  +    -      <      C=1     GT
+        //  -    +      >      C=0     LT
+        //  -    +      =      C=0     LT
+        //  -    +      <      C=0     LT
+        //  -    -      >      C=0     LT
+        //  -    -      =      Z=1     EQ
+        //  -    -      <      C=1     GT
+        //
+        // *When interpeted as a subtraction of unsigned integers
+        //
+        // From the table, it is clear that in the presence of any negative
+        //  operand, the natural result simply needs to be reversed.
+        // Save the 'N' flag for later use.
+        movs    r3,     r0
+        orrs    r3,     r1
+        mov     ip,     r3
+
+        // Keep the absolute value of the second argument for NAN testing.
+        lsls    r3,     r1,     #1
+
+        // With the absolute value of the second argument safely stored,
+        //  recycle $r1 to calculate the difference of the arguments.
+        subs    r1,     r0,     r1
+
+        // Save the 'C' flag for use later.
+        // Effectively shifts all the flags 1 bit left.
+        adcs    r2,     r2
+
+        // Absolute value of the first argument.
+        lsls    r0,     #1
+
+        // Identify the largest absolute value between the two arguments.
+        cmp     r0,     r3
+        bhs     LSYM(__fcmp_sorted)
+
+        // Keep the larger absolute value for NAN testing.
+        // NOTE: When the arguments are respectively a signaling NAN and a
+        //  quiet NAN, the quiet NAN has precedence.  This has consequences
+        //  if TRAP_NANS is enabled, but the flags indicate that exceptions
+        //  for quiet NANs should be suppressed.  After the signaling NAN is
+        //  discarded, no exception is raised, although it should have been.
+        // This could be avoided by using a fifth register to save both
+        //  arguments until the signaling bit can be tested, but that seems
+        //  like an excessive amount of ugly code for an ambiguous case.
+        movs    r0,     r3
+
+    LSYM(__fcmp_sorted):
+        // If $r3 is NAN, the result is unordered.
+        movs    r3,     #255
+        lsls    r3,     #24
+        cmp     r0,     r3
+        bhi     LSYM(__fcmp_unordered)
+
+        // Positive and negative zero must be considered equal.
+        // If the larger absolute value is +/-0, both must have been +/-0.
+        subs    r3,     r0,     #0
+        beq     LSYM(__fcmp_zero)
+
+        // Test for regular equality.
+        subs    r3,     r1,     #0
+        beq     LSYM(__fcmp_zero)
+
+        // Isolate the saved 'C', and invert if either argument was negative.
+        // Remembering that the original subtraction was $r1 - $r0,
+        //  the result will be 1 if 'C' was set (gt), or 0 for not 'C' (lt).
+        lsls    r3,     r2,     #31
+        add     r3,     ip
+        lsrs    r3,     #31
+
+        // HACK: Force the 'C' bit clear, 
+        //  since bit[30] of $r3 may vary with the operands.
+        adds    r3,     #0
+
+    LSYM(__fcmp_zero):
+        // After everything is combined, the temp result will be
+        //  2 (gt), 1 (eq), or 0 (lt).
+        adcs    r3,     r3
+
+        // Short-circuit return if the 3-way comparison flag is set.
+        // Otherwise, shifts the condition mask into bits[2:0].
+        lsrs    r2,     #2
+        bcs     LSYM(__fcmp_return)
+
+        // If the bit corresponding to the comparison result is set in the
+        //  accepance mask, a '1' will fall out into the result.
+        movs    r0,     #1
+        lsrs    r2,     r3
+        ands    r0,     r2
+        RET
+
+    LSYM(__fcmp_unordered):
+        // Set up the requested UNORDERED result.
+        // Remember the shift in the flags (above).
+        lsrs    r2,     #6
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // TODO: ... The
+
+
+  #endif
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Always raise an exception if FCMP_RAISE_EXCEPTIONS was specified.
+        bcs     LSYM(__fcmp_trap)
+
+        // If FCMP_NO_EXCEPTIONS was specified, no exceptions on quiet NANs.
+        // The comparison flags are moot, so $r1 can serve as scratch space.
+        lsrs    r1,     r0,     #24
+        bcs     LSYM(__fcmp_return2)
+
+    LSYM(__fcmp_trap):
+        // Restore the NAN (sans sign) for an argument to the exception.
+        // As an IRQ, the handler restores all registers, including $r3.
+        // NOTE: The service handler may not return.
+        lsrs    r0,     #1
+        movs    r3,     #(UNORDERED_COMPARISON)
+        svc     #(SVC_TRAP_NAN)
+  #endif
+
+     LSYM(__fcmp_return2):
+        // HACK: Work around result register mapping.
+        // This could probably be eliminated by remapping the flags register.
+        movs    r3,     r2
+
+    LSYM(__fcmp_return):
+        // Finish setting up the result.
+        // Constant subtraction allows a negative result while keeping the 
+        //  $r2 flag control word within 8 bits, particularly for FCMP_UN*.  
+        // This operation also happens to set the 'Z' and 'C' flags correctly
+        //  per the requirements of __aeabi_cfcmple() et al.
+        subs    r0,     r3,     #1
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ltsf2
+CM0_FUNC_END lesf2
+CM0_FUNC_END cmpsf2
+
+#endif /* L_arm_cmpsf2 */ 
+
+
+#ifdef L_arm_eqsf2
+
+// int __eqsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1)
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1), or either argument is NAN
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.eqsf2,"x"
+CM0_FUNC_START eqsf2
+CM0_FUNC_ALIAS nesf2 eqsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END nesf2
+CM0_FUNC_END eqsf2
+
+#endif /* L_arm_eqsf2 */
+
+
+#ifdef L_arm_gesf2
+
+// int __gesf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.gesf2,"x"
+CM0_FUNC_START gesf2
+CM0_FUNC_ALIAS gtsf2 gesf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_NEGATIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END gtsf2
+CM0_FUNC_END gesf2
+
+#endif /* L_arm_gesf2 */
+
+
+#ifdef L_arm_fcmpeq
+
+// int __aeabi_fcmpeq(float, float)
+// Returns '1' in $r1 if ($r0 == $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.fcmpeq,"x"
+CM0_FUNC_START aeabi_fcmpeq
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpeq
+
+#endif /* L_arm_fcmpeq */
+
+
+#ifdef L_arm_fcmpne
+
+// int __aeabi_fcmpne(float, float) [non-standard]
+// Returns '1' in $r1 if ($r0 != $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmpne,"x"
+CM0_FUNC_START aeabi_fcmpne
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_NE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpne
+
+#endif /* L_arm_fcmpne */
+
+
+#ifdef L_arm_fcmplt
+
+// int __aeabi_fcmplt(float, float)
+// Returns '1' in $r1 if ($r0 < $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmplt,"x"
+CM0_FUNC_START aeabi_fcmplt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmplt
+
+#endif /* L_arm_fcmplt */
+
+
+#ifdef L_arm_fcmple
+
+// int __aeabi_fcmple(float, float)
+// Returns '1' in $r1 if ($r0 <= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmple,"x"
+CM0_FUNC_START aeabi_fcmple
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmple
+
+#endif /* L_arm_fcmple */
+
+
+#ifdef L_arm_fcmpge
+
+// int __aeabi_fcmpge(float, float)
+// Returns '1' in $r1 if ($r0 >= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.fcmpge,"x"
+CM0_FUNC_START aeabi_fcmpge
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpge
+
+#endif /* L_arm_fcmpge */
+
+
+#ifdef L_arm_fcmpgt
+
+// int __aeabi_fcmpgt(float, float)
+// Returns '1' in $r1 if ($r0 > $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.fcmpgt,"x"
+CM0_FUNC_START aeabi_fcmpgt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpgt
+
+#endif /* L_arm_cmpgt */ 
+
+
+#ifdef L_arm_unordsf2
+
+// int __aeabi_fcmpun(float, float)
+// Returns '1' in $r1 if $r0 and $r1 are unordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.unordsf2,"x"
+CM0_FUNC_START aeabi_fcmpun
+CM0_FUNC_ALIAS unordsf2 aeabi_fcmpun
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END unordsf2
+CM0_FUNC_END aeabi_fcmpun
+
+#endif /* L_arm_unordsf2 */
+
+
+#ifdef L_arm_cfrcmple
+
+// void __aeabi_cfrcmple(float, float)
+// Reverse three-way compare of $r1 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 > $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.cfrcmple,"x"
+CM0_FUNC_START aeabi_cfrcmple
+    CFI_START_FUNCTION
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { r0 - r3, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 24
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset rT, 16
+                .cfi_rel_offset lr, 20
+      #else
+        push    { r0 - r3, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 20
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset lr, 16
+      #endif
+
+        // Reverse the operands.
+        movs    r0,     r1
+        ldr     r1,     [sp, #0]
+
+        // Don't just fall through, else registers will get pushed twice.
+        b       SYM(__internal_cfrcmple)
+
+        // MAYBE: 
+        // It might be better to pass original order arguments and swap 
+        //  the result instead.  Cleaner for STRICT_NAN trapping too.
+        //  Is 4 cycles worth 6 bytes?
+        // For example: 
+        //  $r2 = (FCMP_UN_NEGATIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY) 
+        //  movs    r1,    #1  
+        //  subs    r1,    r3
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_cfrcmple
+
+#endif /* L_arm_cfrcmple */
+
+
+#if defined(L_arm_cfcmple) || \ 
+   (defined(L_arm_cfcmpeq) && defined(TRAP_NANS) && TRAP_NANS)
+
+#ifdef L_arm_cfcmple
+.section .text.sorted.libgcc.fcmp.cfcmple,"x"
+  #define CFCMPLE_NAME aeabi_cfcmple
+#else
+.section .text.sorted.libgcc.fcmp.cfcmpeq,"x"
+  #define CFCMPLE_NAME aeabi_cfcmpeq 
+#endif
+
+// void __aeabi_cfcmple(float, float)
+// void __aeabi_cfcmpeq(float, float)
+// NOTE: These functions are only distinct if __aeabi_cfcmple() can raise exceptions.
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+CM0_FUNC_START CFCMPLE_NAME 
+
+  // __aeabi_cfcmpeq() is defined separately when TRAP_NANS is enabled.
+  #if !defined(TRAP_NANS) || !TRAP_NANS
+    CM0_FUNC_ALIAS aeabi_cfcmpeq aeabi_cfcmple
+  #endif
+
+    CFI_START_FUNCTION
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { r0 - r3, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 24
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset rT, 16
+                .cfi_rel_offset lr, 20
+      #else
+        push    { r0 - r3, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 20
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset lr, 16
+      #endif
+
+  #ifdef L_arm_cfcmple 
+    CM0_FUNC_START internal_cfrcmple
+        // Even though the result in $r0 will be discarded, the 3-way 
+        //  subtraction of '-1' that generates this result happens to 
+        //  set 'C' and 'Z' perfectly.  Unordered results group with '>'.
+        // This happens to be the same control word as __cmpsf2(), meaning 
+        //  that __cmpsf2() is a potential branch target.  However, 
+        //  the choice to set a redundant control word and branch to
+        //  __internal_cmpsf2() makes this compiled object more robust
+        //  against linking with 'foreign' __cmpsf2() implementations.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+  #else /* L_arm_cfcmpeq */ 
+    CM0_FUNC_START internal_cfrcmpeq
+        // No exceptions on quiet NAN.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+  #endif 
+
+        bl      SYM(__internal_cmpsf2)
+
+        // Clean up all working registers.
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { r0 - r3, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { r0 - r3, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+
+  #if !defined(TRAP_NANS) || !TRAP_NANS
+    CM0_FUNC_END aeabi_cfcmpeq
+  #endif
+
+CM0_FUNC_END CFCMPLE_NAME 
+
+#endif /* L_arm_cfcmple || L_arm_cfcmpeq */
+
+
+// C99 libm functions
+#if 0
+
+// int isgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 > $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.isgtf,"x"
+CM0_FUNC_START isgreaterf
+MATH_ALIAS isgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterf
+CM0_FUNC_END isgreaterf
+
+
+// int isgreaterequalf(float, float)
+// Returns '1' in $r0 if ($r0 >= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.isgef,"x"
+CM0_FUNC_START isgreaterequalf
+MATH_ALIAS isgreaterequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterequalf
+CM0_FUNC_END isgreaterequalf
+
+
+// int islessf(float, float)
+// Returns '1' in $r0 if ($r0 < $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.isltf,"x"
+CM0_FUNC_START islessf
+MATH_ALIAS islessf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessf
+CM0_FUNC_END islessf
+
+
+// int islessequalf(float, float)
+// Returns '1' in $r0 if ($r0 <= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.islef,"x"
+CM0_FUNC_START islessequalf
+MATH_ALIAS islessequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessequalf
+CM0_FUNC_END islessequalf
+
+
+// int islessgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 != $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.isnef,"x"
+CM0_FUNC_START islessgreaterf
+MATH_ALIAS islessgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessgreaterf
+CM0_FUNC_END islessgreaterf
+
+
+// int isunorderedf(float, float)
+// Returns '1' in $r0 if either $r0 or $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range. 
+.section .text.sorted.libgcc.fcmp.isunf,"x"
+CM0_FUNC_START isunorderedf
+MATH_ALIAS isunorderedf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isunorderedf
+CM0_FUNC_END isunorderedf
+
+#endif /* 0 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fconv.S gcc-11-20201220/libgcc/config/arm/cm0/fconv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fconv.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fconv.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,429 @@
+/* fconv.S: Cortex M0 optimized 32- and 64-bit float conversions
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_f2d
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.sorted.libgcc.fpcore.v.extendsfdf2,"x"
+CM0_FUNC_START aeabi_f2d
+CM0_FUNC_ALIAS extendsfdf2 aeabi_f2d
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r1,     r0,     #31
+        lsls    r1,     #31
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Test for zero.
+        lsls    r0,     #1
+        beq     LSYM(__f2d_return)
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        movs    r2,     r0
+        bl      SYM(__fp_normalize2) __PLT__
+
+        // Set up the exponent bias.  For INF/NAN values, the bias
+        //  is 1791 (2047 - 255 - 1), where the last '1' accounts
+        //  for the implicit '1' in the mantissa.
+        movs    r0,     #3
+        lsls    r0,     #9
+        adds    r0,     #255
+
+        // Test for INF/NAN, promote exponent if necessary
+        cmp     r2,     #255
+        beq     LSYM(__f2d_indefinite)
+
+        // For normal values, the exponent bias is 895 (1023 - 127 - 1),
+        //  which is half of the prepared INF/NAN bias.
+        lsrs    r0,     #1
+
+    LSYM(__f2d_indefinite):
+        // Assemble exponent with bias correction.
+        adds    r2,     r0
+        lsls    r2,     #20
+        adds    r1,     r2
+
+        // Assemble the high word of the mantissa.
+        lsrs    r0,     r3,     #11
+        add     r1,     r0
+
+        // Remainder of the mantissa in the low word of the result.
+        lsls    r0,     r3,     #21
+
+    LSYM(__f2d_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END extendsfdf2
+CM0_FUNC_END aeabi_f2d
+
+#endif /* L_arm_f2d */
+
+
+#if defined(L_arm_d2f)
+// TODO: not tested || defined(L_arm_truncdfsf2)
+
+// HACK: Build two separate implementations:
+//  * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules.
+//  * __truncdfsf2() rounds towards zero per GCC specification.  
+// Presumably, a program will consistently use one ABI or the other, 
+//  which means that this code will not be duplicated in practice.  
+// Merging the two versions with dynamic rounding would be rather hard. 
+#ifdef L_arm_truncdfsf2
+  #define D2F_NAME truncdfsf2 
+#else
+  #define D2F_NAME aeabi_d2f
+#endif
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+// Values out of range become ZERO or INF; returns the upper 23 bits of NAN.
+.section .text.sorted.libgcc.fpcore.w.truncdfsf2,"x"
+CM0_FUNC_START D2F_NAME
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r2,     r1,     #31
+        lsls    r2,     #31
+        mov     ip,     r2
+
+        // Isolate the exponent (11 bits).
+        lsls    r2,     r1,     #1
+        lsrs    r2,     #21
+
+        // Isolate the mantissa.  It's safe to always add the implicit '1' --
+        //  even for subnormals -- since they will underflow in every case.
+        lsls    r1,     #12
+        adds    r1,     #1
+        rors    r1,     r1
+        lsrs    r3,     r0,     #21
+        adds    r1,     r3
+
+  #ifndef L_arm_truncdfsf2 
+        // Fix the remainder.  Even though the mantissa already has 32 bits
+        //  of significance, this value still influences rounding ties.  
+        lsls    r0,     #11
+  #endif 
+
+        // Test for INF/NAN (r3 = 2047)
+        mvns    r3,     r2
+        lsrs    r3,     #21
+        cmp     r3,     r2
+        beq     LSYM(__d2f_indefinite)
+
+        // Adjust exponent bias.  Offset is 127 - 1023, less 1 more since
+        //  __fp_assemble() expects the exponent relative to bit[30].
+        lsrs    r3,     #1
+        subs    r2,     r3
+        adds    r2,     #126
+
+  #ifndef L_arm_truncdfsf2 
+    LSYM(__d2f_overflow):
+        // Use the standard formatting for overflow and underflow.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(__fp_assemble)
+                .cfi_restore_state
+
+  #else /* L_arm_truncdfsf2 */
+        // In theory, __truncdfsf2() could also push registers and branch to
+        //  __fp_assemble() after calculating the truncation shift and clearing
+        //  bits.  __fp_assemble() always rounds down if there is no remainder.  
+        // However, after doing all of that work, the incremental cost to  
+        //  finish assembling the return value is only 6 or 7 instructions
+        //  (depending on how __d2f_overflow() returns).
+        // This seems worthwhile to avoid linking in all of __fp_assemble(). 
+
+        // Test for INF. 
+        cmp     r2,     #254 
+        bge     LSYM(__d2f_overflow)
+
+        // HACK: Pre-empt the default round-to-nearest mode, 
+        //  since GCC specifies rounding towards zero. 
+        // Start by identifying subnormals by negative exponents. 
+        asrs    r3,     r2,     #31
+        ands    r3,     r2
+
+        // Clear the standard exponent field for subnormals. 
+        eors    r2,     r3
+
+        // Add the subnormal shift to the nominal 8 bits.
+        rsbs    r3,     #0
+        adds    r3,     #8
+
+        // Clamp the shift to a single word (branchless).  
+        // Anything larger would have flushed to zero anyway.
+        lsls    r3,     #27 
+        lsrs    r3,     #27
+
+      #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // Preserve inexact zero. 
+        orrs    r0,     r1
+      #endif
+
+        // Clear the insignificant bits.
+        lsrs    r1,     r3 
+
+        // Combine the mantissa and the exponent.
+        // TODO: Test for inexact zero after adding. 
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        add     r0,     ip
+        RET 
+
+    LSYM(__d2f_overflow):
+        // Construct signed INF in $r0.
+        movs    r0,     #255 
+        lsls    r0,     #23
+        add     r0,     ip
+        RET 
+
+  #endif /* L_arm_truncdfsf2 */
+
+    LSYM(__d2f_indefinite):
+        // Test for INF.  If the mantissa, exclusive of the implicit '1',
+        //  is equal to '0', the result will be INF.
+        lsls    r3,     r1,     #1
+        orrs    r3,     r0
+        beq     LSYM(__d2f_overflow)
+
+        // TODO: Support for TRAP_NANS here. 
+        // This will be double precision, not compatible with the current handler. 
+
+        // Construct NAN with the upper 22 bits of the mantissa, setting bit[21]
+        //  to ensure a valid NAN without changing bit[22] (quiet)
+        subs    r2,     #0xD
+        lsls    r0,     r2,     #20
+        lsrs    r1,     #8
+        orrs    r0,     r1
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Yes, the NAN has already been altered, but at least keep the sign... 
+        add     r0,     ip
+      #endif
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END D2F_NAME
+
+#endif /* L_arm_d2f || L_arm_truncdfsf2 */
+
+
+#ifdef L_arm_h2f 
+
+// float __aeabi_h2f(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.sorted.libgcc.h2f,"x"
+CM0_FUNC_START aeabi_h2f
+    CFI_START_FUNCTION
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the mantissa and exponent.
+        lsls    r2,     r0,     #17
+
+        // Isolate the sign.
+        lsrs    r0,     #15
+        lsls    r0,     #31
+
+        // Align the exponent at bit[24] for normalization.
+        // If zero, return the original sign.
+        lsrs    r2,     #3
+        beq     LSYM(__h2f_return)
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        bl      SYM(__fp_normalize2) __PLT__
+
+        // Set up the exponent bias.  For INF/NAN values, the bias is 223,
+        //  where the last '1' accounts for the implicit '1' in the mantissa.
+        adds    r2,     #(255 - 31 - 1)
+
+        // Test for INF/NAN.
+        cmp     r2,     #254
+        beq     LSYM(__h2f_assemble)
+
+        // For normal values, the bias should have been 111.
+        // However, this adjustment now is faster than branching.
+        subs    r2,     #((255 - 31 - 1) - (127 - 15 - 1))
+
+    LSYM(__h2f_assemble):
+        // Combine exponent and sign.
+        lsls    r2,     #23
+        adds    r0,     r2
+
+        // Combine mantissa.
+        lsrs    r3,     #8
+        add     r0,     r3
+
+    LSYM(__h2f_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_h2f
+
+#endif /* L_arm_h2f */
+
+
+#ifdef L_arm_f2h
+
+// short __aeabi_f2h(float f)
+// Converts a single-precision float in $r1 to half-precision,
+//  rounding to nearest, ties to even.
+// Values out of range become ZERO or INF; returns the upper 12 bits of NAN.
+// Values out of range are forced to either ZERO or INF.
+.section .text.sorted.libgcc.f2h,"x"
+CM0_FUNC_START aeabi_f2h
+    CFI_START_FUNCTION
+
+        // Set up the sign.
+        lsrs    r2,     r0,     #31
+        lsls    r2,     #15
+
+        // Save the exponent and mantissa.
+        // If ZERO, return the original sign.
+        lsls    r0,     #1
+        beq     LSYM(__f2h_return)
+
+        // Isolate the exponent, check for NAN.
+        lsrs    r1,     r0,     #24
+        cmp     r1,     #255
+        beq     LSYM(__f2h_indefinite)
+
+        // Check for overflow.
+        cmp     r1,     #(127 + 15)
+        bhi     LSYM(__f2h_overflow)
+
+        // Isolate the mantissa, adding back the implicit '1'.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Adjust exponent bias for half-precision, including '1' to
+        //  account for the mantissa's implicit '1'.
+        subs    r1,     #(127 - 15 + 1)
+        bmi     LSYM(__f2h_underflow)
+
+        // Combine the exponent and sign.
+        lsls    r1,     #10
+        adds    r2,     r1
+
+        // Split the mantissa (11 bits) and remainder (13 bits).
+        lsls    r3,     r0,     #12
+        lsrs    r0,     #21
+
+     LSYM(__f2h_round):
+        // If the carry bit is '0', always round down.
+        bcc     LSYM(__f2h_return)
+
+        // Carry was set.  If a tie (no remainder) and the
+        //  LSB of the result are '0', round down (to even).
+        lsls    r1,     r0,     #31
+        orrs    r1,     r3
+        beq     LSYM(__f2h_return)
+
+        // Round up, ties to even.
+        adds    r0,     #1
+
+     LSYM(__f2h_return):
+        // Combine mantissa and exponent.
+        adds    r0,     r2
+        RET
+
+    LSYM(__f2h_underflow):
+        // Align the remainder. The remainder consists of the last 12 bits
+        //  of the mantissa plus the magnitude of underflow.
+        movs    r3,     r0
+        adds    r1,     #12
+        lsls    r3,     r1
+
+        // Align the mantissa.  The MSB of the remainder must be
+        // shifted out into last the 'C' flag for rounding.
+        subs    r1,     #33
+        rsbs    r1,     #0
+        lsrs    r0,     r1
+        b       LSYM(__f2h_round)
+
+    LSYM(__f2h_overflow):
+        // Create single-precision INF from which to construct half-precision.
+        movs    r0,     #255
+        lsls    r0,     #24
+
+    LSYM(__f2h_indefinite):
+        // Check for INF.
+        lsls    r3,     r0,     #8
+        beq     LSYM(__f2h_infinite)
+
+        // Set bit[8] to ensure a valid NAN without changing bit[9] (quiet).
+        adds    r2,     #128
+        adds    r2,     #128
+
+    LSYM(__f2h_infinite):
+        // Construct the result from the upper 22 bits of the mantissa
+        //  and the lower 5 bits of the exponent.
+        lsls    r0,     #3
+        lsrs    r0,     #17
+
+        // Combine with the sign (and possibly NAN flag).
+        orrs    r0,     r2
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_f2h
+
+#endif  /* L_arm_f2h */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fdiv.S gcc-11-20201220/libgcc/config/arm/cm0/fdiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fdiv.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fdiv.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,257 @@
+/* fdiv.S: Cortex M0 optimized 32-bit float division
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_divsf3
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+.section .text.sorted.libgcc.fpcore.n.fdiv,"x"
+CM0_FUNC_START aeabi_fdiv
+CM0_FUNC_ALIAS divsf3 aeabi_fdiv
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save for the sign of the result.
+        movs    r3,     r1
+        eors    r3,     r0
+        lsrs    rT,     r3,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for divide by 0.  Automatically catches 0/0.
+        lsls    r2,     r1,     #1
+        beq     LSYM(__fdiv_by_zero)
+
+        // Check for INF/INF, or a number divided by itself.
+        lsls    r3,     #1
+        beq     LSYM(__fdiv_equal)
+
+        // Check the numerator for INF/NAN.
+        eors    r3,     r2
+        cmp     r3,     rT
+        bhs     LSYM(__fdiv_special1)
+
+        // Check the denominator for INF/NAN.
+        cmp     r2,     rT
+        bhs     LSYM(__fdiv_special2)
+
+        // Check the numerator for zero.
+        cmp     r3,     #0
+        beq     SYM(__fp_zero)
+
+        // No action if the numerator is subnormal.
+        //  The mantissa will normalize naturally in the division loop.
+        lsls    r0,     #9
+        lsrs    r1,     r3,     #24
+        beq     LSYM(__fdiv_denominator)
+
+        // Restore the numerator's implicit '1'.
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LSYM(__fdiv_denominator):
+        // The denominator must be normalized and left aligned.
+        bl      SYM(__fp_normalize2)
+
+        // 25 bits of precision will be sufficient.
+        movs    rT,     #64
+
+        // Run division.
+        bl      SYM(__internal_fdiv_loop)
+        b       SYM(__fp_assemble)
+
+    LSYM(__fdiv_equal):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_INF_BY_INF)
+      #endif
+
+        // The absolute value of both operands are equal, but not 0.
+        // If both operands are INF, create a new NAN.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If both operands are NAN, return the NAN in $r0.
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LSYM(__fdiv_return)
+      #endif
+
+        // Return 1.0f, with appropriate sign.
+        movs    r0,     #127
+        lsls    r0,     #23
+        add     r0,     ip
+
+    LSYM(__fdiv_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LSYM(__fdiv_special2):
+        // The denominator is either INF or NAN, numerator is neither.
+        // Also, the denominator is not equal to 0.
+        // If the denominator is INF, the result goes to 0.
+        beq     SYM(__fp_zero)
+
+        // The only other option is NAN, fall through to branch.
+        mov     r0,     r1
+
+    LSYM(__fdiv_special1):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // The numerator is INF or NAN.  If NAN, return it directly.
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LSYM(__fdiv_return)
+      #endif
+
+        // If INF, the result will be INF if the denominator is finite.
+        // The denominator won't be either INF or 0,
+        //  so fall through the exception trap to check for NAN.
+        movs    r0,     r1
+
+    LSYM(__fdiv_by_zero):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_0_BY_0)
+      #endif
+
+        // The denominator is 0.
+        // If the numerator is also 0, the result will be a new NAN.
+        // Otherwise the result will be INF, with the correct sign.
+        lsls    r2,     r0,     #1
+        beq     SYM(__fp_exception)
+
+        // The result should be NAN if the numerator is NAN.  Otherwise,
+        //  the result is INF regardless of the numerator value.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LSYM(__fdiv_return)
+      #endif
+
+        // Recreate INF with the correct sign.
+        b       SYM(__fp_infinity)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divsf3
+CM0_FUNC_END aeabi_fdiv
+
+
+// Division helper, possibly to be shared with atan2.
+// Expects the numerator mantissa in $r0, exponent in $r1,
+//  plus the denominator mantissa in $r3, exponent in $r2, and
+//  a bit pattern in $rT that controls the result precision.
+// Returns quotient in $r1, exponent in $r2, pseudo remainder in $r0.
+.section .text.sorted.libgcc.fpcore.o.fdiv2,"x"
+CM0_FUNC_START internal_fdiv_loop
+    CFI_START_FUNCTION
+
+        // Initialize the exponent, relative to bit[30].
+        subs    r2,     r1,     r2
+
+    SYM(__internal_fdiv_loop2):
+        // The exponent should be (expN - 127) - (expD - 127) + 127.
+        // An additional offset of 25 is required to account for the
+        //  minimum number of bits in the result (before rounding).
+        // However, drop '1' because the offset is relative to bit[30],
+        //  while the result is calculated relative to bit[31].
+        adds    r2,     #(127 + 25 - 1)
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Dividing by a power of 2?
+        lsls    r1,     r3,     #1
+        beq     LSYM(__fdiv_simple)
+      #endif
+
+        // Initialize the result.
+        eors    r1,     r1
+
+        // Clear the MSB, so that when the numerator is smaller than
+        //  the denominator, there is one bit free for a left shift.
+        // After a single shift, the numerator is guaranteed to be larger.
+        // The denominator ends up in r3, and the numerator ends up in r0,
+        //  so that the numerator serves as a psuedo-remainder in rounding.
+        // Shift the numerator one additional bit to compensate for the
+        //  pre-incrementing loop.
+        lsrs    r0,     #2
+        lsrs    r3,     #1
+
+    LSYM(__fdiv_loop):
+        // Once the MSB of the output reaches the MSB of the register,
+        //  the result has been calculated to the required precision.
+        lsls    r1,     #1
+        bmi     LSYM(__fdiv_break)
+
+        // Shift the numerator/remainder left to set up the next bit.
+        subs    r2,     #1
+        lsls    r0,     #1
+
+        // Test if the numerator/remainder is smaller than the denominator,
+        //  do nothing if it is.
+        cmp     r0,     r3
+        blo     LSYM(__fdiv_loop)
+
+        // If the numerator/remainder is greater or equal, set the next bit,
+        //  and subtract the denominator.
+        adds    r1,     rT
+        subs    r0,     r3
+
+        // Short-circuit if the remainder goes to 0.
+        // Even with the overhead of "subnormal" alignment,
+        //  this is usually much faster than continuing.
+        bne     LSYM(__fdiv_loop)
+
+        // Compensate the alignment of the result.
+        // The remainder does not need compensation, it's already 0.
+        lsls    r1,     #1
+
+    LSYM(__fdiv_break):
+        RET
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__fdiv_simple):
+        // The numerator becomes the result, with a remainder of 0.
+        movs    r1,     r0
+        eors    r0,     r0
+        subs    r2,     #25
+        RET
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END internal_fdiv_loop
+
+#endif /* L_arm_divsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ffixed.S gcc-11-20201220/libgcc/config/arm/cm0/ffixed.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ffixed.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ffixed.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,342 @@
+/* ffixed.S: Cortex M0 optimized float->int conversion
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_fixsfsi
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+.section .text.sorted.libgcc.fpcore.r.fixsfsi,"x"
+CM0_FUNC_START aeabi_f2iz
+CM0_FUNC_ALIAS fixsfsi aeabi_f2iz
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #33
+        b       LSYM(__real_f2lz)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+        // Flag for signed conversion.
+        movs    r3,     #1
+
+    LSYM(__real_f2iz):
+        // Isolate the sign of the result.
+        asrs    r1,     r0,     #31
+        lsls    r0,     #1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Check for zero to avoid spurious underflow exception on -0.
+        beq     LSYM(__f2iz_return)
+  #endif
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Test for NAN.
+        // Otherwise, NAN will be converted like +/-INF.
+        cmp     r2,     #255
+        beq     LSYM(__f2iz_nan)
+  #endif
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 31.
+        //  * An exponent of 128 will result in a shift of 30.
+        //  *  ...
+        //  * An exponent of 157 will result in a shift of 1.
+        //  * An exponent of 158 will result in no shift at all.
+        //  * An exponent larger than 158 will result in overflow.
+        rsbs    r2,     #0
+        adds    r2,     #158
+
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r3
+        blt     LSYM(__f2iz_overflow)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Would also catch -0.0f if not handled earlier.
+        cmn     r3,     r1
+        blt     LSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Save a copy for remainder testing
+        movs    r3,     r0
+  #endif
+
+        // Truncate the fraction.
+        lsrs    r0,     r2
+
+        // Two's complement negation, if applicable.
+        // Bonus: the sign in $r1 provides a suitable long long result.
+        eors    r0,     r1
+        subs    r0,     r1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // If any bits set in the remainder, raise FE_INEXACT
+        rsbs    r2,     #0
+        adds    r2,     #32
+        lsls    r3,     r2
+        bne     LSYM(__f2iz_inexact)
+  #endif
+
+    LSYM(__f2iz_return):
+        RET
+
+    LSYM(__f2iz_overflow):
+        // Positive unsigned integers (r1 == 0, r3 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r1 == -1, r3 == 0), return 0x00000000.
+        // Positive signed integers (r1 == 0, r3 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r1 == -1, r3 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^31).
+        mvns    r0,     r1
+        lsls    r3,     #31
+        eors    r0,     r3
+        RET
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+    LSYM(__f2iz_inexact):
+        // TODO: Another class of exceptions that doesn't overwrite $r0.
+        bkpt    #0
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_INEXACT)
+      #endif
+
+        b       SYM(__fp_exception)
+  #endif
+
+    LSYM(__f2iz_nan):
+        // Check for INF
+        lsls    r2,     r0,     #9
+        beq     LSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_UNDEFINED)
+      #endif
+
+        b       SYM(__fp_exception)
+  #else
+
+  #endif
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+
+        // TODO: Extend to long long
+
+        // TODO: bl  fp_check_nan
+      #endif
+
+        // Return long long 0 on NAN.
+        eors    r0,     r0
+        eors    r1,     r1
+        RET
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixsfsi
+CM0_FUNC_END aeabi_f2iz
+
+
+// unsigned int __aeabi_f2uiz(float)
+// Converts a float in $r0 to unsigned integer, rounding toward 0.
+// Values out of range are forced to UINT_MAX.
+// Negative values and NAN all become zero.
+.section .text.sorted.libgcc.fpcore.s.fixunssfsi,"x"
+CM0_FUNC_START aeabi_f2uiz
+CM0_FUNC_ALIAS fixunssfsi aeabi_f2uiz
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #32
+        b       LSYM(__real_f2lz)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+        // Flag for unsigned conversion.
+        movs    r3,     #0
+        b       LSYM(__real_f2iz)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixunssfsi
+CM0_FUNC_END aeabi_f2uiz
+
+
+// long long aeabi_f2lz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to either INT64_MAX or INT64_MIN.
+// NAN becomes zero.
+.section .text.sorted.libgcc.fpcore.t.fixsfdi,"x"
+CM0_FUNC_START aeabi_f2lz
+CM0_FUNC_ALIAS fixsfdi aeabi_f2lz
+    CFI_START_FUNCTION
+
+        movs    r1,     #1
+
+    LSYM(__real_f2lz):
+        // Split the sign of the result from the mantissa/exponent field.
+        // Handle +/-0 specially to avoid spurious exceptions.
+        asrs    r3,     r0,     #31
+        lsls    r0,     #1
+        beq     LSYM(__f2lz_zero)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Specifically, is the LSB of $r1 clear when $r3 is equal to '-1'?
+        //
+        // $r3 (sign)   >=     $r2 (flag)
+        // 0xFFFFFFFF   false   0x00000000
+        // 0x00000000   true    0x00000000
+        // 0xFFFFFFFF   true    0x80000000
+        // 0x00000000   true    0x80000000
+        //
+        // (NOTE: This test will also trap -0.0f, unless handled earlier.)
+        lsls    r2,     r1,     #31
+        cmp     r3,     r2
+        blt     LSYM(__f2lz_overflow)
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+//   #if defined(TRAP_NANS) && TRAP_NANS
+//         // Test for NAN.
+//         // Otherwise, NAN will be converted like +/-INF.
+//         cmp     r2,     #255
+//         beq     LSYM(__f2lz_nan)
+//   #endif
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 63.
+        //  * An exponent of 128 will result in a shift of 62.
+        //  *  ...
+        //  * An exponent of 189 will result in a shift of 1.
+        //  * An exponent of 190 will result in no shift at all.
+        //  * An exponent larger than 190 will result in overflow
+        //     (189 in the case of signed integers).
+        rsbs    r2,     #0
+        adds    r2,     #190
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r1
+        blt     LSYM(__f2lz_overflow)
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate the upper word.
+        // If the shift is greater than 32, gives an automatic '0'.
+        movs    r1,     r0
+        lsrs    r1,     r2
+
+        // Reduce the shift for the lower word.
+        // If the original shift was less than 32, the result may be split
+        //  between the upper and lower words.
+        subs    r2,     #32
+        blt     LSYM(__f2lz_split)
+
+        // Shift is still positive, keep moving right.
+        lsrs    r0,     r2
+
+        // TODO: Remainder test.
+        // $r1 is technically free, as long as it's zero by the time
+        //  this is over.
+
+    LSYM(__f2lz_return):
+        // Two's complement negation, if the original was negative.
+        eors    r0,     r3
+        eors    r1,     r3
+        subs    r0,     r3
+        sbcs    r1,     r3
+        RET
+
+    LSYM(__f2lz_split):
+        // Shift was negative, calculate the remainder
+        rsbs    r2,     #0
+        lsls    r0,     r2
+        b       LSYM(__f2lz_return)
+
+    LSYM(__f2lz_zero):
+        eors    r1,     r1
+        RET
+
+    LSYM(__f2lz_overflow):
+        // Positive unsigned integers (r3 == 0, r1 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r3 == -1, r1 == 0), return 0x00000000.
+        // Positive signed integers (r3 == 0, r1 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r3 == -1, r1 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^63).
+        mvns    r0,     r3
+
+        // For 32-bit results
+        lsls    r2,     r1,     #26
+        lsls    r1,     #31
+        ands    r2,     r1
+        eors    r0,     r2
+
+//    LSYM(__f2lz_zero):
+        eors    r1,     r0
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixsfdi
+CM0_FUNC_END aeabi_f2lz
+
+
+// unsigned long long __aeabi_f2ulz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to UINT64_MAX.
+// Negative values and NAN all become zero.
+.section .text.sorted.libgcc.fpcore.u.fixunssfdi,"x"
+CM0_FUNC_START aeabi_f2ulz
+CM0_FUNC_ALIAS fixunssfdi aeabi_f2ulz
+    CFI_START_FUNCTION
+
+        eors    r1,     r1
+        b       LSYM(__real_f2lz)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fixunssfdi
+CM0_FUNC_END aeabi_f2ulz
+
+#endif /* L_arm_addsubsf3 */ 
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ffloat.S gcc-11-20201220/libgcc/config/arm/cm0/ffloat.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ffloat.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ffloat.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,224 @@
+/* ffixed.S: Cortex M0 optimized int->float conversion
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+  
+#ifdef L_arm_floatsisf
+ 
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+.section .text.sorted.libgcc.fpcore.p.floatsisf,"x"
+
+// On little-endian cores (including all Cortex-M), __floatsisf() can be
+//  implemented as below in 5 instructions.  However, it can also be
+//  implemented by prefixing a single instruction to __floatdisf().
+// A memory savings of 4 instructions at a cost of only 2 execution cycles
+//  seems reasonable enough.  Plus, the trade-off only happens in programs
+//  that require both __floatsisf() and __floatdisf().  Programs only using
+//  __floatsisf() always get the smallest version.  
+// When the combined version will be provided, this standalone version
+//  must be declared WEAK, so that the combined version can supersede it.
+// '_arm_floatsisf' should appear before '_arm_floatdisf' in LIB1ASMFUNCS.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_WEAK_START aeabi_i2f
+CM0_WEAK_ALIAS floatsisf aeabi_i2f
+#else /* !__OPTIMIZE_SIZE__ */ 
+CM0_FUNC_START aeabi_i2f
+CM0_FUNC_ALIAS floatsisf aeabi_i2f
+#endif /* !__OPTIMIZE_SIZE__ */
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        asrs    r3,     r0,     #31
+
+        // Absolute value of the input. 
+        eors    r0,     r3
+        subs    r0,     r3
+
+        // Sign extension to long long unsigned.
+        eors    r1,     r1
+        b       SYM(__internal_uil2f_noswap)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+
+#endif /* L_arm_floatsisf */
+
+
+#ifdef L_arm_floatdisf
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+.section .text.sorted.libgcc.fpcore.p.floatdisf,"x"
+
+// See comments for __floatsisf() above. 
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_FUNC_START aeabi_i2f
+CM0_FUNC_ALIAS floatsisf aeabi_i2f
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__ 
+        // __floatdisf() expects a big-endian lower word in $r1.
+        movs    xxl,    r0
+      #endif  
+
+        // Sign extension to long long signed.
+        asrs    xxh,    xxl,    #31 
+
+#endif /* __OPTIMIZE_SIZE__ */
+
+CM0_FUNC_START aeabi_l2f
+CM0_FUNC_ALIAS floatdisf aeabi_l2f
+
+#if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    CFI_START_FUNCTION
+#endif
+
+        // Save the sign.
+        asrs    r3,     xxh,     #31
+
+        // Absolute value of the input.  
+        // Could this be arranged in big-endian mode so that this block also
+        //  swapped the input words?  Maybe.  But, since neither 'eors' nor
+        //  'sbcs' allow a third destination register, it seems unlikely to
+        //  save more than one cycle.  Also, the size of __floatdisf() and 
+        //  __floatundisf() together would increase by two instructions. 
+        eors    xxl,    r3
+        eors    xxh,    r3
+        subs    xxl,    r3
+        sbcs    xxh,    r3
+
+        b       SYM(__internal_uil2f)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END floatdisf
+CM0_FUNC_END aeabi_l2f
+
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+#endif 
+
+#endif /* L_arm_floatsisf || L_arm_floatdisf */
+
+
+#ifdef L_arm_floatunsisf
+
+// float __aeabi_ui2f(unsigned)
+// Converts an unsigned integer in $r0 to float.
+.section .text.sorted.libgcc.fpcore.q.floatunsisf,"x"
+CM0_FUNC_START aeabi_ui2f
+CM0_FUNC_ALIAS floatunsisf aeabi_ui2f
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__ 
+        // In big-endian mode, function flow breaks down.  __floatundisf() 
+        //  wants to swap word order, but __floatunsisf() does not. The 
+        // The choice is between leaving these arguments un-swapped and 
+        //  branching, or canceling out the word swap in advance.
+        // The branching version would require one extra instruction to 
+        //  clear the sign ($r3) because of __floatdisf() dependencies.
+        // While the branching version is technically one cycle faster 
+        //  on the Cortex-M0 pipeline, branchless just feels better.
+
+        // Thus, __floatundisf() expects a big-endian lower word in $r1.
+        movs    xxl,    r0
+      #endif  
+
+        // Extend to unsigned long long and fall through.
+        eors    xxh,    xxh 
+
+#endif /* L_arm_floatunsisf */
+
+
+// The execution of __floatunsisf() flows directly into __floatundisf(), such
+//  that instructions must appear consecutively in the same memory section
+//  for proper flow control.  However, this construction inhibits the ability
+//  to discard __floatunsisf() when only using __floatundisf().
+// Therefore, this block configures __floatundisf() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __floatunsisf().  The standalone version
+//  must be declared WEAK, so that the combined version can supersede it
+//  and provide both symbols when required.
+// '_arm_floatundisf' should appear before '_arm_floatunsisf' in LIB1ASMFUNCS.
+#if defined(L_arm_floatunsisf) || defined(L_arm_floatundisf)
+
+#ifdef L_arm_floatundisf
+// float __aeabi_ul2f(unsigned long long)
+// Converts an unsigned 64-bit integer in $r1:$r0 to a float in $r0.
+.section .text.sorted.libgcc.fpcore.q.floatundisf,"x"
+CM0_WEAK_START aeabi_ul2f
+CM0_WEAK_ALIAS floatundisf aeabi_ul2f
+    CFI_START_FUNCTION
+
+#else 
+CM0_FUNC_START aeabi_ul2f
+CM0_FUNC_ALIAS floatundisf aeabi_ul2f
+
+#endif
+
+        // Sign is always positive.
+        eors    r3,     r3
+
+#ifdef L_arm_floatundisf 
+    CM0_WEAK_START internal_uil2f
+#else /* L_clzdi2 */
+    CM0_FUNC_START internal_uil2f
+#endif
+      #if defined(__ARMEB__) && __ARMEB__ 
+        // Swap word order for register compatibility with __fp_assemble().
+        // Could this be optimized by re-defining __fp_assemble()?  Maybe.  
+        // But the ramifications of dynamic register assignment on all 
+        //  the other callers of __fp_assemble() would be enormous.    
+        eors    r0,     r1  
+        eors    r1,     r0  
+        eors    r0,     r1  
+      #endif 
+
+#ifdef L_arm_floatundisf 
+    CM0_WEAK_START internal_uil2f_noswap
+#else /* L_clzdi2 */
+    CM0_FUNC_START internal_uil2f_noswap
+#endif
+        // Default exponent, relative to bit[30] of $r1.
+        movs    r2,     #(127 - 1 + 63)
+
+        // Format the sign.
+        lsls    r3,     #31
+        mov     ip,     r3
+
+        push    { rT, lr }
+        b       SYM(__fp_assemble)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END floatundisf
+CM0_FUNC_END aeabi_ul2f
+
+#ifdef L_arm_floatunsisf
+CM0_FUNC_END floatunsisf
+CM0_FUNC_END aeabi_ui2f
+#endif 
+
+#endif /* L_arm_floatunsisf || L_arm_floatundisf */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fmul.S gcc-11-20201220/libgcc/config/arm/cm0/fmul.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fmul.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fmul.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,215 @@
+/* fmul.S: Cortex M0 optimized 32-bit float multiplication
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_mulsf3
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+.section .text.sorted.libgcc.fpcore.m.fmul,"x"
+CM0_FUNC_START aeabi_fmul
+CM0_FUNC_ALIAS mulsf3 aeabi_fmul
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the sign of the result.
+        movs    rT,     r1
+        eors    rT,     r0
+        lsrs    rT,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for multiplication by zero.
+        lsls    r2,     r0,     #1
+        beq     LSYM(__fmul_zero1)
+
+        lsls    r3,     r1,     #1
+        beq     LSYM(__fmul_zero2)
+
+        // Check for INF/NAN.
+        cmp     r3,     rT
+        bhs     LSYM(__fmul_special2)
+
+        cmp     r2,     rT
+        bhs     LSYM(__fmul_special1)
+
+        // Because neither operand is INF/NAN, the result will be finite.
+        // It is now safe to modify the original operand registers.
+        lsls    r0,     #9
+
+        // Isolate the first exponent.  When normal, add back the implicit '1'.
+        // The result is always aligned with the MSB in bit [31].
+        // Subnormal mantissas remain effectively multiplied by 2x relative to
+        //  normals, but this works because the weight of a subnormal is -126.
+        lsrs    r2,     #24
+        beq     LSYM(__fmul_normalize2)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LSYM(__fmul_normalize2):
+        // IMPORTANT: exp10i() jumps in here!
+        // Repeat for the mantissa of the second operand.
+        // Short-circuit when the mantissa is 1.0, as the
+        //  first mantissa is already prepared in $r0
+        lsls    r1,     #9
+
+        // When normal, add back the implicit '1'.
+        lsrs    r3,     #24
+        beq     LSYM(__fmul_go)
+        adds    r1,     #1
+        rors    r1,     r1
+
+    LSYM(__fmul_go):
+        // Calculate the final exponent, relative to bit [30].
+        adds    rT,     r2,     r3
+        subs    rT,     #127
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Short-circuit on multiplication by powers of 2.
+        lsls    r3,     r0,     #1
+        beq     LSYM(__fmul_simple1)
+
+        lsls    r3,     r1,     #1
+        beq     LSYM(__fmul_simple2)
+  #endif
+
+        // Save $ip across the call.
+        // (Alternatively, could push/pop a separate register,
+        //  but the four instructions here are equivally fast)
+        //  without imposing on the stack.
+        add     rT,     ip
+
+        // 32x32 unsigned multiplication, 64 bit result.
+        bl      SYM(__umulsidi3) __PLT__
+
+        // Separate the saved exponent and sign.
+        sxth    r2,     rT
+        subs    rT,     r2
+        mov     ip,     rT
+
+        b       SYM(__fp_assemble)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__fmul_simple2):
+        // Move the high bits of the result to $r1.
+        movs    r1,     r0
+
+    LSYM(__fmul_simple1):
+        // Clear the remainder.
+        eors    r0,     r0
+
+        // Adjust mantissa to match the exponent, relative to bit[30].
+        subs    r2,     rT,     #1
+        b       SYM(__fp_assemble)
+  #endif
+
+    LSYM(__fmul_zero1):
+        // $r0 was equal to 0, set up to check $r1 for INF/NAN.
+        lsls    r2,     r1,     #1
+
+    LSYM(__fmul_zero2):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(INFINITY_TIMES_ZERO)
+      #endif
+
+        // Check the non-zero operand for INF/NAN.
+        // If NAN, it should be returned.
+        // If INF, the result should be NAN.
+        // Otherwise, the result will be +/-0.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+        // If the second operand is finite, the result is 0.
+        blo     SYM(__fp_zero)
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Restore values that got mixed in zero testing, then go back
+        //  to sort out which one is the NAN.
+        lsls    r3,     r1,     #1
+        lsls    r2,     r0,     #1
+      #elif defined(TRAP_NANS) && TRAP_NANS
+        // Return NAN with the sign bit cleared.
+        lsrs    r0,     r2,     #1
+        b       SYM(__fp_check_nan)
+      #else
+        lsrs    r0,     r2,     #1
+        // Return NAN with the sign bit cleared.
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    LSYM(__fmul_special2):
+        // $r1 is INF/NAN.  In case of INF, check $r0 for NAN.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // Force swap if $r0 is not NAN.
+        bls     LSYM(__fmul_swap)
+
+        // $r0 is NAN, keep if $r1 is INF
+        cmp     r3,     rT
+        beq     LSYM(__fmul_special1)
+
+        // Both are NAN, keep the smaller value (more likely to signal).
+        cmp     r2,     r3
+      #endif
+
+        // Prefer the NAN already in $r0.
+        //  (If TRAP_NANS, this is the smaller NAN).
+        bhi     LSYM(__fmul_special1)
+
+    LSYM(__fmul_swap):
+        movs    r0,     r1
+
+    LSYM(__fmul_special1):
+        // $r0 is either INF or NAN.  $r1 has already been examined.
+        // Flags are already set correctly.
+        lsls    r2,     r0,     #1
+        cmp     r2,     rT
+        beq     SYM(__fp_infinity)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        b       SYM(__fp_check_nan)
+      #else
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END mulsf3
+CM0_FUNC_END aeabi_fmul
+
+#endif /* L_arm_mulsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fneg.S gcc-11-20201220/libgcc/config/arm/cm0/fneg.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fneg.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fneg.S	2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,76 @@
+/* fneg.S: Cortex M0 optimized 32-bit float negation
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_arm_negsf2
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+.section .text.sorted.libgcc.fpcore.a.fneg,"x"
+CM0_FUNC_START aeabi_fneg
+CM0_FUNC_ALIAS negsf2 aeabi_fneg
+    CFI_START_FUNCTION
+
+  #if (defined(STRICT_NANS) && STRICT_NANS) || \
+      (defined(TRAP_NANS) && TRAP_NANS)
+        // Check for NAN.
+        lsls    r1,     r0,     #1
+        movs    r2,     #255
+        lsls    r2,     #24
+        cmp     r1,     r2
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        blo     SYM(__fneg_nan)
+      #else
+        blo     LSYM(__fneg_return)
+      #endif
+  #endif
+
+        // Flip the sign.
+        movs    r1,     #1
+        lsls    r1,     #31
+        eors    r0,     r1
+
+    LSYM(__fneg_return):
+        RET
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+    LSYM(__fneg_nan):
+        // Set up registers for exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(fp_check_nan)
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END negsf2
+CM0_FUNC_END aeabi_fneg
+
+#endif /* L_arm_negsf2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fplib.h gcc-11-20201220/libgcc/config/arm/cm0/fplib.h
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fplib.h	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fplib.h	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,83 @@
+/* fplib.h: Cortex M0 optimized 64-bit header definitions 
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __CM0_FPLIB_H
+#define __CM0_FPLIB_H 
+
+/* Enable exception interrupt handler.  
+   Exception implementation is opportunistic, and not fully tested.  */
+#define TRAP_EXCEPTIONS (0)
+#define EXCEPTION_CODES (0)
+
+/* Perform extra checks to avoid modifying the sign bit of NANs */
+#define STRICT_NANS (0)
+
+/* Trap signaling NANs regardless of context. */
+#define TRAP_NANS (0)
+
+/* TODO: Define service numbers according to the handler requirements */ 
+#define SVC_TRAP_NAN (0)
+#define SVC_FP_EXCEPTION (0)
+#define SVC_DIVISION_BY_ZERO (0)
+
+/* Push extra registers when required for 64-bit stack alignment */
+#define DOUBLE_ALIGN_STACK (1)
+
+/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */
+#define PEDANTIC_DIV0 (1)
+
+/* Define various exception codes.  These don't map to anything in particular */
+#define SUBTRACTED_INFINITY (20)
+#define INFINITY_TIMES_ZERO (21)
+#define DIVISION_0_BY_0 (22)
+#define DIVISION_INF_BY_INF (23)
+#define UNORDERED_COMPARISON (24)
+#define CAST_OVERFLOW (25)
+#define CAST_INEXACT (26)
+#define CAST_UNDEFINED (27)
+
+/* Exception control for quiet NANs.
+   If TRAP_NAN support is enabled, signaling NANs always raise exceptions. */
+#define FCMP_RAISE_EXCEPTIONS   16
+#define FCMP_NO_EXCEPTIONS      0
+
+/* The bit indexes in these assignments are significant.  See implementation.
+   They are shared publicly for eventual use by newlib.  */
+#define FCMP_3WAY           (1)
+#define FCMP_LT             (2)
+#define FCMP_EQ             (4)
+#define FCMP_GT             (8)
+
+#define FCMP_GE             (FCMP_EQ | FCMP_GT)
+#define FCMP_LE             (FCMP_LT | FCMP_EQ)
+#define FCMP_NE             (FCMP_LT | FCMP_GT)
+
+/* These flags affect the result of unordered comparisons.  See implementation.  */
+#define FCMP_UN_THREE       (128)
+#define FCMP_UN_POSITIVE    (64)
+#define FCMP_UN_ZERO        (32)
+#define FCMP_UN_NEGATIVE    (0)
+
+#endif /* __CM0_FPLIB_H */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/futil.S gcc-11-20201220/libgcc/config/arm/cm0/futil.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/futil.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/futil.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,407 @@
+/* futil.S: Cortex M0 optimized 32-bit common routines
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+   
+#ifdef L_arm_addsubsf3
+ 
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+//  the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+.section .text.sorted.libgcc.fpcore.y.normf,"x"
+CM0_FUNC_START fp_normalize2
+    CFI_START_FUNCTION
+
+        // Extract the mantissa.
+        lsls    r3,     r2,     #8
+
+        // Extract the exponent.
+        lsrs    r2,     #24
+        beq     SYM(__fp_lalign2)
+
+        // Restore the mantissa's implicit '1'.
+        adds    r3,     #1
+        rors    r3,     r3
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+.section .text.sorted.libgcc.fpcore.z.alignf,"x"
+CM0_FUNC_START fp_lalign2
+    CFI_START_FUNCTION
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Unroll the loop, similar to __clzsi2().
+        lsrs    rT,     r3,     #16
+        bne     LSYM(__align8)
+        subs    r2,     #16
+        lsls    r3,     #16
+
+    LSYM(__align8):
+        lsrs    rT,     r3,     #24
+        bne     LSYM(__align4)
+        subs    r2,     #8
+        lsls    r3,     #8
+
+    LSYM(__align4):
+        lsrs    rT,     r3,     #28
+        bne     LSYM(__align2)
+        subs    r2,     #4
+        lsls    r3,     #4
+  #endif
+
+    LSYM(__align2):
+        // Refresh the state of the N flag before entering the loop.
+        tst     r3,     r3
+
+    LSYM(__align_loop):
+        // Test before subtracting to compensate for the natural exponent.
+        // The largest subnormal should have an exponent of 0, not -1.
+        bmi     LSYM(__align_return)
+        subs    r2,     #1
+        lsls    r3,     #1
+        bne     LSYM(__align_loop)
+
+        // Not just a subnormal... 0!  By design, this should never happen.
+        // All callers of this internal function filter 0 as a special case.
+        // Was there an uncontrolled jump from somewhere else?  Cosmic ray?
+        eors    r2,     r2
+
+      #ifdef DEBUG
+        bkpt    #0
+      #endif
+
+    LSYM(__align_return):
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_lalign2
+
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1.  To avoid underflow (slower),
+//  the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2.  The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+.section .text.sorted.libgcc.fpcore.g.assemblef,"x"
+CM0_FUNC_START fp_assemble
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Examine the upper three bits [31:29] for underflow.
+        lsrs    r3,     r1,     #29
+        beq     LSYM(__fp_underflow)
+
+        // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+        // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+        lsrs    r3,     r1,     #1
+        mvns    r3,     r3
+        ands    r3,     r1
+        lsrs    r3,     #30
+        subs    r3,     #2
+        rors    r1,     r3
+
+        // Update the exponent, assuming the final result will be normal.
+        // The new exponent is 1 less than actual, to compensate for the
+        //  eventual addition of the implicit '1' in the result.
+        // If the final exponent becomes negative, proceed directly to gradual
+        //  underflow, without bothering to search for the MSB.
+        adds    r2,     r3
+
+CM0_FUNC_START fp_assemble2
+        bmi     LSYM(__fp_subnormal)
+
+    LSYM(__fp_normal):
+        // Check for overflow (remember the implicit '1' to be added later).
+        cmp     r2,     #254
+        bge     SYM(__fp_overflow)
+
+        // Save LSBs for the remainder. Position doesn't matter any more,
+        //  these are just tiebreakers for round-to-even.
+        lsls    rT,     r1,     #25
+
+        // Align the final result.
+        lsrs    r1,     #8
+
+    LSYM(__fp_round):
+        // If carry bit is '0', always round down.
+        bcc     LSYM(__fp_return)
+
+        // The carry bit is '1'.  Round to nearest, ties to even.
+        // If either the saved remainder bits [6:0], the additional remainder
+        //  bits in $r1, or the final LSB is '1', round up.
+        lsls    r3,     r1,     #31
+        orrs    r3,     rT
+        orrs    r3,     r0
+        beq     LSYM(__fp_return)
+
+        // If rounding up overflows, then the mantissa result becomes 2.0, 
+        //  which yields the correct return value up to and including INF. 
+        adds    r1,     #1
+
+    LSYM(__fp_return):
+        // Combine the mantissa and the exponent.
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        // End of library call, return to user.
+        add     r0,     ip
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: Underflow/inexact reporting IFF remainder
+  #endif
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LSYM(__fp_underflow):
+        // Set up to align the mantissa.
+        movs    r3,     r1
+        bne     LSYM(__fp_underflow2)
+
+        // MSB wasn't in the upper 32 bits, check the remainder.
+        // If the remainder is also zero, the result is +/-0.
+        movs    r3,     r0
+        beq     SYM(__fp_zero)
+
+        eors    r0,     r0
+        subs    r2,     #32
+
+    LSYM(__fp_underflow2):
+        // Save the pre-alignment exponent to align the remainder later.
+        movs    r1,     r2
+
+        // Align the mantissa with the MSB in bit[31].
+        bl      SYM(__fp_lalign2)
+
+        // Calculate the actual remainder shift.
+        subs    rT,     r1,     r2
+
+        // Align the lower bits of the remainder.
+        movs    r1,     r0
+        lsls    r0,     rT
+
+        // Combine the upper bits of the remainder with the aligned value.
+        rsbs    rT,     #0
+        adds    rT,     #32
+        lsrs    r1,     rT
+        adds    r1,     r3
+
+        // The MSB is now aligned at bit[31] of $r1.
+        // If the net exponent is still positive, the result will be normal.
+        // Because this function is used by fmul(), there is a possibility
+        //  that the value is still wider than 24 bits; always round.
+        tst     r2,     r2
+        bpl     LSYM(__fp_normal)
+
+    LSYM(__fp_subnormal):
+        // The MSB is aligned at bit[31], with a net negative exponent.
+        // The mantissa will need to be shifted right by the absolute value of
+        //  the exponent, plus the normal shift of 8.
+
+        // If the negative shift is smaller than -25, there is no result,
+        //  no rounding, no anything.  Return signed zero.
+        // (Otherwise, the shift for result and remainder may wrap.)
+        adds    r2,     #25
+        bmi     SYM(__fp_inexact_zero)
+
+        // Save the extra bits for the remainder.
+        movs    rT,     r1
+        lsls    rT,     r2
+
+        // Shift the mantissa to create a subnormal.
+        // Just like normal, round to nearest, ties to even.
+        movs    r3,     #33
+        subs    r3,     r2
+        eors    r2,     r2
+
+        // This shift must be last, leaving the shifted LSB in the C flag.
+        lsrs    r1,     r3
+        b       LSYM(__fp_round)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_assemble2
+CM0_FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+.section .text.sorted.libgcc.fpcore.h.infinityf,"x"
+CM0_FUNC_START fp_overflow
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/overflow exception
+  #endif
+
+CM0_FUNC_START fp_infinity
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        movs    r0,     #255
+        lsls    r0,     #23
+        add     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_infinity
+CM0_FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+.section .text.sorted.libgcc.fpcore.i.zerof,"x"
+CM0_FUNC_START fp_inexact_zero
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/underflow exception
+  #endif
+
+CM0_FUNC_START fp_zero
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Return 0 with the correct sign.
+        mov     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_zero
+CM0_FUNC_END fp_inexact_zero
+
+
+// Internal function to detect signaling NANs.  No return.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.fpcore.j.checkf,"x"
+CM0_FUNC_START fp_check_nan2
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+
+CM0_FUNC_START fp_check_nan
+
+        // Check for quiet NAN.
+        lsrs    r2,     r0,     #23
+        bcs     LSYM(__quiet_nan)
+
+        // Raise exception.  Preserves both $r0 and $r1.
+        svc     #(SVC_TRAP_NAN)
+
+        // Quiet the resulting NAN.
+        movs    r2,     #1
+        lsls    r2,     #22
+        orrs    r0,     r2
+
+    LSYM(__quiet_nan):
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_check_nan
+CM0_FUNC_END fp_check_nan2
+
+
+// Internal function to report floating point exceptions.  No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+.section .text.sorted.libgcc.fpcore.k.exceptf,"x"
+CM0_FUNC_START fp_exception
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Create a quiet NAN.
+        movs    r2,     #255
+        lsls    r2,     #1
+        adds    r2,     #1
+        lsls    r2,     #22
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        // Annotate the exception type in the NAN field.
+        // Make sure that the exception is in the valid region 
+        lsls    rT,     r3,     #13
+        orrs    r2,     rT
+      #endif
+
+// Exception handler that expects the result already in $r2,
+//  typically when the result is not going to be NAN.
+CM0_FUNC_START fp_exception2
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_FP_EXCEPTION)
+      #endif
+
+        // TODO: Save exception flags in a static variable.
+
+        // Set up the result, now that the argument isn't required any more.
+        movs    r0,     r2
+
+        // HACK: for sincosf(), with 2 parameters to return.
+        movs    r1,     r2
+
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END fp_exception2
+CM0_FUNC_END fp_exception
+
+#endif /* L_arm_addsubsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/idiv.S gcc-11-20201220/libgcc/config/arm/cm0/idiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/idiv.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/idiv.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,288 @@
+/* div.S: Cortex M0 optimized 32-bit integer division
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if 0
+ 
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+.section .text.sorted.libgcc.idiv0,"x"
+CM0_WEAK_START aeabi_idiv0
+CM0_FUNC_ALIAS cm0_idiv0 aeabi_idiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END cm0_idiv0
+CM0_FUNC_END aeabi_idiv0
+
+#endif /* L_dvmd_tls */
+
+
+#ifdef L_divsi3
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+// Same parent section as __divsi3() to keep branches within range.
+.section .text.sorted.libgcc.idiv.divsi3,"x"
+CM0_FUNC_START aeabi_idivmod
+CM0_FUNC_ALIAS aeabi_idiv aeabi_idivmod
+CM0_FUNC_ALIAS divsi3 aeabi_idivmod
+    CFI_START_FUNCTION
+
+        // Extend signs.
+        asrs    r2,     r0,     #31
+        asrs    r3,     r1,     #31
+
+        // Absolute value of the denominator, abort on division by zero.
+        eors    r1,     r3
+        subs    r1,     r3
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LSYM(__idivmod_zero)
+      #else 
+        beq     SYM(__uidivmod_zero)
+      #endif
+
+        // Absolute value of the numerator.
+        eors    r0,     r2
+        subs    r0,     r2
+
+        // Keep the sign of the numerator in bit[31] (for the remainder).
+        // Save the XOR of the signs in bits[15:0] (for the quotient).
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        lsrs    rT,     r3,     #16
+        eors    rT,     r2
+
+        // Handle division as unsigned.
+        bl      SYM(__uidivmod_nonzero) __PLT__ 
+
+        // Set the sign of the remainder.
+        asrs    r2,     rT,     #31
+        eors    r1,     r2
+        subs    r1,     r2
+
+        // Set the sign of the quotient.
+        sxth    r3,     rT
+        eors    r0,     r3
+        subs    r0,     r3
+
+    LSYM(__idivmod_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LSYM(__idivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI: 
+        //  * 0 if the numerator is 0,  
+        //  * Or, the largest value of the type manipulated by the calling 
+        //     division function if the numerator is positive,
+        //  * Or, the least value of the type manipulated by the calling
+        //     division function if the numerator is negative. 
+        subs    r1,     r0
+        orrs    r0,     r1
+        asrs    r0,     #31
+        lsrs    r0,     #1 
+        eors    r0,     r2 
+
+        // At least the __aeabi_idiv0() call is common.
+        b       SYM(__uidivmod_zero2)        
+  #endif /* PEDANTIC_DIV0 */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divsi3
+CM0_FUNC_END aeabi_idiv
+CM0_FUNC_END aeabi_idivmod
+
+#endif /* L_divsi3 */
+
+
+#ifdef L_udivsi3
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned int)
+// Returns unsigned $r0 after division by $r1.
+// Also returns the remainder in $r1.
+.section .text.sorted.libgcc.idiv.udivsi3,"x"
+CM0_FUNC_START aeabi_uidivmod
+CM0_FUNC_ALIAS aeabi_uidiv aeabi_uidivmod
+CM0_FUNC_ALIAS udivsi3 aeabi_uidivmod
+    CFI_START_FUNCTION
+
+        // Abort on division by zero.
+        tst     r1,     r1
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LSYM(__uidivmod_zero)
+      #else 
+        beq     SYM(__uidivmod_zero)
+      #endif
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    // Public symbol for the sake of divsi3().  
+    CM0_FUNC_START uidivmod_nonzero 
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // The loop is destructive, save a copy of the numerator.
+        mov     ip,     r0
+
+        // Set up binary search.
+        movs    r3,     #16
+        movs    r2,     #1
+
+    LSYM(__uidivmod_align):
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    r0,     r3
+        cmp     r0,     r1
+        blo     LSYM(__uidivmod_skip)
+
+        // Multiply the denominator and the result together.
+        lsls    r1,     r3
+        lsls    r2,     r3
+
+    LSYM(__uidivmod_skip):
+        // Restore the numerator, and iterate until search goes to 0.
+        mov     r0,     ip
+        lsrs    r3,     #1
+        bne     LSYM(__uidivmod_align)
+
+        // In The result $r3 has been conveniently initialized to 0.
+        b       LSYM(__uidivmod_entry)
+
+    LSYM(__uidivmod_loop):
+        // Scale the denominator and the quotient together.
+        lsrs    r1,     #1
+        lsrs    r2,     #1
+        beq     LSYM(__uidivmod_return)
+
+    LSYM(__uidivmod_entry):
+        // Test if the denominator is smaller than the numerator.
+        cmp     r0,     r1
+        blo     LSYM(__uidivmod_loop)
+
+        // If the denominator is smaller, the next bit of the result is '1'.
+        // If the new remainder goes to 0, exit early.
+        adds    r3,     r2
+        subs    r0,     r1
+        bne     LSYM(__uidivmod_loop)
+
+    LSYM(__uidivmod_return):
+        mov     r1,     r0
+        mov     r0,     r3
+        RET
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LSYM(__uidivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI: 
+        //  * 0 if the numerator is 0,  
+        //  * Or, the largest value of the type manipulated by the calling 
+        //     division function if the numerator is positive.
+        subs    r1,     r0
+        orrs    r0,     r1
+        asrs    r0,     #31
+
+    CM0_FUNC_START uidivmod_zero2
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+      #else 
+        push    { lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 4
+                .cfi_rel_offset lr, 0
+      #endif 
+
+        // Since GCC implements __aeabi_idiv0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        // NOTE: __aeabi_idiv0() can't be implemented as a tail call, since any
+        //  non-trivial override will (likely) corrupt a remainder in $r1.
+        bl      SYM(__aeabi_idiv0) __PLT__
+                
+        // Since the input to __aeabi_idiv0() was INF, there really isn't any
+        //  choice in which of the recommended *divmod() patterns to follow.  
+        // Clear the remainder to complete {INF, 0}.
+        eors    r1,     r1
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rT, pc }
+                .cfi_restore_state
+      #else 
+        pop     { pc }
+                .cfi_restore_state
+      #endif 
+        
+  #else /* !PEDANTIC_DIV0 */   
+    CM0_FUNC_START uidivmod_zero  
+        // NOTE: The following code sets up a return pair of {0, numerator},   
+        //  the second preference given by the ARM runtime ABI specification.   
+        // The pedantic version is 18 bytes larger between __aeabi_idiv() and
+        //  __aeabi_uidiv().  However, this version does not conform to the
+        //  out-of-line parameter requirements given for __aeabi_idiv0(), and
+        //  also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+        
+        // Since the numerator may be overwritten by __aeabi_idiv0(), save now.
+        // Afterwards, it can be restored directly as the remainder.
+        push    { r0, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset lr, 4
+
+        // Set up the quotient (not ABI compliant).
+        eors    r0,     r0
+
+        // Since GCC implements div0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        bl      SYM(__aeabi_idiv0) __PLT__
+
+        // Restore the remainder and return.
+        pop     { r1, pc }
+                .cfi_restore_state
+      
+  #endif /* !PEDANTIC_DIV0 */
+  
+    CFI_END_FUNCTION
+CM0_FUNC_END udivsi3
+CM0_FUNC_END aeabi_uidiv
+CM0_FUNC_END aeabi_uidivmod
+
+#endif /* L_udivsi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lcmp.S gcc-11-20201220/libgcc/config/arm/cm0/lcmp.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lcmp.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lcmp.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,136 @@
+/* lcmp.S: Cortex M0 optimized 64-bit integer comparison
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+   
+#if defined(L_arm_lcmp) || defined(L_cmpdi2)   
+
+#ifdef L_arm_lcmp
+.section .text.sorted.libgcc.lcmp,"x"
+  #define LCMP_NAME aeabi_lcmp
+#else
+.section .text.sorted.libgcc.cmpdi2,"x"
+  #define LCMP_NAME cmpdi2 
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+CM0_FUNC_START LCMP_NAME
+    CFI_START_FUNCTION
+
+        // Calculate the difference $r1:$r0 - $r3:$r2.
+        subs    xxl,    yyl 
+        sbcs    xxh,    yyh 
+
+        // With $r2 free, create a reference offset without affecting flags.
+        // Originally implemented as 'mov r2, r3' for ARM architectures 6+
+        //  with unified syntex.  However, this resulted in a compiler error 
+        //  for thumb-1: "MOV Rd, Rs with two low registers not permitted".  
+        // Since unified syntax deprecates the "cpy" instruction, shouldn't
+        //  there be a backwards-compatible tranlation in the assembler?
+        cpy     r2,     r3
+
+        // Finish the comparison.
+        blt     LSYM(__lcmp_lt)
+
+        // The reference offset ($r2 - $r3) will be +2 iff the first
+        //  argument is larger, otherwise the reference offset remains 0.
+        adds    r2,     #2
+
+    LSYM(__lcmp_lt):
+        // Check for zero equality (all 64 bits).
+        // It doesn't matter which register was originally "hi". 
+        orrs    r0,     r1
+        beq     LSYM(__lcmp_return)
+
+        // Convert the relative offset to an absolute value +/-1.
+        subs    r0,     r2,     r3
+        subs    r0,     #1
+
+    LSYM(__lcmp_return):
+      #ifdef L_cmpdi2 
+        // Shift to the correct output specification.
+        adds    r0,     #1
+      #endif 
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END LCMP_NAME 
+
+#endif /* L_arm_lcmp || L_cmpdi2 */
+
+
+#if defined(L_arm_ulcmp) || defined(L_ucmpdi2)
+
+#ifdef L_arm_ulcmp
+.section .text.sorted.libgcc.ulcmp,"x"
+  #define ULCMP_NAME aeabi_ulcmp
+#else
+.section .text.sorted.libgcc.ucmpdi2,"x"
+  #define ULCMP_NAME ucmpdi2 
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+CM0_FUNC_START ULCMP_NAME 
+    CFI_START_FUNCTION
+
+        // Calculate the 'C' flag.
+        subs    xxl,    yyl 
+        sbcs    xxh,    yyh 
+
+        // Capture the carry flg.  
+        // $r2 will contain -1 if the first value is smaller,
+        //  0 if the first value is larger or equal.
+        sbcs    r2,     r2
+
+        // Check for zero equality (all 64 bits).
+        // It doesn't matter which register was originally "hi". 
+        orrs    r0,     r1
+        beq     LSYM(__ulcmp_return)
+
+        // $r0 should contain +1 or -1
+        movs    r0,     #1
+        orrs    r0,     r2
+
+    LSYM(__ulcmp_return):
+      #ifdef L_ucmpdi2 
+        adds    r0,     #1
+      #endif 
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ULCMP_NAME
+
+#endif /* L_arm_ulcmp || L_ucmpdi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ldiv.S gcc-11-20201220/libgcc/config/arm/cm0/ldiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ldiv.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ldiv.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,482 @@
+/* ldiv.S: Cortex M0 optimized 64-bit integer division
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if 0
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+.section .text.sorted.libgcc.ldiv0,"x"
+CM0_WEAK_START aeabi_ldiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ldiv0
+
+#endif /* L_dvmd_tls */
+
+
+#ifdef L_divdi3 
+
+// long long __aeabi_ldiv(long long, long long)
+// lldiv_return __aeabi_ldivmod(long long, long long)
+// Returns signed $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+// Same parent section as __divsi3() to keep branches within range.
+.section .text.sorted.libgcc.ldiv.divdi3,"x"
+CM0_FUNC_START aeabi_ldivmod
+CM0_FUNC_ALIAS aeabi_ldiv aeabi_ldivmod
+CM0_FUNC_ALIAS divdi3 aeabi_ldivmod
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before pushing registers.
+        cmp     yyl,    #0
+        bne     LSYM(__ldivmod_valid)
+
+        cmp     yyh,    #0
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LSYM(__ldivmod_zero)
+      #else
+        beq     SYM(__uldivmod_zero)
+      #endif
+
+    LSYM(__ldivmod_valid):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+      #else
+        push    { rP, rQ, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 12
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset lr, 8
+      #endif
+
+        // Absolute value of the numerator.
+        asrs    rP,     xxh,    #31
+        eors    xxl,    rP
+        eors    xxh,    rP
+        subs    xxl,    rP
+        sbcs    xxh,    rP
+
+        // Absolute value of the denominator.
+        asrs    rQ,     yyh,    #31
+        eors    yyl,    rQ
+        eors    yyh,    rQ
+        subs    yyl,    rQ
+        sbcs    yyh,    rQ
+
+        // Keep the XOR of signs for the quotient.
+        eors    rQ,     rP
+
+        // Handle division as unsigned.
+        bl      SYM(__uldivmod_nonzero) __PLT__
+
+        // Set the sign of the quotient.
+        eors    xxl,    rQ
+        eors    xxh,    rQ
+        subs    xxl,    rQ
+        sbcs    xxh,    rQ
+
+        // Set the sign of the remainder.
+        eors    yyl,    rP
+        eors    yyh,    rP
+        subs    yyl,    rP
+        sbcs    yyh,    rP
+
+    LSYM(__ldivmod_return):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { rP, rQ, pc }
+                .cfi_restore_state
+      #endif
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LSYM(__ldivmod_zero):
+        // Save the sign of the numerator.
+        asrs    yyl,     xxh,    #31
+
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive,
+        //  * Or, the least value of the type manipulated by the calling
+        //     division function if the numerator is negative.
+        rsbs    xxl,    #0
+        sbcs    yyh,    xxh
+        orrs    xxh,    yyh
+        asrs    xxl,    xxh,   #31
+        lsrs    xxh,    xxl,   #1
+        eors    xxh,    yyl
+        eors    xxl,    yyl 
+
+        // At least the __aeabi_ldiv0() call is common.
+        b       SYM(__uldivmod_zero2)
+  #endif /* PEDANTIC_DIV0 */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END divdi3
+CM0_FUNC_END aeabi_ldiv
+CM0_FUNC_END aeabi_ldivmod
+
+#endif /* L_divdi3 */
+
+
+#ifdef L_udivdi3 
+
+// unsigned long long __aeabi_uldiv(unsigned long long, unsigned long long)
+// ulldiv_return __aeabi_uldivmod(unsigned long long, unsigned long long)
+// Returns unsigned $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+.section .text.sorted.libgcc.ldiv.udivdi3,"x"
+CM0_FUNC_START aeabi_uldivmod
+CM0_FUNC_ALIAS aeabi_uldiv aeabi_uldivmod
+CM0_FUNC_ALIAS udivdi3 aeabi_uldivmod
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before changing the stack.
+        cmp     yyh,    #0
+        bne     SYM(__uldivmod_nonzero)
+
+        cmp     yyl,    #0
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LSYM(__uldivmod_zero)
+      #else
+        beq     SYM(__uldivmod_zero)
+      #endif
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    CM0_FUNC_START uldivmod_nonzero
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+
+        // Set up denominator shift, assuming a single width result.
+        movs    rP,     #32
+
+        // If the upper word of the denominator is 0 ...
+        tst     yyh,    yyh
+        bne     LSYM(__uldivmod_setup)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // ... and the upper word of the numerator is also 0,
+        //  single width division will be at least twice as fast.
+        tst     xxh,    xxh
+        beq     LSYM(__uldivmod_small)
+  #endif
+
+        // ... and the lower word of the denominator is less than or equal
+        //     to the upper word of the numerator ...
+        cmp     xxh,    yyl
+        blo     LSYM(__uldivmod_setup)
+
+        //  ... then the result will be double width, at least 33 bits.
+        // Set up a flag in $rP to seed the shift for the second word.
+        movs    yyh,    yyl
+        eors    yyl,    yyl
+        adds    rP,     #64
+
+    LSYM(__uldivmod_setup):
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // Since search is destructive, first save a copy of the numerator.
+        mov     ip,     xxl
+        mov     lr,     xxh
+
+        // Set up binary search.
+        movs    rQ,     #16
+        eors    rT,     rT
+
+    LSYM(__uldivmod_align):
+        // Maintain a secondary shift $rT = 32 - $rQ, making the overlapping
+        //  shifts between low and high words easier to construct.
+        adds    rT,     rQ
+
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    xxh,    rQ
+
+        // Measure the high bits of denominator against the numerator.
+        cmp     xxh,    yyh
+        blo     LSYM(__uldivmod_skip)
+        bhi     LSYM(__uldivmod_shift)
+
+        // If the high bits are equal, construct the low bits for checking.
+        mov     xxh,    lr
+        lsls    xxh,    rT
+
+        lsrs    xxl,    rQ
+        orrs    xxh,    xxl
+
+        cmp     xxh,    yyl
+        blo     LSYM(__uldivmod_skip)
+
+    LSYM(__uldivmod_shift):
+        // Scale the denominator and the result together.
+        subs    rP,     rQ
+
+        // If the reduced numerator is still larger than or equal to the
+        //  denominator, it is safe to shift the denominator left.
+        movs    xxh,    yyl
+        lsrs    xxh,    rT
+        lsls    yyh,    rQ
+
+        lsls    yyl,    rQ
+        orrs    yyh,    xxh
+
+    LSYM(__uldivmod_skip):
+        // Restore the numerator.
+        mov     xxl,    ip
+        mov     xxh,    lr
+
+        // Iterate until the shift goes to 0.
+        lsrs    rQ,     #1
+        bne     LSYM(__uldivmod_align)
+
+        // Initialize the result (zero).
+        mov     ip,     rQ
+
+        // HACK: Compensate for the first word test.
+        lsls    rP,     #6
+
+    LSYM(__uldivmod_word2):
+        // Is there another word?
+        lsrs    rP,     #6
+        beq     LSYM(__uldivmod_return)
+
+        // Shift the calculated result by 1 word.
+        mov     lr,     ip
+        mov     ip,     rQ
+
+        // Set up the MSB of the next word of the quotient
+        movs    rQ,     #1
+        rors    rQ,     rP
+        b     LSYM(__uldivmod_entry)
+
+    LSYM(__uldivmod_loop):
+        // Divide the denominator by 2.
+        // It could be slightly faster to multiply the numerator,
+        //  but that would require shifting the remainder at the end.
+        lsls    rT,     yyh,    #31
+        lsrs    yyh,    #1
+        lsrs    yyl,    #1
+        adds    yyl,    rT
+
+        // Step to the next bit of the result.
+        lsrs    rQ,     #1
+        beq     LSYM(__uldivmod_word2)
+
+    LSYM(__uldivmod_entry):
+        // Test if the denominator is smaller, high byte first.
+        cmp     xxh,    yyh
+        blo     LSYM(__uldivmod_loop)
+        bhi     LSYM(__uldivmod_quotient)
+
+        cmp     xxl,    yyl
+        blo     LSYM(__uldivmod_loop)
+
+    LSYM(__uldivmod_quotient):
+        // Smaller denominator: the next bit of the quotient will be set.
+        add     ip,     rQ
+
+        // Subtract the denominator from the remainder.
+        // If the new remainder goes to 0, exit early.
+        subs    xxl,    yyl
+        sbcs    xxh,    yyh
+        bne     LSYM(__uldivmod_loop)
+
+        tst     xxl,    xxl
+        bne     LSYM(__uldivmod_loop)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Check whether there's still a second word to calculate.
+        lsrs    rP,     #6
+        beq     LSYM(__uldivmod_return)
+
+        // If so, shift the result left by a full word.
+        mov     lr,     ip
+        mov     ip,     xxh // zero
+  #else
+        eors    rQ,     rQ
+        b       LSYM(__uldivmod_word2)
+  #endif
+
+    LSYM(__uldivmod_return):
+        // Move the remainder to the second half of the result.
+        movs    yyl,    xxl
+        movs    yyh,    xxh
+
+        // Move the quotient to the first half of the result.
+        mov     xxl,    ip
+        mov     xxh,    lr
+
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LSYM(__uldivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive.
+        subs    yyl,    xxl
+        sbcs    yyh,    xxh
+        orrs    xxh,    yyh
+        asrs    xxh,    #31
+        movs    xxl,    xxh
+
+    CM0_FUNC_START uldivmod_zero2
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+      #else
+        push    { lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 4
+                .cfi_rel_offset lr, 0
+      #endif
+
+        // Since GCC implements __aeabi_ldiv0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        // NOTE: __aeabi_ldiv0() can't be implemented as a tail call, since any
+        //  non-trivial override will (likely) corrupt a remainder in $r3:$r2.
+        bl      SYM(__aeabi_ldiv0) __PLT__
+
+        // Since the input to __aeabi_ldiv0() was INF, there really isn't any
+        //  choice in which of the recommended *divmod() patterns to follow.
+        // Clear the remainder to complete {INF, 0}.
+        eors    yyl,    yyl
+        eors    yyh,    yyh
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { pc }
+                .cfi_restore_state
+      #endif
+
+  #else /* !PEDANTIC_DIV0 */
+    CM0_FUNC_START uldivmod_zero
+        // NOTE: The following code sets up a return pair of {0, numerator},
+        //  the second preference given by the ARM runtime ABI specification.
+        // The pedantic version is 30 bytes larger between __aeabi_ldiv() and
+        //  __aeabi_uldiv().  However, this version does not conform to the
+        //  out-of-line parameter requirements given for __aeabi_ldiv0(), and
+        //  also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+
+        // Since the numerator may be overwritten by __aeabi_ldiv0(), save now.
+        // Afterwards, they can be restored directly as the remainder.
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { r0, r1, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset xxl,0
+                .cfi_rel_offset xxh,4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+      #else
+        push    { r0, r1, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 12
+                .cfi_rel_offset xxl,0
+                .cfi_rel_offset xxh,4
+                .cfi_rel_offset lr, 8
+      #endif
+
+        // Set up the quotient.
+        eors    xxl,    xxl
+        eors    xxh,    xxh
+
+        // Since GCC implements div0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        bl      SYM(__aeabi_ldiv0) __PLT__
+
+        // Restore the remainder and return.  
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { r2, r3, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { r2, r3, pc }
+                .cfi_restore_state
+      #endif
+  #endif /* !PEDANTIC_DIV0 */
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LSYM(__uldivmod_small):
+        // Arrange operands for (much faster) 32-bit division.
+      #if defined(__ARMEB__) && __ARMEB__
+        movs    r0,     r1
+        movs    r1,     r3
+      #else 
+        movs    r1,     r2 
+      #endif 
+
+        bl      SYM(__uidivmod_nonzero) __PLT__
+
+        // Arrange results back into 64-bit format. 
+      #if defined(__ARMEB__) && __ARMEB__
+        movs    r3,     r1
+        movs    r1,     r0
+      #else 
+        movs    r2,     r1
+      #endif
+ 
+        // Extend quotient and remainder to 64 bits, unsigned.
+        eors    xxh,    xxh
+        eors    yyh,    yyh
+        pop     { rP, rQ, rT, pc }
+  #endif
+
+    CFI_END_FUNCTION
+CM0_FUNC_END udivdi3
+CM0_FUNC_END aeabi_uldiv
+CM0_FUNC_END aeabi_uldivmod
+
+#endif /* L_udivdi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lmul.S gcc-11-20201220/libgcc/config/arm/cm0/lmul.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lmul.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lmul.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,213 @@
+/* lmul.S: Cortex M0 optimized 64-bit integer multiplication 
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+   
+#ifdef L_muldi3 
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+.section .text.sorted.libgcc.lmul.muldi3,"x"
+CM0_FUNC_START aeabi_lmul
+CM0_FUNC_ALIAS muldi3 aeabi_lmul
+    CFI_START_FUNCTION
+
+        // $r1:$r0 = 0xDDDDCCCCBBBBAAAA
+        // $r3:$r2 = 0xZZZZYYYYXXXXWWWW
+
+        // The following operations that only affect the upper 64 bits
+        //  can be safely discarded:
+        //   DDDD * ZZZZ
+        //   DDDD * YYYY
+        //   DDDD * XXXX
+        //   CCCC * ZZZZ
+        //   CCCC * YYYY
+        //   BBBB * ZZZZ
+
+        // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+        // (0xDDDDCCCC * 0xXXXXWWWW), free $r1
+        muls    xxh,    yyl 
+
+        // (0xZZZZYYYY * 0xBBBBAAAA), free $r3
+        muls    yyh,    xxl 
+        adds    yyh,    xxh 
+
+        // Put the parameters in the correct form for umulsidi3().
+        movs    xxh,    yyl 
+        b       LSYM(__mul_overflow)
+
+    CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lmul
+CM0_FUNC_END muldi3
+
+#endif /* L_muldi3 */
+
+
+// The following implementation of __umulsidi3() integrates with __muldi3()
+//  above to allow the fast tail call while still preserving the extra  
+//  hi-shifted bits of the result.  However, these extra bits add a few 
+//  instructions not otherwise required when using only __umulsidi3().
+// Therefore, this block configures __umulsidi3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version adds the hi bits of __muldi3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols in programs that multiply long doubles.
+// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS.
+#if defined(L_muldi3) || defined(L_umulsidi3)
+
+#ifdef L_umulsidi3
+// unsigned long long __umulsidi3(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+.section .text.sorted.libgcc.lmul.umulsidi3,"x"
+CM0_WEAK_START umulsidi3
+    CFI_START_FUNCTION
+
+#else /* L_muldi3 */
+CM0_FUNC_START umulsidi3
+    CFI_START_FUNCTION
+
+        // 32x32 multiply with 64 bit result.
+        // Expand the multiply into 4 parts, since muls only returns 32 bits.
+        //         (a16h * b16h / 2^32)
+        //       + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+        //       + (a16l * b16l / 2^64)
+
+        // MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+        eors    yyh,    yyh 
+
+    LSYM(__mul_overflow):
+        mov     ip,     yyh 
+
+#endif /* !L_muldi3 */ 
+
+        // a16h * b16h
+        lsrs    r2,     xxl,    #16
+        lsrs    r3,     xxh,    #16
+        muls    r2,     r3
+
+      #ifdef L_muldi3 
+        add     ip,     r2
+      #else 
+        mov     ip,     r2
+      #endif 
+
+        // a16l * b16h; save a16h first!
+        lsrs    r2,     xxl,    #16
+    #if (__ARM_ARCH >= 6)    
+        uxth    xxl,    xxl
+    #else /* __ARM_ARCH < 6 */
+        lsls    xxl,    #16
+        lsrs    xxl,    #16 
+    #endif  
+        muls    r3,     xxl
+
+        // a16l * b16l
+    #if (__ARM_ARCH >= 6)    
+        uxth    xxh,    xxh 
+    #else /* __ARM_ARCH < 6 */
+        lsls    xxh,    #16
+        lsrs    xxh,    #16 
+    #endif  
+        muls    xxl,    xxh 
+
+        // a16h * b16l
+        muls    xxh,    r2
+
+        // Distribute intermediate results.
+        eors    r2,     r2
+        adds    xxh,    r3
+        adcs    r2,     r2
+        lsls    r3,     xxh,    #16
+        lsrs    xxh,    #16
+        lsls    r2,     #16
+        adds    xxl,    r3
+        adcs    xxh,    r2
+
+        // Add in the high bits.
+        add     xxh,     ip
+
+        RET
+
+    CFI_END_FUNCTION
+CM0_FUNC_END umulsidi3
+
+#endif /* L_muldi3 || L_umulsidi3 */
+
+
+#ifdef L_mulsidi3
+
+// long long mulsidi3(int, int)
+// Returns all 64 bits of a 32 bit signed multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $rT as scratch space.
+.section .text.sorted.libgcc.lmul.mulsidi3,"x"
+CM0_FUNC_START mulsidi3
+    CFI_START_FUNCTION
+
+        // Push registers for function call.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save signs of the arguments.
+        asrs    r3,     r0,     #31
+        asrs    rT,     r1,     #31
+
+        // Absolute value of the arguments.
+        eors    r0,     r3
+        eors    r1,     rT
+        subs    r0,     r3
+        subs    r1,     rT
+
+        // Save sign of the result.
+        eors    rT,     r3
+
+        bl      SYM(__umulsidi3) __PLT__
+
+        // Apply sign of the result.
+        eors    xxl,     rT
+        eors    xxh,     rT
+        subs    xxl,     rT
+        sbcs    xxh,     rT
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+CM0_FUNC_END mulsidi3
+
+#endif /* L_mulsidi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lshift.S gcc-11-20201220/libgcc/config/arm/cm0/lshift.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lshift.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lshift.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,202 @@
+/* lshift.S: Cortex M0 optimized 64-bit integer shift 
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+   
+
+#ifdef L_ashldi3
+
+// long long __aeabi_llsl(long long, int)
+// Logical shift left the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.ashldi3,"x"
+CM0_FUNC_START aeabi_llsl
+CM0_FUNC_ALIAS ashldi3 aeabi_llsl
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxl 
+
+        // Assume a simple shift.
+        lsls    xxl,    r2
+        lsls    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__llsl_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsrs    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__llsl_large):
+        // Apply any remaining shift
+        lsls    r3,     r2
+
+        // Merge remainder and result.
+        adds    xxh,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        // Moved here from lib1funcs.S
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxh,    xxh,    lsl r2
+        movpl   xxh,    xxl,    lsl r3
+        orrmi   xxh,    xxh,    xxl,    lsr ip
+        mov     xxl,    xxl,    lsl r2
+        RET
+
+  #endif /* !__thumb__ */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ashldi3
+CM0_FUNC_END aeabi_llsl
+
+#endif /* L_ashldi3 */
+
+
+#ifdef L_lshrdi3
+
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.lshrdi3,"x"
+CM0_FUNC_START aeabi_llsr
+CM0_FUNC_ALIAS lshrdi3 aeabi_llsr
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxh 
+
+        // Assume a simple shift.
+        lsrs    xxl,    r2
+        lsrs    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__llsr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__llsr_large):
+        // Apply any remaining shift
+        lsrs    r3,     r2
+
+        // Merge remainder and result.
+        adds    xxl,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        // Moved here from lib1funcs.S
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxl,    xxl,    lsr r2
+        movpl   xxl,    xxh,    lsr r3
+        orrmi   xxl,    xxl,    xxh,    lsl ip
+        mov     xxh,    xxh,    lsr r2
+        RET
+
+  #endif /* !__thumb__ */
+
+
+    CFI_END_FUNCTION
+CM0_FUNC_END lshrdi3
+CM0_FUNC_END aeabi_llsr
+
+#endif /* L_lshrdi3 */
+
+
+#ifdef L_ashrdi3
+
+// long long __aeabi_lasr(long long, int)
+// Arithmetic shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.ashrdi3,"x"
+CM0_FUNC_START aeabi_lasr
+CM0_FUNC_ALIAS ashrdi3 aeabi_lasr
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxh 
+
+        // Assume a simple shift.
+        lsrs    xxl,    r2
+        asrs    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+        bhs     LSYM(__lasr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+    LSYM(__lasr_large):
+        // Apply any remaining shift
+        asrs    r3,     r2
+
+        // Merge remainder and result.
+        adds    xxl,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        // Moved here from lib1funcs.S
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxl,    xxl,    lsr r2
+        movpl   xxl,    xxh,    asr r3
+        orrmi   xxl,    xxl,    xxh,    lsl ip
+        mov     xxh,    xxh,    asr r2
+        RET
+
+  #endif /* !__thumb__ */  
+
+    CFI_END_FUNCTION
+CM0_FUNC_END ashrdi3
+CM0_FUNC_END aeabi_lasr
+
+#endif /* L_ashrdi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/parity.S gcc-11-20201220/libgcc/config/arm/cm0/parity.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/parity.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/parity.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,122 @@
+/* parity.S: Cortex M0 optimized parity functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_paritydi2
+   
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.paritydi2,"x"
+CM0_FUNC_START paritydi2
+    CFI_START_FUNCTION
+    
+        // Combine the upper and lower words, then fall through. 
+        // Byte-endianness does not matter for this function.  
+        eors    r0,     r1
+
+#endif /* L_paritydi2 */ 
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __paritydi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2) 
+
+#ifdef L_paritysi2            
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.paritysi2,"x"
+CM0_WEAK_START paritysi2
+    CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+CM0_FUNC_START paritysi2
+
+#endif
+
+  #if defined(__thumb__) && __thumb__
+    #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+        // Size optimized: 16 bytes, 40 cycles
+        // Speed optimized: 24 bytes, 14 cycles
+        movs    r2,     #16 
+        
+    LSYM(__parity_loop):
+        // Calculate the parity of successively smaller half-words into the MSB.  
+        movs    r1,     r0 
+        lsls    r1,     r2 
+        eors    r0,     r1 
+        lsrs    r2,     #1 
+        bne     LSYM(__parity_loop)
+   
+    #else /* !__OPTIMIZE_SIZE__ */
+        
+        // Unroll the loop.  The 'libgcc' reference C implementation replaces 
+        //  the x2 and the x1 shifts with a constant.  However, since it takes 
+        //  4 cycles to load, index, and mask the constant result, it doesn't 
+        //  cost anything to keep shifting (and saves a few bytes).  
+        lsls    r1,     r0,     #16 
+        eors    r0,     r1 
+        lsls    r1,     r0,     #8 
+        eors    r0,     r1 
+        lsls    r1,     r0,     #4 
+        eors    r0,     r1 
+        lsls    r1,     r0,     #2 
+        eors    r0,     r1 
+        lsls    r1,     r0,     #1 
+        eors    r0,     r1 
+        
+    #endif /* !__OPTIMIZE_SIZE__ */
+  #else /* !__thumb__ */
+   
+        eors    r0,    r0,     r0,     lsl #16
+        eors    r0,    r0,     r0,     lsl #8
+        eors    r0,    r0,     r0,     lsl #4
+        eors    r0,    r0,     r0,     lsl #2
+        eors    r0,    r0,     r0,     lsl #1
+
+  #endif /* !__thumb__ */
+ 
+        lsrs    r0,     #31 
+        RET
+        
+    CFI_END_FUNCTION
+CM0_FUNC_END paritysi2
+
+#ifdef L_paritydi2
+CM0_FUNC_END paritydi2
+#endif 
+
+#endif /* L_paritysi2 || L_paritydi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/popcnt.S gcc-11-20201220/libgcc/config/arm/cm0/popcnt.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/popcnt.S	1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/popcnt.S	2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,199 @@
+/* popcnt.S: Cortex M0 optimized popcount functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_popcountdi2
+   
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.popcountdi2,"x"
+CM0_FUNC_START popcountdi2
+    CFI_START_FUNCTION
+    
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Initialize the result.
+        // Compensate for the two extra loop (one for each word)
+        //  required to detect zero arguments.  
+        movs    r2,     #2
+
+    LSYM(__popcountd_loop):
+        // Same as __popcounts_loop below, except for $r1.
+        subs    r2,     #1
+        subs    r3,     r1,     #1
+        ands    r1,     r3 
+        bcs     LSYM(__popcountd_loop)
+        
+        // Repeat the operation for the second word.  
+        b       LSYM(__popcounts_loop)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+        // Load the one-bit alternating mask.
+        ldr     r3,     LSYM(__popcount_1b)
+
+        // Reduce the second word. 
+        lsrs    r2,     r1,     #1
+        ands    r2,     r3
+        subs    r1,     r2 
+
+        // Reduce the first word. 
+        lsrs    r2,     r0,     #1
+        ands    r2,     r3
+        subs    r0,     r2 
+
+        // Load the two-bit alternating mask. 
+        ldr     r3,     LSYM(__popcount_2b)
+
+        // Reduce the second word.
+        lsrs    r2,     r1,     #2
+        ands    r2,     r3
+        ands    r1,     r3
+        adds    r1,     r2
+
+        // Reduce the first word. 
+        lsrs    r2,     r0,     #2
+        ands    r2,     r3
+        ands    r0,     r3
+        adds    r0,     r2    
+
+        // There will be a maximum of 8 bits in each 4-bit field.   
+        // Jump into the single word flow to combine and complete.
+        b       LSYM(__popcounts_merge)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */ 
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __popcountdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_popcountsi2' should appear before '_popcountdi2' in LIB1ASMFUNCS.
+#if defined(L_popcountsi2) || defined(L_popcountdi2) 
+
+#ifdef L_popcountsi2            
+// int __popcountsi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.popcountsi2,"x"
+CM0_WEAK_START popcountsi2
+    CFI_START_FUNCTION
+
+#else /* L_popcountdi2 */
+CM0_FUNC_START popcountsi2
+
+#endif
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Initialize the result.
+        // Compensate for the extra loop required to detect zero.
+        movs    r2,     #1
+
+        // Kernighan's algorithm for __popcount(x): 
+        //     for (c = 0; x; c++)
+        //         x &= x - 1;
+
+    LSYM(__popcounts_loop):
+        // Every loop counts for a '1' set in the argument.  
+        // Count down since it's easier to initialize positive compensation, 
+        //  and the negation before function return is free.  
+        subs    r2,     #1
+
+        // Clear one bit per loop.  
+        subs    r3,     r0,     #1
+        ands    r0,     r3 
+
+        // If this is a test for zero, it will be impossible to distinguish
+        //  between zero and one bits set: both terminate after one loop.  
+        // Instead, subtraction underflow flags when zero entered the loop.
+        bcs     LSYM(__popcountd_loop)
+       
+        // Invert the result, since we have been counting negative.   
+        rsbs    r0,     r2,     #0 
+        RET
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+        // Load the one-bit alternating mask.
+        ldr     r3,     LSYM(__popcount_1b)
+
+        // Reduce the word. 
+        lsrs    r1,     r0,     #1
+        ands    r1,     r3
+        subs    r0,     r1 
+
+        // Load the two-bit alternating mask. 
+        ldr     r3,     LSYM(__popcount_2b)
+
+        // Reduce the word. 
+        lsrs    r1,     r0,     #2
+        ands    r0,     r3
+        ands    r1,     r3
+    LSYM(__popcounts_merge):
+        adds    r0,     r1
+
+        // Load the four-bit alternating mask.  
+        ldr     r3,     LSYM(__popcount_4b)
+
+        // Reduce the word. 
+        lsrs    r1,     r0,     #4
+        ands    r0,     r3
+        ands    r1,     r3
+        adds    r0,     r1
+
+        // Accumulate individual byte sums into the MSB.
+        lsls    r1,     r0,     #8
+        adds    r0,     r1 
+        lsls    r1,     r0,     #16
+        adds    r0,     r1
+
+        // Isolate the cumulative sum.
+        lsrs    r0,     #24
+        RET
+
+        .align 2
+    LSYM(__popcount_1b):
+        .word 0x55555555
+    LSYM(__popcount_2b):
+        .word 0x33333333
+    LSYM(__popcount_4b):
+        .word 0x0F0F0F0F
+        
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+CM0_FUNC_END popcountsi2
+
+#ifdef L_popcountdi2
+CM0_FUNC_END popcountdi2
+#endif
+
+#endif /* L_popcountsi2 || L_popcountdi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/lib1funcs.S gcc-11-20201220/libgcc/config/arm/lib1funcs.S
--- gcc-11-20201220-clean/libgcc/config/arm/lib1funcs.S	2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/lib1funcs.S	2021-01-06 02:45:47.436262144 -0800
@@ -1050,6 +1050,10 @@
 /* ------------------------------------------------------------------------ */
 /*		Start of the Real Functions				    */
 /* ------------------------------------------------------------------------ */
+
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
 #ifdef L_udivsi3
 
 #if defined(__prefer_thumb__)
@@ -1455,6 +1459,8 @@
 	DIV_FUNC_END modsi3 signed
 
 #endif /* L_modsi3 */
+#endif /* NOT_ISA_TARGET_32BIT */
+
 /* ------------------------------------------------------------------------ */
 #ifdef L_dvmd_tls
 
@@ -1472,7 +1478,8 @@
 	FUNC_END div0
 #endif
 	
-#endif /* L_divmodsi_tools */
+#endif /* L_dvmd_tls */
+
 /* ------------------------------------------------------------------------ */
 #ifdef L_dvmd_lnx
 @ GNU/Linux division-by zero handler.  Used in place of L_dvmd_tls
@@ -1509,6 +1516,7 @@
 #endif
 	
 #endif /* L_dvmd_lnx */
+
 #ifdef L_clear_cache
 #if defined __ARM_EABI__ && defined __linux__
 @ EABI GNU/Linux call to cacheflush syscall.
@@ -1584,305 +1592,12 @@
    case of logical shifts) or the sign (for asr).  */
 
 #ifdef __ARMEB__
-#define al	r1
-#define ah	r0
-#else
-#define al	r0
-#define ah	r1
-#endif
-
-/* Prevent __aeabi double-word shifts from being produced on SymbianOS.  */
-#ifndef __symbian__
-
-#ifdef L_lshrdi3
-
-	FUNC_START lshrdi3
-	FUNC_ALIAS aeabi_llsr lshrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	lsrs	ah, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsrs	r3, r2
-	orrs	al, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsls	r3, r2
-	orrs	al, r3
-	RET
+#define al      r1
+#define ah      r0
 #else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, lsr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, lsr r2
-	RET
-#endif
-	FUNC_END aeabi_llsr
-	FUNC_END lshrdi3
-
-#endif
-	
-#ifdef L_ashrdi3
-	
-	FUNC_START ashrdi3
-	FUNC_ALIAS aeabi_lasr ashrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	asrs	ah, r2
-	subs	r2, #32
-	@ If r2 is negative at this point the following step would OR
-	@ the sign bit into all of AL.  That's not what we want...
-	bmi	1f
-	mov	ip, r3
-	asrs	r3, r2
-	orrs	al, r3
-	mov	r3, ip
-1:
-	negs	r2, r2
-	lsls	r3, r2
-	orrs	al, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, asr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, asr r2
-	RET
-#endif
-
-	FUNC_END aeabi_lasr
-	FUNC_END ashrdi3
-
-#endif
-
-#ifdef L_ashldi3
-
-	FUNC_START ashldi3
-	FUNC_ALIAS aeabi_llsl ashldi3
-	
-#ifdef __thumb__
-	lsls	ah, r2
-	movs	r3, al
-	lsls	al, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsls	r3, r2
-	orrs	ah, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsrs	r3, r2
-	orrs	ah, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	ah, ah, lsl r2
-	movpl	ah, al, lsl r3
-	orrmi	ah, ah, al, lsr ip
-	mov	al, al, lsl r2
-	RET
+#define al      r0
+#define ah      r1
 #endif
-	FUNC_END aeabi_llsl
-	FUNC_END ashldi3
-
-#endif
-
-#endif /* __symbian__ */
-
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	adds	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-	FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	add	r0, r0, r1
-	RET
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END clzsi2
-#endif
-#endif /* L_clzsi2 */
-
-#ifdef L_clzdi2
-#if !defined (__ARM_FEATURE_CLZ)
-
-# ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzdi2
-	push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	movs	r0, xxl
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	movs	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	pop	{r4, pc}
-# else /* NOT_ISA_TARGET_32BIT */
-ARM_FUNC_START clzdi2
-	do_push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	mov	r0, xxl
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	mov	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	RETLDM	r4
-	FUNC_END clzdi2
-# endif /* NOT_ISA_TARGET_32BIT */
-
-#else /* defined (__ARM_FEATURE_CLZ) */
-
-ARM_FUNC_START clzdi2
-	cmp	xxh, #0
-	do_it	eq, et
-	clzeq	r0, xxl
-	clzne	r0, xxh
-	addeq	r0, r0, #32
-	RET
-	FUNC_END clzdi2
-
-#endif
-#endif /* L_clzdi2 */
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-	negs	r1, r0
-	ands	r0, r0, r1
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	subs	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-	FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
-	rsb	r1, r0, #0
-	and	r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	rsb	r0, r0, #31
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	sub	r0, r0, r1
-	RET
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END ctzsi2
-#endif
-#endif /* L_clzsi2 */
 
 /* ------------------------------------------------------------------------ */
 /* These next two sections are here despite the fact that they contain Thumb 
@@ -2190,4 +1905,77 @@
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
 #endif /* NOT_ISA_TARGET_32BIT */
+
+
+/* Temp registers. */
+#define rP r4
+#define rQ r5
+#define rS r6
+#define rT r7
+
+.macro CM0_FUNC_START name
+.global SYM(__\name)
+.type SYM(__\name),function
+THUMB_CODE
+THUMB_FUNC
+.align 1
+    SYM(__\name):
+.endm
+
+.macro CM0_WEAK_START name 
+.weak SYM(__\name)
+CM0_FUNC_START \name 
+.endm
+
+.macro CM0_FUNC_ALIAS new old
+.global	SYM (__\new)
+.thumb_set SYM (__\new), SYM (__\old)
+.endm
+
+.macro CM0_WEAK_ALIAS new old
+.weak SYM(__\new)
+CM0_FUNC_ALIAS \new \old 
+.endm
+
+.macro CM0_FUNC_END name
+.size SYM(__\name), . - SYM(__\name)
+.endm
+
+#include "cm0/fplib.h"
+
+/* These have no conflicts with existing ARM implementations, 
+    so these these files can be built for all architectures. */ 
+#include "cm0/ctz2.S"
+#include "cm0/clz2.S"
+#include "cm0/lcmp.S"
+#include "cm0/lmul.S"
+#include "cm0/lshift.S"
+#include "cm0/parity.S"
+#include "cm0/popcnt.S"
+
+#ifdef NOT_ISA_TARGET_32BIT 
+
+/* These have existing ARM implementations that may be preferred 
+    for non-v6m architectures.  For example, use of the hardware 
+    instructions for 'clz' and 'umull'/'smull'.  Comprehensive 
+    integration may be possible in the future. */
+#include "cm0/idiv.S"
+#include "cm0/ldiv.S"
+
+#include "cm0/fcmp.S"
+
+/* Section names in the following files are selected to maximize 
+    the utility of +/- 256 byte conditional branches. */
+#include "cm0/fneg.S"
+#include "cm0/fadd.S"
+#include "cm0/futil.S"
+#include "cm0/fmul.S"
+#include "cm0/fdiv.S"
+
+#include "cm0/ffloat.S"
+#include "cm0/ffixed.S"
+#include "cm0/fconv.S"
+
+#endif /* NOT_ISA_TARGET_32BIT */ 
+
 #endif /* !__symbian__ */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/t-elf gcc-11-20201220/libgcc/config/arm/t-elf
--- gcc-11-20201220-clean/libgcc/config/arm/t-elf	2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/t-elf	2021-01-06 02:45:47.436262144 -0800
@@ -10,23 +10,31 @@
 # inclusion create when only multiplication is used, thus avoiding pulling in
 # useless division code.
 ifneq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
-LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
+LIB1ASMFUNCS += _arm_muldf3
 endif
 endif # !__symbian__
 
+
+# Preferred WEAK implementations should appear first.  See implementation notes.
+LIB1ASMFUNCS += _arm_mulsf3 _arm_addsf3 _umulsidi3 _arm_floatsisf _arm_floatundisf \
+	_clzsi2 _ctzsi2 _ffssi2 _clrsbsi2 _paritysi2 _popcountsi2 
+
+
 # For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M.  Here we want to use the soft-fp C
-# implementation.  The soft-fp code is only build for ARMv6M.  This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
-	_call_via_rX _interwork_call_via_rX \
-	_lshrdi3 _ashrdi3 _ashldi3 \
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _udivdi3 _divdi3 \
+	_dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX \
+	_lshrdi3 _ashrdi3 _ashldi3 _mulsidi3 _muldi3 \
+	_arm_lcmp _cmpdi2 _arm_ulcmp _ucmpdi2 \
 	_arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
-	_arm_fixdfsi _arm_fixunsdfsi \
-	_arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
-	_arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
-	_arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-	_clzsi2 _clzdi2 _ctzsi2
+	_arm_fixdfsi _arm_fixunsdfsi _arm_fixsfsi _arm_fixunssfsi \
+	_arm_f2h _arm_h2f _arm_d2f _arm_f2d _arm_truncdfsf2 \
+	_arm_negsf2 _arm_addsubsf3 _arm_frsubsf3 _arm_divsf3 _arm_muldivsf3 \
+	_arm_cmpsf2 _arm_unordsf2 _arm_eqsf2 _arm_gesf2 \
+ 	_arm_fcmpeq _arm_fcmpne _arm_fcmplt _arm_fcmple _arm_fcmpge _arm_fcmpgt \
+	_arm_cfcmpeq _arm_cfcmple _arm_cfrcmple \
+	_arm_floatdidf _arm_floatundidf _arm_floatdisf _arm_floatunsisf \
+	_clzdi2 _ctzdi2 _ffsdi2 _clrsbdi2 _paritydi2 _popcountdi2 
+
 
 # Currently there is a bug somewhere in GCC's alias analysis
 # or scheduling code that is breaking _fpmul_parts in fp-bit.c.

  reply	other threads:[~2021-01-06 11:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:04 [PATCH] " Daniel Engel
2020-11-26  9:14 ` Christophe Lyon
2020-12-02  3:32   ` Daniel Engel
2020-12-16 17:15     ` Christophe Lyon
2021-01-06 11:20       ` Daniel Engel [this message]
2021-01-06 17:05         ` [PATCH v3] " Richard Earnshaw
2021-01-07  0:59           ` Daniel Engel
2021-01-07 12:56             ` Richard Earnshaw
2021-01-07 13:27               ` Christophe Lyon
2021-01-07 16:44                 ` Richard Earnshaw
2021-01-09 12:28               ` Daniel Engel
2021-01-09 13:09                 ` Christophe Lyon
2021-01-09 18:04                   ` Daniel Engel
2021-01-11 14:49                     ` Richard Earnshaw
2021-01-09 18:48                   ` Daniel Engel
2021-01-11 16:07                   ` Christophe Lyon
2021-01-11 16:18                     ` Daniel Engel
2021-01-11 16:39                       ` Christophe Lyon
2021-01-15 11:40                         ` Daniel Engel
2021-01-15 12:30                           ` Christophe Lyon
2021-01-16 16:14                             ` Daniel Engel
2021-01-21 10:29                               ` Christophe Lyon
2021-01-21 20:35                                 ` Daniel Engel
2021-01-22 18:28                                   ` Christophe Lyon
2021-01-25 17:48                                     ` Christophe Lyon
2021-01-25 23:36                                       ` Daniel Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=962a0e7d-f431-42ee-aa42-e4e4cc823a10@www.fastmail.com \
    --to=libgcc@danielengel.com \
    --cc=christophe.lyon@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).