From: "Daniel Engel" <libgcc@danielengel.com>
To: "Christophe Lyon" <christophe.lyon@linaro.org>
Cc: "gcc Patches" <gcc-patches@gcc.gnu.org>
Subject: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0
Date: Wed, 06 Jan 2021 03:20:18 -0800 [thread overview]
Message-ID: <962a0e7d-f431-42ee-aa42-e4e4cc823a10@www.fastmail.com> (raw)
In-Reply-To: <CAKdteObdRur2SfYQ3TPQ7soRCsS9d++krfq9138jWvbu+JFLxA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 17488 bytes --]
Hi Christophe,
On Wed, Dec 16, 2020, at 9:15 AM, Christophe Lyon wrote:
> On Wed, 2 Dec 2020 at 04:31, Daniel Engel <libgcc@danielengel.com> wrote:
> >
> > Hi Christophe,
> >
> > On Thu, Nov 26, 2020, at 1:14 AM, Christophe Lyon wrote:
> > > Hi,
> > >
> > > On Fri, 13 Nov 2020 at 00:03, Daniel Engel <libgcc@danielengel.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch adds an efficient assembly-language implementation of IEEE-
> > > > 754 compliant floating point routines for Cortex M0 EABI (v6m, thumb-
> > > > 1). This is the libgcc portion of a larger library originally
> > > > described in 2018:
> > > >
> > > > https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
> > > >
> > > > Since that time, I've separated the libm functions for submission to
> > > > newlib. The remaining libgcc functions in the attached patch have
> > > > the following characteristics:
> > > >
> > > > Function(s) Size (bytes) Cycles Stack Accuracy
> > > > __clzsi2 42 23 0 exact
> > > > __clzsi2 (OPTIMIZE_SIZE) 22 55 0 exact
> > > > __clzdi2 8+__clzsi2 4+__clzsi2 0 exact
> > > >
> > > > __umulsidi3 44 24 0 exact
> > > > __mulsidi3 30+__umulsidi3 24+__umulsidi3 8 exact
> > > > __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 exact
> > > > __ashldi3 (__aeabi_llsl) 22 13 0 exact
> > > > __lshrdi3 (__aeabi_llsr) 22 13 0 exact
> > > > __ashrdi3 (__aeabi_lasr) 22 13 0 exact
> > > >
> > > > __aeabi_lcmp 20 13 0 exact
> > > > __aeabi_ulcmp 16 10 0 exact
> > > >
> > > > __udivsi3 (__aeabi_uidiv) 56 72 – 385 0 < 1 lsb
> > > > __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 < 1 lsb
> > > > __udivdi3 (__aeabi_uldiv) 164 103 – 1394 16 < 1 lsb
> > > > __udivdi3 (OPTIMIZE_SIZE) 142 120 – 1392 16 < 1 lsb
> > > > __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 < 1 lsb
> > > >
> > > > __shared_float 178
> > > > __shared_float (OPTIMIZE_SIZE) 154
> > > >
> > > > __addsf3 (__aeabi_fadd) 116+__shared_float 31 – 76 8 <= 0.5 ulp
> > > > __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 <= 0.5 ulp
> > > > __subsf3 (__aeabi_fsub) 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
> > > > __aeabi_frsub 8+__addsf3 6+__addsf3 8 <= 0.5 ulp
> > > > __mulsf3 (__aeabi_fmul) 112+__shared_float 73 – 97 8 <= 0.5 ulp
> > > > __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 <= 0.5 ulp
> > > > __divsf3 (__aeabi_fdiv) 132+__shared_float 83 – 361 8 <= 0.5 ulp
> > > > __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263 – 359 8 <= 0.5 ulp
> > > >
> > > > __cmpsf2/__lesf2/__ltsf2 72 33 0 exact
> > > > __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 exact
> > > > __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 exact
> > > >
> > > > __floatundisf (__aeabi_ul2f) 14+__shared_float 40 – 81 8 <= 0.5 ulp
> > > > __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40 – 237 8 <= 0.5 ulp
> > > > __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 <= 0.5 ulp
> > > > __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 <= 0.5 ulp
> > > > __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 <= 0.5 ulp
> > > >
> > > > __fixsfdi (__aeabi_f2lz) 74 27 – 33 0 exact
> > > > __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 exact
> > > > __fixsfsi (__aeabi_f2iz) 52 19 0 exact
> > > > __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 exact
> > > > __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 exact
> > > >
> > > > __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 exact
> > > > __aeabi_d2f 56+__shared_float 54 – 58 8 <= 0.5 ulp
> > > > __aeabi_h2f 34+__shared_float 34 8 exact
> > > > __aeabi_f2h 84 23 – 34 0 <= 0.5 ulp
> > > >
> > > > Copyright assignment is on file with the FSF.
> > > >
> > > > I've built the gcc-arm-none-eabi cross-compiler using the 20201108
> > > > snapshot of GCC plus this patch, and successfully compiled a test
> > > > program:
> > > >
> > > > extern int main (void)
> > > > {
> > > > volatile int x = 1;
> > > > volatile unsigned long long int y = 10;
> > > > volatile long long int z = x / y; // 64-bit division
> > > >
> > > > volatile float a = x; // 32-bit casting
> > > > volatile float b = y; // 64 bit casting
> > > > volatile float c = z / b; // float division
> > > > volatile float d = a + c; // float addition
> > > > volatile float e = c * b; // float multiplication
> > > > volatile float f = d - e - c; // float subtraction
> > > >
> > > > if (f != c) // float comparison
> > > > y -= (long long int)d; // float casting
> > > > }
> > > >
> > > > As one point of comparison, the test program links to 876 bytes of
> > > > libgcc code from the patched toolchain, vs 10276 bytes from the
> > > > latest released gcc-arm-none-eabi-9-2020-q2 toolchain. That's a
> > > > 90% size reduction.
> > >
> > > This looks awesome!
> > >
> > > >
> > > > I have extensive test vectors, and have passed these tests on an
> > > > STM32F051. These vectors were derived from UCB [1], Testfloat [2],
> > > > and IEEECC754 [3] sources, plus some of my own creation.
> > > > Unfortunately, I'm not sure how "make check" should work for a cross
> > > > compiler run time library.
> > > >
> > > > Although I believe this patch can be incorporated as-is, there are
> > > > at least two points that might bear discussion:
> > > >
> > > > * I'm not sure where or how they would be integrated, but I would be
> > > > happy to provide sources for my test vectors.
> > > >
> > > > * The library is currently built for the ARM v6m architecture only.
> > > > It is likely that some of the other Cortex variants would benefit
> > > > from these routines. However, I would need some guidance on this
> > > > to proceed without introducing regressions. I do not currently
> > > > have a test strategy for architectures beyond Cortex M0, and I
> > > > have NOT profiled the existing thumb-2 implementations (ieee754-
> > > > sf.S) for comparison.
> > >
> > > I tried your patch, and I see many regressions in the GCC testsuite
> > > because many tests fail to link with errors like:
> > > ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function
> > > `__clzdi2':
> > > /libgcc/config/arm/cm0/clz2.S:39: multiple definition of
> > > `__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config/arm/cm0/clz2.S:39:
> > > first defined here
> > >
> > > This happens with a toolchain configured with --target arm-none-eabi,
> > > default cpu/fpu/mode,
> > > --enable-multilib --with-multilib-list=rmprofile and running the tests with
> > > -mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > >
> > > Does it work for you?
> >
> > Thanks for the feedback.
> >
> > I'm afraid I'm quite ignorant as to the gcc test suite
> > infrastructure, so I don't know how to use the options you've shared
> > above. I'm cross- compiling the Windows toolchain on Ubuntu. Would
> > you mind sharing a full command line you would use for testing? The
> > toolchain is built with the default options, which includes "--
> > target arm-none-eabi".
> >
>
> Why put Windows in the picture? This seems unnecessarily
> complicated... I suggest you build your cross-toolchain on x86_64
> ubuntu and run it on x86_64 ubuntu (of course targetting arm)
Mostly because I had not previously committed the time to understand the
GCC regression test environment. My company and personal computers both
run Windows. I created an Ubuntu virtual machine for this project, and
I'd been trying to get by with the build scripts provided by the ARM
toolchain. Clearly that was insufficient.
> The above options where GCC configure options, except for the last one
> which I used when running the tests.
>
> There is some documentation about how to run the GCC testsuite there:
> https://gcc.gnu.org/install/test.html
Thanks. I was able to take this document, plus some additional pages
about constructing a combined tree with newlib, and put together a
working regression test. GDB didn't want to build cleanly at first, so
eventually I gave up and disabled that part.
> Basically 'make check' should mostly work except for execution tests
> for which you'll need to teach DejaGnu how to run the generated
> programs on a real board or on a simulator.
>
> I didn't analyze your patch, I just submitted it to my validation
> system:
> https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20201130.patch2/report-build-info.html
> - the red "regressed" items indicate regressions in the testsuite. You
> can click on "log" to download the corresponding gcc.log
> - the dark-red "build broken" items indicate that the toolchain build
> failed
> - the orange "interrupted" items indicate an infrastructure problem,
> so you can ignore such cases
> - similarly the dark red "ref build failed" indicate that the
> reference build failed for some infrastructure reason
>
> for the arm-none-eabi target, several toolchain versions fail to
> build, some succeed. This is because I use different multilib
> configuration flags, it looks like the ones involving --with-
> multilib=rmprofile are broken with your patch.
>
> These ones should be reasonably easy to fix: no 'make check' involved.
>
> For instance if you configure GCC with:
> --target arm-none-eabi --enable-multilib --with-multilib-list=rmprofile
> you should see the build failure.
So far, I have not found a cause for the build failures you are seeing.
The ARM toolchain script I was using before did build with the
'rmprofile' option. With my current configure options, gcc builds
'rmprofile', 'aprofile', and even 'armeb'. I did find a number of link
issues with 'make check' due to incorrect usage of the 'L_' defines in
LIB1ASMFUNCS. These are fixed in the new version attached.
Returning to the build failures you logged, I do consistently see this
message in the logs [1]: "fatal error: cm0/fplib.h: No such file or
directory". I recognize the file, since it's one of the new files in
my patch (the full sub-directory is libgcc/config/arm/cm0/fplib.h).
Do I have to format patches in some different way so that new files
get created?
Regression testing also showed that the previous patch was failing the
"arm/divzero" test because I wasn't providing the same arguments to
div0() as the existing implementation. Having made that change, I think
the patch is clean. (I don't think there is a strict specification for
div0(), and the changes add a non-trivial number of instructions, but
I'll hold that discussion for another time).
Do you have time to re-check this patch on your build system?
Thanks,
Daniel
[1] Line 36054: <https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20201130.patch2/arm-none-eabi/build-rh70-arm-none-eabi-default-default-default-mthumb.-mcpu=cortex-m0.-mfloat-abi=soft.-march=armv6s-m.log.xz>
>
> HTH
>
> Christophe
>
> > I did see similar errors once before. It turned out then that I omitted
> > one of the ".S" files from the build. My interpretation at that point
> > was that gcc had been searching multiple versions of "libgcc.a" and
> > unable to merge the symbols. In hindsight, that was a really bad
> > interpretation. I was able to reproduce the error above by simply
> > adding a line like "volatile double m = 1.0; m += 2;".
> >
> > After reviewing the existing asm implementations more closely, I
> > believe that I have not been using the function guard macros (L_arm_*)
> > as intended. The make script appears to compile "lib1funcs.S" dozens of
> > times -- once for each function guard macro listed in LIB1ASMFUNCS --
> > with the intent of generating a separate ".o" file for each function.
> > Because they were unguarded, my new library functions were duplicated
> > into every ".o" file, which caused the link errors you saw.
> >
> > I have attached an updated patch that implements the macros.
> >
> > However, I'm not sure whether my usage is really consistent with the
> > spirit of the make script. If there's a README or HOWTO, I haven't
> > found it yet. The following points summarize my concerns as I was
> > making these updates:
> >
> > 1. While some of the new functions (e.g. __cmpsf2) are standalone,
> > there is a common core in the new library shared by several related
> > functions. That keeps the library small. For now, I've elected to
> > group all of these related functions together in a single object
> > file "_arm_addsubsf3.o" to protect the short branches (+/-2KB)
> > within this unit. Notice that I manually assigned section names in
> > the code, so there still shouldn't be any unnecessary code linked in
> > the final build. Does the multiple-".o" files strategy predate "-gc-
> > sections", or should I be trying harder to break these related
> > functions into separate compilation units?
> >
> > 2. I introduced a few new macro keywords for functions/groups (e.g.
> > "_arm_f2h" and '_arm_f2h'. My assumption is that some empty ".o"
> > files compiled for the non-v6m architectures will be benign.
> >
> > 3. The "t-elf" make script implies that __mulsf3() should not be
> > compiled in thumb mode (it's inside a conditional), but this is one
> > of the new functions. Moot for now, since my __mulsf3() is grouped
> > with the common core functions (see point 1) and is thus currently
> > guarded by the "_arm_addsubsf3.o" macro.
> >
> > 4. The advice (in "ieee754-sf.S") regarding WEAK symbols does not seem
> > to be working. I have defined __clzsi2() as a weak symbol to be
> > overridden by the combined function __clzdi2(). I can also see
> > (with "nm") that "clzsi2.o" is compiled before "clzdi2.o" in
> > "libgcc.a". Yet, the full __clzdi2() function (8 bytes larger) is
> > always linked, even in programs that only call __clzsi2(), A minor
> > annoyance at this point.
> >
> > 5. Is there a permutation of the makefile that compiles libgcc with
> > __OPTIMIZE_SIZE__? There are a few sections in the patch that can
> > optimize either way, yet the final product only seems to have the
> > "fast" code. At this optimization level, the sample program above
> > pulls in 1012 bytes of library code instead of 836. Perhaps this is
> > meant to be controlled by the toolchain configuration step, but it
> > doesn't follow that the optimization for the cross-compiler would
> > automatically translate to the target runtime libraries.
> >
> > Thanks again,
> > Daniel
> >
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > >
> > > > I'm naturally hoping for some action on this patch before the Nov 16th deadline for GCC-11 stage 3. Please review and advise.
> > > >
> > > > Thanks,
> > > > Daniel Engel
> > > >
> > > > [1] http://www.netlib.org/fp/ucbtest.tgz
> > > > [2] http://www.jhauser.us/arithmetic/TestFloat.html
> > > > [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html
> > >
>
[-- Attachment #2: cortex-m0-fplib-20210105.patch --]
[-- Type: application/octet-stream, Size: 195994 bytes --]
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/bpabi.S gcc-11-20201220/libgcc/config/arm/bpabi.S
--- gcc-11-20201220-clean/libgcc/config/arm/bpabi.S 2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/bpabi.S 2021-01-06 02:45:47.416262493 -0800
@@ -34,48 +34,6 @@
.eabi_attribute 25, 1
#endif /* __ARM_EABI__ */
-#ifdef L_aeabi_lcmp
-
-ARM_FUNC_START aeabi_lcmp
- cmp xxh, yyh
- do_it lt
- movlt r0, #-1
- do_it gt
- movgt r0, #1
- do_it ne
- RETc(ne)
- subs r0, xxl, yyl
- do_it lo
- movlo r0, #-1
- do_it hi
- movhi r0, #1
- RET
- FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-
-#ifdef L_aeabi_ulcmp
-
-ARM_FUNC_START aeabi_ulcmp
- cmp xxh, yyh
- do_it lo
- movlo r0, #-1
- do_it hi
- movhi r0, #1
- do_it ne
- RETc(ne)
- cmp xxl, yyl
- do_it lo
- movlo r0, #-1
- do_it hi
- movhi r0, #1
- do_it eq
- moveq r0, #0
- RET
- FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
.macro test_div_by_zero signed
/* Tail-call to divide-by-zero handlers which may be overridden by the user,
so unwinding works properly. */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/bpabi-v6m.S gcc-11-20201220/libgcc/config/arm/bpabi-v6m.S
--- gcc-11-20201220-clean/libgcc/config/arm/bpabi-v6m.S 2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/bpabi-v6m.S 2021-01-06 02:45:47.428262284 -0800
@@ -33,212 +33,6 @@
.eabi_attribute 25, 1
#endif /* __ARM_EABI__ */
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
- cmp xxh, yyh
- beq 1f
- bgt 2f
- movs r0, #1
- negs r0, r0
- RET
-2:
- movs r0, #1
- RET
-1:
- subs r0, xxl, yyl
- beq 1f
- bhi 2f
- movs r0, #1
- negs r0, r0
- RET
-2:
- movs r0, #1
-1:
- RET
- FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
- cmp xxh, yyh
- bne 1f
- subs r0, xxl, yyl
- beq 2f
-1:
- bcs 1f
- movs r0, #1
- negs r0, r0
- RET
-1:
- movs r0, #1
-2:
- RET
- FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
-.macro test_div_by_zero signed
- cmp yyh, #0
- bne 7f
- cmp yyl, #0
- bne 7f
- cmp xxh, #0
- .ifc \signed, unsigned
- bne 2f
- cmp xxl, #0
-2:
- beq 3f
- movs xxh, #0
- mvns xxh, xxh @ 0xffffffff
- movs xxl, xxh
-3:
- .else
- blt 6f
- bgt 4f
- cmp xxl, #0
- beq 5f
-4: movs xxl, #0
- mvns xxl, xxl @ 0xffffffff
- lsrs xxh, xxl, #1 @ 0x7fffffff
- b 5f
-6: movs xxh, #0x80
- lsls xxh, xxh, #24 @ 0x80000000
- movs xxl, #0
-5:
- .endif
- @ tailcalls are tricky on v6-m.
- push {r0, r1, r2}
- ldr r0, 1f
- adr r1, 1f
- adds r0, r1
- str r0, [sp, #8]
- @ We know we are not on armv4t, so pop pc is safe.
- pop {r0, r1, pc}
- .align 2
-1:
- .word __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
- test_div_by_zero signed
-
- push {r0, r1}
- mov r0, sp
- push {r0, lr}
- ldr r0, [sp, #8]
- bl SYM(__gnu_ldivmod_helper)
- ldr r3, [sp, #4]
- mov lr, r3
- add sp, sp, #8
- pop {r2, r3}
- RET
- FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
- test_div_by_zero unsigned
-
- push {r0, r1}
- mov r0, sp
- push {r0, lr}
- ldr r0, [sp, #8]
- bl SYM(__udivmoddi4)
- ldr r3, [sp, #4]
- mov lr, r3
- add sp, sp, #8
- pop {r2, r3}
- RET
- FUNC_END aeabi_uldivmod
-
-#endif /* L_aeabi_uldivmod */
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
- push {r4, lr}
- movs r4, #1
- lsls r4, #31
- eors r0, r0, r4
- bl __aeabi_fadd
- pop {r4, pc}
-
- FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
- mov ip, r0
- movs r0, r1
- mov r1, ip
- b 6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
- @ The status-returning routines are required to preserve all
- @ registers except ip, lr, and cpsr.
-6: push {r0, r1, r2, r3, r4, lr}
- bl __lesf2
- @ Set the Z flag correctly, and the C flag unconditionally.
- cmp r0, #0
- @ Clear the C flag if the return value was -1, indicating
- @ that the first operand was smaller than the second.
- bmi 1f
- movs r1, #0
- cmn r0, r1
-1:
- pop {r0, r1, r2, r3, r4, pc}
-
- FUNC_END aeabi_cfcmple
- FUNC_END aeabi_cfcmpeq
- FUNC_END aeabi_cfrcmple
-
-FUNC_START aeabi_fcmpeq
-
- push {r4, lr}
- bl __eqsf2
- negs r0, r0
- adds r0, r0, #1
- pop {r4, pc}
-
- FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START aeabi_fcmp\cond
-
- push {r4, lr}
- bl __\helper\mode
- cmp r0, #0
- b\cond 1f
- movs r0, #0
- pop {r4, pc}
-1:
- movs r0, #1
- pop {r4, pc}
-
- FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
#ifdef L_arm_addsubdf3
FUNC_START aeabi_drsub
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/clz2.S gcc-11-20201220/libgcc/config/arm/cm0/clz2.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/clz2.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/clz2.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,324 @@
+/* clz2.S: Cortex M0 optimized 'clz' functions
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.clz2.clzdi2,"x"
+CM0_FUNC_START clzdi2
+ CFI_START_FUNCTION
+
+ // Moved here from lib1funcs.S
+ cmp xxh, #0
+ do_it eq, et
+ clzeq r0, xxl
+ clzne r0, xxh
+ addeq r0, #32
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
+
+
+#ifdef L_clzsi2
+
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.clz2.clzsi2,"x"
+CM0_FUNC_START clzsi2
+ CFI_START_FUNCTION
+
+ // Moved here from lib1funcs.S
+ clz r0, r0
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+
+#endif /* L_clzsi2 */
+
+#else /* !__ARM_FEATURE_CLZ */
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clzdi2,"x"
+CM0_FUNC_START clzdi2
+ CFI_START_FUNCTION
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // Check if the upper word is zero.
+ cmp r0, #0
+
+ // The upper word is non-zero, so calculate __clzsi2(upper).
+ bne SYM(__clzsi2)
+
+ // The upper word is zero, so calculate 32 + __clzsi2(lower).
+ movs r2, #64
+ movs r0, r1
+ b SYM(__internal_clzsi2)
+
+ #else /* !__ARMEB__ */
+ // Assume all the bits in the argument are zero.
+ movs r2, #64
+
+ // Check if the upper word is zero.
+ cmp r1, #0
+
+ // The upper word is zero, so calculate 32 + __clzsi2(lower).
+ beq SYM(__internal_clzsi2)
+
+ // The upper word is non-zero, so set up __clzsi2(upper).
+ // Then fall through.
+ movs r0, r1
+
+ #endif /* !__ARMEB__ */
+
+#endif /* L_clzdi2 */
+
+
+// The bitwise implementation of __clzdi2() tightly couples with __clzsi2(),
+// such that instructions must appear consecutively in the same memory
+// section for proper flow control. However, this construction inhibits
+// the ability to discard __clzdi2() when only using __clzsi2().
+// Therefore, this block configures __clzsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __clzdi2(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols when required.
+// '_clzsi2' should appear before '_clzdi2' in LIB1ASMFUNCS.
+#if defined(L_clzsi2) || defined(L_clzdi2)
+
+#ifdef L_clzsi2
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clzsi2,"x"
+CM0_WEAK_START clzsi2
+ CFI_START_FUNCTION
+
+#else /* L_clzdi2 */
+CM0_FUNC_START clzsi2
+
+#endif
+
+ // Assume all the bits in the argument are zero
+ movs r2, #32
+
+#ifdef L_clzsi2
+ CM0_WEAK_START internal_clzsi2
+#else /* L_clzdi2 */
+ CM0_FUNC_START internal_clzsi2
+#endif
+
+ // Size optimized: 22 bytes, 51 cycles
+ // Speed optimized: 50 bytes, 20 cycles
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+ // Binary search starts at half the word width.
+ movs r3, #16
+
+ LSYM(__clz_loop):
+ // Test the upper 'n' bits of the operand for ZERO.
+ movs r1, r0
+ lsrs r1, r3
+ beq LSYM(__clz_skip)
+
+ // When the test fails, discard the lower bits of the register,
+ // and deduct the count of discarded bits from the result.
+ movs r0, r1
+ subs r2, r3
+
+ LSYM(__clz_skip):
+ // Decrease the shift distance for the next test.
+ lsrs r3, #1
+ bne LSYM(__clz_loop)
+
+ #else /* __OPTIMIZE_SIZE__ */
+
+ // Unrolled binary search.
+ lsrs r1, r0, #16
+ beq LSYM(__clz8)
+ movs r0, r1
+ subs r2, #16
+
+ LSYM(__clz8):
+ lsrs r1, r0, #8
+ beq LSYM(__clz4)
+ movs r0, r1
+ subs r2, #8
+
+ LSYM(__clz4):
+ lsrs r1, r0, #4
+ beq LSYM(__clz2)
+ movs r0, r1
+ subs r2, #4
+
+ LSYM(__clz2):
+ // Load the remainder by index
+ adr r1, LSYM(__clz_remainder)
+ ldrb r0, [r1, r0]
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+
+ // Account for the remainder.
+ subs r0, r2, r0
+ RET
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ .align 2
+ LSYM(__clz_remainder):
+ .byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clzsi2
+
+#ifdef L_clzdi2
+CM0_FUNC_END clzdi2
+#endif
+
+#endif /* L_clzsi2 || L_clzdi2 */
+
+#endif /* !__ARM_FEATURE_CLZ */
+
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clrsbdi2,"x"
+CM0_FUNC_START clrsbdi2
+ CFI_START_FUNCTION
+
+ #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+ // Invert negative signs to keep counting zeros.
+ asrs r3, xxh, #31
+ eors xxl, r3
+ eors xxh, r3
+
+ // Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+ // Also, the trailing 'subs', since the last bit is not redundant.
+ do_it eq, et
+ clzeq r0, xxl
+ clzne r0, xxh
+ addeq r0, #32
+ subs r0, #1
+ RET
+
+ #else /* !__ARM_FEATURE_CLZ */
+ // Result if all the bits in the argument are zero.
+ // Set it here to keep the flags clean after 'eors' below.
+ movs r2, #31
+
+ // Invert negative signs to keep counting zeros.
+ asrs r3, xxh, #31
+ eors xxh, r3
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+ bne SYM(__internal_clzsi2)
+
+ // The upper word is zero, prepare the lower word.
+ movs r0, r1
+ eors r0, r3
+
+ #else /* !__ARMEB__ */
+ // Save the lower word temporarily.
+ // This somewhat awkward construction adds one cycle when the
+ // branch is not taken, but prevents a double-branch.
+ eors r3, r0
+
+ // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+ movs r0, r1
+ bne SYM(__internal_clzsi2)
+
+ // Restore the lower word.
+ movs r0, r3
+
+ #endif /* !__ARMEB__ */
+
+ // The upper word is zero, return '31 + __clzsi2(lower)'.
+ adds r2, #32
+ b SYM(__internal_clzsi2)
+
+ #endif /* !__ARM_FEATURE_CLZ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.clz2.clrsbsi2,"x"
+CM0_FUNC_START clrsbsi2
+ CFI_START_FUNCTION
+
+ // Invert negative signs to keep counting zeros.
+ asrs r2, r0, #31
+ eors r0, r2
+
+ #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+ // Count.
+ clz r0, r0
+
+ // The result for a positive value will always be >= 1.
+ // By definition, the last bit is not redundant.
+ subs r0, #1
+ RET
+
+ #else /* !__ARM_FEATURE_CLZ */
+ // Result if all the bits in the argument are zero.
+ // By definition, the last bit is not redundant.
+ movs r2, #31
+ b SYM(__internal_clzsi2)
+
+ #endif /* !__ARM_FEATURE_CLZ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ctz2.S gcc-11-20201220/libgcc/config/arm/cm0/ctz2.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ctz2.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ctz2.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,285 @@
+/* ctz2.S: Cortex M0 optimized 'ctz' functions
+
+ Copyright (C) 2020-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+// When the hardware 'clz' function is available, an efficient version
+// of __ctzsi2(x) can be created by calculating '31 - __clzsi2(lsb(x))',
+// where lsb(x) is 'x' with only the least-significant '1' bit set.
+// The following offset applies to all of the functions in this file.
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+ #define CTZ_RESULT_OFFSET 1
+#else
+ #define CTZ_RESULT_OFFSET 0
+#endif
+
+
+#ifdef L_ctzdi2
+
+// int __ctzdi2(long long)
+// Counts trailing zeros in a 64 bit double word.
+// Expects the argument in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.ctz2.ctzdi2,"x"
+CM0_FUNC_START ctzdi2
+ CFI_START_FUNCTION
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // Assume all the bits in the argument are zero.
+ movs r2, #(64 - CTZ_RESULT_OFFSET)
+
+ // Check if the lower word is zero.
+ cmp r1, #0
+
+ // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+ beq SYM(__internal_ctzsi2)
+
+ // The lower word is non-zero, so set up __ctzsi2(lower).
+ // Then fall through.
+ movs r0, r1
+
+ #else /* !__ARMEB__ */
+ // Check if the lower word is zero.
+ cmp r0, #0
+
+ // If the lower word is non-zero, result is just __ctzsi2(lower).
+ bne SYM(__ctzsi2)
+
+ // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+ movs r2, #(64 - CTZ_RESULT_OFFSET)
+ movs r0, r1
+ b SYM(__internal_ctzsi2)
+
+ #endif /* !__ARMEB__ */
+
+#endif /* L_ctzdi2 */
+
+
+// The bitwise implementation of __clzdi2() tightly couples with __ctzsi2(),
+// such that instructions must appear consecutively in the same memory
+// section for proper flow control. However, this construction inhibits
+// the ability to discard __clzdi2() when only using __ctzsi2().
+// Therefore, this block configures __ctzsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __clzdi2(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols when required.
+// '_ctzsi2' should appear before '_clzdi2' in LIB1ASMFUNCS.
+#if defined(L_ctzsi2) || defined(L_ctzdi2)
+
+#ifdef L_ctzsi2
+// int __ctzsi2(int)
+// Counts trailing zeros in a 32 bit word.
+// Expects the argument in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+.section .text.sorted.libgcc.ctz2.ctzsi2,"x"
+CM0_WEAK_START ctzsi2
+ CFI_START_FUNCTION
+
+#else /* L_ctzdi2 */
+CM0_FUNC_START ctzsi2
+
+#endif
+
+ // Assume all the bits in the argument are zero
+ movs r2, #(32 - CTZ_RESULT_OFFSET)
+
+#ifdef L_ctzsi2
+ CM0_WEAK_START internal_ctzsi2
+#else /* L_ctzdi2 */
+ CM0_FUNC_START internal_ctzsi2
+#endif
+
+ #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+ // Find the least-significant '1' bit of the argument.
+ rsbs r1, r0, #0
+ ands r1, r0
+
+ // Maintain result compatibility with the software implementation.
+ // Technically, __ctzsi2(0) is undefined, but 32 seems better than -1.
+ // (or possibly 31 if this is an intermediate result for __ctzdi2(0)).
+ // The carry flag from 'rsbs' gives '-1' iff the argument was 'zero'.
+ // (NOTE: 'ands' with 0 shift bits does not change the carry flag.)
+ // After the jump, the final result will be '31 - (-1)'.
+ sbcs r0, r0
+ beq LSYM(__ctz_zero)
+
+ // Gives the number of '0' bits left of the least-significant '1'.
+ clz r0, r1
+
+ #elif defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Size optimized: 24 bytes, 52 cycles
+ // Speed optimized: 52 bytes, 21 cycles
+
+ // Binary search starts at half the word width.
+ movs r3, #16
+
+ LSYM(__ctz_loop):
+ // Test the upper 'n' bits of the operand for ZERO.
+ movs r1, r0
+
+ lsls r1, r3
+ beq LSYM(__ctz_skip)
+
+ // When the test fails, discard the lower bits of the register,
+ // and deduct the count of discarded bits from the result.
+ movs r0, r1
+ subs r2, r3
+
+ LSYM(__ctz_skip):
+ // Decrease the shift distance for the next test.
+ lsrs r3, #1
+ bne LSYM(__ctz_loop)
+
+ // Prepare the remainder.
+ lsrs r0, #31
+
+ #else /* !__OPTIMIZE_SIZE__ */
+
+ // Unrolled binary search.
+ lsls r1, r0, #16
+ beq LSYM(__ctz8)
+ movs r0, r1
+ subs r2, #16
+
+ LSYM(__ctz8):
+ lsls r1, r0, #8
+ beq LSYM(__ctz4)
+ movs r0, r1
+ subs r2, #8
+
+ LSYM(__ctz4):
+ lsls r1, r0, #4
+ beq LSYM(__ctz2)
+ movs r0, r1
+ subs r2, #4
+
+ LSYM(__ctz2):
+ // Load the remainder by index
+ lsrs r0, #28
+ adr r3, LSYM(__ctz_remainder)
+ ldrb r0, [r3, r0]
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+
+ LSYM(__ctz_zero):
+ // Apply the remainder.
+ subs r0, r2, r0
+ RET
+
+ #if (!defined(__ARM_FEATURE_CLZ) || !__ARM_FEATURE_CLZ) && \
+ (!defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__)
+ .align 2
+ LSYM(__ctz_remainder):
+ .byte 0,4,3,4,2,4,3,4,1,4,3,4,2,4,3,4
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ctzsi2
+
+#ifdef L_ctzdi2
+CM0_FUNC_END ctzdi2
+#endif
+
+#endif /* L_ctzsi2 || L_ctzdi2 */
+
+
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0,
+// or zero if $r1:r0 is zero. The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+.section .text.sorted.libgcc.ctz2.ffsdi2,"x"
+CM0_FUNC_START ffsdi2
+ CFI_START_FUNCTION
+
+ // Simplify branching by assuming a non-zero lower word.
+ // For all such, ffssi2(x) == ctzsi2(x) + 1.
+ movs r2, #(33 - CTZ_RESULT_OFFSET)
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // HACK: Save the upper word in a scratch register.
+ movs r3, r0
+
+ // Test the lower word.
+ movs r0, r1
+ bne SYM(__internal_ctzsi2)
+
+ // Test the upper word.
+ movs r2, #(65 - CTZ_RESULT_OFFSET)
+ movs r0, r3
+ bne SYM(__internal_ctzsi2)
+
+ #else /* !__ARMEB__ */
+ // Test the lower word.
+ cmp r0, #0
+ bne SYM(__internal_ctzsi2)
+
+ // Test the upper word.
+ movs r2, #(65 - CTZ_RESULT_OFFSET)
+ movs r0, r1
+ bne SYM(__internal_ctzsi2)
+
+ #endif /* !__ARMEB__ */
+
+ // Upper and lower words are both zero.
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ffsdi2
+
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2
+
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0,
+// or zero if $r0 is zero. The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+.section .text.sorted.libgcc.ctz2.ffssi2,"x"
+CM0_FUNC_START ffssi2
+ CFI_START_FUNCTION
+
+ // Simplify branching by assuming a non-zero argument.
+ // For all such, ffssi2(x) == ctzsi2(x) + 1.
+ movs r2, #(33 - CTZ_RESULT_OFFSET)
+
+ // Test for zero, return unmodified.
+ cmp r0, #0
+ bne SYM(__internal_ctzsi2)
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fadd.S gcc-11-20201220/libgcc/config/arm/cm0/fadd.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fadd.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fadd.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,324 @@
+/* fadd.S: Cortex M0 optimized 32-bit float addition
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+.section .text.sorted.libgcc.fpcore.b.frsub,"x"
+CM0_FUNC_START aeabi_frsub
+ CFI_START_FUNCTION
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Check if $r0 is NAN before modifying.
+ lsls r2, r0, #1
+ movs r3, #255
+ lsls r3, #24
+
+ // Let fadd() find the NAN in the normal course of operation,
+ // moving it to $r0 and checking the quiet/signaling bit.
+ cmp r2, r3
+ bhi SYM(__aeabi_fadd)
+ #endif
+
+ // Flip sign and run through fadd().
+ movs r2, #1
+ lsls r2, #31
+ adds r0, r2
+ b SYM(__aeabi_fadd)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
+#ifdef L_arm_addsubsf3
+
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+.section .text.sorted.libgcc.fpcore.c.faddsub,"x"
+CM0_FUNC_START aeabi_fsub
+CM0_FUNC_ALIAS subsf3 aeabi_fsub
+ CFI_START_FUNCTION
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Check if $r1 is NAN before modifying.
+ lsls r2, r1, #1
+ movs r3, #255
+ lsls r3, #24
+
+ // Let fadd() find the NAN in the normal course of operation,
+ // moving it to $r0 and checking the quiet/signaling bit.
+ cmp r2, r3
+ bhi SYM(__aeabi_fadd)
+ #endif
+
+ // Flip sign and fall into fadd().
+ movs r2, #1
+ lsls r2, #31
+ adds r1, r2
+
+#endif /* L_arm_addsubsf3 */
+
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+// instructions must appear consecutively in the same memory section.
+// However, this construction inhibits the ability to discard __subsf3()
+// when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __subsf3(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3)
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+.section .text.sorted.libgcc.fpcore.c.fadd,"x"
+CM0_WEAK_START aeabi_fadd
+CM0_WEAK_ALIAS addsf3 aeabi_fadd
+ CFI_START_FUNCTION
+
+#else /* L_arm_addsubsf3 */
+CM0_FUNC_START aeabi_fadd
+CM0_FUNC_ALIAS addsf3 aeabi_fadd
+
+#endif
+
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Drop the sign bit to compare absolute value.
+ lsls r2, r0, #1
+ lsls r3, r1, #1
+
+ // Save the logical difference of original values.
+ // This actually makes the following swap slightly faster.
+ eors r1, r0
+
+ // Compare exponents+mantissa.
+ // MAYBE: Speedup for equal values? This would have to separately
+ // check for NAN/INF and then either:
+ // * Increase the exponent by '1' (for multiply by 2), or
+ // * Return +0
+ cmp r2, r3
+ bhs LSYM(__fadd_ordered)
+
+ // Reorder operands so the larger absolute value is in r2,
+ // the corresponding original operand is in $r0,
+ // and the smaller absolute value is in $r3.
+ movs r3, r2
+ eors r0, r1
+ lsls r2, r0, #1
+
+ LSYM(__fadd_ordered):
+ // Extract the exponent of the larger operand.
+ // If INF/NAN, then it becomes an automatic result.
+ lsrs r2, #24
+ cmp r2, #255
+ beq LSYM(__fadd_special)
+
+ // Save the sign of the result.
+ lsrs rT, r0, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // If the original value of $r1 was to +/-0,
+ // $r0 becomes the automatic result.
+ // Because $r0 is known to be a finite value, return directly.
+ // It's actually important that +/-0 not go through the normal
+ // process, to keep "-0 +/- 0" from being turned into +0.
+ cmp r3, #0
+ beq LSYM(__fadd_zero)
+
+ // Extract the second exponent.
+ lsrs r3, #24
+
+ // Calculate the difference of exponents (always positive).
+ subs r3, r2, r3
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // If the smaller operand is more than 25 bits less significant
+ // than the larger, the larger operand is an automatic result.
+ // The smaller operand can't affect the result, even after rounding.
+ cmp r3, #25
+ bhi LSYM(__fadd_return)
+ #endif
+
+ // Isolate both mantissas, recovering the smaller.
+ lsls rT, r0, #9
+ lsls r0, r1, #9
+ eors r0, rT
+
+ // If the larger operand is normal, restore the implicit '1'.
+ // If subnormal, the second operand will also be subnormal.
+ cmp r2, #0
+ beq LSYM(__fadd_normal)
+ adds rT, #1
+ rors rT, rT
+
+ // If the smaller operand is also normal, restore the implicit '1'.
+ // If subnormal, the smaller operand effectively remains multiplied
+ // by 2 w.r.t the first. This compensates for subnormal exponents,
+ // which are technically still -126, not -127.
+ cmp r2, r3
+ beq LSYM(__fadd_normal)
+ adds r0, #1
+ rors r0, r0
+
+ LSYM(__fadd_normal):
+ // Provide a spare bit for overflow.
+ // Normal values will be aligned in bits [30:7]
+ // Subnormal values will be aligned in bits [30:8]
+ lsrs rT, #1
+ lsrs r0, #1
+
+ // If signs weren't matched, negate the smaller operand (branchless).
+ asrs r1, #31
+ eors r0, r1
+ subs r0, r1
+
+ // Keep a copy of the small mantissa for the remainder.
+ movs r1, r0
+
+ // Align the small mantissa for addition.
+ asrs r1, r3
+
+ // Isolate the remainder.
+ // NOTE: Given the various cases above, the remainder will only
+ // be used as a boolean for rounding ties to even. It is not
+ // necessary to negate the remainder for subtraction operations.
+ rsbs r3, #0
+ adds r3, #32
+ lsls r0, r3
+
+ // Because operands are ordered, the result will never be negative.
+ // If the result of subtraction is 0, the overall result must be +0.
+ // If the overall result in $r1 is 0, then the remainder in $r0
+ // must also be 0, so no register copy is necessary on return.
+ adds r1, rT
+ beq LSYM(__fadd_return)
+
+ // The large operand was aligned in bits [29:7]...
+ // If the larger operand was normal, the implicit '1' went in bit [30].
+ //
+ // After addition, the MSB of the result may be in bit:
+ // 31, if the result overflowed.
+ // 30, the usual case.
+ // 29, if there was a subtraction of operands with exponents
+ // differing by more than 1.
+ // < 28, if there was a subtraction of operands with exponents +/-1,
+ // < 28, if both operands were subnormal.
+
+ // In the last case (both subnormal), the alignment shift will be 8,
+ // the exponent will be 0, and no rounding is necessary.
+ cmp r2, #0
+ bne SYM(__fp_assemble)
+
+ // Subnormal overflow automatically forms the correct exponent.
+ lsrs r0, r1, #8
+ add r0, ip
+
+ LSYM(__fadd_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ LSYM(__fadd_special):
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // If $r1 is (also) NAN, force it in place of $r0.
+ // As the smaller NAN, it is more likely to be signaling.
+ movs rT, #255
+ lsls rT, #24
+ cmp r3, rT
+ bls LSYM(__fadd_ordered2)
+
+ eors r0, r1
+ #endif
+
+ LSYM(__fadd_ordered2):
+ // There are several possible cases to consider here:
+ // 1. Any NAN/NAN combination
+ // 2. Any NAN/INF combination
+ // 3. Any NAN/value combination
+ // 4. INF/INF with matching signs
+ // 5. INF/INF with mismatched signs.
+ // 6. Any INF/value combination.
+ // In all cases but the case 5, it is safe to return $r0.
+ // In the special case, a new NAN must be constructed.
+ // First, check the mantissa to see if $r0 is NAN.
+ lsls r2, r0, #9
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ bne SYM(__fp_check_nan)
+ #else
+ bne LSYM(__fadd_return)
+ #endif
+
+ LSYM(__fadd_zero):
+ // Next, check for an INF/value combination.
+ lsls r2, r1, #1
+ bne LSYM(__fadd_return)
+
+ // Finally, check for matching sign on INF/INF.
+ // Also accepts matching signs when +/-0 are added.
+ bcc LSYM(__fadd_return)
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(SUBTRACTED_INFINITY)
+ #endif
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ // Restore original operands.
+ eors r1, r0
+ #endif
+
+ // Identify mismatched 0.
+ lsls r2, r0, #1
+ bne SYM(__fp_exception)
+
+ // Force mismatched 0 to +0.
+ eors r0, r0
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END addsf3
+CM0_FUNC_END aeabi_fadd
+
+#ifdef L_arm_addsubsf3
+CM0_FUNC_END subsf3
+CM0_FUNC_END aeabi_fsub
+#endif
+
+#endif /* L_arm_addsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fcmp.S gcc-11-20201220/libgcc/config/arm/cm0/fcmp.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fcmp.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fcmp.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,634 @@
+/* fcmp.S: Cortex M0 optimized 32-bit float comparison
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_cmpsf2
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * +1 if ($r0 > $r1), or either argument is NAN
+// * 0 if ($r0 == $r1)
+// * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+.section .text.sorted.libgcc.fcmp.cmpsf2,"x"
+CM0_FUNC_START cmpsf2
+CM0_FUNC_ALIAS lesf2 cmpsf2
+CM0_FUNC_ALIAS ltsf2 cmpsf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+CM0_FUNC_START internal_cmpsf2
+
+ // When operand signs are considered, the comparison result falls
+ // within one of the following quadrants:
+ //
+ // $r0 $r1 $r0-$r1* flags result
+ // + + > C=0 GT
+ // + + = Z=1 EQ
+ // + + < C=1 LT
+ // + - > C=1 GT
+ // + - = C=1 GT
+ // + - < C=1 GT
+ // - + > C=0 LT
+ // - + = C=0 LT
+ // - + < C=0 LT
+ // - - > C=0 LT
+ // - - = Z=1 EQ
+ // - - < C=1 GT
+ //
+ // *When interpeted as a subtraction of unsigned integers
+ //
+ // From the table, it is clear that in the presence of any negative
+ // operand, the natural result simply needs to be reversed.
+ // Save the 'N' flag for later use.
+ movs r3, r0
+ orrs r3, r1
+ mov ip, r3
+
+ // Keep the absolute value of the second argument for NAN testing.
+ lsls r3, r1, #1
+
+ // With the absolute value of the second argument safely stored,
+ // recycle $r1 to calculate the difference of the arguments.
+ subs r1, r0, r1
+
+ // Save the 'C' flag for use later.
+ // Effectively shifts all the flags 1 bit left.
+ adcs r2, r2
+
+ // Absolute value of the first argument.
+ lsls r0, #1
+
+ // Identify the largest absolute value between the two arguments.
+ cmp r0, r3
+ bhs LSYM(__fcmp_sorted)
+
+ // Keep the larger absolute value for NAN testing.
+ // NOTE: When the arguments are respectively a signaling NAN and a
+ // quiet NAN, the quiet NAN has precedence. This has consequences
+ // if TRAP_NANS is enabled, but the flags indicate that exceptions
+ // for quiet NANs should be suppressed. After the signaling NAN is
+ // discarded, no exception is raised, although it should have been.
+ // This could be avoided by using a fifth register to save both
+ // arguments until the signaling bit can be tested, but that seems
+ // like an excessive amount of ugly code for an ambiguous case.
+ movs r0, r3
+
+ LSYM(__fcmp_sorted):
+ // If $r3 is NAN, the result is unordered.
+ movs r3, #255
+ lsls r3, #24
+ cmp r0, r3
+ bhi LSYM(__fcmp_unordered)
+
+ // Positive and negative zero must be considered equal.
+ // If the larger absolute value is +/-0, both must have been +/-0.
+ subs r3, r0, #0
+ beq LSYM(__fcmp_zero)
+
+ // Test for regular equality.
+ subs r3, r1, #0
+ beq LSYM(__fcmp_zero)
+
+ // Isolate the saved 'C', and invert if either argument was negative.
+ // Remembering that the original subtraction was $r1 - $r0,
+ // the result will be 1 if 'C' was set (gt), or 0 for not 'C' (lt).
+ lsls r3, r2, #31
+ add r3, ip
+ lsrs r3, #31
+
+ // HACK: Force the 'C' bit clear,
+ // since bit[30] of $r3 may vary with the operands.
+ adds r3, #0
+
+ LSYM(__fcmp_zero):
+ // After everything is combined, the temp result will be
+ // 2 (gt), 1 (eq), or 0 (lt).
+ adcs r3, r3
+
+ // Short-circuit return if the 3-way comparison flag is set.
+ // Otherwise, shifts the condition mask into bits[2:0].
+ lsrs r2, #2
+ bcs LSYM(__fcmp_return)
+
+ // If the bit corresponding to the comparison result is set in the
+ // accepance mask, a '1' will fall out into the result.
+ movs r0, #1
+ lsrs r2, r3
+ ands r0, r2
+ RET
+
+ LSYM(__fcmp_unordered):
+ // Set up the requested UNORDERED result.
+ // Remember the shift in the flags (above).
+ lsrs r2, #6
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ // TODO: ... The
+
+
+ #endif
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Always raise an exception if FCMP_RAISE_EXCEPTIONS was specified.
+ bcs LSYM(__fcmp_trap)
+
+ // If FCMP_NO_EXCEPTIONS was specified, no exceptions on quiet NANs.
+ // The comparison flags are moot, so $r1 can serve as scratch space.
+ lsrs r1, r0, #24
+ bcs LSYM(__fcmp_return2)
+
+ LSYM(__fcmp_trap):
+ // Restore the NAN (sans sign) for an argument to the exception.
+ // As an IRQ, the handler restores all registers, including $r3.
+ // NOTE: The service handler may not return.
+ lsrs r0, #1
+ movs r3, #(UNORDERED_COMPARISON)
+ svc #(SVC_TRAP_NAN)
+ #endif
+
+ LSYM(__fcmp_return2):
+ // HACK: Work around result register mapping.
+ // This could probably be eliminated by remapping the flags register.
+ movs r3, r2
+
+ LSYM(__fcmp_return):
+ // Finish setting up the result.
+ // Constant subtraction allows a negative result while keeping the
+ // $r2 flag control word within 8 bits, particularly for FCMP_UN*.
+ // This operation also happens to set the 'Z' and 'C' flags correctly
+ // per the requirements of __aeabi_cfcmple() et al.
+ subs r0, r3, #1
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ltsf2
+CM0_FUNC_END lesf2
+CM0_FUNC_END cmpsf2
+
+#endif /* L_arm_cmpsf2 */
+
+
+#ifdef L_arm_eqsf2
+
+// int __eqsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * -1 if ($r0 < $r1)
+// * 0 if ($r0 == $r1)
+// * +1 if ($r0 > $r1), or either argument is NAN
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.eqsf2,"x"
+CM0_FUNC_START eqsf2
+CM0_FUNC_ALIAS nesf2 eqsf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END nesf2
+CM0_FUNC_END eqsf2
+
+#endif /* L_arm_eqsf2 */
+
+
+#ifdef L_arm_gesf2
+
+// int __gesf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+// * -1 if ($r0 < $r1), or either argument is NAN
+// * 0 if ($r0 == $r1)
+// * +1 if ($r0 > $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.gesf2,"x"
+CM0_FUNC_START gesf2
+CM0_FUNC_ALIAS gtsf2 gesf2
+ CFI_START_FUNCTION
+
+ // Assumption: The 'libgcc' functions should raise exceptions.
+ movs r2, #(FCMP_UN_NEGATIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END gtsf2
+CM0_FUNC_END gesf2
+
+#endif /* L_arm_gesf2 */
+
+
+#ifdef L_arm_fcmpeq
+
+// int __aeabi_fcmpeq(float, float)
+// Returns '1' in $r1 if ($r0 == $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmpeq,"x"
+CM0_FUNC_START aeabi_fcmpeq
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpeq
+
+#endif /* L_arm_fcmpeq */
+
+
+#ifdef L_arm_fcmpne
+
+// int __aeabi_fcmpne(float, float) [non-standard]
+// Returns '1' in $r1 if ($r0 != $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmpne,"x"
+CM0_FUNC_START aeabi_fcmpne
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_NE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpne
+
+#endif /* L_arm_fcmpne */
+
+
+#ifdef L_arm_fcmplt
+
+// int __aeabi_fcmplt(float, float)
+// Returns '1' in $r1 if ($r0 < $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmplt,"x"
+CM0_FUNC_START aeabi_fcmplt
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmplt
+
+#endif /* L_arm_fcmplt */
+
+
+#ifdef L_arm_fcmple
+
+// int __aeabi_fcmple(float, float)
+// Returns '1' in $r1 if ($r0 <= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmple,"x"
+CM0_FUNC_START aeabi_fcmple
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmple
+
+#endif /* L_arm_fcmple */
+
+
+#ifdef L_arm_fcmpge
+
+// int __aeabi_fcmpge(float, float)
+// Returns '1' in $r1 if ($r0 >= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmpge,"x"
+CM0_FUNC_START aeabi_fcmpge
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GE)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpge
+
+#endif /* L_arm_fcmpge */
+
+
+#ifdef L_arm_fcmpgt
+
+// int __aeabi_fcmpgt(float, float)
+// Returns '1' in $r1 if ($r0 > $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.fcmpgt,"x"
+CM0_FUNC_START aeabi_fcmpgt
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_fcmpgt
+
+#endif /* L_arm_cmpgt */
+
+
+#ifdef L_arm_unordsf2
+
+// int __aeabi_fcmpun(float, float)
+// Returns '1' in $r1 if $r0 and $r1 are unordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.unordsf2,"x"
+CM0_FUNC_START aeabi_fcmpun
+CM0_FUNC_ALIAS unordsf2 aeabi_fcmpun
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END unordsf2
+CM0_FUNC_END aeabi_fcmpun
+
+#endif /* L_arm_unordsf2 */
+
+
+#ifdef L_arm_cfrcmple
+
+// void __aeabi_cfrcmple(float, float)
+// Reverse three-way compare of $r1 ? $r1, with result in the status flags:
+// * 'Z' is set only when the operands are ordered and equal.
+// * 'C' is clear only when the operands are ordered and $r0 > $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.cfrcmple,"x"
+CM0_FUNC_START aeabi_cfrcmple
+ CFI_START_FUNCTION
+
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { r0 - r3, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 24
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset rT, 16
+ .cfi_rel_offset lr, 20
+ #else
+ push { r0 - r3, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 20
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset lr, 16
+ #endif
+
+ // Reverse the operands.
+ movs r0, r1
+ ldr r1, [sp, #0]
+
+ // Don't just fall through, else registers will get pushed twice.
+ b SYM(__internal_cfrcmple)
+
+ // MAYBE:
+ // It might be better to pass original order arguments and swap
+ // the result instead. Cleaner for STRICT_NAN trapping too.
+ // Is 4 cycles worth 6 bytes?
+ // For example:
+ // $r2 = (FCMP_UN_NEGATIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+ // movs r1, #1
+ // subs r1, r3
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_cfrcmple
+
+#endif /* L_arm_cfrcmple */
+
+
+#if defined(L_arm_cfcmple) || \
+ (defined(L_arm_cfcmpeq) && defined(TRAP_NANS) && TRAP_NANS)
+
+#ifdef L_arm_cfcmple
+.section .text.sorted.libgcc.fcmp.cfcmple,"x"
+ #define CFCMPLE_NAME aeabi_cfcmple
+#else
+.section .text.sorted.libgcc.fcmp.cfcmpeq,"x"
+ #define CFCMPLE_NAME aeabi_cfcmpeq
+#endif
+
+// void __aeabi_cfcmple(float, float)
+// void __aeabi_cfcmpeq(float, float)
+// NOTE: These functions are only distinct if __aeabi_cfcmple() can raise exceptions.
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+// * 'Z' is set only when the operands are ordered and equal.
+// * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+CM0_FUNC_START CFCMPLE_NAME
+
+ // __aeabi_cfcmpeq() is defined separately when TRAP_NANS is enabled.
+ #if !defined(TRAP_NANS) || !TRAP_NANS
+ CM0_FUNC_ALIAS aeabi_cfcmpeq aeabi_cfcmple
+ #endif
+
+ CFI_START_FUNCTION
+
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { r0 - r3, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 24
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset rT, 16
+ .cfi_rel_offset lr, 20
+ #else
+ push { r0 - r3, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 20
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset r1, 4
+ .cfi_rel_offset r2, 8
+ .cfi_rel_offset r3, 12
+ .cfi_rel_offset lr, 16
+ #endif
+
+ #ifdef L_arm_cfcmple
+ CM0_FUNC_START internal_cfrcmple
+ // Even though the result in $r0 will be discarded, the 3-way
+ // subtraction of '-1' that generates this result happens to
+ // set 'C' and 'Z' perfectly. Unordered results group with '>'.
+ // This happens to be the same control word as __cmpsf2(), meaning
+ // that __cmpsf2() is a potential branch target. However,
+ // the choice to set a redundant control word and branch to
+ // __internal_cmpsf2() makes this compiled object more robust
+ // against linking with 'foreign' __cmpsf2() implementations.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+ #else /* L_arm_cfcmpeq */
+ CM0_FUNC_START internal_cfrcmpeq
+ // No exceptions on quiet NAN.
+ movs r2, #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+ #endif
+
+ bl SYM(__internal_cmpsf2)
+
+ // Clean up all working registers.
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { r0 - r3, rT, pc }
+ .cfi_restore_state
+ #else
+ pop { r0 - r3, pc }
+ .cfi_restore_state
+ #endif
+
+ CFI_END_FUNCTION
+
+ #if !defined(TRAP_NANS) || !TRAP_NANS
+ CM0_FUNC_END aeabi_cfcmpeq
+ #endif
+
+CM0_FUNC_END CFCMPLE_NAME
+
+#endif /* L_arm_cfcmple || L_arm_cfcmpeq */
+
+
+// C99 libm functions
+#if 0
+
+// int isgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 > $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.isgtf,"x"
+CM0_FUNC_START isgreaterf
+MATH_ALIAS isgreaterf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isgreaterf
+CM0_FUNC_END isgreaterf
+
+
+// int isgreaterequalf(float, float)
+// Returns '1' in $r0 if ($r0 >= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.isgef,"x"
+CM0_FUNC_START isgreaterequalf
+MATH_ALIAS isgreaterequalf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isgreaterequalf
+CM0_FUNC_END isgreaterequalf
+
+
+// int islessf(float, float)
+// Returns '1' in $r0 if ($r0 < $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.isltf,"x"
+CM0_FUNC_START islessf
+MATH_ALIAS islessf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessf
+CM0_FUNC_END islessf
+
+
+// int islessequalf(float, float)
+// Returns '1' in $r0 if ($r0 <= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.islef,"x"
+CM0_FUNC_START islessequalf
+MATH_ALIAS islessequalf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessequalf
+CM0_FUNC_END islessequalf
+
+
+// int islessgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 != $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.isnef,"x"
+CM0_FUNC_START islessgreaterf
+MATH_ALIAS islessgreaterf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END islessgreaterf
+CM0_FUNC_END islessgreaterf
+
+
+// int isunorderedf(float, float)
+// Returns '1' in $r0 if either $r0 or $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+.section .text.sorted.libgcc.fcmp.isunf,"x"
+CM0_FUNC_START isunorderedf
+MATH_ALIAS isunorderedf
+ CFI_START_FUNCTION
+
+ movs r2, #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+ b SYM(__internal_cmpsf2)
+
+ CFI_END_FUNCTION
+MATH_END isunorderedf
+CM0_FUNC_END isunorderedf
+
+#endif /* 0 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fconv.S gcc-11-20201220/libgcc/config/arm/cm0/fconv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fconv.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fconv.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,429 @@
+/* fconv.S: Cortex M0 optimized 32- and 64-bit float conversions
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_f2d
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.sorted.libgcc.fpcore.v.extendsfdf2,"x"
+CM0_FUNC_START aeabi_f2d
+CM0_FUNC_ALIAS extendsfdf2 aeabi_f2d
+ CFI_START_FUNCTION
+
+ // Save the sign.
+ lsrs r1, r0, #31
+ lsls r1, #31
+
+ // Set up registers for __fp_normalize2().
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Test for zero.
+ lsls r0, #1
+ beq LSYM(__f2d_return)
+
+ // Split the exponent and mantissa into separate registers.
+ // This is the most efficient way to convert subnormals in the
+ // half-precision form into normals in single-precision.
+ // This does add a leading implicit '1' to INF and NAN,
+ // but that will be absorbed when the value is re-assembled.
+ movs r2, r0
+ bl SYM(__fp_normalize2) __PLT__
+
+ // Set up the exponent bias. For INF/NAN values, the bias
+ // is 1791 (2047 - 255 - 1), where the last '1' accounts
+ // for the implicit '1' in the mantissa.
+ movs r0, #3
+ lsls r0, #9
+ adds r0, #255
+
+ // Test for INF/NAN, promote exponent if necessary
+ cmp r2, #255
+ beq LSYM(__f2d_indefinite)
+
+ // For normal values, the exponent bias is 895 (1023 - 127 - 1),
+ // which is half of the prepared INF/NAN bias.
+ lsrs r0, #1
+
+ LSYM(__f2d_indefinite):
+ // Assemble exponent with bias correction.
+ adds r2, r0
+ lsls r2, #20
+ adds r1, r2
+
+ // Assemble the high word of the mantissa.
+ lsrs r0, r3, #11
+ add r1, r0
+
+ // Remainder of the mantissa in the low word of the result.
+ lsls r0, r3, #21
+
+ LSYM(__f2d_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END extendsfdf2
+CM0_FUNC_END aeabi_f2d
+
+#endif /* L_arm_f2d */
+
+
+#if defined(L_arm_d2f)
+// TODO: not tested || defined(L_arm_truncdfsf2)
+
+// HACK: Build two separate implementations:
+// * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules.
+// * __truncdfsf2() rounds towards zero per GCC specification.
+// Presumably, a program will consistently use one ABI or the other,
+// which means that this code will not be duplicated in practice.
+// Merging the two versions with dynamic rounding would be rather hard.
+#ifdef L_arm_truncdfsf2
+ #define D2F_NAME truncdfsf2
+#else
+ #define D2F_NAME aeabi_d2f
+#endif
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+// Values out of range become ZERO or INF; returns the upper 23 bits of NAN.
+.section .text.sorted.libgcc.fpcore.w.truncdfsf2,"x"
+CM0_FUNC_START D2F_NAME
+ CFI_START_FUNCTION
+
+ // Save the sign.
+ lsrs r2, r1, #31
+ lsls r2, #31
+ mov ip, r2
+
+ // Isolate the exponent (11 bits).
+ lsls r2, r1, #1
+ lsrs r2, #21
+
+ // Isolate the mantissa. It's safe to always add the implicit '1' --
+ // even for subnormals -- since they will underflow in every case.
+ lsls r1, #12
+ adds r1, #1
+ rors r1, r1
+ lsrs r3, r0, #21
+ adds r1, r3
+
+ #ifndef L_arm_truncdfsf2
+ // Fix the remainder. Even though the mantissa already has 32 bits
+ // of significance, this value still influences rounding ties.
+ lsls r0, #11
+ #endif
+
+ // Test for INF/NAN (r3 = 2047)
+ mvns r3, r2
+ lsrs r3, #21
+ cmp r3, r2
+ beq LSYM(__d2f_indefinite)
+
+ // Adjust exponent bias. Offset is 127 - 1023, less 1 more since
+ // __fp_assemble() expects the exponent relative to bit[30].
+ lsrs r3, #1
+ subs r2, r3
+ adds r2, #126
+
+ #ifndef L_arm_truncdfsf2
+ LSYM(__d2f_overflow):
+ // Use the standard formatting for overflow and underflow.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ b SYM(__fp_assemble)
+ .cfi_restore_state
+
+ #else /* L_arm_truncdfsf2 */
+ // In theory, __truncdfsf2() could also push registers and branch to
+ // __fp_assemble() after calculating the truncation shift and clearing
+ // bits. __fp_assemble() always rounds down if there is no remainder.
+ // However, after doing all of that work, the incremental cost to
+ // finish assembling the return value is only 6 or 7 instructions
+ // (depending on how __d2f_overflow() returns).
+ // This seems worthwhile to avoid linking in all of __fp_assemble().
+
+ // Test for INF.
+ cmp r2, #254
+ bge LSYM(__d2f_overflow)
+
+ // HACK: Pre-empt the default round-to-nearest mode,
+ // since GCC specifies rounding towards zero.
+ // Start by identifying subnormals by negative exponents.
+ asrs r3, r2, #31
+ ands r3, r2
+
+ // Clear the standard exponent field for subnormals.
+ eors r2, r3
+
+ // Add the subnormal shift to the nominal 8 bits.
+ rsbs r3, #0
+ adds r3, #8
+
+ // Clamp the shift to a single word (branchless).
+ // Anything larger would have flushed to zero anyway.
+ lsls r3, #27
+ lsrs r3, #27
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // Preserve inexact zero.
+ orrs r0, r1
+ #endif
+
+ // Clear the insignificant bits.
+ lsrs r1, r3
+
+ // Combine the mantissa and the exponent.
+ // TODO: Test for inexact zero after adding.
+ lsls r2, #23
+ adds r0, r1, r2
+
+ // Combine with the saved sign.
+ add r0, ip
+ RET
+
+ LSYM(__d2f_overflow):
+ // Construct signed INF in $r0.
+ movs r0, #255
+ lsls r0, #23
+ add r0, ip
+ RET
+
+ #endif /* L_arm_truncdfsf2 */
+
+ LSYM(__d2f_indefinite):
+ // Test for INF. If the mantissa, exclusive of the implicit '1',
+ // is equal to '0', the result will be INF.
+ lsls r3, r1, #1
+ orrs r3, r0
+ beq LSYM(__d2f_overflow)
+
+ // TODO: Support for TRAP_NANS here.
+ // This will be double precision, not compatible with the current handler.
+
+ // Construct NAN with the upper 22 bits of the mantissa, setting bit[21]
+ // to ensure a valid NAN without changing bit[22] (quiet)
+ subs r2, #0xD
+ lsls r0, r2, #20
+ lsrs r1, #8
+ orrs r0, r1
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Yes, the NAN has already been altered, but at least keep the sign...
+ add r0, ip
+ #endif
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END D2F_NAME
+
+#endif /* L_arm_d2f || L_arm_truncdfsf2 */
+
+
+#ifdef L_arm_h2f
+
+// float __aeabi_h2f(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// INF and ZERO are returned unmodified.
+.section .text.sorted.libgcc.h2f,"x"
+CM0_FUNC_START aeabi_h2f
+ CFI_START_FUNCTION
+
+ // Set up registers for __fp_normalize2().
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save the mantissa and exponent.
+ lsls r2, r0, #17
+
+ // Isolate the sign.
+ lsrs r0, #15
+ lsls r0, #31
+
+ // Align the exponent at bit[24] for normalization.
+ // If zero, return the original sign.
+ lsrs r2, #3
+ beq LSYM(__h2f_return)
+
+ // Split the exponent and mantissa into separate registers.
+ // This is the most efficient way to convert subnormals in the
+ // half-precision form into normals in single-precision.
+ // This does add a leading implicit '1' to INF and NAN,
+ // but that will be absorbed when the value is re-assembled.
+ bl SYM(__fp_normalize2) __PLT__
+
+ // Set up the exponent bias. For INF/NAN values, the bias is 223,
+ // where the last '1' accounts for the implicit '1' in the mantissa.
+ adds r2, #(255 - 31 - 1)
+
+ // Test for INF/NAN.
+ cmp r2, #254
+ beq LSYM(__h2f_assemble)
+
+ // For normal values, the bias should have been 111.
+ // However, this adjustment now is faster than branching.
+ subs r2, #((255 - 31 - 1) - (127 - 15 - 1))
+
+ LSYM(__h2f_assemble):
+ // Combine exponent and sign.
+ lsls r2, #23
+ adds r0, r2
+
+ // Combine mantissa.
+ lsrs r3, #8
+ add r0, r3
+
+ LSYM(__h2f_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_h2f
+
+#endif /* L_arm_h2f */
+
+
+#ifdef L_arm_f2h
+
+// short __aeabi_f2h(float f)
+// Converts a single-precision float in $r1 to half-precision,
+// rounding to nearest, ties to even.
+// Values out of range become ZERO or INF; returns the upper 12 bits of NAN.
+// Values out of range are forced to either ZERO or INF.
+.section .text.sorted.libgcc.f2h,"x"
+CM0_FUNC_START aeabi_f2h
+ CFI_START_FUNCTION
+
+ // Set up the sign.
+ lsrs r2, r0, #31
+ lsls r2, #15
+
+ // Save the exponent and mantissa.
+ // If ZERO, return the original sign.
+ lsls r0, #1
+ beq LSYM(__f2h_return)
+
+ // Isolate the exponent, check for NAN.
+ lsrs r1, r0, #24
+ cmp r1, #255
+ beq LSYM(__f2h_indefinite)
+
+ // Check for overflow.
+ cmp r1, #(127 + 15)
+ bhi LSYM(__f2h_overflow)
+
+ // Isolate the mantissa, adding back the implicit '1'.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0
+
+ // Adjust exponent bias for half-precision, including '1' to
+ // account for the mantissa's implicit '1'.
+ subs r1, #(127 - 15 + 1)
+ bmi LSYM(__f2h_underflow)
+
+ // Combine the exponent and sign.
+ lsls r1, #10
+ adds r2, r1
+
+ // Split the mantissa (11 bits) and remainder (13 bits).
+ lsls r3, r0, #12
+ lsrs r0, #21
+
+ LSYM(__f2h_round):
+ // If the carry bit is '0', always round down.
+ bcc LSYM(__f2h_return)
+
+ // Carry was set. If a tie (no remainder) and the
+ // LSB of the result are '0', round down (to even).
+ lsls r1, r0, #31
+ orrs r1, r3
+ beq LSYM(__f2h_return)
+
+ // Round up, ties to even.
+ adds r0, #1
+
+ LSYM(__f2h_return):
+ // Combine mantissa and exponent.
+ adds r0, r2
+ RET
+
+ LSYM(__f2h_underflow):
+ // Align the remainder. The remainder consists of the last 12 bits
+ // of the mantissa plus the magnitude of underflow.
+ movs r3, r0
+ adds r1, #12
+ lsls r3, r1
+
+ // Align the mantissa. The MSB of the remainder must be
+ // shifted out into last the 'C' flag for rounding.
+ subs r1, #33
+ rsbs r1, #0
+ lsrs r0, r1
+ b LSYM(__f2h_round)
+
+ LSYM(__f2h_overflow):
+ // Create single-precision INF from which to construct half-precision.
+ movs r0, #255
+ lsls r0, #24
+
+ LSYM(__f2h_indefinite):
+ // Check for INF.
+ lsls r3, r0, #8
+ beq LSYM(__f2h_infinite)
+
+ // Set bit[8] to ensure a valid NAN without changing bit[9] (quiet).
+ adds r2, #128
+ adds r2, #128
+
+ LSYM(__f2h_infinite):
+ // Construct the result from the upper 22 bits of the mantissa
+ // and the lower 5 bits of the exponent.
+ lsls r0, #3
+ lsrs r0, #17
+
+ // Combine with the sign (and possibly NAN flag).
+ orrs r0, r2
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_f2h
+
+#endif /* L_arm_f2h */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fdiv.S gcc-11-20201220/libgcc/config/arm/cm0/fdiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fdiv.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fdiv.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,257 @@
+/* fdiv.S: Cortex M0 optimized 32-bit float division
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_divsf3
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+.section .text.sorted.libgcc.fpcore.n.fdiv,"x"
+CM0_FUNC_START aeabi_fdiv
+CM0_FUNC_ALIAS divsf3 aeabi_fdiv
+ CFI_START_FUNCTION
+
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save for the sign of the result.
+ movs r3, r1
+ eors r3, r0
+ lsrs rT, r3, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // Set up INF for comparison.
+ movs rT, #255
+ lsls rT, #24
+
+ // Check for divide by 0. Automatically catches 0/0.
+ lsls r2, r1, #1
+ beq LSYM(__fdiv_by_zero)
+
+ // Check for INF/INF, or a number divided by itself.
+ lsls r3, #1
+ beq LSYM(__fdiv_equal)
+
+ // Check the numerator for INF/NAN.
+ eors r3, r2
+ cmp r3, rT
+ bhs LSYM(__fdiv_special1)
+
+ // Check the denominator for INF/NAN.
+ cmp r2, rT
+ bhs LSYM(__fdiv_special2)
+
+ // Check the numerator for zero.
+ cmp r3, #0
+ beq SYM(__fp_zero)
+
+ // No action if the numerator is subnormal.
+ // The mantissa will normalize naturally in the division loop.
+ lsls r0, #9
+ lsrs r1, r3, #24
+ beq LSYM(__fdiv_denominator)
+
+ // Restore the numerator's implicit '1'.
+ adds r0, #1
+ rors r0, r0
+
+ LSYM(__fdiv_denominator):
+ // The denominator must be normalized and left aligned.
+ bl SYM(__fp_normalize2)
+
+ // 25 bits of precision will be sufficient.
+ movs rT, #64
+
+ // Run division.
+ bl SYM(__internal_fdiv_loop)
+ b SYM(__fp_assemble)
+
+ LSYM(__fdiv_equal):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(DIVISION_INF_BY_INF)
+ #endif
+
+ // The absolute value of both operands are equal, but not 0.
+ // If both operands are INF, create a new NAN.
+ cmp r2, rT
+ beq SYM(__fp_exception)
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // If both operands are NAN, return the NAN in $r0.
+ bhi SYM(__fp_check_nan)
+ #else
+ bhi LSYM(__fdiv_return)
+ #endif
+
+ // Return 1.0f, with appropriate sign.
+ movs r0, #127
+ lsls r0, #23
+ add r0, ip
+
+ LSYM(__fdiv_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ LSYM(__fdiv_special2):
+ // The denominator is either INF or NAN, numerator is neither.
+ // Also, the denominator is not equal to 0.
+ // If the denominator is INF, the result goes to 0.
+ beq SYM(__fp_zero)
+
+ // The only other option is NAN, fall through to branch.
+ mov r0, r1
+
+ LSYM(__fdiv_special1):
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // The numerator is INF or NAN. If NAN, return it directly.
+ bne SYM(__fp_check_nan)
+ #else
+ bne LSYM(__fdiv_return)
+ #endif
+
+ // If INF, the result will be INF if the denominator is finite.
+ // The denominator won't be either INF or 0,
+ // so fall through the exception trap to check for NAN.
+ movs r0, r1
+
+ LSYM(__fdiv_by_zero):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(DIVISION_0_BY_0)
+ #endif
+
+ // The denominator is 0.
+ // If the numerator is also 0, the result will be a new NAN.
+ // Otherwise the result will be INF, with the correct sign.
+ lsls r2, r0, #1
+ beq SYM(__fp_exception)
+
+ // The result should be NAN if the numerator is NAN. Otherwise,
+ // the result is INF regardless of the numerator value.
+ cmp r2, rT
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ bhi SYM(__fp_check_nan)
+ #else
+ bhi LSYM(__fdiv_return)
+ #endif
+
+ // Recreate INF with the correct sign.
+ b SYM(__fp_infinity)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divsf3
+CM0_FUNC_END aeabi_fdiv
+
+
+// Division helper, possibly to be shared with atan2.
+// Expects the numerator mantissa in $r0, exponent in $r1,
+// plus the denominator mantissa in $r3, exponent in $r2, and
+// a bit pattern in $rT that controls the result precision.
+// Returns quotient in $r1, exponent in $r2, pseudo remainder in $r0.
+.section .text.sorted.libgcc.fpcore.o.fdiv2,"x"
+CM0_FUNC_START internal_fdiv_loop
+ CFI_START_FUNCTION
+
+ // Initialize the exponent, relative to bit[30].
+ subs r2, r1, r2
+
+ SYM(__internal_fdiv_loop2):
+ // The exponent should be (expN - 127) - (expD - 127) + 127.
+ // An additional offset of 25 is required to account for the
+ // minimum number of bits in the result (before rounding).
+ // However, drop '1' because the offset is relative to bit[30],
+ // while the result is calculated relative to bit[31].
+ adds r2, #(127 + 25 - 1)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Dividing by a power of 2?
+ lsls r1, r3, #1
+ beq LSYM(__fdiv_simple)
+ #endif
+
+ // Initialize the result.
+ eors r1, r1
+
+ // Clear the MSB, so that when the numerator is smaller than
+ // the denominator, there is one bit free for a left shift.
+ // After a single shift, the numerator is guaranteed to be larger.
+ // The denominator ends up in r3, and the numerator ends up in r0,
+ // so that the numerator serves as a psuedo-remainder in rounding.
+ // Shift the numerator one additional bit to compensate for the
+ // pre-incrementing loop.
+ lsrs r0, #2
+ lsrs r3, #1
+
+ LSYM(__fdiv_loop):
+ // Once the MSB of the output reaches the MSB of the register,
+ // the result has been calculated to the required precision.
+ lsls r1, #1
+ bmi LSYM(__fdiv_break)
+
+ // Shift the numerator/remainder left to set up the next bit.
+ subs r2, #1
+ lsls r0, #1
+
+ // Test if the numerator/remainder is smaller than the denominator,
+ // do nothing if it is.
+ cmp r0, r3
+ blo LSYM(__fdiv_loop)
+
+ // If the numerator/remainder is greater or equal, set the next bit,
+ // and subtract the denominator.
+ adds r1, rT
+ subs r0, r3
+
+ // Short-circuit if the remainder goes to 0.
+ // Even with the overhead of "subnormal" alignment,
+ // this is usually much faster than continuing.
+ bne LSYM(__fdiv_loop)
+
+ // Compensate the alignment of the result.
+ // The remainder does not need compensation, it's already 0.
+ lsls r1, #1
+
+ LSYM(__fdiv_break):
+ RET
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__fdiv_simple):
+ // The numerator becomes the result, with a remainder of 0.
+ movs r1, r0
+ eors r0, r0
+ subs r2, #25
+ RET
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END internal_fdiv_loop
+
+#endif /* L_arm_divsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ffixed.S gcc-11-20201220/libgcc/config/arm/cm0/ffixed.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ffixed.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ffixed.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,342 @@
+/* ffixed.S: Cortex M0 optimized float->int conversion
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_fixsfsi
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+.section .text.sorted.libgcc.fpcore.r.fixsfsi,"x"
+CM0_FUNC_START aeabi_f2iz
+CM0_FUNC_ALIAS fixsfsi aeabi_f2iz
+ CFI_START_FUNCTION
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Flag for unsigned conversion.
+ movs r1, #33
+ b LSYM(__real_f2lz)
+
+ #else /* !__OPTIMIZE_SIZE__ */
+ // Flag for signed conversion.
+ movs r3, #1
+
+ LSYM(__real_f2iz):
+ // Isolate the sign of the result.
+ asrs r1, r0, #31
+ lsls r0, #1
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // Check for zero to avoid spurious underflow exception on -0.
+ beq LSYM(__f2iz_return)
+ #endif
+
+ // Isolate the exponent.
+ lsrs r2, r0, #24
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Test for NAN.
+ // Otherwise, NAN will be converted like +/-INF.
+ cmp r2, #255
+ beq LSYM(__f2iz_nan)
+ #endif
+
+ // Extract the mantissa and restore the implicit '1'. Technically,
+ // this is wrong for subnormals, but they flush to zero regardless.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0
+
+ // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+ // * An exponent less than 127 will automatically flush to 0.
+ // * An exponent of 127 will result in a shift of 31.
+ // * An exponent of 128 will result in a shift of 30.
+ // * ...
+ // * An exponent of 157 will result in a shift of 1.
+ // * An exponent of 158 will result in no shift at all.
+ // * An exponent larger than 158 will result in overflow.
+ rsbs r2, #0
+ adds r2, #158
+
+ // When the shift is less than minimum, the result will overflow.
+ // The only signed value to fail this test is INT_MIN (0x80000000),
+ // but it will be returned correctly from the overflow branch.
+ cmp r2, r3
+ blt LSYM(__f2iz_overflow)
+
+ // If unsigned conversion of a negative value, also overflow.
+ // Would also catch -0.0f if not handled earlier.
+ cmn r3, r1
+ blt LSYM(__f2iz_overflow)
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // Save a copy for remainder testing
+ movs r3, r0
+ #endif
+
+ // Truncate the fraction.
+ lsrs r0, r2
+
+ // Two's complement negation, if applicable.
+ // Bonus: the sign in $r1 provides a suitable long long result.
+ eors r0, r1
+ subs r0, r1
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ // If any bits set in the remainder, raise FE_INEXACT
+ rsbs r2, #0
+ adds r2, #32
+ lsls r3, r2
+ bne LSYM(__f2iz_inexact)
+ #endif
+
+ LSYM(__f2iz_return):
+ RET
+
+ LSYM(__f2iz_overflow):
+ // Positive unsigned integers (r1 == 0, r3 == 0), return 0xFFFFFFFF.
+ // Negative unsigned integers (r1 == -1, r3 == 0), return 0x00000000.
+ // Positive signed integers (r1 == 0, r3 == 1), return 0x7FFFFFFF.
+ // Negative signed integers (r1 == -1, r3 == 1), return 0x80000000.
+ // TODO: FE_INVALID exception, (but not for -2^31).
+ mvns r0, r1
+ lsls r3, #31
+ eors r0, r3
+ RET
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ LSYM(__f2iz_inexact):
+ // TODO: Another class of exceptions that doesn't overwrite $r0.
+ bkpt #0
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(CAST_INEXACT)
+ #endif
+
+ b SYM(__fp_exception)
+ #endif
+
+ LSYM(__f2iz_nan):
+ // Check for INF
+ lsls r2, r0, #9
+ beq LSYM(__f2iz_overflow)
+
+ #if defined(FP_EXCEPTION) && FP_EXCEPTION
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(CAST_UNDEFINED)
+ #endif
+
+ b SYM(__fp_exception)
+ #else
+
+ #endif
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+
+ // TODO: Extend to long long
+
+ // TODO: bl fp_check_nan
+ #endif
+
+ // Return long long 0 on NAN.
+ eors r0, r0
+ eors r1, r1
+ RET
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixsfsi
+CM0_FUNC_END aeabi_f2iz
+
+
+// unsigned int __aeabi_f2uiz(float)
+// Converts a float in $r0 to unsigned integer, rounding toward 0.
+// Values out of range are forced to UINT_MAX.
+// Negative values and NAN all become zero.
+.section .text.sorted.libgcc.fpcore.s.fixunssfsi,"x"
+CM0_FUNC_START aeabi_f2uiz
+CM0_FUNC_ALIAS fixunssfsi aeabi_f2uiz
+ CFI_START_FUNCTION
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Flag for unsigned conversion.
+ movs r1, #32
+ b LSYM(__real_f2lz)
+
+ #else /* !__OPTIMIZE_SIZE__ */
+ // Flag for unsigned conversion.
+ movs r3, #0
+ b LSYM(__real_f2iz)
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixunssfsi
+CM0_FUNC_END aeabi_f2uiz
+
+
+// long long aeabi_f2lz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to either INT64_MAX or INT64_MIN.
+// NAN becomes zero.
+.section .text.sorted.libgcc.fpcore.t.fixsfdi,"x"
+CM0_FUNC_START aeabi_f2lz
+CM0_FUNC_ALIAS fixsfdi aeabi_f2lz
+ CFI_START_FUNCTION
+
+ movs r1, #1
+
+ LSYM(__real_f2lz):
+ // Split the sign of the result from the mantissa/exponent field.
+ // Handle +/-0 specially to avoid spurious exceptions.
+ asrs r3, r0, #31
+ lsls r0, #1
+ beq LSYM(__f2lz_zero)
+
+ // If unsigned conversion of a negative value, also overflow.
+ // Specifically, is the LSB of $r1 clear when $r3 is equal to '-1'?
+ //
+ // $r3 (sign) >= $r2 (flag)
+ // 0xFFFFFFFF false 0x00000000
+ // 0x00000000 true 0x00000000
+ // 0xFFFFFFFF true 0x80000000
+ // 0x00000000 true 0x80000000
+ //
+ // (NOTE: This test will also trap -0.0f, unless handled earlier.)
+ lsls r2, r1, #31
+ cmp r3, r2
+ blt LSYM(__f2lz_overflow)
+
+ // Isolate the exponent.
+ lsrs r2, r0, #24
+
+// #if defined(TRAP_NANS) && TRAP_NANS
+// // Test for NAN.
+// // Otherwise, NAN will be converted like +/-INF.
+// cmp r2, #255
+// beq LSYM(__f2lz_nan)
+// #endif
+
+ // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+ // * An exponent less than 127 will automatically flush to 0.
+ // * An exponent of 127 will result in a shift of 63.
+ // * An exponent of 128 will result in a shift of 62.
+ // * ...
+ // * An exponent of 189 will result in a shift of 1.
+ // * An exponent of 190 will result in no shift at all.
+ // * An exponent larger than 190 will result in overflow
+ // (189 in the case of signed integers).
+ rsbs r2, #0
+ adds r2, #190
+ // When the shift is less than minimum, the result will overflow.
+ // The only signed value to fail this test is INT_MIN (0x80000000),
+ // but it will be returned correctly from the overflow branch.
+ cmp r2, r1
+ blt LSYM(__f2lz_overflow)
+
+ // Extract the mantissa and restore the implicit '1'. Technically,
+ // this is wrong for subnormals, but they flush to zero regardless.
+ lsls r0, #8
+ adds r0, #1
+ rors r0, r0
+
+ // Calculate the upper word.
+ // If the shift is greater than 32, gives an automatic '0'.
+ movs r1, r0
+ lsrs r1, r2
+
+ // Reduce the shift for the lower word.
+ // If the original shift was less than 32, the result may be split
+ // between the upper and lower words.
+ subs r2, #32
+ blt LSYM(__f2lz_split)
+
+ // Shift is still positive, keep moving right.
+ lsrs r0, r2
+
+ // TODO: Remainder test.
+ // $r1 is technically free, as long as it's zero by the time
+ // this is over.
+
+ LSYM(__f2lz_return):
+ // Two's complement negation, if the original was negative.
+ eors r0, r3
+ eors r1, r3
+ subs r0, r3
+ sbcs r1, r3
+ RET
+
+ LSYM(__f2lz_split):
+ // Shift was negative, calculate the remainder
+ rsbs r2, #0
+ lsls r0, r2
+ b LSYM(__f2lz_return)
+
+ LSYM(__f2lz_zero):
+ eors r1, r1
+ RET
+
+ LSYM(__f2lz_overflow):
+ // Positive unsigned integers (r3 == 0, r1 == 0), return 0xFFFFFFFF.
+ // Negative unsigned integers (r3 == -1, r1 == 0), return 0x00000000.
+ // Positive signed integers (r3 == 0, r1 == 1), return 0x7FFFFFFF.
+ // Negative signed integers (r3 == -1, r1 == 1), return 0x80000000.
+ // TODO: FE_INVALID exception, (but not for -2^63).
+ mvns r0, r3
+
+ // For 32-bit results
+ lsls r2, r1, #26
+ lsls r1, #31
+ ands r2, r1
+ eors r0, r2
+
+// LSYM(__f2lz_zero):
+ eors r1, r0
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixsfdi
+CM0_FUNC_END aeabi_f2lz
+
+
+// unsigned long long __aeabi_f2ulz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to UINT64_MAX.
+// Negative values and NAN all become zero.
+.section .text.sorted.libgcc.fpcore.u.fixunssfdi,"x"
+CM0_FUNC_START aeabi_f2ulz
+CM0_FUNC_ALIAS fixunssfdi aeabi_f2ulz
+ CFI_START_FUNCTION
+
+ eors r1, r1
+ b LSYM(__real_f2lz)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fixunssfdi
+CM0_FUNC_END aeabi_f2ulz
+
+#endif /* L_arm_addsubsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ffloat.S gcc-11-20201220/libgcc/config/arm/cm0/ffloat.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ffloat.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ffloat.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,224 @@
+/* ffixed.S: Cortex M0 optimized int->float conversion
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_floatsisf
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+.section .text.sorted.libgcc.fpcore.p.floatsisf,"x"
+
+// On little-endian cores (including all Cortex-M), __floatsisf() can be
+// implemented as below in 5 instructions. However, it can also be
+// implemented by prefixing a single instruction to __floatdisf().
+// A memory savings of 4 instructions at a cost of only 2 execution cycles
+// seems reasonable enough. Plus, the trade-off only happens in programs
+// that require both __floatsisf() and __floatdisf(). Programs only using
+// __floatsisf() always get the smallest version.
+// When the combined version will be provided, this standalone version
+// must be declared WEAK, so that the combined version can supersede it.
+// '_arm_floatsisf' should appear before '_arm_floatdisf' in LIB1ASMFUNCS.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_WEAK_START aeabi_i2f
+CM0_WEAK_ALIAS floatsisf aeabi_i2f
+#else /* !__OPTIMIZE_SIZE__ */
+CM0_FUNC_START aeabi_i2f
+CM0_FUNC_ALIAS floatsisf aeabi_i2f
+#endif /* !__OPTIMIZE_SIZE__ */
+ CFI_START_FUNCTION
+
+ // Save the sign.
+ asrs r3, r0, #31
+
+ // Absolute value of the input.
+ eors r0, r3
+ subs r0, r3
+
+ // Sign extension to long long unsigned.
+ eors r1, r1
+ b SYM(__internal_uil2f_noswap)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+
+#endif /* L_arm_floatsisf */
+
+
+#ifdef L_arm_floatdisf
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+.section .text.sorted.libgcc.fpcore.p.floatdisf,"x"
+
+// See comments for __floatsisf() above.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_FUNC_START aeabi_i2f
+CM0_FUNC_ALIAS floatsisf aeabi_i2f
+ CFI_START_FUNCTION
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // __floatdisf() expects a big-endian lower word in $r1.
+ movs xxl, r0
+ #endif
+
+ // Sign extension to long long signed.
+ asrs xxh, xxl, #31
+
+#endif /* __OPTIMIZE_SIZE__ */
+
+CM0_FUNC_START aeabi_l2f
+CM0_FUNC_ALIAS floatdisf aeabi_l2f
+
+#if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ CFI_START_FUNCTION
+#endif
+
+ // Save the sign.
+ asrs r3, xxh, #31
+
+ // Absolute value of the input.
+ // Could this be arranged in big-endian mode so that this block also
+ // swapped the input words? Maybe. But, since neither 'eors' nor
+ // 'sbcs' allow a third destination register, it seems unlikely to
+ // save more than one cycle. Also, the size of __floatdisf() and
+ // __floatundisf() together would increase by two instructions.
+ eors xxl, r3
+ eors xxh, r3
+ subs xxl, r3
+ sbcs xxh, r3
+
+ b SYM(__internal_uil2f)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END floatdisf
+CM0_FUNC_END aeabi_l2f
+
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+CM0_FUNC_END floatsisf
+CM0_FUNC_END aeabi_i2f
+#endif
+
+#endif /* L_arm_floatsisf || L_arm_floatdisf */
+
+
+#ifdef L_arm_floatunsisf
+
+// float __aeabi_ui2f(unsigned)
+// Converts an unsigned integer in $r0 to float.
+.section .text.sorted.libgcc.fpcore.q.floatunsisf,"x"
+CM0_FUNC_START aeabi_ui2f
+CM0_FUNC_ALIAS floatunsisf aeabi_ui2f
+ CFI_START_FUNCTION
+
+ #if defined(__ARMEB__) && __ARMEB__
+ // In big-endian mode, function flow breaks down. __floatundisf()
+ // wants to swap word order, but __floatunsisf() does not. The
+ // The choice is between leaving these arguments un-swapped and
+ // branching, or canceling out the word swap in advance.
+ // The branching version would require one extra instruction to
+ // clear the sign ($r3) because of __floatdisf() dependencies.
+ // While the branching version is technically one cycle faster
+ // on the Cortex-M0 pipeline, branchless just feels better.
+
+ // Thus, __floatundisf() expects a big-endian lower word in $r1.
+ movs xxl, r0
+ #endif
+
+ // Extend to unsigned long long and fall through.
+ eors xxh, xxh
+
+#endif /* L_arm_floatunsisf */
+
+
+// The execution of __floatunsisf() flows directly into __floatundisf(), such
+// that instructions must appear consecutively in the same memory section
+// for proper flow control. However, this construction inhibits the ability
+// to discard __floatunsisf() when only using __floatundisf().
+// Therefore, this block configures __floatundisf() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __floatunsisf(). The standalone version
+// must be declared WEAK, so that the combined version can supersede it
+// and provide both symbols when required.
+// '_arm_floatundisf' should appear before '_arm_floatunsisf' in LIB1ASMFUNCS.
+#if defined(L_arm_floatunsisf) || defined(L_arm_floatundisf)
+
+#ifdef L_arm_floatundisf
+// float __aeabi_ul2f(unsigned long long)
+// Converts an unsigned 64-bit integer in $r1:$r0 to a float in $r0.
+.section .text.sorted.libgcc.fpcore.q.floatundisf,"x"
+CM0_WEAK_START aeabi_ul2f
+CM0_WEAK_ALIAS floatundisf aeabi_ul2f
+ CFI_START_FUNCTION
+
+#else
+CM0_FUNC_START aeabi_ul2f
+CM0_FUNC_ALIAS floatundisf aeabi_ul2f
+
+#endif
+
+ // Sign is always positive.
+ eors r3, r3
+
+#ifdef L_arm_floatundisf
+ CM0_WEAK_START internal_uil2f
+#else /* L_clzdi2 */
+ CM0_FUNC_START internal_uil2f
+#endif
+ #if defined(__ARMEB__) && __ARMEB__
+ // Swap word order for register compatibility with __fp_assemble().
+ // Could this be optimized by re-defining __fp_assemble()? Maybe.
+ // But the ramifications of dynamic register assignment on all
+ // the other callers of __fp_assemble() would be enormous.
+ eors r0, r1
+ eors r1, r0
+ eors r0, r1
+ #endif
+
+#ifdef L_arm_floatundisf
+ CM0_WEAK_START internal_uil2f_noswap
+#else /* L_clzdi2 */
+ CM0_FUNC_START internal_uil2f_noswap
+#endif
+ // Default exponent, relative to bit[30] of $r1.
+ movs r2, #(127 - 1 + 63)
+
+ // Format the sign.
+ lsls r3, #31
+ mov ip, r3
+
+ push { rT, lr }
+ b SYM(__fp_assemble)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END floatundisf
+CM0_FUNC_END aeabi_ul2f
+
+#ifdef L_arm_floatunsisf
+CM0_FUNC_END floatunsisf
+CM0_FUNC_END aeabi_ui2f
+#endif
+
+#endif /* L_arm_floatunsisf || L_arm_floatundisf */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fmul.S gcc-11-20201220/libgcc/config/arm/cm0/fmul.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fmul.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fmul.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,215 @@
+/* fmul.S: Cortex M0 optimized 32-bit float multiplication
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_mulsf3
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+.section .text.sorted.libgcc.fpcore.m.fmul,"x"
+CM0_FUNC_START aeabi_fmul
+CM0_FUNC_ALIAS mulsf3 aeabi_fmul
+ CFI_START_FUNCTION
+
+ // Standard registers, compatible with exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save the sign of the result.
+ movs rT, r1
+ eors rT, r0
+ lsrs rT, #31
+ lsls rT, #31
+ mov ip, rT
+
+ // Set up INF for comparison.
+ movs rT, #255
+ lsls rT, #24
+
+ // Check for multiplication by zero.
+ lsls r2, r0, #1
+ beq LSYM(__fmul_zero1)
+
+ lsls r3, r1, #1
+ beq LSYM(__fmul_zero2)
+
+ // Check for INF/NAN.
+ cmp r3, rT
+ bhs LSYM(__fmul_special2)
+
+ cmp r2, rT
+ bhs LSYM(__fmul_special1)
+
+ // Because neither operand is INF/NAN, the result will be finite.
+ // It is now safe to modify the original operand registers.
+ lsls r0, #9
+
+ // Isolate the first exponent. When normal, add back the implicit '1'.
+ // The result is always aligned with the MSB in bit [31].
+ // Subnormal mantissas remain effectively multiplied by 2x relative to
+ // normals, but this works because the weight of a subnormal is -126.
+ lsrs r2, #24
+ beq LSYM(__fmul_normalize2)
+ adds r0, #1
+ rors r0, r0
+
+ LSYM(__fmul_normalize2):
+ // IMPORTANT: exp10i() jumps in here!
+ // Repeat for the mantissa of the second operand.
+ // Short-circuit when the mantissa is 1.0, as the
+ // first mantissa is already prepared in $r0
+ lsls r1, #9
+
+ // When normal, add back the implicit '1'.
+ lsrs r3, #24
+ beq LSYM(__fmul_go)
+ adds r1, #1
+ rors r1, r1
+
+ LSYM(__fmul_go):
+ // Calculate the final exponent, relative to bit [30].
+ adds rT, r2, r3
+ subs rT, #127
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Short-circuit on multiplication by powers of 2.
+ lsls r3, r0, #1
+ beq LSYM(__fmul_simple1)
+
+ lsls r3, r1, #1
+ beq LSYM(__fmul_simple2)
+ #endif
+
+ // Save $ip across the call.
+ // (Alternatively, could push/pop a separate register,
+ // but the four instructions here are equivally fast)
+ // without imposing on the stack.
+ add rT, ip
+
+ // 32x32 unsigned multiplication, 64 bit result.
+ bl SYM(__umulsidi3) __PLT__
+
+ // Separate the saved exponent and sign.
+ sxth r2, rT
+ subs rT, r2
+ mov ip, rT
+
+ b SYM(__fp_assemble)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__fmul_simple2):
+ // Move the high bits of the result to $r1.
+ movs r1, r0
+
+ LSYM(__fmul_simple1):
+ // Clear the remainder.
+ eors r0, r0
+
+ // Adjust mantissa to match the exponent, relative to bit[30].
+ subs r2, rT, #1
+ b SYM(__fp_assemble)
+ #endif
+
+ LSYM(__fmul_zero1):
+ // $r0 was equal to 0, set up to check $r1 for INF/NAN.
+ lsls r2, r1, #1
+
+ LSYM(__fmul_zero2):
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ movs r3, #(INFINITY_TIMES_ZERO)
+ #endif
+
+ // Check the non-zero operand for INF/NAN.
+ // If NAN, it should be returned.
+ // If INF, the result should be NAN.
+ // Otherwise, the result will be +/-0.
+ cmp r2, rT
+ beq SYM(__fp_exception)
+
+ // If the second operand is finite, the result is 0.
+ blo SYM(__fp_zero)
+
+ #if defined(STRICT_NANS) && STRICT_NANS
+ // Restore values that got mixed in zero testing, then go back
+ // to sort out which one is the NAN.
+ lsls r3, r1, #1
+ lsls r2, r0, #1
+ #elif defined(TRAP_NANS) && TRAP_NANS
+ // Return NAN with the sign bit cleared.
+ lsrs r0, r2, #1
+ b SYM(__fp_check_nan)
+ #else
+ lsrs r0, r2, #1
+ // Return NAN with the sign bit cleared.
+ pop { rT, pc }
+ .cfi_restore_state
+ #endif
+
+ LSYM(__fmul_special2):
+ // $r1 is INF/NAN. In case of INF, check $r0 for NAN.
+ cmp r2, rT
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ // Force swap if $r0 is not NAN.
+ bls LSYM(__fmul_swap)
+
+ // $r0 is NAN, keep if $r1 is INF
+ cmp r3, rT
+ beq LSYM(__fmul_special1)
+
+ // Both are NAN, keep the smaller value (more likely to signal).
+ cmp r2, r3
+ #endif
+
+ // Prefer the NAN already in $r0.
+ // (If TRAP_NANS, this is the smaller NAN).
+ bhi LSYM(__fmul_special1)
+
+ LSYM(__fmul_swap):
+ movs r0, r1
+
+ LSYM(__fmul_special1):
+ // $r0 is either INF or NAN. $r1 has already been examined.
+ // Flags are already set correctly.
+ lsls r2, r0, #1
+ cmp r2, rT
+ beq SYM(__fp_infinity)
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ b SYM(__fp_check_nan)
+ #else
+ pop { rT, pc }
+ .cfi_restore_state
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END mulsf3
+CM0_FUNC_END aeabi_fmul
+
+#endif /* L_arm_mulsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fneg.S gcc-11-20201220/libgcc/config/arm/cm0/fneg.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fneg.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fneg.S 2021-01-06 02:45:47.428262284 -0800
@@ -0,0 +1,76 @@
+/* fneg.S: Cortex M0 optimized 32-bit float negation
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_negsf2
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+.section .text.sorted.libgcc.fpcore.a.fneg,"x"
+CM0_FUNC_START aeabi_fneg
+CM0_FUNC_ALIAS negsf2 aeabi_fneg
+ CFI_START_FUNCTION
+
+ #if (defined(STRICT_NANS) && STRICT_NANS) || \
+ (defined(TRAP_NANS) && TRAP_NANS)
+ // Check for NAN.
+ lsls r1, r0, #1
+ movs r2, #255
+ lsls r2, #24
+ cmp r1, r2
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ blo SYM(__fneg_nan)
+ #else
+ blo LSYM(__fneg_return)
+ #endif
+ #endif
+
+ // Flip the sign.
+ movs r1, #1
+ lsls r1, #31
+ eors r0, r1
+
+ LSYM(__fneg_return):
+ RET
+
+ #if defined(TRAP_NANS) && TRAP_NANS
+ LSYM(__fneg_nan):
+ // Set up registers for exception handling.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ b SYM(fp_check_nan)
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END negsf2
+CM0_FUNC_END aeabi_fneg
+
+#endif /* L_arm_negsf2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/fplib.h gcc-11-20201220/libgcc/config/arm/cm0/fplib.h
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/fplib.h 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/fplib.h 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,83 @@
+/* fplib.h: Cortex M0 optimized 64-bit header definitions
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifndef __CM0_FPLIB_H
+#define __CM0_FPLIB_H
+
+/* Enable exception interrupt handler.
+ Exception implementation is opportunistic, and not fully tested. */
+#define TRAP_EXCEPTIONS (0)
+#define EXCEPTION_CODES (0)
+
+/* Perform extra checks to avoid modifying the sign bit of NANs */
+#define STRICT_NANS (0)
+
+/* Trap signaling NANs regardless of context. */
+#define TRAP_NANS (0)
+
+/* TODO: Define service numbers according to the handler requirements */
+#define SVC_TRAP_NAN (0)
+#define SVC_FP_EXCEPTION (0)
+#define SVC_DIVISION_BY_ZERO (0)
+
+/* Push extra registers when required for 64-bit stack alignment */
+#define DOUBLE_ALIGN_STACK (1)
+
+/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */
+#define PEDANTIC_DIV0 (1)
+
+/* Define various exception codes. These don't map to anything in particular */
+#define SUBTRACTED_INFINITY (20)
+#define INFINITY_TIMES_ZERO (21)
+#define DIVISION_0_BY_0 (22)
+#define DIVISION_INF_BY_INF (23)
+#define UNORDERED_COMPARISON (24)
+#define CAST_OVERFLOW (25)
+#define CAST_INEXACT (26)
+#define CAST_UNDEFINED (27)
+
+/* Exception control for quiet NANs.
+ If TRAP_NAN support is enabled, signaling NANs always raise exceptions. */
+#define FCMP_RAISE_EXCEPTIONS 16
+#define FCMP_NO_EXCEPTIONS 0
+
+/* The bit indexes in these assignments are significant. See implementation.
+ They are shared publicly for eventual use by newlib. */
+#define FCMP_3WAY (1)
+#define FCMP_LT (2)
+#define FCMP_EQ (4)
+#define FCMP_GT (8)
+
+#define FCMP_GE (FCMP_EQ | FCMP_GT)
+#define FCMP_LE (FCMP_LT | FCMP_EQ)
+#define FCMP_NE (FCMP_LT | FCMP_GT)
+
+/* These flags affect the result of unordered comparisons. See implementation. */
+#define FCMP_UN_THREE (128)
+#define FCMP_UN_POSITIVE (64)
+#define FCMP_UN_ZERO (32)
+#define FCMP_UN_NEGATIVE (0)
+
+#endif /* __CM0_FPLIB_H */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/futil.S gcc-11-20201220/libgcc/config/arm/cm0/futil.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/futil.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/futil.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,407 @@
+/* futil.S: Cortex M0 optimized 32-bit common routines
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_arm_addsubsf3
+
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+// the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+.section .text.sorted.libgcc.fpcore.y.normf,"x"
+CM0_FUNC_START fp_normalize2
+ CFI_START_FUNCTION
+
+ // Extract the mantissa.
+ lsls r3, r2, #8
+
+ // Extract the exponent.
+ lsrs r2, #24
+ beq SYM(__fp_lalign2)
+
+ // Restore the mantissa's implicit '1'.
+ adds r3, #1
+ rors r3, r3
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+.section .text.sorted.libgcc.fpcore.z.alignf,"x"
+CM0_FUNC_START fp_lalign2
+ CFI_START_FUNCTION
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Unroll the loop, similar to __clzsi2().
+ lsrs rT, r3, #16
+ bne LSYM(__align8)
+ subs r2, #16
+ lsls r3, #16
+
+ LSYM(__align8):
+ lsrs rT, r3, #24
+ bne LSYM(__align4)
+ subs r2, #8
+ lsls r3, #8
+
+ LSYM(__align4):
+ lsrs rT, r3, #28
+ bne LSYM(__align2)
+ subs r2, #4
+ lsls r3, #4
+ #endif
+
+ LSYM(__align2):
+ // Refresh the state of the N flag before entering the loop.
+ tst r3, r3
+
+ LSYM(__align_loop):
+ // Test before subtracting to compensate for the natural exponent.
+ // The largest subnormal should have an exponent of 0, not -1.
+ bmi LSYM(__align_return)
+ subs r2, #1
+ lsls r3, #1
+ bne LSYM(__align_loop)
+
+ // Not just a subnormal... 0! By design, this should never happen.
+ // All callers of this internal function filter 0 as a special case.
+ // Was there an uncontrolled jump from somewhere else? Cosmic ray?
+ eors r2, r2
+
+ #ifdef DEBUG
+ bkpt #0
+ #endif
+
+ LSYM(__align_return):
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_lalign2
+
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1. To avoid underflow (slower),
+// the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2. The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+.section .text.sorted.libgcc.fpcore.g.assemblef,"x"
+CM0_FUNC_START fp_assemble
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Examine the upper three bits [31:29] for underflow.
+ lsrs r3, r1, #29
+ beq LSYM(__fp_underflow)
+
+ // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+ // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+ lsrs r3, r1, #1
+ mvns r3, r3
+ ands r3, r1
+ lsrs r3, #30
+ subs r3, #2
+ rors r1, r3
+
+ // Update the exponent, assuming the final result will be normal.
+ // The new exponent is 1 less than actual, to compensate for the
+ // eventual addition of the implicit '1' in the result.
+ // If the final exponent becomes negative, proceed directly to gradual
+ // underflow, without bothering to search for the MSB.
+ adds r2, r3
+
+CM0_FUNC_START fp_assemble2
+ bmi LSYM(__fp_subnormal)
+
+ LSYM(__fp_normal):
+ // Check for overflow (remember the implicit '1' to be added later).
+ cmp r2, #254
+ bge SYM(__fp_overflow)
+
+ // Save LSBs for the remainder. Position doesn't matter any more,
+ // these are just tiebreakers for round-to-even.
+ lsls rT, r1, #25
+
+ // Align the final result.
+ lsrs r1, #8
+
+ LSYM(__fp_round):
+ // If carry bit is '0', always round down.
+ bcc LSYM(__fp_return)
+
+ // The carry bit is '1'. Round to nearest, ties to even.
+ // If either the saved remainder bits [6:0], the additional remainder
+ // bits in $r1, or the final LSB is '1', round up.
+ lsls r3, r1, #31
+ orrs r3, rT
+ orrs r3, r0
+ beq LSYM(__fp_return)
+
+ // If rounding up overflows, then the mantissa result becomes 2.0,
+ // which yields the correct return value up to and including INF.
+ adds r1, #1
+
+ LSYM(__fp_return):
+ // Combine the mantissa and the exponent.
+ lsls r2, #23
+ adds r0, r1, r2
+
+ // Combine with the saved sign.
+ // End of library call, return to user.
+ add r0, ip
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: Underflow/inexact reporting IFF remainder
+ #endif
+
+ pop { rT, pc }
+ .cfi_restore_state
+
+ LSYM(__fp_underflow):
+ // Set up to align the mantissa.
+ movs r3, r1
+ bne LSYM(__fp_underflow2)
+
+ // MSB wasn't in the upper 32 bits, check the remainder.
+ // If the remainder is also zero, the result is +/-0.
+ movs r3, r0
+ beq SYM(__fp_zero)
+
+ eors r0, r0
+ subs r2, #32
+
+ LSYM(__fp_underflow2):
+ // Save the pre-alignment exponent to align the remainder later.
+ movs r1, r2
+
+ // Align the mantissa with the MSB in bit[31].
+ bl SYM(__fp_lalign2)
+
+ // Calculate the actual remainder shift.
+ subs rT, r1, r2
+
+ // Align the lower bits of the remainder.
+ movs r1, r0
+ lsls r0, rT
+
+ // Combine the upper bits of the remainder with the aligned value.
+ rsbs rT, #0
+ adds rT, #32
+ lsrs r1, rT
+ adds r1, r3
+
+ // The MSB is now aligned at bit[31] of $r1.
+ // If the net exponent is still positive, the result will be normal.
+ // Because this function is used by fmul(), there is a possibility
+ // that the value is still wider than 24 bits; always round.
+ tst r2, r2
+ bpl LSYM(__fp_normal)
+
+ LSYM(__fp_subnormal):
+ // The MSB is aligned at bit[31], with a net negative exponent.
+ // The mantissa will need to be shifted right by the absolute value of
+ // the exponent, plus the normal shift of 8.
+
+ // If the negative shift is smaller than -25, there is no result,
+ // no rounding, no anything. Return signed zero.
+ // (Otherwise, the shift for result and remainder may wrap.)
+ adds r2, #25
+ bmi SYM(__fp_inexact_zero)
+
+ // Save the extra bits for the remainder.
+ movs rT, r1
+ lsls rT, r2
+
+ // Shift the mantissa to create a subnormal.
+ // Just like normal, round to nearest, ties to even.
+ movs r3, #33
+ subs r3, r2
+ eors r2, r2
+
+ // This shift must be last, leaving the shifted LSB in the C flag.
+ lsrs r1, r3
+ b LSYM(__fp_round)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_assemble2
+CM0_FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign. No return.
+// Expects the sign of the result in $ip.
+.section .text.sorted.libgcc.fpcore.h.infinityf,"x"
+CM0_FUNC_START fp_overflow
+ CFI_START_FUNCTION
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: inexact/overflow exception
+ #endif
+
+CM0_FUNC_START fp_infinity
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ movs r0, #255
+ lsls r0, #23
+ add r0, ip
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_infinity
+CM0_FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign. No return.
+// Expects the sign of the result in $ip.
+.section .text.sorted.libgcc.fpcore.i.zerof,"x"
+CM0_FUNC_START fp_inexact_zero
+
+ #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+ // TODO: inexact/underflow exception
+ #endif
+
+CM0_FUNC_START fp_zero
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Return 0 with the correct sign.
+ mov r0, ip
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_zero
+CM0_FUNC_END fp_inexact_zero
+
+
+// Internal function to detect signaling NANs. No return.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.fpcore.j.checkf,"x"
+CM0_FUNC_START fp_check_nan2
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+
+CM0_FUNC_START fp_check_nan
+
+ // Check for quiet NAN.
+ lsrs r2, r0, #23
+ bcs LSYM(__quiet_nan)
+
+ // Raise exception. Preserves both $r0 and $r1.
+ svc #(SVC_TRAP_NAN)
+
+ // Quiet the resulting NAN.
+ movs r2, #1
+ lsls r2, #22
+ orrs r0, r2
+
+ LSYM(__quiet_nan):
+ // End of library call, return to user.
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_check_nan
+CM0_FUNC_END fp_check_nan2
+
+
+// Internal function to report floating point exceptions. No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+.section .text.sorted.libgcc.fpcore.k.exceptf,"x"
+CM0_FUNC_START fp_exception
+ CFI_START_FUNCTION
+
+ // Work around CFI branching limitations.
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Create a quiet NAN.
+ movs r2, #255
+ lsls r2, #1
+ adds r2, #1
+ lsls r2, #22
+
+ #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+ // Annotate the exception type in the NAN field.
+ // Make sure that the exception is in the valid region
+ lsls rT, r3, #13
+ orrs r2, rT
+ #endif
+
+// Exception handler that expects the result already in $r2,
+// typically when the result is not going to be NAN.
+CM0_FUNC_START fp_exception2
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_FP_EXCEPTION)
+ #endif
+
+ // TODO: Save exception flags in a static variable.
+
+ // Set up the result, now that the argument isn't required any more.
+ movs r0, r2
+
+ // HACK: for sincosf(), with 2 parameters to return.
+ movs r1, r2
+
+ // End of library call, return to user.
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END fp_exception2
+CM0_FUNC_END fp_exception
+
+#endif /* L_arm_addsubsf3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/idiv.S gcc-11-20201220/libgcc/config/arm/cm0/idiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/idiv.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/idiv.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,288 @@
+/* div.S: Cortex M0 optimized 32-bit integer division
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#if 0
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+.section .text.sorted.libgcc.idiv0,"x"
+CM0_WEAK_START aeabi_idiv0
+CM0_FUNC_ALIAS cm0_idiv0 aeabi_idiv0
+ CFI_START_FUNCTION
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_DIVISION_BY_ZERO)
+ #endif
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END cm0_idiv0
+CM0_FUNC_END aeabi_idiv0
+
+#endif /* L_dvmd_tls */
+
+
+#ifdef L_divsi3
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+// Same parent section as __divsi3() to keep branches within range.
+.section .text.sorted.libgcc.idiv.divsi3,"x"
+CM0_FUNC_START aeabi_idivmod
+CM0_FUNC_ALIAS aeabi_idiv aeabi_idivmod
+CM0_FUNC_ALIAS divsi3 aeabi_idivmod
+ CFI_START_FUNCTION
+
+ // Extend signs.
+ asrs r2, r0, #31
+ asrs r3, r1, #31
+
+ // Absolute value of the denominator, abort on division by zero.
+ eors r1, r3
+ subs r1, r3
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ beq LSYM(__idivmod_zero)
+ #else
+ beq SYM(__uidivmod_zero)
+ #endif
+
+ // Absolute value of the numerator.
+ eors r0, r2
+ subs r0, r2
+
+ // Keep the sign of the numerator in bit[31] (for the remainder).
+ // Save the XOR of the signs in bits[15:0] (for the quotient).
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ lsrs rT, r3, #16
+ eors rT, r2
+
+ // Handle division as unsigned.
+ bl SYM(__uidivmod_nonzero) __PLT__
+
+ // Set the sign of the remainder.
+ asrs r2, rT, #31
+ eors r1, r2
+ subs r1, r2
+
+ // Set the sign of the quotient.
+ sxth r3, rT
+ eors r0, r3
+ subs r0, r3
+
+ LSYM(__idivmod_return):
+ pop { rT, pc }
+ .cfi_restore_state
+
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ LSYM(__idivmod_zero):
+ // Set up the *div0() parameter specified in the ARM runtime ABI:
+ // * 0 if the numerator is 0,
+ // * Or, the largest value of the type manipulated by the calling
+ // division function if the numerator is positive,
+ // * Or, the least value of the type manipulated by the calling
+ // division function if the numerator is negative.
+ subs r1, r0
+ orrs r0, r1
+ asrs r0, #31
+ lsrs r0, #1
+ eors r0, r2
+
+ // At least the __aeabi_idiv0() call is common.
+ b SYM(__uidivmod_zero2)
+ #endif /* PEDANTIC_DIV0 */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divsi3
+CM0_FUNC_END aeabi_idiv
+CM0_FUNC_END aeabi_idivmod
+
+#endif /* L_divsi3 */
+
+
+#ifdef L_udivsi3
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned int)
+// Returns unsigned $r0 after division by $r1.
+// Also returns the remainder in $r1.
+.section .text.sorted.libgcc.idiv.udivsi3,"x"
+CM0_FUNC_START aeabi_uidivmod
+CM0_FUNC_ALIAS aeabi_uidiv aeabi_uidivmod
+CM0_FUNC_ALIAS udivsi3 aeabi_uidivmod
+ CFI_START_FUNCTION
+
+ // Abort on division by zero.
+ tst r1, r1
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ beq LSYM(__uidivmod_zero)
+ #else
+ beq SYM(__uidivmod_zero)
+ #endif
+
+ #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+ // MAYBE: Optimize division by a power of 2
+ #endif
+
+ // Public symbol for the sake of divsi3().
+ CM0_FUNC_START uidivmod_nonzero
+ // Pre division: Shift the denominator as far as possible left
+ // without making it larger than the numerator.
+ // The loop is destructive, save a copy of the numerator.
+ mov ip, r0
+
+ // Set up binary search.
+ movs r3, #16
+ movs r2, #1
+
+ LSYM(__uidivmod_align):
+ // Prefer dividing the numerator to multipying the denominator
+ // (multiplying the denominator may result in overflow).
+ lsrs r0, r3
+ cmp r0, r1
+ blo LSYM(__uidivmod_skip)
+
+ // Multiply the denominator and the result together.
+ lsls r1, r3
+ lsls r2, r3
+
+ LSYM(__uidivmod_skip):
+ // Restore the numerator, and iterate until search goes to 0.
+ mov r0, ip
+ lsrs r3, #1
+ bne LSYM(__uidivmod_align)
+
+ // In The result $r3 has been conveniently initialized to 0.
+ b LSYM(__uidivmod_entry)
+
+ LSYM(__uidivmod_loop):
+ // Scale the denominator and the quotient together.
+ lsrs r1, #1
+ lsrs r2, #1
+ beq LSYM(__uidivmod_return)
+
+ LSYM(__uidivmod_entry):
+ // Test if the denominator is smaller than the numerator.
+ cmp r0, r1
+ blo LSYM(__uidivmod_loop)
+
+ // If the denominator is smaller, the next bit of the result is '1'.
+ // If the new remainder goes to 0, exit early.
+ adds r3, r2
+ subs r0, r1
+ bne LSYM(__uidivmod_loop)
+
+ LSYM(__uidivmod_return):
+ mov r1, r0
+ mov r0, r3
+ RET
+
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ LSYM(__uidivmod_zero):
+ // Set up the *div0() parameter specified in the ARM runtime ABI:
+ // * 0 if the numerator is 0,
+ // * Or, the largest value of the type manipulated by the calling
+ // division function if the numerator is positive.
+ subs r1, r0
+ orrs r0, r1
+ asrs r0, #31
+
+ CM0_FUNC_START uidivmod_zero2
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+ #else
+ push { lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 4
+ .cfi_rel_offset lr, 0
+ #endif
+
+ // Since GCC implements __aeabi_idiv0() as a weak overridable function,
+ // this call must be prepared for a jump beyond +/- 2 KB.
+ // NOTE: __aeabi_idiv0() can't be implemented as a tail call, since any
+ // non-trivial override will (likely) corrupt a remainder in $r1.
+ bl SYM(__aeabi_idiv0) __PLT__
+
+ // Since the input to __aeabi_idiv0() was INF, there really isn't any
+ // choice in which of the recommended *divmod() patterns to follow.
+ // Clear the remainder to complete {INF, 0}.
+ eors r1, r1
+
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { rT, pc }
+ .cfi_restore_state
+ #else
+ pop { pc }
+ .cfi_restore_state
+ #endif
+
+ #else /* !PEDANTIC_DIV0 */
+ CM0_FUNC_START uidivmod_zero
+ // NOTE: The following code sets up a return pair of {0, numerator},
+ // the second preference given by the ARM runtime ABI specification.
+ // The pedantic version is 18 bytes larger between __aeabi_idiv() and
+ // __aeabi_uidiv(). However, this version does not conform to the
+ // out-of-line parameter requirements given for __aeabi_idiv0(), and
+ // also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+
+ // Since the numerator may be overwritten by __aeabi_idiv0(), save now.
+ // Afterwards, it can be restored directly as the remainder.
+ push { r0, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset r0, 0
+ .cfi_rel_offset lr, 4
+
+ // Set up the quotient (not ABI compliant).
+ eors r0, r0
+
+ // Since GCC implements div0() as a weak overridable function,
+ // this call must be prepared for a jump beyond +/- 2 KB.
+ bl SYM(__aeabi_idiv0) __PLT__
+
+ // Restore the remainder and return.
+ pop { r1, pc }
+ .cfi_restore_state
+
+ #endif /* !PEDANTIC_DIV0 */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END udivsi3
+CM0_FUNC_END aeabi_uidiv
+CM0_FUNC_END aeabi_uidivmod
+
+#endif /* L_udivsi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lcmp.S gcc-11-20201220/libgcc/config/arm/cm0/lcmp.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lcmp.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lcmp.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,136 @@
+/* lcmp.S: Cortex M0 optimized 64-bit integer comparison
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#if defined(L_arm_lcmp) || defined(L_cmpdi2)
+
+#ifdef L_arm_lcmp
+.section .text.sorted.libgcc.lcmp,"x"
+ #define LCMP_NAME aeabi_lcmp
+#else
+.section .text.sorted.libgcc.cmpdi2,"x"
+ #define LCMP_NAME cmpdi2
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+CM0_FUNC_START LCMP_NAME
+ CFI_START_FUNCTION
+
+ // Calculate the difference $r1:$r0 - $r3:$r2.
+ subs xxl, yyl
+ sbcs xxh, yyh
+
+ // With $r2 free, create a reference offset without affecting flags.
+ // Originally implemented as 'mov r2, r3' for ARM architectures 6+
+ // with unified syntex. However, this resulted in a compiler error
+ // for thumb-1: "MOV Rd, Rs with two low registers not permitted".
+ // Since unified syntax deprecates the "cpy" instruction, shouldn't
+ // there be a backwards-compatible tranlation in the assembler?
+ cpy r2, r3
+
+ // Finish the comparison.
+ blt LSYM(__lcmp_lt)
+
+ // The reference offset ($r2 - $r3) will be +2 iff the first
+ // argument is larger, otherwise the reference offset remains 0.
+ adds r2, #2
+
+ LSYM(__lcmp_lt):
+ // Check for zero equality (all 64 bits).
+ // It doesn't matter which register was originally "hi".
+ orrs r0, r1
+ beq LSYM(__lcmp_return)
+
+ // Convert the relative offset to an absolute value +/-1.
+ subs r0, r2, r3
+ subs r0, #1
+
+ LSYM(__lcmp_return):
+ #ifdef L_cmpdi2
+ // Shift to the correct output specification.
+ adds r0, #1
+ #endif
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END LCMP_NAME
+
+#endif /* L_arm_lcmp || L_cmpdi2 */
+
+
+#if defined(L_arm_ulcmp) || defined(L_ucmpdi2)
+
+#ifdef L_arm_ulcmp
+.section .text.sorted.libgcc.ulcmp,"x"
+ #define ULCMP_NAME aeabi_ulcmp
+#else
+.section .text.sorted.libgcc.ucmpdi2,"x"
+ #define ULCMP_NAME ucmpdi2
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+CM0_FUNC_START ULCMP_NAME
+ CFI_START_FUNCTION
+
+ // Calculate the 'C' flag.
+ subs xxl, yyl
+ sbcs xxh, yyh
+
+ // Capture the carry flg.
+ // $r2 will contain -1 if the first value is smaller,
+ // 0 if the first value is larger or equal.
+ sbcs r2, r2
+
+ // Check for zero equality (all 64 bits).
+ // It doesn't matter which register was originally "hi".
+ orrs r0, r1
+ beq LSYM(__ulcmp_return)
+
+ // $r0 should contain +1 or -1
+ movs r0, #1
+ orrs r0, r2
+
+ LSYM(__ulcmp_return):
+ #ifdef L_ucmpdi2
+ adds r0, #1
+ #endif
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ULCMP_NAME
+
+#endif /* L_arm_ulcmp || L_ucmpdi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/ldiv.S gcc-11-20201220/libgcc/config/arm/cm0/ldiv.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/ldiv.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/ldiv.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,482 @@
+/* ldiv.S: Cortex M0 optimized 64-bit integer division
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#if 0
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+.section .text.sorted.libgcc.ldiv0,"x"
+CM0_WEAK_START aeabi_ldiv0
+ CFI_START_FUNCTION
+
+ #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+ svc #(SVC_DIVISION_BY_ZERO)
+ #endif
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_ldiv0
+
+#endif /* L_dvmd_tls */
+
+
+#ifdef L_divdi3
+
+// long long __aeabi_ldiv(long long, long long)
+// lldiv_return __aeabi_ldivmod(long long, long long)
+// Returns signed $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+// Same parent section as __divsi3() to keep branches within range.
+.section .text.sorted.libgcc.ldiv.divdi3,"x"
+CM0_FUNC_START aeabi_ldivmod
+CM0_FUNC_ALIAS aeabi_ldiv aeabi_ldivmod
+CM0_FUNC_ALIAS divdi3 aeabi_ldivmod
+ CFI_START_FUNCTION
+
+ // Test the denominator for zero before pushing registers.
+ cmp yyl, #0
+ bne LSYM(__ldivmod_valid)
+
+ cmp yyh, #0
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ beq LSYM(__ldivmod_zero)
+ #else
+ beq SYM(__uldivmod_zero)
+ #endif
+
+ LSYM(__ldivmod_valid):
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { rP, rQ, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 16
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset rT, 8
+ .cfi_rel_offset lr, 12
+ #else
+ push { rP, rQ, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 12
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset lr, 8
+ #endif
+
+ // Absolute value of the numerator.
+ asrs rP, xxh, #31
+ eors xxl, rP
+ eors xxh, rP
+ subs xxl, rP
+ sbcs xxh, rP
+
+ // Absolute value of the denominator.
+ asrs rQ, yyh, #31
+ eors yyl, rQ
+ eors yyh, rQ
+ subs yyl, rQ
+ sbcs yyh, rQ
+
+ // Keep the XOR of signs for the quotient.
+ eors rQ, rP
+
+ // Handle division as unsigned.
+ bl SYM(__uldivmod_nonzero) __PLT__
+
+ // Set the sign of the quotient.
+ eors xxl, rQ
+ eors xxh, rQ
+ subs xxl, rQ
+ sbcs xxh, rQ
+
+ // Set the sign of the remainder.
+ eors yyl, rP
+ eors yyh, rP
+ subs yyl, rP
+ sbcs yyh, rP
+
+ LSYM(__ldivmod_return):
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { rP, rQ, rT, pc }
+ .cfi_restore_state
+ #else
+ pop { rP, rQ, pc }
+ .cfi_restore_state
+ #endif
+
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ LSYM(__ldivmod_zero):
+ // Save the sign of the numerator.
+ asrs yyl, xxh, #31
+
+ // Set up the *div0() parameter specified in the ARM runtime ABI:
+ // * 0 if the numerator is 0,
+ // * Or, the largest value of the type manipulated by the calling
+ // division function if the numerator is positive,
+ // * Or, the least value of the type manipulated by the calling
+ // division function if the numerator is negative.
+ rsbs xxl, #0
+ sbcs yyh, xxh
+ orrs xxh, yyh
+ asrs xxl, xxh, #31
+ lsrs xxh, xxl, #1
+ eors xxh, yyl
+ eors xxl, yyl
+
+ // At least the __aeabi_ldiv0() call is common.
+ b SYM(__uldivmod_zero2)
+ #endif /* PEDANTIC_DIV0 */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END divdi3
+CM0_FUNC_END aeabi_ldiv
+CM0_FUNC_END aeabi_ldivmod
+
+#endif /* L_divdi3 */
+
+
+#ifdef L_udivdi3
+
+// unsigned long long __aeabi_uldiv(unsigned long long, unsigned long long)
+// ulldiv_return __aeabi_uldivmod(unsigned long long, unsigned long long)
+// Returns unsigned $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+.section .text.sorted.libgcc.ldiv.udivdi3,"x"
+CM0_FUNC_START aeabi_uldivmod
+CM0_FUNC_ALIAS aeabi_uldiv aeabi_uldivmod
+CM0_FUNC_ALIAS udivdi3 aeabi_uldivmod
+ CFI_START_FUNCTION
+
+ // Test the denominator for zero before changing the stack.
+ cmp yyh, #0
+ bne SYM(__uldivmod_nonzero)
+
+ cmp yyl, #0
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ beq LSYM(__uldivmod_zero)
+ #else
+ beq SYM(__uldivmod_zero)
+ #endif
+
+ #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+ // MAYBE: Optimize division by a power of 2
+ #endif
+
+ CM0_FUNC_START uldivmod_nonzero
+ push { rP, rQ, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 16
+ .cfi_rel_offset rP, 0
+ .cfi_rel_offset rQ, 4
+ .cfi_rel_offset rT, 8
+ .cfi_rel_offset lr, 12
+
+ // Set up denominator shift, assuming a single width result.
+ movs rP, #32
+
+ // If the upper word of the denominator is 0 ...
+ tst yyh, yyh
+ bne LSYM(__uldivmod_setup)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // ... and the upper word of the numerator is also 0,
+ // single width division will be at least twice as fast.
+ tst xxh, xxh
+ beq LSYM(__uldivmod_small)
+ #endif
+
+ // ... and the lower word of the denominator is less than or equal
+ // to the upper word of the numerator ...
+ cmp xxh, yyl
+ blo LSYM(__uldivmod_setup)
+
+ // ... then the result will be double width, at least 33 bits.
+ // Set up a flag in $rP to seed the shift for the second word.
+ movs yyh, yyl
+ eors yyl, yyl
+ adds rP, #64
+
+ LSYM(__uldivmod_setup):
+ // Pre division: Shift the denominator as far as possible left
+ // without making it larger than the numerator.
+ // Since search is destructive, first save a copy of the numerator.
+ mov ip, xxl
+ mov lr, xxh
+
+ // Set up binary search.
+ movs rQ, #16
+ eors rT, rT
+
+ LSYM(__uldivmod_align):
+ // Maintain a secondary shift $rT = 32 - $rQ, making the overlapping
+ // shifts between low and high words easier to construct.
+ adds rT, rQ
+
+ // Prefer dividing the numerator to multipying the denominator
+ // (multiplying the denominator may result in overflow).
+ lsrs xxh, rQ
+
+ // Measure the high bits of denominator against the numerator.
+ cmp xxh, yyh
+ blo LSYM(__uldivmod_skip)
+ bhi LSYM(__uldivmod_shift)
+
+ // If the high bits are equal, construct the low bits for checking.
+ mov xxh, lr
+ lsls xxh, rT
+
+ lsrs xxl, rQ
+ orrs xxh, xxl
+
+ cmp xxh, yyl
+ blo LSYM(__uldivmod_skip)
+
+ LSYM(__uldivmod_shift):
+ // Scale the denominator and the result together.
+ subs rP, rQ
+
+ // If the reduced numerator is still larger than or equal to the
+ // denominator, it is safe to shift the denominator left.
+ movs xxh, yyl
+ lsrs xxh, rT
+ lsls yyh, rQ
+
+ lsls yyl, rQ
+ orrs yyh, xxh
+
+ LSYM(__uldivmod_skip):
+ // Restore the numerator.
+ mov xxl, ip
+ mov xxh, lr
+
+ // Iterate until the shift goes to 0.
+ lsrs rQ, #1
+ bne LSYM(__uldivmod_align)
+
+ // Initialize the result (zero).
+ mov ip, rQ
+
+ // HACK: Compensate for the first word test.
+ lsls rP, #6
+
+ LSYM(__uldivmod_word2):
+ // Is there another word?
+ lsrs rP, #6
+ beq LSYM(__uldivmod_return)
+
+ // Shift the calculated result by 1 word.
+ mov lr, ip
+ mov ip, rQ
+
+ // Set up the MSB of the next word of the quotient
+ movs rQ, #1
+ rors rQ, rP
+ b LSYM(__uldivmod_entry)
+
+ LSYM(__uldivmod_loop):
+ // Divide the denominator by 2.
+ // It could be slightly faster to multiply the numerator,
+ // but that would require shifting the remainder at the end.
+ lsls rT, yyh, #31
+ lsrs yyh, #1
+ lsrs yyl, #1
+ adds yyl, rT
+
+ // Step to the next bit of the result.
+ lsrs rQ, #1
+ beq LSYM(__uldivmod_word2)
+
+ LSYM(__uldivmod_entry):
+ // Test if the denominator is smaller, high byte first.
+ cmp xxh, yyh
+ blo LSYM(__uldivmod_loop)
+ bhi LSYM(__uldivmod_quotient)
+
+ cmp xxl, yyl
+ blo LSYM(__uldivmod_loop)
+
+ LSYM(__uldivmod_quotient):
+ // Smaller denominator: the next bit of the quotient will be set.
+ add ip, rQ
+
+ // Subtract the denominator from the remainder.
+ // If the new remainder goes to 0, exit early.
+ subs xxl, yyl
+ sbcs xxh, yyh
+ bne LSYM(__uldivmod_loop)
+
+ tst xxl, xxl
+ bne LSYM(__uldivmod_loop)
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ // Check whether there's still a second word to calculate.
+ lsrs rP, #6
+ beq LSYM(__uldivmod_return)
+
+ // If so, shift the result left by a full word.
+ mov lr, ip
+ mov ip, xxh // zero
+ #else
+ eors rQ, rQ
+ b LSYM(__uldivmod_word2)
+ #endif
+
+ LSYM(__uldivmod_return):
+ // Move the remainder to the second half of the result.
+ movs yyl, xxl
+ movs yyh, xxh
+
+ // Move the quotient to the first half of the result.
+ mov xxl, ip
+ mov xxh, lr
+
+ pop { rP, rQ, rT, pc }
+ .cfi_restore_state
+
+ #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+ LSYM(__uldivmod_zero):
+ // Set up the *div0() parameter specified in the ARM runtime ABI:
+ // * 0 if the numerator is 0,
+ // * Or, the largest value of the type manipulated by the calling
+ // division function if the numerator is positive.
+ subs yyl, xxl
+ sbcs yyh, xxh
+ orrs xxh, yyh
+ asrs xxh, #31
+ movs xxl, xxh
+
+ CM0_FUNC_START uldivmod_zero2
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+ #else
+ push { lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 4
+ .cfi_rel_offset lr, 0
+ #endif
+
+ // Since GCC implements __aeabi_ldiv0() as a weak overridable function,
+ // this call must be prepared for a jump beyond +/- 2 KB.
+ // NOTE: __aeabi_ldiv0() can't be implemented as a tail call, since any
+ // non-trivial override will (likely) corrupt a remainder in $r3:$r2.
+ bl SYM(__aeabi_ldiv0) __PLT__
+
+ // Since the input to __aeabi_ldiv0() was INF, there really isn't any
+ // choice in which of the recommended *divmod() patterns to follow.
+ // Clear the remainder to complete {INF, 0}.
+ eors yyl, yyl
+ eors yyh, yyh
+
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { rT, pc }
+ .cfi_restore_state
+ #else
+ pop { pc }
+ .cfi_restore_state
+ #endif
+
+ #else /* !PEDANTIC_DIV0 */
+ CM0_FUNC_START uldivmod_zero
+ // NOTE: The following code sets up a return pair of {0, numerator},
+ // the second preference given by the ARM runtime ABI specification.
+ // The pedantic version is 30 bytes larger between __aeabi_ldiv() and
+ // __aeabi_uldiv(). However, this version does not conform to the
+ // out-of-line parameter requirements given for __aeabi_ldiv0(), and
+ // also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+
+ // Since the numerator may be overwritten by __aeabi_ldiv0(), save now.
+ // Afterwards, they can be restored directly as the remainder.
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ push { r0, r1, rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 16
+ .cfi_rel_offset xxl,0
+ .cfi_rel_offset xxh,4
+ .cfi_rel_offset rT, 8
+ .cfi_rel_offset lr, 12
+ #else
+ push { r0, r1, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 12
+ .cfi_rel_offset xxl,0
+ .cfi_rel_offset xxh,4
+ .cfi_rel_offset lr, 8
+ #endif
+
+ // Set up the quotient.
+ eors xxl, xxl
+ eors xxh, xxh
+
+ // Since GCC implements div0() as a weak overridable function,
+ // this call must be prepared for a jump beyond +/- 2 KB.
+ bl SYM(__aeabi_ldiv0) __PLT__
+
+ // Restore the remainder and return.
+ #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+ pop { r2, r3, rT, pc }
+ .cfi_restore_state
+ #else
+ pop { r2, r3, pc }
+ .cfi_restore_state
+ #endif
+ #endif /* !PEDANTIC_DIV0 */
+
+ #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+ LSYM(__uldivmod_small):
+ // Arrange operands for (much faster) 32-bit division.
+ #if defined(__ARMEB__) && __ARMEB__
+ movs r0, r1
+ movs r1, r3
+ #else
+ movs r1, r2
+ #endif
+
+ bl SYM(__uidivmod_nonzero) __PLT__
+
+ // Arrange results back into 64-bit format.
+ #if defined(__ARMEB__) && __ARMEB__
+ movs r3, r1
+ movs r1, r0
+ #else
+ movs r2, r1
+ #endif
+
+ // Extend quotient and remainder to 64 bits, unsigned.
+ eors xxh, xxh
+ eors yyh, yyh
+ pop { rP, rQ, rT, pc }
+ #endif
+
+ CFI_END_FUNCTION
+CM0_FUNC_END udivdi3
+CM0_FUNC_END aeabi_uldiv
+CM0_FUNC_END aeabi_uldivmod
+
+#endif /* L_udivdi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lmul.S gcc-11-20201220/libgcc/config/arm/cm0/lmul.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lmul.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lmul.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,213 @@
+/* lmul.S: Cortex M0 optimized 64-bit integer multiplication
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_muldi3
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+.section .text.sorted.libgcc.lmul.muldi3,"x"
+CM0_FUNC_START aeabi_lmul
+CM0_FUNC_ALIAS muldi3 aeabi_lmul
+ CFI_START_FUNCTION
+
+ // $r1:$r0 = 0xDDDDCCCCBBBBAAAA
+ // $r3:$r2 = 0xZZZZYYYYXXXXWWWW
+
+ // The following operations that only affect the upper 64 bits
+ // can be safely discarded:
+ // DDDD * ZZZZ
+ // DDDD * YYYY
+ // DDDD * XXXX
+ // CCCC * ZZZZ
+ // CCCC * YYYY
+ // BBBB * ZZZZ
+
+ // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+ // 'muls' instruction, and skip over the operation in that case.
+
+ // (0xDDDDCCCC * 0xXXXXWWWW), free $r1
+ muls xxh, yyl
+
+ // (0xZZZZYYYY * 0xBBBBAAAA), free $r3
+ muls yyh, xxl
+ adds yyh, xxh
+
+ // Put the parameters in the correct form for umulsidi3().
+ movs xxh, yyl
+ b LSYM(__mul_overflow)
+
+ CFI_END_FUNCTION
+CM0_FUNC_END aeabi_lmul
+CM0_FUNC_END muldi3
+
+#endif /* L_muldi3 */
+
+
+// The following implementation of __umulsidi3() integrates with __muldi3()
+// above to allow the fast tail call while still preserving the extra
+// hi-shifted bits of the result. However, these extra bits add a few
+// instructions not otherwise required when using only __umulsidi3().
+// Therefore, this block configures __umulsidi3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version adds the hi bits of __muldi3(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols in programs that multiply long doubles.
+// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS.
+#if defined(L_muldi3) || defined(L_umulsidi3)
+
+#ifdef L_umulsidi3
+// unsigned long long __umulsidi3(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+.section .text.sorted.libgcc.lmul.umulsidi3,"x"
+CM0_WEAK_START umulsidi3
+ CFI_START_FUNCTION
+
+#else /* L_muldi3 */
+CM0_FUNC_START umulsidi3
+ CFI_START_FUNCTION
+
+ // 32x32 multiply with 64 bit result.
+ // Expand the multiply into 4 parts, since muls only returns 32 bits.
+ // (a16h * b16h / 2^32)
+ // + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+ // + (a16l * b16l / 2^64)
+
+ // MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+ // 'muls' instruction, and skip over the operation in that case.
+
+ eors yyh, yyh
+
+ LSYM(__mul_overflow):
+ mov ip, yyh
+
+#endif /* !L_muldi3 */
+
+ // a16h * b16h
+ lsrs r2, xxl, #16
+ lsrs r3, xxh, #16
+ muls r2, r3
+
+ #ifdef L_muldi3
+ add ip, r2
+ #else
+ mov ip, r2
+ #endif
+
+ // a16l * b16h; save a16h first!
+ lsrs r2, xxl, #16
+ #if (__ARM_ARCH >= 6)
+ uxth xxl, xxl
+ #else /* __ARM_ARCH < 6 */
+ lsls xxl, #16
+ lsrs xxl, #16
+ #endif
+ muls r3, xxl
+
+ // a16l * b16l
+ #if (__ARM_ARCH >= 6)
+ uxth xxh, xxh
+ #else /* __ARM_ARCH < 6 */
+ lsls xxh, #16
+ lsrs xxh, #16
+ #endif
+ muls xxl, xxh
+
+ // a16h * b16l
+ muls xxh, r2
+
+ // Distribute intermediate results.
+ eors r2, r2
+ adds xxh, r3
+ adcs r2, r2
+ lsls r3, xxh, #16
+ lsrs xxh, #16
+ lsls r2, #16
+ adds xxl, r3
+ adcs xxh, r2
+
+ // Add in the high bits.
+ add xxh, ip
+
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END umulsidi3
+
+#endif /* L_muldi3 || L_umulsidi3 */
+
+
+#ifdef L_mulsidi3
+
+// long long mulsidi3(int, int)
+// Returns all 64 bits of a 32 bit signed multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $rT as scratch space.
+.section .text.sorted.libgcc.lmul.mulsidi3,"x"
+CM0_FUNC_START mulsidi3
+ CFI_START_FUNCTION
+
+ // Push registers for function call.
+ push { rT, lr }
+ .cfi_remember_state
+ .cfi_adjust_cfa_offset 8
+ .cfi_rel_offset rT, 0
+ .cfi_rel_offset lr, 4
+
+ // Save signs of the arguments.
+ asrs r3, r0, #31
+ asrs rT, r1, #31
+
+ // Absolute value of the arguments.
+ eors r0, r3
+ eors r1, rT
+ subs r0, r3
+ subs r1, rT
+
+ // Save sign of the result.
+ eors rT, r3
+
+ bl SYM(__umulsidi3) __PLT__
+
+ // Apply sign of the result.
+ eors xxl, rT
+ eors xxh, rT
+ subs xxl, rT
+ sbcs xxh, rT
+
+ pop { rT, pc }
+ .cfi_restore_state
+
+ CFI_END_FUNCTION
+CM0_FUNC_END mulsidi3
+
+#endif /* L_mulsidi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/lshift.S gcc-11-20201220/libgcc/config/arm/cm0/lshift.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/lshift.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/lshift.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,202 @@
+/* lshift.S: Cortex M0 optimized 64-bit integer shift
+
+ Copyright (C) 2018-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_ashldi3
+
+// long long __aeabi_llsl(long long, int)
+// Logical shift left the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.ashldi3,"x"
+CM0_FUNC_START aeabi_llsl
+CM0_FUNC_ALIAS ashldi3 aeabi_llsl
+ CFI_START_FUNCTION
+
+ #if defined(__thumb__) && __thumb__
+
+ // Save a copy for the remainder.
+ movs r3, xxl
+
+ // Assume a simple shift.
+ lsls xxl, r2
+ lsls xxh, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__llsl_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsrs r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__llsl_large):
+ // Apply any remaining shift
+ lsls r3, r2
+
+ // Merge remainder and result.
+ adds xxh, r3
+ RET
+
+ #else /* !__thumb__ */
+
+ // Moved here from lib1funcs.S
+ subs r3, r2, #32
+ rsb ip, r2, #32
+ movmi xxh, xxh, lsl r2
+ movpl xxh, xxl, lsl r3
+ orrmi xxh, xxh, xxl, lsr ip
+ mov xxl, xxl, lsl r2
+ RET
+
+ #endif /* !__thumb__ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ashldi3
+CM0_FUNC_END aeabi_llsl
+
+#endif /* L_ashldi3 */
+
+
+#ifdef L_lshrdi3
+
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.lshrdi3,"x"
+CM0_FUNC_START aeabi_llsr
+CM0_FUNC_ALIAS lshrdi3 aeabi_llsr
+ CFI_START_FUNCTION
+
+ #if defined(__thumb__) && __thumb__
+
+ // Save a copy for the remainder.
+ movs r3, xxh
+
+ // Assume a simple shift.
+ lsrs xxl, r2
+ lsrs xxh, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__llsr_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsls r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__llsr_large):
+ // Apply any remaining shift
+ lsrs r3, r2
+
+ // Merge remainder and result.
+ adds xxl, r3
+ RET
+
+ #else /* !__thumb__ */
+
+ // Moved here from lib1funcs.S
+ subs r3, r2, #32
+ rsb ip, r2, #32
+ movmi xxl, xxl, lsr r2
+ movpl xxl, xxh, lsr r3
+ orrmi xxl, xxl, xxh, lsl ip
+ mov xxh, xxh, lsr r2
+ RET
+
+ #endif /* !__thumb__ */
+
+
+ CFI_END_FUNCTION
+CM0_FUNC_END lshrdi3
+CM0_FUNC_END aeabi_llsr
+
+#endif /* L_lshrdi3 */
+
+
+#ifdef L_ashrdi3
+
+// long long __aeabi_lasr(long long, int)
+// Arithmetic shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.ashrdi3,"x"
+CM0_FUNC_START aeabi_lasr
+CM0_FUNC_ALIAS ashrdi3 aeabi_lasr
+ CFI_START_FUNCTION
+
+ #if defined(__thumb__) && __thumb__
+
+ // Save a copy for the remainder.
+ movs r3, xxh
+
+ // Assume a simple shift.
+ lsrs xxl, r2
+ asrs xxh, r2
+
+ // Test if the shift distance is larger than 1 word.
+ subs r2, #32
+ bhs LSYM(__lasr_large)
+
+ // The remainder is opposite the main shift, (32 - x) bits.
+ rsbs r2, #0
+ lsls r3, r2
+
+ // Cancel any remaining shift.
+ eors r2, r2
+
+ LSYM(__lasr_large):
+ // Apply any remaining shift
+ asrs r3, r2
+
+ // Merge remainder and result.
+ adds xxl, r3
+ RET
+
+ #else /* !__thumb__ */
+
+ // Moved here from lib1funcs.S
+ subs r3, r2, #32
+ rsb ip, r2, #32
+ movmi xxl, xxl, lsr r2
+ movpl xxl, xxh, asr r3
+ orrmi xxl, xxl, xxh, lsl ip
+ mov xxh, xxh, asr r2
+ RET
+
+ #endif /* !__thumb__ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END ashrdi3
+CM0_FUNC_END aeabi_lasr
+
+#endif /* L_ashrdi3 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/parity.S gcc-11-20201220/libgcc/config/arm/cm0/parity.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/parity.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/parity.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,122 @@
+/* parity.S: Cortex M0 optimized parity functions
+
+ Copyright (C) 2020-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_paritydi2
+
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.paritydi2,"x"
+CM0_FUNC_START paritydi2
+ CFI_START_FUNCTION
+
+ // Combine the upper and lower words, then fall through.
+ // Byte-endianness does not matter for this function.
+ eors r0, r1
+
+#endif /* L_paritydi2 */
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+// such that instructions must appear consecutively in the same memory
+// section for proper flow control. However, this construction inhibits
+// the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __paritydi2(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2)
+
+#ifdef L_paritysi2
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.paritysi2,"x"
+CM0_WEAK_START paritysi2
+ CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+CM0_FUNC_START paritysi2
+
+#endif
+
+ #if defined(__thumb__) && __thumb__
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+ // Size optimized: 16 bytes, 40 cycles
+ // Speed optimized: 24 bytes, 14 cycles
+ movs r2, #16
+
+ LSYM(__parity_loop):
+ // Calculate the parity of successively smaller half-words into the MSB.
+ movs r1, r0
+ lsls r1, r2
+ eors r0, r1
+ lsrs r2, #1
+ bne LSYM(__parity_loop)
+
+ #else /* !__OPTIMIZE_SIZE__ */
+
+ // Unroll the loop. The 'libgcc' reference C implementation replaces
+ // the x2 and the x1 shifts with a constant. However, since it takes
+ // 4 cycles to load, index, and mask the constant result, it doesn't
+ // cost anything to keep shifting (and saves a few bytes).
+ lsls r1, r0, #16
+ eors r0, r1
+ lsls r1, r0, #8
+ eors r0, r1
+ lsls r1, r0, #4
+ eors r0, r1
+ lsls r1, r0, #2
+ eors r0, r1
+ lsls r1, r0, #1
+ eors r0, r1
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+ #else /* !__thumb__ */
+
+ eors r0, r0, r0, lsl #16
+ eors r0, r0, r0, lsl #8
+ eors r0, r0, r0, lsl #4
+ eors r0, r0, r0, lsl #2
+ eors r0, r0, r0, lsl #1
+
+ #endif /* !__thumb__ */
+
+ lsrs r0, #31
+ RET
+
+ CFI_END_FUNCTION
+CM0_FUNC_END paritysi2
+
+#ifdef L_paritydi2
+CM0_FUNC_END paritydi2
+#endif
+
+#endif /* L_paritysi2 || L_paritydi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/cm0/popcnt.S gcc-11-20201220/libgcc/config/arm/cm0/popcnt.S
--- gcc-11-20201220-clean/libgcc/config/arm/cm0/popcnt.S 1969-12-31 16:00:00.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/cm0/popcnt.S 2021-01-06 02:45:47.432262214 -0800
@@ -0,0 +1,199 @@
+/* popcnt.S: Cortex M0 optimized popcount functions
+
+ Copyright (C) 2020-2021 Free Software Foundation, Inc.
+ Contributed by Daniel Engel (gnu@danielengel.com)
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#ifdef L_popcountdi2
+
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+.section .text.sorted.libgcc.popcountdi2,"x"
+CM0_FUNC_START popcountdi2
+ CFI_START_FUNCTION
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Initialize the result.
+ // Compensate for the two extra loop (one for each word)
+ // required to detect zero arguments.
+ movs r2, #2
+
+ LSYM(__popcountd_loop):
+ // Same as __popcounts_loop below, except for $r1.
+ subs r2, #1
+ subs r3, r1, #1
+ ands r1, r3
+ bcs LSYM(__popcountd_loop)
+
+ // Repeat the operation for the second word.
+ b LSYM(__popcounts_loop)
+
+ #else /* !__OPTIMIZE_SIZE__ */
+ // Load the one-bit alternating mask.
+ ldr r3, LSYM(__popcount_1b)
+
+ // Reduce the second word.
+ lsrs r2, r1, #1
+ ands r2, r3
+ subs r1, r2
+
+ // Reduce the first word.
+ lsrs r2, r0, #1
+ ands r2, r3
+ subs r0, r2
+
+ // Load the two-bit alternating mask.
+ ldr r3, LSYM(__popcount_2b)
+
+ // Reduce the second word.
+ lsrs r2, r1, #2
+ ands r2, r3
+ ands r1, r3
+ adds r1, r2
+
+ // Reduce the first word.
+ lsrs r2, r0, #2
+ ands r2, r3
+ ands r0, r3
+ adds r0, r2
+
+ // There will be a maximum of 8 bits in each 4-bit field.
+ // Jump into the single word flow to combine and complete.
+ b LSYM(__popcounts_merge)
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+// such that instructions must appear consecutively in the same memory
+// section for proper flow control. However, this construction inhibits
+// the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+// version is the continuation of __popcountdi2(). The standalone version must
+// be declared WEAK, so that the combined version can supersede it and
+// provide both symbols when required.
+// '_popcountsi2' should appear before '_popcountdi2' in LIB1ASMFUNCS.
+#if defined(L_popcountsi2) || defined(L_popcountdi2)
+
+#ifdef L_popcountsi2
+// int __popcountsi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+.section .text.sorted.libgcc.popcountsi2,"x"
+CM0_WEAK_START popcountsi2
+ CFI_START_FUNCTION
+
+#else /* L_popcountdi2 */
+CM0_FUNC_START popcountsi2
+
+#endif
+
+ #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+ // Initialize the result.
+ // Compensate for the extra loop required to detect zero.
+ movs r2, #1
+
+ // Kernighan's algorithm for __popcount(x):
+ // for (c = 0; x; c++)
+ // x &= x - 1;
+
+ LSYM(__popcounts_loop):
+ // Every loop counts for a '1' set in the argument.
+ // Count down since it's easier to initialize positive compensation,
+ // and the negation before function return is free.
+ subs r2, #1
+
+ // Clear one bit per loop.
+ subs r3, r0, #1
+ ands r0, r3
+
+ // If this is a test for zero, it will be impossible to distinguish
+ // between zero and one bits set: both terminate after one loop.
+ // Instead, subtraction underflow flags when zero entered the loop.
+ bcs LSYM(__popcountd_loop)
+
+ // Invert the result, since we have been counting negative.
+ rsbs r0, r2, #0
+ RET
+
+ #else /* !__OPTIMIZE_SIZE__ */
+
+ // Load the one-bit alternating mask.
+ ldr r3, LSYM(__popcount_1b)
+
+ // Reduce the word.
+ lsrs r1, r0, #1
+ ands r1, r3
+ subs r0, r1
+
+ // Load the two-bit alternating mask.
+ ldr r3, LSYM(__popcount_2b)
+
+ // Reduce the word.
+ lsrs r1, r0, #2
+ ands r0, r3
+ ands r1, r3
+ LSYM(__popcounts_merge):
+ adds r0, r1
+
+ // Load the four-bit alternating mask.
+ ldr r3, LSYM(__popcount_4b)
+
+ // Reduce the word.
+ lsrs r1, r0, #4
+ ands r0, r3
+ ands r1, r3
+ adds r0, r1
+
+ // Accumulate individual byte sums into the MSB.
+ lsls r1, r0, #8
+ adds r0, r1
+ lsls r1, r0, #16
+ adds r0, r1
+
+ // Isolate the cumulative sum.
+ lsrs r0, #24
+ RET
+
+ .align 2
+ LSYM(__popcount_1b):
+ .word 0x55555555
+ LSYM(__popcount_2b):
+ .word 0x33333333
+ LSYM(__popcount_4b):
+ .word 0x0F0F0F0F
+
+ #endif /* !__OPTIMIZE_SIZE__ */
+
+ CFI_END_FUNCTION
+CM0_FUNC_END popcountsi2
+
+#ifdef L_popcountdi2
+CM0_FUNC_END popcountdi2
+#endif
+
+#endif /* L_popcountsi2 || L_popcountdi2 */
+
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/lib1funcs.S gcc-11-20201220/libgcc/config/arm/lib1funcs.S
--- gcc-11-20201220-clean/libgcc/config/arm/lib1funcs.S 2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/lib1funcs.S 2021-01-06 02:45:47.436262144 -0800
@@ -1050,6 +1050,10 @@
/* ------------------------------------------------------------------------ */
/* Start of the Real Functions */
/* ------------------------------------------------------------------------ */
+
+/* Disable these functions for v6m in favor of the versions below */
+#ifndef NOT_ISA_TARGET_32BIT
+
#ifdef L_udivsi3
#if defined(__prefer_thumb__)
@@ -1455,6 +1459,8 @@
DIV_FUNC_END modsi3 signed
#endif /* L_modsi3 */
+#endif /* NOT_ISA_TARGET_32BIT */
+
/* ------------------------------------------------------------------------ */
#ifdef L_dvmd_tls
@@ -1472,7 +1478,8 @@
FUNC_END div0
#endif
-#endif /* L_divmodsi_tools */
+#endif /* L_dvmd_tls */
+
/* ------------------------------------------------------------------------ */
#ifdef L_dvmd_lnx
@ GNU/Linux division-by zero handler. Used in place of L_dvmd_tls
@@ -1509,6 +1516,7 @@
#endif
#endif /* L_dvmd_lnx */
+
#ifdef L_clear_cache
#if defined __ARM_EABI__ && defined __linux__
@ EABI GNU/Linux call to cacheflush syscall.
@@ -1584,305 +1592,12 @@
case of logical shifts) or the sign (for asr). */
#ifdef __ARMEB__
-#define al r1
-#define ah r0
-#else
-#define al r0
-#define ah r1
-#endif
-
-/* Prevent __aeabi double-word shifts from being produced on SymbianOS. */
-#ifndef __symbian__
-
-#ifdef L_lshrdi3
-
- FUNC_START lshrdi3
- FUNC_ALIAS aeabi_llsr lshrdi3
-
-#ifdef __thumb__
- lsrs al, r2
- movs r3, ah
- lsrs ah, r2
- mov ip, r3
- subs r2, #32
- lsrs r3, r2
- orrs al, r3
- negs r2, r2
- mov r3, ip
- lsls r3, r2
- orrs al, r3
- RET
+#define al r1
+#define ah r0
#else
- subs r3, r2, #32
- rsb ip, r2, #32
- movmi al, al, lsr r2
- movpl al, ah, lsr r3
- orrmi al, al, ah, lsl ip
- mov ah, ah, lsr r2
- RET
-#endif
- FUNC_END aeabi_llsr
- FUNC_END lshrdi3
-
-#endif
-
-#ifdef L_ashrdi3
-
- FUNC_START ashrdi3
- FUNC_ALIAS aeabi_lasr ashrdi3
-
-#ifdef __thumb__
- lsrs al, r2
- movs r3, ah
- asrs ah, r2
- subs r2, #32
- @ If r2 is negative at this point the following step would OR
- @ the sign bit into all of AL. That's not what we want...
- bmi 1f
- mov ip, r3
- asrs r3, r2
- orrs al, r3
- mov r3, ip
-1:
- negs r2, r2
- lsls r3, r2
- orrs al, r3
- RET
-#else
- subs r3, r2, #32
- rsb ip, r2, #32
- movmi al, al, lsr r2
- movpl al, ah, asr r3
- orrmi al, al, ah, lsl ip
- mov ah, ah, asr r2
- RET
-#endif
-
- FUNC_END aeabi_lasr
- FUNC_END ashrdi3
-
-#endif
-
-#ifdef L_ashldi3
-
- FUNC_START ashldi3
- FUNC_ALIAS aeabi_llsl ashldi3
-
-#ifdef __thumb__
- lsls ah, r2
- movs r3, al
- lsls al, r2
- mov ip, r3
- subs r2, #32
- lsls r3, r2
- orrs ah, r3
- negs r2, r2
- mov r3, ip
- lsrs r3, r2
- orrs ah, r3
- RET
-#else
- subs r3, r2, #32
- rsb ip, r2, #32
- movmi ah, ah, lsl r2
- movpl ah, al, lsl r3
- orrmi ah, ah, al, lsr ip
- mov al, al, lsl r2
- RET
+#define al r0
+#define ah r1
#endif
- FUNC_END aeabi_llsl
- FUNC_END ashldi3
-
-#endif
-
-#endif /* __symbian__ */
-
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
- movs r1, #28
- movs r3, #1
- lsls r3, r3, #16
- cmp r0, r3 /* 0x10000 */
- bcc 2f
- lsrs r0, r0, #16
- subs r1, r1, #16
-2: lsrs r3, r3, #8
- cmp r0, r3 /* #0x100 */
- bcc 2f
- lsrs r0, r0, #8
- subs r1, r1, #8
-2: lsrs r3, r3, #4
- cmp r0, r3 /* #0x10 */
- bcc 2f
- lsrs r0, r0, #4
- subs r1, r1, #4
-2: adr r2, 1f
- ldrb r0, [r2, r0]
- adds r0, r0, r1
- bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
- FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
- clz r0, r0
- RET
-# else
- mov r1, #28
- cmp r0, #0x10000
- do_it cs, t
- movcs r0, r0, lsr #16
- subcs r1, r1, #16
- cmp r0, #0x100
- do_it cs, t
- movcs r0, r0, lsr #8
- subcs r1, r1, #8
- cmp r0, #0x10
- do_it cs, t
- movcs r0, r0, lsr #4
- subcs r1, r1, #4
- adr r2, 1f
- ldrb r0, [r2, r0]
- add r0, r0, r1
- RET
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-# endif /* !defined (__ARM_FEATURE_CLZ) */
- FUNC_END clzsi2
-#endif
-#endif /* L_clzsi2 */
-
-#ifdef L_clzdi2
-#if !defined (__ARM_FEATURE_CLZ)
-
-# ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzdi2
- push {r4, lr}
- cmp xxh, #0
- bne 1f
-# ifdef __ARMEB__
- movs r0, xxl
- bl __clzsi2
- adds r0, r0, #32
- b 2f
-1:
- bl __clzsi2
-# else
- bl __clzsi2
- adds r0, r0, #32
- b 2f
-1:
- movs r0, xxh
- bl __clzsi2
-# endif
-2:
- pop {r4, pc}
-# else /* NOT_ISA_TARGET_32BIT */
-ARM_FUNC_START clzdi2
- do_push {r4, lr}
- cmp xxh, #0
- bne 1f
-# ifdef __ARMEB__
- mov r0, xxl
- bl __clzsi2
- add r0, r0, #32
- b 2f
-1:
- bl __clzsi2
-# else
- bl __clzsi2
- add r0, r0, #32
- b 2f
-1:
- mov r0, xxh
- bl __clzsi2
-# endif
-2:
- RETLDM r4
- FUNC_END clzdi2
-# endif /* NOT_ISA_TARGET_32BIT */
-
-#else /* defined (__ARM_FEATURE_CLZ) */
-
-ARM_FUNC_START clzdi2
- cmp xxh, #0
- do_it eq, et
- clzeq r0, xxl
- clzne r0, xxh
- addeq r0, r0, #32
- RET
- FUNC_END clzdi2
-
-#endif
-#endif /* L_clzdi2 */
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
- negs r1, r0
- ands r0, r0, r1
- movs r1, #28
- movs r3, #1
- lsls r3, r3, #16
- cmp r0, r3 /* 0x10000 */
- bcc 2f
- lsrs r0, r0, #16
- subs r1, r1, #16
-2: lsrs r3, r3, #8
- cmp r0, r3 /* #0x100 */
- bcc 2f
- lsrs r0, r0, #8
- subs r1, r1, #8
-2: lsrs r3, r3, #4
- cmp r0, r3 /* #0x10 */
- bcc 2f
- lsrs r0, r0, #4
- subs r1, r1, #4
-2: adr r2, 1f
- ldrb r0, [r2, r0]
- subs r0, r0, r1
- bx lr
-.align 2
-1:
-.byte 27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
- FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
- rsb r1, r0, #0
- and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
- clz r0, r0
- rsb r0, r0, #31
- RET
-# else
- mov r1, #28
- cmp r0, #0x10000
- do_it cs, t
- movcs r0, r0, lsr #16
- subcs r1, r1, #16
- cmp r0, #0x100
- do_it cs, t
- movcs r0, r0, lsr #8
- subcs r1, r1, #8
- cmp r0, #0x10
- do_it cs, t
- movcs r0, r0, lsr #4
- subcs r1, r1, #4
- adr r2, 1f
- ldrb r0, [r2, r0]
- sub r0, r0, r1
- RET
-.align 2
-1:
-.byte 27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (__ARM_FEATURE_CLZ) */
- FUNC_END ctzsi2
-#endif
-#endif /* L_clzsi2 */
/* ------------------------------------------------------------------------ */
/* These next two sections are here despite the fact that they contain Thumb
@@ -2190,4 +1905,77 @@
#else /* NOT_ISA_TARGET_32BIT */
#include "bpabi-v6m.S"
#endif /* NOT_ISA_TARGET_32BIT */
+
+
+/* Temp registers. */
+#define rP r4
+#define rQ r5
+#define rS r6
+#define rT r7
+
+.macro CM0_FUNC_START name
+.global SYM(__\name)
+.type SYM(__\name),function
+THUMB_CODE
+THUMB_FUNC
+.align 1
+ SYM(__\name):
+.endm
+
+.macro CM0_WEAK_START name
+.weak SYM(__\name)
+CM0_FUNC_START \name
+.endm
+
+.macro CM0_FUNC_ALIAS new old
+.global SYM (__\new)
+.thumb_set SYM (__\new), SYM (__\old)
+.endm
+
+.macro CM0_WEAK_ALIAS new old
+.weak SYM(__\new)
+CM0_FUNC_ALIAS \new \old
+.endm
+
+.macro CM0_FUNC_END name
+.size SYM(__\name), . - SYM(__\name)
+.endm
+
+#include "cm0/fplib.h"
+
+/* These have no conflicts with existing ARM implementations,
+ so these these files can be built for all architectures. */
+#include "cm0/ctz2.S"
+#include "cm0/clz2.S"
+#include "cm0/lcmp.S"
+#include "cm0/lmul.S"
+#include "cm0/lshift.S"
+#include "cm0/parity.S"
+#include "cm0/popcnt.S"
+
+#ifdef NOT_ISA_TARGET_32BIT
+
+/* These have existing ARM implementations that may be preferred
+ for non-v6m architectures. For example, use of the hardware
+ instructions for 'clz' and 'umull'/'smull'. Comprehensive
+ integration may be possible in the future. */
+#include "cm0/idiv.S"
+#include "cm0/ldiv.S"
+
+#include "cm0/fcmp.S"
+
+/* Section names in the following files are selected to maximize
+ the utility of +/- 256 byte conditional branches. */
+#include "cm0/fneg.S"
+#include "cm0/fadd.S"
+#include "cm0/futil.S"
+#include "cm0/fmul.S"
+#include "cm0/fdiv.S"
+
+#include "cm0/ffloat.S"
+#include "cm0/ffixed.S"
+#include "cm0/fconv.S"
+
+#endif /* NOT_ISA_TARGET_32BIT */
+
#endif /* !__symbian__ */
diff -ruN gcc-11-20201220-clean/libgcc/config/arm/t-elf gcc-11-20201220/libgcc/config/arm/t-elf
--- gcc-11-20201220-clean/libgcc/config/arm/t-elf 2020-12-20 14:32:15.000000000 -0800
+++ gcc-11-20201220/libgcc/config/arm/t-elf 2021-01-06 02:45:47.436262144 -0800
@@ -10,23 +10,31 @@
# inclusion create when only multiplication is used, thus avoiding pulling in
# useless division code.
ifneq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
-LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
+LIB1ASMFUNCS += _arm_muldf3
endif
endif # !__symbian__
+
+# Preferred WEAK implementations should appear first. See implementation notes.
+LIB1ASMFUNCS += _arm_mulsf3 _arm_addsf3 _umulsidi3 _arm_floatsisf _arm_floatundisf \
+ _clzsi2 _ctzsi2 _ffssi2 _clrsbsi2 _paritysi2 _popcountsi2
+
+
# For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M. Here we want to use the soft-fp C
-# implementation. The soft-fp code is only build for ARMv6M. This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
- _call_via_rX _interwork_call_via_rX \
- _lshrdi3 _ashrdi3 _ashldi3 \
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _udivdi3 _divdi3 \
+ _dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX \
+ _lshrdi3 _ashrdi3 _ashldi3 _mulsidi3 _muldi3 \
+ _arm_lcmp _cmpdi2 _arm_ulcmp _ucmpdi2 \
_arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
- _arm_fixdfsi _arm_fixunsdfsi \
- _arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
- _arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
- _arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
- _clzsi2 _clzdi2 _ctzsi2
+ _arm_fixdfsi _arm_fixunsdfsi _arm_fixsfsi _arm_fixunssfsi \
+ _arm_f2h _arm_h2f _arm_d2f _arm_f2d _arm_truncdfsf2 \
+ _arm_negsf2 _arm_addsubsf3 _arm_frsubsf3 _arm_divsf3 _arm_muldivsf3 \
+ _arm_cmpsf2 _arm_unordsf2 _arm_eqsf2 _arm_gesf2 \
+ _arm_fcmpeq _arm_fcmpne _arm_fcmplt _arm_fcmple _arm_fcmpge _arm_fcmpgt \
+ _arm_cfcmpeq _arm_cfcmple _arm_cfrcmple \
+ _arm_floatdidf _arm_floatundidf _arm_floatdisf _arm_floatunsisf \
+ _clzdi2 _ctzdi2 _ffsdi2 _clrsbdi2 _paritydi2 _popcountdi2
+
# Currently there is a bug somewhere in GCC's alias analysis
# or scheduling code that is breaking _fpmul_parts in fp-bit.c.
next prev parent reply other threads:[~2021-01-06 11:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-12 23:04 [PATCH] " Daniel Engel
2020-11-26 9:14 ` Christophe Lyon
2020-12-02 3:32 ` Daniel Engel
2020-12-16 17:15 ` Christophe Lyon
2021-01-06 11:20 ` Daniel Engel [this message]
2021-01-06 17:05 ` [PATCH v3] " Richard Earnshaw
2021-01-07 0:59 ` Daniel Engel
2021-01-07 12:56 ` Richard Earnshaw
2021-01-07 13:27 ` Christophe Lyon
2021-01-07 16:44 ` Richard Earnshaw
2021-01-09 12:28 ` Daniel Engel
2021-01-09 13:09 ` Christophe Lyon
2021-01-09 18:04 ` Daniel Engel
2021-01-11 14:49 ` Richard Earnshaw
2021-01-09 18:48 ` Daniel Engel
2021-01-11 16:07 ` Christophe Lyon
2021-01-11 16:18 ` Daniel Engel
2021-01-11 16:39 ` Christophe Lyon
2021-01-15 11:40 ` Daniel Engel
2021-01-15 12:30 ` Christophe Lyon
2021-01-16 16:14 ` Daniel Engel
2021-01-21 10:29 ` Christophe Lyon
2021-01-21 20:35 ` Daniel Engel
2021-01-22 18:28 ` Christophe Lyon
2021-01-25 17:48 ` Christophe Lyon
2021-01-25 23:36 ` Daniel Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=962a0e7d-f431-42ee-aa42-e4e4cc823a10@www.fastmail.com \
--to=libgcc@danielengel.com \
--cc=christophe.lyon@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).