From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by sourceware.org (Postfix) with ESMTPS id 23C15393D019 for ; Thu, 26 Nov 2020 09:14:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 23C15393D019 Received: by mail-oi1-x242.google.com with SMTP id j15so1588731oih.4 for ; Thu, 26 Nov 2020 01:14:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=b4y3wd0OHUHHbcySQvTbqY8tyJ6aNSmow8JmHh+VIq4=; b=XtgwB8IO1lzzmVvvv0uW/3kiISTf+g4vbqrqczy9fwGns3EzNJqUmynA1EY7lGAeCF 0feiWcxaa5VHw9OFMSe2IG8flxWjiQSUnnAXgGzxgGGfrUU2onJkdC1NaJwBFUfxarqI Y5iB4TwaxD3nlD6MGjrNYIbiOppbjSK6UbKBNuk4yDaszuuLAcgpQNHlPwru/mi61WlJ YCbMN9qHHHWCsz1WYcto+waMFcfskHYcJ1s7modzR3yq5tOMNWLY9040cJQcmitrco/w qDXzart5EUVdpEmxw4+kHU4kyHSuCsNOcwPE/j0bNWXhAZf2t9rPhOf4xJfZDIIKSy+6 iBYQ== X-Gm-Message-State: AOAM531EW1qjTILvXTyR7Kivrj1l8GgnmyTeekpVQdzvpTma8Q5JA1f6 wAzJMo9+fAyLnupElCLjTKL16TOEWCivpoB6C4t/Hg== X-Google-Smtp-Source: ABdhPJxDW7SZyiyRM7lrvLU9RLeG//ZayyPGY+LpOWRma4grQC9hEoDaMxa6ELEH9LTRCBdgm1KVSsjpXWdqspasUKM= X-Received: by 2002:aca:6103:: with SMTP id v3mr1453049oib.64.1606382060335; Thu, 26 Nov 2020 01:14:20 -0800 (PST) MIME-Version: 1.0 References: <3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com> In-Reply-To: <3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com> From: Christophe Lyon Date: Thu, 26 Nov 2020 10:14:09 +0100 Message-ID: Subject: Re: [PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0 To: Daniel Engel Cc: gcc Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_INFOUSMEBIZ, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2020 09:14:23 -0000 Hi, On Fri, 13 Nov 2020 at 00:03, Daniel Engel wrote: > > Hi, > > This patch adds an efficient assembly-language implementation of IEEE-754= compliant floating point routines for Cortex M0 EABI (v6m, thumb-1). This= is the libgcc portion of a larger library originally described in 2018: > > https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html > > Since that time, I've separated the libm functions for submission to newl= ib. The remaining libgcc functions in the attached patch have the followin= g characteristics: > > Function(s) Size (bytes) Cycles S= tack Accuracy > __clzsi2 42 23 0= exact > __clzsi2 (OPTIMIZE_SIZE) 22 55 0= exact > __clzdi2 8+__clzsi2 4+__clzsi2 0= exact > > __umulsidi3 44 24 0= exact > __mulsidi3 30+__umulsidi3 24+__umulsidi3 8= exact > __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0= exact > __ashldi3 (__aeabi_llsl) 22 13 0= exact > __lshrdi3 (__aeabi_llsr) 22 13 0= exact > __ashrdi3 (__aeabi_lasr) 22 13 0= exact > > __aeabi_lcmp 20 13 0= exact > __aeabi_ulcmp 16 10 0= exact > > __udivsi3 (__aeabi_uidiv) 56 72 =E2=80=93 385 = 0 < 1 lsb > __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8= < 1 lsb > __udivdi3 (__aeabi_uldiv) 164 103 =E2=80=93 139= 4 16 < 1 lsb > __udivdi3 (OPTIMIZE_SIZE) 142 120 =E2=80=93 139= 2 16 < 1 lsb > __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 3= 2 < 1 lsb > > __shared_float 178 > __shared_float (OPTIMIZE_SIZE) 154 > > __addsf3 (__aeabi_fadd) 116+__shared_float 31 =E2=80=93 76 = 8 <=3D 0.5 ulp > __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8= <=3D 0.5 ulp > __subsf3 (__aeabi_fsub) 8+__addsf3 6+__addsf3 8= <=3D 0.5 ulp > __aeabi_frsub 8+__addsf3 6+__addsf3 8= <=3D 0.5 ulp > __mulsf3 (__aeabi_fmul) 112+__shared_float 73 =E2=80=93 97 = 8 <=3D 0.5 ulp > __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8= <=3D 0.5 ulp > __divsf3 (__aeabi_fdiv) 132+__shared_float 83 =E2=80=93 361 = 8 <=3D 0.5 ulp > __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263 =E2=80=93 359= 8 <=3D 0.5 ulp > > __cmpsf2/__lesf2/__ltsf2 72 33 0= exact > __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0= exact > __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0= exact > __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0= exact > __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0= exact > __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0= exact > __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0= exact > __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0= exact > __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0= exact > > __floatundisf (__aeabi_ul2f) 14+__shared_float 40 =E2=80=93 81 = 8 <=3D 0.5 ulp > __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40 =E2=80=93 237 = 8 <=3D 0.5 ulp > __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8= <=3D 0.5 ulp > __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8= <=3D 0.5 ulp > __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8= <=3D 0.5 ulp > > __fixsfdi (__aeabi_f2lz) 74 27 =E2=80=93 33 = 0 exact > __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0= exact > __fixsfsi (__aeabi_f2iz) 52 19 0= exact > __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0= exact > __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0= exact > > __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 = exact > __aeabi_d2f 56+__shared_float 54 =E2=80=93 58 = 8 <=3D 0.5 ulp > __aeabi_h2f 34+__shared_float 34 8 = exact > __aeabi_f2h 84 23 =E2=80=93 34 = 0 <=3D 0.5 ulp > > Copyright assignment is on file with the FSF. > > I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapsh= ot of GCC plus this patch, and successfully compiled a test program: > > extern int main (void) > { > volatile int x =3D 1; > volatile unsigned long long int y =3D 10; > volatile long long int z =3D x / y; // 64-bit division > > volatile float a =3D x; // 32-bit casting > volatile float b =3D y; // 64 bit casting > volatile float c =3D z / b; // float division > volatile float d =3D a + c; // float addition > volatile float e =3D c * b; // float multiplication > volatile float f =3D d - e - c; // float subtraction > > if (f !=3D c) // float comparison > y -=3D (long long int)d; // float casting > } > > As one point of comparison, the test program links to 876 bytes of libgcc= code from the patched toolchain, vs 10276 bytes from the latest released g= cc-arm-none-eabi-9-2020-q2 toolchain. That's a 90% size reduction. This looks awesome! > > I have extensive test vectors, and have passed these tests on an STM32F05= 1. These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [= 3] sources, plus some of my own creation. Unfortunately, I'm not sure how = "make check" should work for a cross compiler run time library. > > Although I believe this patch can be incorporated as-is, there are at lea= st two points that might bear discussion: > > * I'm not sure where or how they would be integrated, but I would be happ= y to provide sources for my test vectors. > > * The library is currently built for the ARM v6m architecture only. It i= s likely that some of the other Cortex variants would benefit from these ro= utines. However, I would need some guidance on this to proceed without int= roducing regressions. I do not currently have a test strategy for architec= tures beyond Cortex M0, and I have NOT profiled the existing thumb-2 implem= entations (ieee754-sf.S) for comparison. I tried your patch, and I see many regressions in the GCC testsuite because many tests fail to link with errors like: ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function `__clzdi2': /libgcc/config/arm/cm0/clz2.S:39: multiple definition of `__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config= /arm/cm0/clz2.S:39: first defined here This happens with a toolchain configured with --target arm-none-eabi, default cpu/fpu/mode, --enable-multilib --with-multilib-list=3Drmprofile and running the tests wi= th -mthumb/-mcpu=3Dcortex-m0/-mfloat-abi=3Dsoft/-march=3Darmv6s-m Does it work for you? Thanks, Christophe > > I'm naturally hoping for some action on this patch before the Nov 16th de= adline for GCC-11 stage 3. Please review and advise. > > Thanks, > Daniel Engel > > [1] http://www.netlib.org/fp/ucbtest.tgz > [2] http://www.jhauser.us/arithmetic/TestFloat.html > [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html