From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <christophe.lyon@linaro.org>
Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com
 [IPv6:2607:f8b0:4864:20::242])
 by sourceware.org (Postfix) with ESMTPS id 23C15393D019
 for <gcc-patches@gcc.gnu.org>; Thu, 26 Nov 2020 09:14:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 23C15393D019
Received: by mail-oi1-x242.google.com with SMTP id j15so1588731oih.4
 for <gcc-patches@gcc.gnu.org>; Thu, 26 Nov 2020 01:14:21 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=b4y3wd0OHUHHbcySQvTbqY8tyJ6aNSmow8JmHh+VIq4=;
 b=XtgwB8IO1lzzmVvvv0uW/3kiISTf+g4vbqrqczy9fwGns3EzNJqUmynA1EY7lGAeCF
 0feiWcxaa5VHw9OFMSe2IG8flxWjiQSUnnAXgGzxgGGfrUU2onJkdC1NaJwBFUfxarqI
 Y5iB4TwaxD3nlD6MGjrNYIbiOppbjSK6UbKBNuk4yDaszuuLAcgpQNHlPwru/mi61WlJ
 YCbMN9qHHHWCsz1WYcto+waMFcfskHYcJ1s7modzR3yq5tOMNWLY9040cJQcmitrco/w
 qDXzart5EUVdpEmxw4+kHU4kyHSuCsNOcwPE/j0bNWXhAZf2t9rPhOf4xJfZDIIKSy+6
 iBYQ==
X-Gm-Message-State: AOAM531EW1qjTILvXTyR7Kivrj1l8GgnmyTeekpVQdzvpTma8Q5JA1f6
 wAzJMo9+fAyLnupElCLjTKL16TOEWCivpoB6C4t/Hg==
X-Google-Smtp-Source: ABdhPJxDW7SZyiyRM7lrvLU9RLeG//ZayyPGY+LpOWRma4grQC9hEoDaMxa6ELEH9LTRCBdgm1KVSsjpXWdqspasUKM=
X-Received: by 2002:aca:6103:: with SMTP id v3mr1453049oib.64.1606382060335;
 Thu, 26 Nov 2020 01:14:20 -0800 (PST)
MIME-Version: 1.0
References: <3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com>
In-Reply-To: <3b36bc72-e92e-4372-8da4-43ade34d868b@www.fastmail.com>
From: Christophe Lyon <christophe.lyon@linaro.org>
Date: Thu, 26 Nov 2020 10:14:09 +0100
Message-ID: <CAKdteOa7s_uFsLdP4WNOUO3nBd1Bspqbx3pX1nf95Zdsgu4R1A@mail.gmail.com>
Subject: Re: [PATCH] libgcc: Thumb-1 Floating-Point Library for Cortex M0
To: Daniel Engel <libgcc@danielengel.com>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_INFOUSMEBIZ, KAM_NUMSUBJECT,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Nov 2020 09:14:23 -0000

Hi,

On Fri, 13 Nov 2020 at 00:03, Daniel Engel <libgcc@danielengel.com> wrote:
>
> Hi,
>
> This patch adds an efficient assembly-language implementation of IEEE-754=
 compliant floating point routines for Cortex M0 EABI (v6m, thumb-1).  This=
 is the libgcc portion of a larger library originally described in 2018:
>
>     https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> Since that time, I've separated the libm functions for submission to newl=
ib.  The remaining libgcc functions in the attached patch have the followin=
g characteristics:
>
>     Function(s)                     Size (bytes)        Cycles          S=
tack   Accuracy
>     __clzsi2                        42                  23              0=
       exact
>     __clzsi2 (OPTIMIZE_SIZE)        22                  55              0=
       exact
>     __clzdi2                        8+__clzsi2          4+__clzsi2      0=
       exact
>
>     __umulsidi3                     44                  24              0=
       exact
>     __mulsidi3                      30+__umulsidi3      24+__umulsidi3  8=
       exact
>     __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3   0=
       exact
>     __ashldi3 (__aeabi_llsl)        22                  13              0=
       exact
>     __lshrdi3 (__aeabi_llsr)        22                  13              0=
       exact
>     __ashrdi3 (__aeabi_lasr)        22                  13              0=
       exact
>
>     __aeabi_lcmp                    20                   13             0=
       exact
>     __aeabi_ulcmp                   16                  10              0=
       exact
>
>     __udivsi3 (__aeabi_uidiv)       56                  72 =E2=80=93 385 =
       0       < 1 lsb
>     __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3    8=
       < 1 lsb
>     __udivdi3 (__aeabi_uldiv)       164                 103 =E2=80=93 139=
4      16      < 1 lsb
>     __udivdi3 (OPTIMIZE_SIZE)       142                 120 =E2=80=93 139=
2      16      < 1 lsb
>     __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3    3=
2      < 1 lsb
>
>     __shared_float                  178
>     __shared_float (OPTIMIZE_SIZE)  154
>
>     __addsf3 (__aeabi_fadd)         116+__shared_float  31 =E2=80=93 76  =
       8       <=3D 0.5 ulp
>     __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74              8=
       <=3D 0.5 ulp
>     __subsf3 (__aeabi_fsub)         8+__addsf3          6+__addsf3      8=
       <=3D 0.5 ulp
>     __aeabi_frsub                   8+__addsf3          6+__addsf3      8=
       <=3D 0.5 ulp
>     __mulsf3 (__aeabi_fmul)         112+__shared_float  73 =E2=80=93 97  =
       8       <=3D 0.5 ulp
>     __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93              8=
       <=3D 0.5 ulp
>     __divsf3 (__aeabi_fdiv)         132+__shared_float  83 =E2=80=93 361 =
       8       <=3D 0.5 ulp
>     __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263 =E2=80=93 359=
       8       <=3D 0.5 ulp
>
>     __cmpsf2/__lesf2/__ltsf2        72                  33              0=
       exact
>     __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2      0=
       exact
>     __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2      0=
       exact
>
>     __floatundisf (__aeabi_ul2f)    14+__shared_float   40 =E2=80=93 81  =
       8       <=3D 0.5 ulp
>     __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40 =E2=80=93 237 =
       8       <=3D 0.5 ulp
>     __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf 8=
       <=3D 0.5 ulp
>     __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf 8=
       <=3D 0.5 ulp
>     __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf   8=
       <=3D 0.5 ulp
>
>     __fixsfdi (__aeabi_f2lz)        74                  27 =E2=80=93 33  =
       0       exact
>     __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi     0=
       exact
>     __fixsfsi (__aeabi_f2iz)        52                  19              0=
       exact
>     __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi     0=
       exact
>     __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi     0=
       exact
>
>     __extendsfdf2 (__aeabi_f2d)     42+__shared_float 38             8   =
  exact
>     __aeabi_d2f                     56+__shared_float 54 =E2=80=93 58    =
 8     <=3D 0.5 ulp
>     __aeabi_h2f                     34+__shared_float 34             8   =
  exact
>     __aeabi_f2h                     84                 23 =E2=80=93 34   =
      0     <=3D 0.5 ulp
>
> Copyright assignment is on file with the FSF.
>
> I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapsh=
ot of GCC plus this patch, and successfully compiled a test program:
>
>     extern int main (void)
>     {
>         volatile int x =3D 1;
>         volatile unsigned long long int y =3D 10;
>         volatile long long int z =3D x / y; // 64-bit division
>
>         volatile float a =3D x; // 32-bit casting
>         volatile float b =3D y; // 64 bit casting
>         volatile float c =3D z / b; // float division
>         volatile float d =3D a + c; // float addition
>         volatile float e =3D c * b; // float multiplication
>         volatile float f =3D d - e - c; // float subtraction
>
>         if (f !=3D c) // float comparison
>             y -=3D (long long int)d; // float casting
>     }
>
> As one point of comparison, the test program links to 876 bytes of libgcc=
 code from the patched toolchain, vs 10276 bytes from the latest released g=
cc-arm-none-eabi-9-2020-q2 toolchain.    That's a 90% size reduction.

This looks awesome!

>
> I have extensive test vectors, and have passed these tests on an STM32F05=
1.  These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [=
3] sources, plus some of my own creation.  Unfortunately, I'm not sure how =
"make check" should work for a cross compiler run time library.
>
> Although I believe this patch can be incorporated as-is, there are at lea=
st two points that might bear discussion:
>
> * I'm not sure where or how they would be integrated, but I would be happ=
y to provide sources for my test vectors.
>
> * The library is currently built for the ARM v6m architecture only.  It i=
s likely that some of the other Cortex variants would benefit from these ro=
utines.  However, I would need some guidance on this to proceed without int=
roducing regressions.  I do not currently have a test strategy for architec=
tures beyond Cortex M0, and I have NOT profiled the existing thumb-2 implem=
entations (ieee754-sf.S) for comparison.

I tried your patch, and I see many regressions in the GCC testsuite
because many tests fail to link with errors like:
ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function `__clzdi2':
/libgcc/config/arm/cm0/clz2.S:39: multiple definition of
`__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config=
/arm/cm0/clz2.S:39:
first defined here

This happens with a toolchain configured with --target arm-none-eabi,
default cpu/fpu/mode,
--enable-multilib --with-multilib-list=3Drmprofile and running the tests wi=
th
-mthumb/-mcpu=3Dcortex-m0/-mfloat-abi=3Dsoft/-march=3Darmv6s-m

Does it work for you?

Thanks,

Christophe

>
> I'm naturally hoping for some action on this patch before the Nov 16th de=
adline for GCC-11 stage 3.  Please review and advise.
>
> Thanks,
> Daniel Engel
>
> [1] http://www.netlib.org/fp/ucbtest.tgz
> [2] http://www.jhauser.us/arithmetic/TestFloat.html
> [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html