From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gnu@danielengel.com>
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19])
	by sourceware.org (Postfix) with ESMTPS id CBB393853542
	for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CBB393853542
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
	by mailout.west.internal (Postfix) with ESMTP id 0481D320093E;
	Mon, 31 Oct 2022 11:46:03 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
  by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:46:04 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
	 h=cc:cc:content-transfer-encoding:date:date:from:from
	:in-reply-to:message-id:mime-version:reply-to:sender:subject
	:subject:to:to; s=fm1; t=1667231163; x=1667317563; bh=3Xl1lH0erE
	gxl92Cycu1g3xeqAXDyKSoMWp4RU4eM6s=; b=GgYy+/KyfHbszJtTlbrT0ls/3f
	VVpLOI1TwEBd5oqLWh3uRGnMCKp1rzHskbgAG0kKcbfm7GUg6Sr835ccT2hZqLGU
	LPSk8oHTBZwVKNPkcROnkux90NfWgVSMujOz0SdQgfxYFOPe622rmE/KqYfGGkUw
	ZsPUL9Ha7cTrF4mu42dbxGUykn0e8mV7QdgPcv1BkxxeFV0sMLojZm07WpuAs/qz
	gbOhp97jG2MngovnuHqvypcFcA1euzO40aVwiJgYACsZrUXMJ/CYX1W7Kx4A0h+L
	08VnbjIIO/bEoWDl6WQI3/6uYlVkKa9TOq/45diSGI+IcXT+ir2AMUoEfNjA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
	:feedback-id:feedback-id:from:from:in-reply-to:message-id
	:mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy
	:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=
	1667231163; x=1667317563; bh=3Xl1lH0erEgxl92Cycu1g3xeqAXDyKSoMWp
	4RU4eM6s=; b=URJbbSGE4qq1VcH5jJ2xy72Kq+uk5OYffMy3vvZapWOtw7m6K9h
	/RUvlyqxpT3CZ1JHFUwrWI7vVL7sWWj86svNuc94adt2wAl/SXfCrNVXGuJYynor
	Odc+utuCQTbCHbC+pLJaaGuXzATtsMPMmdQs7ygvf1q5nH5pKMTbYutsvhsd8OUf
	NyOTYcn5J/Hz/G+otnLmrjuxDHYxGLtysFv9U5+iu5HEXkK0ZKKt5j2wjJc9212k
	mMrAT+kt9clI2HH+1LVUo7k8Hg8c6V/NSf7bS05s4GKAWjcF0LAkNYBtydZXDuF5
	6ey0G6FOOqyXCIzEwBoal7KP5BMmuS+B6HA==
X-ME-Sender: <xms:u-1fY5RBZdJC8f2XVe6T4VHPXQve4HQuJicv5ykc1SInq0eASxgNdQ>
    <xme:u-1fYywAbqKXYX1gsu3Ja0VAxB_OiFKmIm0h1Y_F8gbp0pb6Vq9pnG4aEnXQMOhLq
    PqXnOCXx07FvQ>
X-ME-Received: <xmr:u-1fY-3y7UQPgimkqSY-5JHhjnkiw38fytP0RNDW9zFNJ-bcuDL-p0Ip84l5V7oMdVdCtjg2er0vXOTpyUJ9OwDmeZxbhW_8odIzOM90gzw9Qlseuyzkmhw>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
    curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
    uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
    fjughrpefhvfevufffkffoggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgvlhcu
    gfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrghtth
    gvrhhnpeevudfftdfhffehjedukefgfffhkeelueduueeuudfhveehiefgudfhjeejgeeg
    tdenucffohhmrghinhepghhnuhdrohhrghdpuggrnhhivghlvghnghgvlhdrtghomhdpnh
    gvthhlihgsrdhorhhgpdhjhhgruhhsvghrrdhushdpuhhirgdrrggtrdgsvgenucevlhhu
    shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnih
    gvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:u-1fYxDJ8N7EZr1AHXV0Gr4hx0l26lcPNEYqwGX47sIQ5zRkIJZ1SQ>
    <xmx:u-1fYyjALbhLbY-qJb2UbLkJta_HsWMmOZD89w-9DfCktPfmfQQf6g>
    <xmx:u-1fY1pd5dstBsctv_KxOPt--C9UmOrerafEmXo7CbPGTHdtJ73AYA>
    <xmx:u-1fY1uq0YKEwpF1xpsy-MVxk9ktf-SOWhRB38ktasUPmwW1bPU68g>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:02 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96])
	by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 29VFjrHY087229;
	Mon, 31 Oct 2022 08:45:53 -0700 (PDT)
	(envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Cc: Daniel Engel <gnu@danielengel.com>,
        Christophe Lyon <christophe.lyon@linaro.org>
Subject: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0
Date: Mon, 31 Oct 2022 08:44:55 -0700
Message-Id: <20221031154529.3627576-1-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_INFOUSMEBIZ,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi Richard,

I am re-submitting my libgcc patch from 2021:

    https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
    https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html

I believe I have finally made the stage1 window. 

Regards,
Daniel

---

Changes since v6:

    * Rebased and tested with gcc-13

There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
Clean master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

Patched master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

    * The library is currently integrated into the ARM v6s-m multilib only.  It
    is likely that some other architectures would benefit from these routines.
    However, I have NOT profiled the existing implementations (ieee754-sf.S) to
    estimate where improvements may be found.

    * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
    There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s)                     Size (bytes)        Cycles              Stack   Accuracy
__clzsi2                        50                  20                  0       exact
__clzsi2 (OPTIMIZE_SIZE)        22                  51                  0       exact
__clzdi2                        8+__clzsi2          4+__clzsi2          0       exact

__clrsbsi2                      8+__clzsi2          6+__clzsi2          0       exact
__clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    0       exact

__ctzsi2                        52                  21                  0       exact
__ctzsi2 (OPTIMIZE_SIZE)        24                  52                  0       exact
__ctzdi2                        8+__ctzsi2          5+__ctzsi2          0       exact

__ffssi2                        8                   6..(5+__ctzsi2)     0       exact
__ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     0       exact

__popcountsi2                   52                  25                  0       exact
__popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              0       exact
__popcountdi2                   34+__popcountsi2    46                  0       exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             0       exact

__paritysi2                     24                  14                  0       exact
__paritysi2 (OPTIMIZE_SIZE)     16                  38                  0       exact
__paritydi2                     2+__paritysi2       1+__paritysi2       0       exact

__umulsidi3                     44                  24                  0       exact
__mulsidi3                      30+__umulsidi3      24+__umulsidi3      8       exact
__muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       0       exact
__ashldi3 (__aeabi_llsl)        22                  13                  0       exact
__lshrdi3 (__aeabi_llsr)        22                  13                  0       exact
__ashrdi3 (__aeabi_lasr)        22                  13                  0       exact

__aeabi_lcmp                    20                  13                  0       exact
__aeabi_ulcmp                   16                  10                  0       exact

__udivsi3 (__aeabi_uidiv)       56                  72..385             0       < 1 lsb
__divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        8       < 1 lsb
__udivdi3 (__aeabi_uldiv)       164                 103..1394           16      < 1 lsb
__udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           16      < 1 lsb
__divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        32      < 1 lsb

__shared_float                  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd)         116+__shared_float  31..76              8       <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  8       <= 0.5 ulp
__subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          8       <= 0.5 ulp
__aeabi_frsub                   8+__addsf3          6+__addsf3          8       <= 0.5 ulp
__mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              8       <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  8       <= 0.5 ulp
__divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             8       <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            8       <= 0.5 ulp

__cmpsf2/__lesf2/__ltsf2        72                  33                  0       exact
__eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          0       exact

__floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              8       <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             8       <= 0.5 ulp
__floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     8       <= 0.5 ulp
__floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     8       <= 0.5 ulp
__floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       8       <= 0.5 ulp

__fixsfdi (__aeabi_f2lz)        74                  27..33              0       exact
__fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         0       exact
__fixsfsi (__aeabi_f2iz)        52                  19                  0       exact
__fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         0       exact
__fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         0       exact

__extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  8       exact
__truncsfdf2 (__aeabi_f2d)      88                  34                  8       exact
__aeabi_d2f                     56+__shared_float   54..58              8       <= 0.5 ulp
__aeabi_h2f                     34+__shared_float   34                  8       exact
__aeabi_f2h                     84                  23..34              0       <= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division

        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction

        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

---

Daniel Engel (34):
  Add and restructure function declaration macros
  Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
  Fix syntax warnings on conditional instructions
  Reorganize LIB1ASMFUNCS object wrapper macros
  Add the __HAVE_FEATURE_IT and IT() macros
  Refactor 'clz' functions into a new file
  Refactor 'ctz' functions into a new file
  Refactor 64-bit shift functions into a new file
  Import 'clz' functions from the CM0 library
  Import 'ctz' functions from the CM0 library
  Import 64-bit shift functions from the CM0 library
  Import 'clrsb' functions from the CM0 library
  Import 'ffs' functions from the CM0 library
  Import 'parity' functions from the CM0 library
  Import 'popcnt' functions from the CM0 library
  Refactor Thumb-1 64-bit comparison into a new file
  Import 64-bit comparison from CM0 library
  Merge Thumb-2 optimizations for 64-bit comparison
  Import 32-bit division from the CM0 library
  Refactor Thumb-1 64-bit division into a new file
  Import 64-bit division from the CM0 library
  Import integer multiplication from the CM0 library
  Refactor Thumb-1 float comparison into a new file
  Import float comparison from the CM0 library
  Refactor Thumb-1 float subtraction into a new file
  Import float addition and subtraction from the CM0 library
  Import float multiplication from the CM0 library
  Import float division from the CM0 library
  Import integer-to-float conversion from the CM0 library
  Import float-to-integer conversion from the CM0 library
  Import float<->double conversion from the CM0 library
  Import float<->__fp16 conversion from the CM0 library
  Drop single-precision Thumb-1 soft-float functions
  Add -mpure-code support to the CM0 functions.

 libgcc/Makefile.in              |   5 +-
 libgcc/config/arm/bpabi-lib.h   |  12 -
 libgcc/config/arm/bpabi-v6m.S   | 206 -----------
 libgcc/config/arm/bpabi.S       |  42 ---
 libgcc/config/arm/bpabi.c       |  42 ---
 libgcc/config/arm/clz2.S        | 371 ++++++++++++++++++++
 libgcc/config/arm/ctz2.S        | 349 ++++++++++++++++++
 libgcc/config/arm/eabi/fadd.S   | 324 +++++++++++++++++
 libgcc/config/arm/eabi/fcast.S  | 533 ++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fcmp.S   | 604 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fdiv.S   | 261 ++++++++++++++
 libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
 libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
 libgcc/config/arm/eabi/fmul.S   | 215 ++++++++++++
 libgcc/config/arm/eabi/fneg.S   |  76 ++++
 libgcc/config/arm/eabi/fplib.h  |  80 +++++
 libgcc/config/arm/eabi/futil.S  | 418 ++++++++++++++++++++++
 libgcc/config/arm/eabi/idiv.S   | 299 ++++++++++++++++
 libgcc/config/arm/eabi/lcmp.S   | 187 ++++++++++
 libgcc/config/arm/eabi/ldiv.S   | 493 ++++++++++++++++++++++++++
 libgcc/config/arm/eabi/lmul.S   | 218 ++++++++++++
 libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
 libgcc/config/arm/fp16.c        |   4 +
 libgcc/config/arm/lib1funcs.S   | 549 ++++++++++-------------------
 libgcc/config/arm/parity.S      | 120 +++++++
 libgcc/config/arm/popcnt.S      | 212 +++++++++++
 libgcc/config/arm/t-bpabi       |  10 +-
 libgcc/config/arm/t-elf         | 138 +++++++-
 libgcc/config/arm/t-softfp      |   2 +
 29 files changed, 5997 insertions(+), 675 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c
 create mode 100644 libgcc/config/arm/clz2.S
 create mode 100644 libgcc/config/arm/ctz2.S
 create mode 100644 libgcc/config/arm/eabi/fadd.S
 create mode 100644 libgcc/config/arm/eabi/fcast.S
 create mode 100644 libgcc/config/arm/eabi/fcmp.S
 create mode 100644 libgcc/config/arm/eabi/fdiv.S
 create mode 100644 libgcc/config/arm/eabi/ffixed.S
 create mode 100644 libgcc/config/arm/eabi/ffloat.S
 create mode 100644 libgcc/config/arm/eabi/fmul.S
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/fplib.h
 create mode 100644 libgcc/config/arm/eabi/futil.S
 create mode 100644 libgcc/config/arm/eabi/idiv.S
 create mode 100644 libgcc/config/arm/eabi/lcmp.S
 create mode 100644 libgcc/config/arm/eabi/ldiv.S
 create mode 100644 libgcc/config/arm/eabi/lmul.S
 create mode 100644 libgcc/config/arm/eabi/lshift.S
 create mode 100644 libgcc/config/arm/parity.S
 create mode 100644 libgcc/config/arm/popcnt.S

-- 
2.34.1