From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gnu@danielengel.com>
Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27])
	by sourceware.org (Postfix) with ESMTPS id 6F1BA389839B
	for <gcc-patches@gcc.gnu.org>; Tue, 15 Nov 2022 15:25:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F1BA389839B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
	by mailout.nyi.internal (Postfix) with ESMTP id 41F715C0241;
	Tue, 15 Nov 2022 10:25:49 -0500 (EST)
Received: from imap41 ([10.202.2.91])
  by compute4.internal (MEProxy); Tue, 15 Nov 2022 10:25:49 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
	 h=cc:cc:content-type:date:date:from:from:in-reply-to
	:in-reply-to:message-id:mime-version:references:reply-to:sender
	:subject:subject:to:to; s=fm2; t=1668525949; x=1668612349; bh=AV
	xSfx/0Rq1AWzXdaoEq9hOuHFNwho6ujqDNweqP9BU=; b=fyb1yHm4+bPDOR729y
	wJvGuxNM7JtHsriok/s5m/POyWSCi2uzz4pU2AA0tzsQ1vMlMN3ML6AyuFROZIFp
	EpgTP8HCi8w2F4v3QntppIVQG2PcyYGDOUwNu7Q2v5IkT3l6mxHp+I/z0P53AJ+a
	QpU4IggnNEjVK+1LSIMTj0jbP3Cm2oThB1D4rUSUTPG9PahYTISN2bZ8y4f1lZoc
	qfHaFwdrWF9zKAuvmAdA1hA3W+roAzjO/4lKTF497fY1qFFGN5uNFphmeWhylTlr
	6C+I3o4AqVSjA4Ha7GD8CDPcxtQyG9UM2nZXJtQ2lA2ixwv8VImweJxZPhdoRhaK
	OdDw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:cc:content-type:date:date:feedback-id
	:feedback-id:from:from:in-reply-to:in-reply-to:message-id
	:mime-version:references:reply-to:sender:subject:subject:to:to
	:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
	fm1; t=1668525949; x=1668612349; bh=AVxSfx/0Rq1AWzXdaoEq9hOuHFNw
	ho6ujqDNweqP9BU=; b=J5QbL5UUzAMhQZdnseCdqXbto5k5XZM03Ft1xHfS/Cof
	1Ka6ivVeNvfpnp33pDFFWDj68FURu1424twpfHDeVeFUF43EPUmeDpUePHGh78me
	1OSfi5jIhwbDRDr1luncSRN0i9qRa79ljPNcwRrtD2d/e1LmLJv1rB3LIBvAikrH
	ufOibpaO4EExamdet6KL6GGBRVAdDUCgd8NKfLFpu+/5SnGerm7q55/5a9Gc9Udi
	h/Vx0Z0jpAN0AVTvboTAn0umGMQFhrvz3TlZJi66RokteqzklvbtFMj6pHEsp10G
	p/aFv20LE66e5m60vZl1gRH+t8o/CTmbkfN7onDJmQ==
X-ME-Sender: <xms:fK9zY_vByKXT_oycO4FsAGQCM9a2BTZXhzWpbK_Oc5g9HFQUtouCpg>
    <xme:fK9zYwc2HLoDHrNZtsk3O_G5jFmtDxNwYdMUKYXM5V5yCQhVtFmovOGLbx4NXZecH
    4VNSXRhjYyGlA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrgeeggdejiecutefuodetggdotefrodftvf
    curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
    uegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvfevufgtsehttd
    ertderredtnecuhfhrohhmpedfffgrnhhivghlucfgnhhgvghlfdcuoehgnhhusegurghn
    ihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrghtthgvrhhnpeeikeffgedtudetveejle
    fgtedujefhgfffgeegteetieduffeitedtgfetfeelgfenucffohhmrghinhepghhnuhdr
    ohhrghdpuggrnhhivghlvghnghgvlhdrtghomhdpnhgvthhlihgsrdhorhhgpdhjhhgruh
    hsvghrrdhushdpuhhirgdrrggtrdgsvgenucevlhhushhtvghrufhiiigvpedtnecurfgr
    rhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:fK9zYyyjYW5AW3y9-xJhvMXP72M4IUSXC-II4PHyA1Ry9dwZ061k4w>
    <xmx:fK9zY-Ntn5FdvWqTG6v5azw0JoMtKvhC0Cb_6RJGT42I40xELj7_8A>
    <xmx:fK9zY_9_I6EebtqZz7fuMnlcSIyvT8sgMaU5I802SDBRtnjY9VmQWA>
    <xmx:fa9zY8F5l_DpKjDEY5wLrITxcUPMxThWUu6ZEa7s9hMSBAeKloJ_iA>
Feedback-ID: i791144d6:Fastmail
Received: by mailuser.nyi.internal (Postfix, from userid 501)
	id C90F6234007B; Tue, 15 Nov 2022 10:25:48 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.7.0-alpha0-1115-g8b801eadce-fm-20221102.001-g8b801ead
Mime-Version: 1.0
Message-Id: <6d704904-06bb-4c02-ae30-fcbc11b8d003@app.fastmail.com>
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
Date: Tue, 15 Nov 2022 07:27:45 -0800
From: "Daniel Engel" <gnu@danielengel.com>
To: gcc-patches@gcc.gnu.org
Cc: christophe.lyon@arm.com, Richard.Earnshaw@foss.arm.com
Subject: [PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex
 M0
Content-Type: text/plain
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_INFOUSMEBIZ,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hello, 

Is there still any interest in merging this patch? 

Thanks,
Daniel


On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window. 
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
>     * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
>     # of expected passes            529397
>     # of unexpected failures        41160
>     # of unexpected successes       12
>     # of expected failures          3442
>     # of unresolved testcases       978
>     # of unsupported tests          28993
>
> Patched master:
>
>     # of expected passes            529397
>     # of unexpected failures        41160
>     # of unexpected successes       12
>     # of expected failures          3442
>     # of unresolved testcases       978
>     # of unsupported tests          28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture.  There
> are improvements to most of the EABI integer functions as well.  This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
>     https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an 
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
>     * The library is currently integrated into the ARM v6s-m multilib only.  It
>     is likely that some other architectures would benefit from these routines.
>     However, I have NOT profiled the existing implementations (ieee754-sf.S) to
>     estimate where improvements may be found.
>
>     * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
>     There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s)                     Size (bytes)        Cycles              
> Stack   Accuracy
> __clzsi2                        50                  20                  
> 0       exact
> __clzsi2 (OPTIMIZE_SIZE)        22                  51                  
> 0       exact
> __clzdi2                        8+__clzsi2          4+__clzsi2          
> 0       exact
>
> __clrsbsi2                      8+__clzsi2          6+__clzsi2          
> 0       exact
> __clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    
> 0       exact
>
> __ctzsi2                        52                  21                  
> 0       exact
> __ctzsi2 (OPTIMIZE_SIZE)        24                  52                  
> 0       exact
> __ctzdi2                        8+__ctzsi2          5+__ctzsi2          
> 0       exact
>
> __ffssi2                        8                   6..(5+__ctzsi2)     
> 0       exact
> __ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     
> 0       exact
>
> __popcountsi2                   52                  25                  
> 0       exact
> __popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              
> 0       exact
> __popcountdi2                   34+__popcountsi2    46                  
> 0       exact
> __popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             
> 0       exact
>
> __paritysi2                     24                  14                  
> 0       exact
> __paritysi2 (OPTIMIZE_SIZE)     16                  38                  
> 0       exact
> __paritydi2                     2+__paritysi2       1+__paritysi2       
> 0       exact
>
> __umulsidi3                     44                  24                  
> 0       exact
> __mulsidi3                      30+__umulsidi3      24+__umulsidi3      
> 8       exact
> __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       
> 0       exact
> __ashldi3 (__aeabi_llsl)        22                  13                  
> 0       exact
> __lshrdi3 (__aeabi_llsr)        22                  13                  
> 0       exact
> __ashrdi3 (__aeabi_lasr)        22                  13                  
> 0       exact
>
> __aeabi_lcmp                    20                  13                  
> 0       exact
> __aeabi_ulcmp                   16                  10                  
> 0       exact
>
> __udivsi3 (__aeabi_uidiv)       56                  72..385             
> 0       < 1 lsb
> __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        
> 8       < 1 lsb
> __udivdi3 (__aeabi_uldiv)       164                 103..1394           
> 16      < 1 lsb
> __udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           
> 16      < 1 lsb
> __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        
> 32      < 1 lsb
>
> __shared_float                  178
> __shared_float (OPTIMIZE_SIZE)  154
>
> __addsf3 (__aeabi_fadd)         116+__shared_float  31..76              
> 8       <= 0.5 ulp
> __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  
> 8       <= 0.5 ulp
> __subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          
> 8       <= 0.5 ulp
> __aeabi_frsub                   8+__addsf3          6+__addsf3          
> 8       <= 0.5 ulp
> __mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              
> 8       <= 0.5 ulp
> __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  
> 8       <= 0.5 ulp
> __divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             
> 8       <= 0.5 ulp
> __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            
> 8       <= 0.5 ulp
>
> __cmpsf2/__lesf2/__ltsf2        72                  33                  
> 0       exact
> __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
>
> __floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              
> 8       <= 0.5 ulp
> __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             
> 8       <= 0.5 ulp
> __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     
> 8       <= 0.5 ulp
> __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     
> 8       <= 0.5 ulp
> __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       
> 8       <= 0.5 ulp
>
> __fixsfdi (__aeabi_f2lz)        74                  27..33              
> 0       exact
> __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         
> 0       exact
> __fixsfsi (__aeabi_f2iz)        52                  19                  
> 0       exact
> __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         
> 0       exact
> __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         
> 0       exact
>
> __extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  
> 8       exact
> __truncsfdf2 (__aeabi_f2d)      88                  34                  
> 8       exact
> __aeabi_d2f                     56+__shared_float   54..58              
> 8       <= 0.5 ulp
> __aeabi_h2f                     34+__shared_float   34                  
> 8       exact
> __aeabi_f2h                     84                  23..34              
> 0       <= 0.5 ulp
>
> Copyright assignment is on file with the FSF.
>
> Thanks,
> Daniel Engel
>
>
> [1] // Test program for size comparison
>
>     extern int main (void)
>     {
>         volatile int x = 1;
>         volatile unsigned long long int y = 10;
>         volatile long long int z = x / y; // 64-bit division
>
>         volatile float a = x; // 32-bit casting
>         volatile float b = y; // 64 bit casting
>         volatile float c = z / b; // float division
>         volatile float d = a + c; // float addition
>         volatile float e = c * b; // float multiplication
>         volatile float f = d - e - c; // float subtraction
>
>         if (f != c) // float comparison
>             y -= (long long int)d; // float casting
>     }
>
> [2] http://danielengel.com/cm0_test_vectors.tgz
> [3] http://www.netlib.org/fp/ucbtest.tgz
> [4] http://www.jhauser.us/arithmetic/TestFloat.html
> [5] http://win-www.uia.ac.be/u/cant/ieeecc754.html
>
> ---
>
> Daniel Engel (34):
>   Add and restructure function declaration macros
>   Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
>   Fix syntax warnings on conditional instructions
>   Reorganize LIB1ASMFUNCS object wrapper macros
>   Add the __HAVE_FEATURE_IT and IT() macros
>   Refactor 'clz' functions into a new file
>   Refactor 'ctz' functions into a new file
>   Refactor 64-bit shift functions into a new file
>   Import 'clz' functions from the CM0 library
>   Import 'ctz' functions from the CM0 library
>   Import 64-bit shift functions from the CM0 library
>   Import 'clrsb' functions from the CM0 library
>   Import 'ffs' functions from the CM0 library
>   Import 'parity' functions from the CM0 library
>   Import 'popcnt' functions from the CM0 library
>   Refactor Thumb-1 64-bit comparison into a new file
>   Import 64-bit comparison from CM0 library
>   Merge Thumb-2 optimizations for 64-bit comparison
>   Import 32-bit division from the CM0 library
>   Refactor Thumb-1 64-bit division into a new file
>   Import 64-bit division from the CM0 library
>   Import integer multiplication from the CM0 library
>   Refactor Thumb-1 float comparison into a new file
>   Import float comparison from the CM0 library
>   Refactor Thumb-1 float subtraction into a new file
>   Import float addition and subtraction from the CM0 library
>   Import float multiplication from the CM0 library
>   Import float division from the CM0 library
>   Import integer-to-float conversion from the CM0 library
>   Import float-to-integer conversion from the CM0 library
>   Import float<->double conversion from the CM0 library
>   Import float<->__fp16 conversion from the CM0 library
>   Drop single-precision Thumb-1 soft-float functions
>   Add -mpure-code support to the CM0 functions.
>
>  libgcc/Makefile.in              |   5 +-
>  libgcc/config/arm/bpabi-lib.h   |  12 -
>  libgcc/config/arm/bpabi-v6m.S   | 206 -----------
>  libgcc/config/arm/bpabi.S       |  42 ---
>  libgcc/config/arm/bpabi.c       |  42 ---
>  libgcc/config/arm/clz2.S        | 371 ++++++++++++++++++++
>  libgcc/config/arm/ctz2.S        | 349 ++++++++++++++++++
>  libgcc/config/arm/eabi/fadd.S   | 324 +++++++++++++++++
>  libgcc/config/arm/eabi/fcast.S  | 533 ++++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/fcmp.S   | 604 ++++++++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/fdiv.S   | 261 ++++++++++++++
>  libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
>  libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
>  libgcc/config/arm/eabi/fmul.S   | 215 ++++++++++++
>  libgcc/config/arm/eabi/fneg.S   |  76 ++++
>  libgcc/config/arm/eabi/fplib.h  |  80 +++++
>  libgcc/config/arm/eabi/futil.S  | 418 ++++++++++++++++++++++
>  libgcc/config/arm/eabi/idiv.S   | 299 ++++++++++++++++
>  libgcc/config/arm/eabi/lcmp.S   | 187 ++++++++++
>  libgcc/config/arm/eabi/ldiv.S   | 493 ++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/lmul.S   | 218 ++++++++++++
>  libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
>  libgcc/config/arm/fp16.c        |   4 +
>  libgcc/config/arm/lib1funcs.S   | 549 ++++++++++-------------------
>  libgcc/config/arm/parity.S      | 120 +++++++
>  libgcc/config/arm/popcnt.S      | 212 +++++++++++
>  libgcc/config/arm/t-bpabi       |  10 +-
>  libgcc/config/arm/t-elf         | 138 +++++++-
>  libgcc/config/arm/t-softfp      |   2 +
>  29 files changed, 5997 insertions(+), 675 deletions(-)
>  delete mode 100644 libgcc/config/arm/bpabi.c
>  create mode 100644 libgcc/config/arm/clz2.S
>  create mode 100644 libgcc/config/arm/ctz2.S
>  create mode 100644 libgcc/config/arm/eabi/fadd.S
>  create mode 100644 libgcc/config/arm/eabi/fcast.S
>  create mode 100644 libgcc/config/arm/eabi/fcmp.S
>  create mode 100644 libgcc/config/arm/eabi/fdiv.S
>  create mode 100644 libgcc/config/arm/eabi/ffixed.S
>  create mode 100644 libgcc/config/arm/eabi/ffloat.S
>  create mode 100644 libgcc/config/arm/eabi/fmul.S
>  create mode 100644 libgcc/config/arm/eabi/fneg.S
>  create mode 100644 libgcc/config/arm/eabi/fplib.h
>  create mode 100644 libgcc/config/arm/eabi/futil.S
>  create mode 100644 libgcc/config/arm/eabi/idiv.S
>  create mode 100644 libgcc/config/arm/eabi/lcmp.S
>  create mode 100644 libgcc/config/arm/eabi/ldiv.S
>  create mode 100644 libgcc/config/arm/eabi/lmul.S
>  create mode 100644 libgcc/config/arm/eabi/lshift.S
>  create mode 100644 libgcc/config/arm/parity.S
>  create mode 100644 libgcc/config/arm/popcnt.S
>
> -- 
> 2.34.1