From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Paul.Zimmermann@inria.fr>
Received: from mail2-relais-roc.national.inria.fr
 (mail2-relais-roc.national.inria.fr [192.134.164.83])
 by sourceware.org (Postfix) with ESMTPS id B41423985C0A
 for <libc-alpha@sourceware.org>; Wed, 24 Jun 2020 06:22:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B41423985C0A
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=inria.fr
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=Paul.Zimmermann@inria.fr
X-IronPort-AV: E=Sophos;i="5.75,274,1589234400"; d="scan'208";a="456361109"
Received: from tomate.loria.fr (HELO tomate) ([152.81.10.51])
 by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 24 Jun 2020 08:22:22 +0200
Date: Wed, 24 Jun 2020 08:22:22 +0200
Message-Id: <mw8sgddlup.fsf@tomate.loria.fr>
From: Paul Zimmermann <Paul.Zimmermann@inria.fr>
To: Paul E Murphy <murphyp@linux.ibm.com>
CC: libc-alpha@sourceware.org
In-reply-to: <a5cf1511-d1d1-39b2-ea28-fd84eb5c5361@linux.ibm.com> (message
 from Paul E Murphy on Mon, 22 Jun 2020 08:59:08 -0500)
Subject: Re: faster expf128
References: <mwmu4v4b42.fsf@tomate.loria.fr>
 <a5cf1511-d1d1-39b2-ea28-fd84eb5c5361@linux.ibm.com>
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_NUMSUBJECT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jun 2020 06:22:25 -0000

       Dear Paul,

thank you for your feedback.

> From: Paul E Murphy <murphyp@linux.ibm.com>
> Date: Mon, 22 Jun 2020 08:59:08 -0500
> 
> On 6/22/20 6:02 AM, Paul Zimmermann wrote:
> > I have written some expf128 for x86_64 that is more than 10 times faster than
> > the current glibc/libquadmath code [1] (see slide 21 of [2]).
> 
> I would highly recommend running the benchmarks against ppc64le or s390x 
> before replacing the existing implementation.  I think it would improve 
> the code to have more explicit separation between implementations 
> optimized for soft and hardfp if performance cannot be rectified.  I 
> think much of the float128 support assumes the underlying machine does 
> not natively support binary128.

I forgot to say my code is intended mainly for machines that do not provide
hardware float128 support. However I did compare with the glibc
expf128 on gcc135.fsffrance.org (ppc64le GNU/Linux) and below are the
results. You can reproduce them with the code from [1]. We see that
my implementation is about 27% faster, but slightly less accurate
(999585 instead of 999999 correct rounding over 1000000). One caveat
though: I did not find how to efficiently set the inexact flag, thus
it is not set in my code.

glibc function (with hardware float128):

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
GNU libc version: 2.28
GNU libc release: stable
correct roundings: 999999/1000000 max err=1 ulp(s)
maximal error for
x=-4.2166924211009987727735597908208042e+00
y=1.47473419221889191873789731438093288e-02
z=1.47473419221889191873789731438093303e-02

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
GNU libc version: 2.28
GNU libc release: stable
s=1.09651217175878924483994909720534935e+09

real	0m0.195s
user	0m0.194s
sys	0m0.000s

my implementation:

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
correct roundings: 999585/1000000 max err=1 ulp(s)
maximal error for
x=-9.88703896394271837099996910948152675e+00
y=5.08292305698879224291515174794000669e-05
z=5.08292305698879224291515174794000728e-05

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
s=1.09651217175878924483994909720534935e+09

real	0m0.143s
user	0m0.142s
sys	0m0.000s

> > Before making a proper patch for glibc, I'd like to make sure it fits the
> > glibc requirements. In particular, the table size is 16kb. Is that ok?
> > If too large, what table size would be ok?
> 
> I think that is acceptable.  The current tables for expf128 probably 
> aren't much smaller, if I recall correctly.

ok, then I will prepare a patch, once glibc 2.32 is out.

Best regards,
Paul

[1] https://homepages.loria.fr/PZimmermann/glibc-contrib/