faster expf128

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* faster expf128
@ 2020-06-22 11:02 Paul Zimmermann
  2020-06-22 13:59 ` Paul E Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Zimmermann @ 2020-06-22 11:02 UTC (permalink / raw)
  To: libc-alpha

       Hi,

I have written some expf128 for x86_64 that is more than 10 times faster than
the current glibc/libquadmath code [1] (see slide 21 of [2]).

Before making a proper patch for glibc, I'd like to make sure it fits the
glibc requirements. In particular, the table size is 16kb. Is that ok?
If too large, what table size would be ok?

Best regards,
Paul

[1] https://homepages.loria.fr/PZimmermann/glibc-contrib/
[2] https://members.loria.fr/PZimmermann/talks/quad.pdf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-22 11:02 faster expf128 Paul Zimmermann
@ 2020-06-22 13:59 ` Paul E Murphy
  2020-06-22 21:18   ` Joseph Myers
  2020-06-24  6:22   ` Paul Zimmermann
  0 siblings, 2 replies; 7+ messages in thread
From: Paul E Murphy @ 2020-06-22 13:59 UTC (permalink / raw)
  To: Paul Zimmermann, libc-alpha

On 6/22/20 6:02 AM, Paul Zimmermann wrote:
> I have written some expf128 for x86_64 that is more than 10 times faster than
> the current glibc/libquadmath code [1] (see slide 21 of [2]).

I would highly recommend running the benchmarks against ppc64le or s390x 
before replacing the existing implementation.  I think it would improve 
the code to have more explicit separation between implementations 
optimized for soft and hardfp if performance cannot be rectified.  I 
think much of the float128 support assumes the underlying machine does 
not natively support binary128.

> 
> Before making a proper patch for glibc, I'd like to make sure it fits the
> glibc requirements. In particular, the table size is 16kb. Is that ok?
> If too large, what table size would be ok?

I think that is acceptable.  The current tables for expf128 probably 
aren't much smaller, if I recall correctly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-22 13:59 ` Paul E Murphy
@ 2020-06-22 21:18   ` Joseph Myers
  2020-06-24  6:22   ` Paul Zimmermann
  1 sibling, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2020-06-22 21:18 UTC (permalink / raw)
  To: Paul E Murphy; +Cc: Paul Zimmermann, libc-alpha

On Mon, 22 Jun 2020, Paul E Murphy via Libc-alpha wrote:

> On 6/22/20 6:02 AM, Paul Zimmermann wrote:
> > I have written some expf128 for x86_64 that is more than 10 times faster
> > than
> > the current glibc/libquadmath code [1] (see slide 21 of [2]).
> 
> I would highly recommend running the benchmarks against ppc64le or s390x
> before replacing the existing implementation.  I think it would improve the

Specificall, ppc64le *on POWER9*, as POWER9 and s390x are the two 
supported configurations with hardware support for binary128, and where 
it's thus plausible that an implementation based on floating-point 
operations is faster than one using only integer operations.

As documented in the manual, glibc supports _Float128 for powerpc64le, 
x86_64, x86, ia64, aarch64, alpha, mips64, riscv, s390 and sparc (most of 
those are architectures where long double has binary128 format; for 
powerpc64le the long double format depends on compiler options and 
configuration, and for x86_64, x86 and ia64 _Float128 is always a 
different format from long double).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-22 13:59 ` Paul E Murphy
  2020-06-22 21:18   ` Joseph Myers
@ 2020-06-24  6:22   ` Paul Zimmermann
  2020-06-24 17:39     ` Joseph Myers
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Zimmermann @ 2020-06-24  6:22 UTC (permalink / raw)
  To: Paul E Murphy; +Cc: libc-alpha

       Dear Paul,

thank you for your feedback.

> From: Paul E Murphy <murphyp@linux.ibm.com>
> Date: Mon, 22 Jun 2020 08:59:08 -0500
> 
> On 6/22/20 6:02 AM, Paul Zimmermann wrote:
> > I have written some expf128 for x86_64 that is more than 10 times faster than
> > the current glibc/libquadmath code [1] (see slide 21 of [2]).
> 
> I would highly recommend running the benchmarks against ppc64le or s390x 
> before replacing the existing implementation.  I think it would improve 
> the code to have more explicit separation between implementations 
> optimized for soft and hardfp if performance cannot be rectified.  I 
> think much of the float128 support assumes the underlying machine does 
> not natively support binary128.

I forgot to say my code is intended mainly for machines that do not provide
hardware float128 support. However I did compare with the glibc
expf128 on gcc135.fsffrance.org (ppc64le GNU/Linux) and below are the
results. You can reproduce them with the code from [1]. We see that
my implementation is about 27% faster, but slightly less accurate
(999585 instead of 999999 correct rounding over 1000000). One caveat
though: I did not find how to efficiently set the inexact flag, thus
it is not set in my code.

glibc function (with hardware float128):

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
GNU libc version: 2.28
GNU libc release: stable
correct roundings: 999999/1000000 max err=1 ulp(s)
maximal error for
x=-4.2166924211009987727735597908208042e+00
y=1.47473419221889191873789731438093288e-02
z=1.47473419221889191873789731438093303e-02

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
GNU libc version: 2.28
GNU libc release: stable
s=1.09651217175878924483994909720534935e+09

real	0m0.195s
user	0m0.194s
sys	0m0.000s

my implementation:

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
correct roundings: 999585/1000000 max err=1 ulp(s)
maximal error for
x=-9.88703896394271837099996910948152675e+00
y=5.08292305698879224291515174794000669e-05
z=5.08292305698879224291515174794000728e-05

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
s=1.09651217175878924483994909720534935e+09

real	0m0.143s
user	0m0.142s
sys	0m0.000s

> > Before making a proper patch for glibc, I'd like to make sure it fits the
> > glibc requirements. In particular, the table size is 16kb. Is that ok?
> > If too large, what table size would be ok?
> 
> I think that is acceptable.  The current tables for expf128 probably 
> aren't much smaller, if I recall correctly.

ok, then I will prepare a patch, once glibc 2.32 is out.

Best regards,
Paul

[1] https://homepages.loria.fr/PZimmermann/glibc-contrib/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-24  6:22   ` Paul Zimmermann
@ 2020-06-24 17:39     ` Joseph Myers
  2020-06-26 10:09       ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Myers @ 2020-06-24 17:39 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: Paul E Murphy, libc-alpha

On Wed, 24 Jun 2020, Paul Zimmermann wrote:

> I forgot to say my code is intended mainly for machines that do not provide
> hardware float128 support. However I did compare with the glibc
> expf128 on gcc135.fsffrance.org (ppc64le GNU/Linux) and below are the
> results. You can reproduce them with the code from [1]. We see that
> my implementation is about 27% faster, but slightly less accurate
> (999585 instead of 999999 correct rounding over 1000000). One caveat
> though: I did not find how to efficiently set the inexact flag, thus
> it is not set in my code.

There is no expectation that most libm functions (other than those such as 
fma and sqrt that are fully defined by the corresponding IEEE operations) 
set inexact correctly.  (It is necessary to set other exceptions 
correctly, and the glibc testsuite verifies that; for expf128, that's 
invalid (for signaling NaNs), overflow and underflow.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-24 17:39     ` Joseph Myers
@ 2020-06-26 10:09       ` Szabolcs Nagy
  2020-06-29 22:39         ` Joseph Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2020-06-26 10:09 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Paul Zimmermann, Paul E Murphy, libc-alpha

The 06/24/2020 17:39, Joseph Myers wrote:
> On Wed, 24 Jun 2020, Paul Zimmermann wrote:
> 
> > I forgot to say my code is intended mainly for machines that do not provide
> > hardware float128 support. However I did compare with the glibc
> > expf128 on gcc135.fsffrance.org (ppc64le GNU/Linux) and below are the
> > results. You can reproduce them with the code from [1]. We see that
> > my implementation is about 27% faster, but slightly less accurate
> > (999585 instead of 999999 correct rounding over 1000000). One caveat
> > though: I did not find how to efficiently set the inexact flag, thus
> > it is not set in my code.
> 
> There is no expectation that most libm functions (other than those such as 
> fma and sqrt that are fully defined by the corresponding IEEE operations) 
> set inexact correctly.  (It is necessary to set other exceptions 
> correctly, and the glibc testsuite verifies that; for expf128, that's 
> invalid (for signaling NaNs), overflow and underflow.)

and presumably target specific exceptions are not considered.
(e.g. input denormal on x86)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: faster expf128
  2020-06-26 10:09       ` Szabolcs Nagy
@ 2020-06-29 22:39         ` Joseph Myers
  0 siblings, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2020-06-29 22:39 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Paul E Murphy, libc-alpha

On Fri, 26 Jun 2020, Szabolcs Nagy wrote:

> > There is no expectation that most libm functions (other than those such as 
> > fma and sqrt that are fully defined by the corresponding IEEE operations) 
> > set inexact correctly.  (It is necessary to set other exceptions 
> > correctly, and the glibc testsuite verifies that; for expf128, that's 
> > invalid (for signaling NaNs), overflow and underflow.)
> 
> and presumably target specific exceptions are not considered.
> (e.g. input denormal on x86)

Indeed, there are no expectations that libm functions do anything in 
particular with those.  (Those <fenv.h> functions that manipulate the 
whole environment, such as fegetenv and fesetenv, should still save and 
restore such architecture-specific state.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-29 22:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 11:02 faster expf128 Paul Zimmermann
2020-06-22 13:59 ` Paul E Murphy
2020-06-22 21:18   ` Joseph Myers
2020-06-24  6:22   ` Paul Zimmermann
2020-06-24 17:39     ` Joseph Myers
2020-06-26 10:09       ` Szabolcs Nagy
2020-06-29 22:39         ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).