* [PATCH v2 8/9] i386: Move hypot implementation to C @ 2021-10-28 13:43 Wilco Dijkstra 2021-11-01 17:09 ` Adhemerval Zanella 0 siblings, 1 reply; 5+ messages in thread From: Wilco Dijkstra @ 2021-10-28 13:43 UTC (permalink / raw) To: Adhemerval Zanella; +Cc: 'GNU C Library' Hi Adhemerval, +double +__ieee754_hypot (double x, double y) +{ + if ((isinf (x) || isinf (y)) + && !issignaling (x) && !issignaling (y)) + return INFINITY; + if (isnan (x) || isnan (y)) + return x + y; + + long double lx = x; + long double ly = y; + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); + math_check_force_underflow_nonneg (r); + return r; +} This looks very generic, so wouldn't it be possible to move to sysdeps/ieee754/ldbl-96? Wilco ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 8/9] i386: Move hypot implementation to C 2021-10-28 13:43 [PATCH v2 8/9] i386: Move hypot implementation to C Wilco Dijkstra @ 2021-11-01 17:09 ` Adhemerval Zanella 0 siblings, 0 replies; 5+ messages in thread From: Adhemerval Zanella @ 2021-11-01 17:09 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: 'GNU C Library' On 28/10/2021 10:43, Wilco Dijkstra wrote: > Hi Adhemerval, > > +double > +__ieee754_hypot (double x, double y) > +{ > + if ((isinf (x) || isinf (y)) > + && !issignaling (x) && !issignaling (y)) > + return INFINITY; > + if (isnan (x) || isnan (y)) > + return x + y; > + > + long double lx = x; > + long double ly = y; > + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); > + math_check_force_underflow_nonneg (r); > + return r; > +} > > This looks very generic, so wouldn't it be possible to move to > sysdeps/ieee754/ldbl-96? I am not sure this does make sense since ldbl-96 only contains symbols related to 'long double' and adding symbols for other floating point types is not the best code organization imho. Another reason is it really dependable of the underlying hardware, since 'long double' might be implemented with software emulation and using float/double operation might yield better performance. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 0/9] Improve hypot() @ 2021-10-25 11:57 Adhemerval Zanella 2021-10-25 11:57 ` [PATCH v2 8/9] i386: Move hypot implementation to C Adhemerval Zanella 0 siblings, 1 reply; 5+ messages in thread From: Adhemerval Zanella @ 2021-10-25 11:57 UTC (permalink / raw) To: libc-alpha This patchset uses a different algorithm for hypot() based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1]. This method is also used by Julia language. The motivation for this change are: 1. The new algorithm is more precise without minimum performance difference. 2. It allows to consolidate the implementations, since it favor floating-point operations over integer ones (as done for powerpc). This might be a boost for some architectures as well. The current hypot() implementation seems already to be bounded to a maximum of 1 ulp of error, however the new proposed algorithm shows an slight precision improvement by showing more correctly rounded results. With a random 1e9 inputs for different float format I see: - An improvement from 3427362 to 18457 results with 1 ulp of error for Binary64. - An improvement from 233442 to 1274 results with 1 ulp of error for Binary96 (x86_64). - An improvement from 453045 to 1294 results with 1 ulp of error for Binary96 (x86_64). Also for the maximal known error master shows (in ulps, with corresponding inputs), determined with [3]: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 binary64 0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022 binary96 0.981 0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l binary128 0.985 -0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384 With the new implementation: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same] binary64 0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022 binary96 0.584 -0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l binary128 0.749 0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192 Along with the newer implementation, the last patch also optimizes the call with similar work done for other math functions by removing the POSIX error handling (which is a big performance boost in some cases and architectures). I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not required since it calls the dbl-64 one. I have not adapated ldbl-128ibm since the format has a lot of caveats and IBM aims to move to ldbl-128. [1] https://arxiv.org/pdf/1904.09481.pdf [2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c [3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c --- Changes from v1: - Improved benchmarks to have the exponents of the two arguments close together. - Fixed typos. - Fixed i386 C implementation. --- Adhemerval Zanella (9): benchtests: Make hypot input random benchtests: Add hypotf math: Simplify hypotf implementation math: Use an improved algorithm for hypot (dbl-64) math: Use an improved algorithm for hypotl (ldbl-96) math: Use an improved algorithm for hypotl (ldbl-128) math: Remove powerpc e_hypot i386: Move hypot implementation to C math: Remove the error handling wrapper from hypot and hypotf benchtests/Makefile | 2 +- benchtests/hypot-inputs | 1015 ++++++++++++++++- benchtests/hypotf-inputs | 1007 ++++++++++++++++ math/Versions | 2 + math/w_hypot.c | 8 + math/w_hypot_compat.c | 13 +- math/w_hypotf.c | 8 + math/w_hypotf_compat.c | 6 +- sysdeps/i386/fpu/e_hypot.S | 75 -- sysdeps/i386/fpu/e_hypot.c | 56 + sysdeps/i386/fpu/e_hypotf.S | 64 -- sysdeps/ieee754/dbl-64/e_hypot.c | 247 ++-- sysdeps/ieee754/dbl-64/w_hypot.c | 1 + sysdeps/ieee754/flt-32/e_hypotf.c | 72 +- sysdeps/ieee754/flt-32/w_hypotf.c | 1 + sysdeps/ieee754/ldbl-128/e_hypotl.c | 222 ++-- sysdeps/ieee754/ldbl-96/e_hypotl.c | 227 ++-- sysdeps/mach/hurd/i386/libm.abilist | 2 + sysdeps/powerpc/fpu/e_hypot.c | 87 -- sysdeps/powerpc/fpu/e_hypotf.c | 78 -- .../powerpc32/power4/fpu/multiarch/Makefile | 5 +- .../power4/fpu/multiarch/e_hypot-power7.c | 23 - .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 - .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 - .../power4/fpu/multiarch/e_hypotf-power7.c | 23 - .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 - .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 - sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 + sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 + sysdeps/unix/sysv/linux/i386/libm.abilist | 2 + .../sysv/linux/m68k/coldfire/libm.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libm.abilist | 2 + .../sysv/linux/microblaze/be/libm.abilist | 2 + .../sysv/linux/microblaze/le/libm.abilist | 2 + .../unix/sysv/linux/mips/mips32/libm.abilist | 2 + .../unix/sysv/linux/mips/mips64/libm.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 + .../linux/powerpc/powerpc32/fpu/libm.abilist | 2 + .../powerpc/powerpc32/nofpu/libm.abilist | 2 + .../linux/powerpc/powerpc64/be/libm.abilist | 2 + .../linux/powerpc/powerpc64/le/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-32/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 + .../sysv/linux/sparc/sparc32/libm.abilist | 2 + .../sysv/linux/sparc/sparc64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libm.abilist | 2 + 52 files changed, 2487 insertions(+), 919 deletions(-) create mode 100644 benchtests/hypotf-inputs create mode 100644 math/w_hypot.c create mode 100644 math/w_hypotf.c delete mode 100644 sysdeps/i386/fpu/e_hypot.S create mode 100644 sysdeps/i386/fpu/e_hypot.c delete mode 100644 sysdeps/i386/fpu/e_hypotf.S create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c -- 2.32.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 8/9] i386: Move hypot implementation to C 2021-10-25 11:57 [PATCH v2 0/9] Improve hypot() Adhemerval Zanella @ 2021-10-25 11:57 ` Adhemerval Zanella 2021-10-25 17:10 ` Joseph Myers 0 siblings, 1 reply; 5+ messages in thread From: Adhemerval Zanella @ 2021-10-25 11:57 UTC (permalink / raw) To: libc-alpha The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu. --- sysdeps/i386/fpu/e_hypot.S | 75 ------------------------------------- sysdeps/i386/fpu/e_hypot.c | 44 ++++++++++++++++++++++ sysdeps/i386/fpu/e_hypotf.S | 64 ------------------------------- 3 files changed, 44 insertions(+), 139 deletions(-) delete mode 100644 sysdeps/i386/fpu/e_hypot.S create mode 100644 sysdeps/i386/fpu/e_hypot.c delete mode 100644 sysdeps/i386/fpu/e_hypotf.S diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S deleted file mode 100644 index f2c956b77a..0000000000 --- a/sysdeps/i386/fpu/e_hypot.S +++ /dev/null @@ -1,75 +0,0 @@ -/* Compute the hypothenuse of X and Y. - Copyright (C) 1998-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <sysdep.h> -#include <i386-math-asm.h> -#include <libm-alias-finite.h> - -DEFINE_DBL_MIN - -#ifdef PIC -# define MO(op) op##@GOTOFF(%edx) -#else -# define MO(op) op -#endif - - .text -ENTRY(__ieee754_hypot) -#ifdef PIC - LOAD_PIC_REG (dx) -#endif - fldl 4(%esp) // x - fxam - fnstsw - fldl 12(%esp) // y : x - movb %ah, %ch - fxam - fnstsw - movb %ah, %al - orb %ch, %ah - sahf - jc 1f - fmul %st(0) // y * y : x - fxch // x : y * y - fmul %st(0) // x * x : y * y - faddp // x * x + y * y - fsqrt - DBL_NARROW_EVAL_UFLOW_NONNEG -2: ret - - // We have to test whether any of the parameters is Inf. - // In this case the result is infinity. -1: andb $0x45, %al - cmpb $5, %al - je 3f // jump if y is Inf - andb $0x45, %ch - cmpb $5, %ch - jne 4f // jump if x is not Inf - fxch -3: fstp %st(1) - fabs - jmp 2b - -4: testb $1, %al - jnz 5f // y is NaN - fxch -5: fstp %st(1) - jmp 2b - -END(__ieee754_hypot) -libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c new file mode 100644 index 0000000000..d3473f2a6a --- /dev/null +++ b/sysdeps/i386/fpu/e_hypot.c @@ -0,0 +1,44 @@ +/* Euclidean distance function. Double/Binary64 i386 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <math.h> +#include <math_private.h> +#include <math-underflow.h> +#include <math-narrow-eval.h> +#include <libm-alias-finite.h> +#include <math_config.h> + +/* The i386 allows to use the default excess of precision to optimize the + hypot implementation, since internal multiplication and sqrt is carried + with 80-bit FP type. */ +double +__ieee754_hypot (double x, double y) +{ + if ((isinf (x) || isinf (y)) + && !issignaling (x) && !issignaling (y)) + return INFINITY; + if (isnan (x) || isnan (y)) + return x + y; + + long double lx = x; + long double ly = y; + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); + math_check_force_underflow_nonneg (r); + return r; +} +libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S deleted file mode 100644 index cec5d15403..0000000000 --- a/sysdeps/i386/fpu/e_hypotf.S +++ /dev/null @@ -1,64 +0,0 @@ -/* Compute the hypothenuse of X and Y. - Copyright (C) 1998-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <sysdep.h> -#include <i386-math-asm.h> -#include <libm-alias-finite.h> - - .text -ENTRY(__ieee754_hypotf) - flds 4(%esp) // x - fxam - fnstsw - flds 8(%esp) // y : x - movb %ah, %ch - fxam - fnstsw - movb %ah, %al - orb %ch, %ah - sahf - jc 1f - fmul %st(0) // y * y : x - fxch // x : y * y - fmul %st(0) // x * x : y * y - faddp // x * x + y * y - fsqrt - FLT_NARROW_EVAL -2: ret - - // We have to test whether any of the parameters is Inf. - // In this case the result is infinity. -1: andb $0x45, %al - cmpb $5, %al - je 3f // jump if y is Inf - andb $0x45, %ch - cmpb $5, %ch - jne 4f // jump if x is not Inf - fxch -3: fstp %st(1) - fabs - jmp 2b - -4: testb $1, %al - jnz 5f // y is NaN - fxch -5: fstp %st(1) - jmp 2b - -END(__ieee754_hypotf) -libm_alias_finite (__ieee754_hypotf, __hypotf) -- 2.32.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 8/9] i386: Move hypot implementation to C 2021-10-25 11:57 ` [PATCH v2 8/9] i386: Move hypot implementation to C Adhemerval Zanella @ 2021-10-25 17:10 ` Joseph Myers 2021-11-01 17:23 ` Adhemerval Zanella 0 siblings, 1 reply; 5+ messages in thread From: Joseph Myers @ 2021-10-25 17:10 UTC (permalink / raw) To: Adhemerval Zanella; +Cc: libc-alpha On Mon, 25 Oct 2021, Adhemerval Zanella via Libc-alpha wrote: > + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); I'd expect math_narrow_eval to be called on the result cast to double, not on a value of long double type. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 8/9] i386: Move hypot implementation to C 2021-10-25 17:10 ` Joseph Myers @ 2021-11-01 17:23 ` Adhemerval Zanella 0 siblings, 0 replies; 5+ messages in thread From: Adhemerval Zanella @ 2021-11-01 17:23 UTC (permalink / raw) To: Joseph Myers; +Cc: libc-alpha On 25/10/2021 14:10, Joseph Myers wrote: > On Mon, 25 Oct 2021, Adhemerval Zanella via Libc-alpha wrote: > >> + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); > > I'd expect math_narrow_eval to be called on the result cast to double, not > on a value of long double type. > Ack. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-01 17:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-28 13:43 [PATCH v2 8/9] i386: Move hypot implementation to C Wilco Dijkstra 2021-11-01 17:09 ` Adhemerval Zanella -- strict thread matches above, loose matches on Subject: below -- 2021-10-25 11:57 [PATCH v2 0/9] Improve hypot() Adhemerval Zanella 2021-10-25 11:57 ` [PATCH v2 8/9] i386: Move hypot implementation to C Adhemerval Zanella 2021-10-25 17:10 ` Joseph Myers 2021-11-01 17:23 ` Adhemerval Zanella
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).