[PATCH v5 00/12] Improve hypot

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v5 00/12] Improve hypot
@ 2021-12-07 19:03 Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 01/12] math: Simplify hypotf implementation Adhemerval Zanella
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This patchset add a different algorithm along with Wilco [1] performance
improvements.  The default implementation is based on the 'An Improved 
Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and
improvements (although the dbl-64 one uses a slight different approach
when fast FMA is avaliable).  This method is also used by Julia language
runtime [2].

The motivation for this change are:

  1. Use a newer algotihm that favor FP operations over interger ones
     (which tends to show better performance specially on newer
     processor with multiple FP units).  It also allows consolidate
     the multiple implementation (for instance, the powerpc one).

  2. The new algorithm is more precise without minimum performance
     difference.

The current hypot() implementation seems already to be bounded to a
maximum of 1 ulp of error, however the new proposed algorithm shows an
slight precision improvement by showing more correctly rounded results.

With a random 1e9 inputs for different float format I see:

  - An improvement from 3427362 to 18457 results with 1 ulp of
    error for Binary64.
  - An improvement from 233442 to 1274 results with 1 ulp of
    error for Binary96 (x86_64).
  - An improvement from 453045 to 1294 results with 1 ulp of
    error for Binary96 (x86_64).

Also for the maximal known error master shows (in ulps, with
corresponding inputs), determined with [3]:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77
  binary64  0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022
  binary96  0.981
0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l
  binary128 0.985
-0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384

With the new implementation:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same]
  binary64  0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022
  binary96  0.584
-0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l
  binary128 0.749
0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192

If FMA is uses the binary64 shows a slight worse precision:

I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not
required since it calls the dbl-64 one.  I have not adapated ldbl-128ibm
since the format has a lot of caveats and IBM aims to move to ldbl-128.

[1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html
[1] https://arxiv.org/pdf/1904.09481.pdf
[2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c
[3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c

---
Changes from v5:
 * Add missing cast on generic hypotf.
 * Add missing math_narrow_eval on generic hypot.
 * Fixed return value on generic ldbl-96 hypotl.
 * Fixed return value on generic ldbl-128 hypotl.
 * Add POWER10 performance values.
 * Add missing cast on i686 hypotf.
 * Rewrite math-use-builtinds-fmax.h and math-use-builtinds-fmin.h
   based on Joseph's suggestions.
---

Adhemerval Zanella (9):
  math: Simplify hypotf implementation
  math: Use an improved algorithm for hypotl (ldbl-96)
  math: Use an improved algorithm for hypotl (ldbl-128)
  i386: Move hypot implementation to C
  math: Remove powerpc e_hypot
  math: Add math-use-builtinds-fmax.h
  math: Add math-use-builtinds-fmin.h
  aarch64: Add math-use-builtins-f{max,min}.h
  math: Remove the error handling wrapper from hypot and hypotf

Wilco Dijkstra (3):
  math: Use an improved algorithm for hypot (dbl-64)
  math: Improve hypot performance with FMA
  math: Use fmin/fmax on hypot

 math/Versions                                 |   2 +
 math/s_fmax_template.c                        |   5 +
 math/s_fmin_template.c                        |   6 +-
 math/w_hypot.c                                |   8 +
 math/w_hypot_compat.c                         |  13 +-
 math/w_hypotf.c                               |   8 +
 math/w_hypotf_compat.c                        |   6 +-
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h  |   4 +
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h  |   4 +
 sysdeps/aarch64/fpu/s_fmax.c                  |  28 --
 sysdeps/aarch64/fpu/s_fmaxf.c                 |  28 --
 sysdeps/aarch64/fpu/s_fmin.c                  |  28 --
 sysdeps/aarch64/fpu/s_fminf.c                 |  28 --
 sysdeps/generic/math-type-macros-double.h     |   1 +
 sysdeps/generic/math-type-macros-float.h      |   1 +
 sysdeps/generic/math-type-macros-float128.h   |   1 +
 sysdeps/generic/math-type-macros-ldouble.h    |   1 +
 sysdeps/generic/math-use-builtins-fmax.h      |   4 +
 sysdeps/generic/math-use-builtins-fmin.h      |   4 +
 sysdeps/generic/math-use-builtins.h           |   2 +
 sysdeps/i386/fpu/e_hypot.S                    |  75 -----
 sysdeps/i386/fpu/e_hypot.c                    |  58 ++++
 sysdeps/i386/fpu/e_hypotf.S                   |  64 -----
 sysdeps/ieee754/dbl-64/e_hypot.c              | 270 ++++++++----------
 sysdeps/ieee754/dbl-64/w_hypot.c              |   1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             |  79 ++---
 sysdeps/ieee754/flt-32/math_config.h          |   9 +
 sysdeps/ieee754/flt-32/w_hypotf.c             |   1 +
 sysdeps/ieee754/ldbl-128/e_hypotl.c           | 226 +++++++--------
 sysdeps/ieee754/ldbl-96/e_hypotl.c            | 231 +++++++--------
 sysdeps/mach/hurd/i386/libm.abilist           |   2 +
 sysdeps/powerpc/fpu/e_hypot.c                 |  87 ------
 sysdeps/powerpc/fpu/e_hypotf.c                |  78 -----
 .../powerpc32/power4/fpu/multiarch/Makefile   |   5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     |  23 --
 .../power4/fpu/multiarch/e_hypot-ppc32.c      |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  |  33 ---
 .../power4/fpu/multiarch/e_hypotf-power7.c    |  23 --
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c |  33 ---
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libm.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libm.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |   2 +
 .../sysv/linux/microblaze/be/libm.abilist     |   2 +
 .../sysv/linux/microblaze/le/libm.abilist     |   2 +
 .../unix/sysv/linux/mips/mips32/libm.abilist  |   2 +
 .../unix/sysv/linux/mips/mips64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |   2 +
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libm.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libm.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libm.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libm.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libm.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libm.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libm.abilist     |   2 +
 .../unix/sysv/linux/x86_64/64/libm.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |   2 +
 65 files changed, 547 insertions(+), 1029 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 01/12] math: Simplify hypotf implementation
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float.  With gcc it
results in 2 FP comparisons.

The file Copyright is also changed to use  GPL, the implementation was
completely changed by 7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.

Checked on x86_64-linux-gnu.
---
 sysdeps/ieee754/flt-32/e_hypotf.c    | 64 +++++++++++++---------------
 sysdeps/ieee754/flt-32/math_config.h |  9 ++++
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index e770947dc1..436e3c2da8 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -1,46 +1,40 @@
-/* e_hypotf.c -- float version of e_hypot.c.
- */
+/* Euclidean distance function.  Float/Binary32 version.
+   Copyright (C) 2012-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
 
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
 
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <libm-alias-finite.h>
 #include <math.h>
+#include "math_config.h"
+#include <math-narrow-eval.h>
 #include <math_private.h>
-#include <libm-alias-finite.h>
 
 float
-__ieee754_hypotf(float x, float y)
+__ieee754_hypotf (float x, float y)
 {
-	double d_x, d_y;
-	int32_t ha, hb;
-
-	GET_FLOAT_WORD(ha,x);
-	ha &= 0x7fffffff;
-	GET_FLOAT_WORD(hb,y);
-	hb &= 0x7fffffff;
-	if (ha == 0x7f800000 && !issignaling (y))
-	  return fabsf(x);
-	else if (hb == 0x7f800000 && !issignaling (x))
-	  return fabsf(y);
-	else if (ha > 0x7f800000 || hb > 0x7f800000)
-	  return fabsf(x) * fabsf(y);
-	else if (ha == 0)
-	  return fabsf(y);
-	else if (hb == 0)
-	  return fabsf(x);
-
-	d_x = (double) x;
-	d_y = (double) y;
+   if (!isfinite(x) || !isfinite(y))
+     {
+       if ((isinf (x) || isinf (y))
+	   && !issignalingf_inline (x) && !issignalingf_inline (y))
+	 return INFINITY;
+       return x + y;
+     }
 
-	return (float) sqrt(d_x * d_x + d_y * d_y);
+  return math_narrow_eval ((float) sqrt ((double) x * (double) x
+					 + (double) y * (double) y));
 }
 #ifndef __ieee754_hypotf
 libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index 513454a297..daa8a82f99 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -101,6 +101,15 @@ asdouble (uint64_t i)
   return u.f;
 }
 
+static inline int
+issignalingf_inline (float x)
+{
+  uint32_t ix = asuint (x);
+  if (HIGH_ORDER_BIT_IS_SET_FOR_SNAN)
+    return (ix & 0x7fc00000) == 0x7fc00000;
+  return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL;
+}
+
 #define NOINLINE __attribute__ ((noinline))
 
 attribute_hidden float __math_oflowf (uint32_t);
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 02/12] math: Use an improved algorithm for hypot (dbl-64)
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 01/12] math: Simplify hypotf implementation Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for denormal results.

The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).

The performance result are also only slight worse than current
implementation.  On x86_64 (Ryzen 5900X) with gcc 12:

Before:

  "hypot": {
   "workload-random": {
    "duration": 3.73319e+09,
    "iterations": 1.12e+08,
    "reciprocal-throughput": 22.8737,
    "latency": 43.7904,
    "max-throughput": 4.37184e+07,
    "min-throughput": 2.28361e+07
   }
  }

After:

  "hypot": {
   "workload-random": {
    "duration": 3.7597e+09,
    "iterations": 9.8e+07,
    "reciprocal-throughput": 23.7547,
    "latency": 52.9739,
    "max-throughput": 4.2097e+07,
    "min-throughput": 1.88772e+07
   }
  }

Co-Authored-By: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 235 ++++++++++++-------------------
 1 file changed, 92 insertions(+), 143 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 9ec4c1ced0..75bce2df4e 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -1,164 +1,113 @@
-/* @(#)e_hypot.c 5.1 93/09/24 */
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypot(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrt(2)/2 ulp, than
- *	sqrt(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrt(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypot(x,y) is INF if x or y is +INF or -INF; else
- *	hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypot(x,y) returns sqrt(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Double/Binary64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* The implementation uses a correction based on 'An Improved Algorithm for
+   hypot(a,b)' by Carlos F. Borges [1] usingthe MyHypot3 with the following
+   changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   The expected ULP is ~0.792.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
+#include <math-narrow-eval.h>
 #include <libm-alias-finite.h>
+#include "math_config.h"
 
-double
-__ieee754_hypot (double x, double y)
+#define SCALE     0x1p-600
+#define LARGE_VAL 0x1p+511
+#define TINY_VAL  0x1p-459
+#define EPS       0x1p-54
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline double
+kernel (double ax, double ay)
 {
-  double a, b, t1, t2, y1, y2, w;
-  int32_t j, k, ha, hb;
-
-  GET_HIGH_WORD (ha, x);
-  ha &= 0x7fffffff;
-  GET_HIGH_WORD (hb, y);
-  hb &= 0x7fffffff;
-  if (hb > ha)
+  double t1, t2;
+  double h = sqrt (ax * ax + ay * ay);
+  if (h <= 2.0 * ay)
     {
-      a = y; b = x; j = ha; ha = hb; hb = j;
+      double delta = h - ay;
+      t1 = ax * (2.0 * delta - ax);
+      t2 = (delta - 2.0 * (ax - ay)) * delta;
     }
   else
     {
-      a = x; b = y;
-    }
-  SET_HIGH_WORD (a, ha);        /* a <- |a| */
-  SET_HIGH_WORD (b, hb);        /* b <- |b| */
-  if ((ha - hb) > 0x3c00000)
-    {
-      return a + b;
-    }                                       /* x/y > 2**60 */
-  k = 0;
-  if (__glibc_unlikely (ha > 0x5f300000))                  /* a>2**500 */
-    {
-      if (ha >= 0x7ff00000)             /* Inf or NaN */
-	{
-	  uint32_t low;
-	  w = a + b;                    /* for sNaN */
-	  if (issignaling (a) || issignaling (b))
-	    return w;
-	  GET_LOW_WORD (low, a);
-	  if (((ha & 0xfffff) | low) == 0)
-	    w = a;
-	  GET_LOW_WORD (low, b);
-	  if (((hb ^ 0x7ff00000) | low) == 0)
-	    w = b;
-	  return w;
-	}
-      /* scale a and b by 2**-600 */
-      ha -= 0x25800000; hb -= 0x25800000;  k += 600;
-      SET_HIGH_WORD (a, ha);
-      SET_HIGH_WORD (b, hb);
-    }
-  if (__builtin_expect (hb < 0x23d00000, 0))            /* b < 2**-450 */
-    {
-      if (hb <= 0x000fffff)             /* subnormal b or 0 */
-	{
-	  uint32_t low;
-	  GET_LOW_WORD (low, b);
-	  if ((hb | low) == 0)
-	    return a;
-	  t1 = 0;
-	  SET_HIGH_WORD (t1, 0x7fd00000);       /* t1=2^1022 */
-	  b *= t1;
-	  a *= t1;
-	  k -= 1022;
-	  GET_HIGH_WORD (ha, a);
-	  GET_HIGH_WORD (hb, b);
-	  if (hb > ha)
-	    {
-	      t1 = a;
-	      a = b;
-	      b = t1;
-	      j = ha;
-	      ha = hb;
-	      hb = j;
-	    }
-	}
-      else                      /* scale a and b by 2^600 */
-	{
-	  ha += 0x25800000;             /* a *= 2^600 */
-	  hb += 0x25800000;             /* b *= 2^600 */
-	  k -= 600;
-	  SET_HIGH_WORD (a, ha);
-	  SET_HIGH_WORD (b, hb);
-	}
+      double delta = h - ax;
+      t1 = 2.0 * delta * (ax - 2.0 * ay);
+      t2 = (4.0 * delta - ay) * ay + delta * delta;
     }
-  /* medium size a and b */
-  w = a - b;
-  if (w > b)
+
+  h -= (t1 + t2) / (2.0 * h);
+  return h;
+}
+
+double
+__ieee754_hypot (double x, double y)
+{
+  if (!isfinite(x) || !isfinite(y))
     {
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha);
-      t2 = a - t1;
-      w = sqrt (t1 * t1 - (b * (-b) - t2 * (a + t1)));
+      if ((isinf (x) || isinf (y))
+	  && !issignaling_inline (x) && !issignaling_inline (y))
+	return INFINITY;
+      return x + y;
     }
-  else
+
+  x = fabs (x);
+  y = fabs (y);
+
+  double ax = x < y ? y : x;
+  double ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
     {
-      a = a + a;
-      y1 = 0;
-      SET_HIGH_WORD (y1, hb);
-      y2 = b - y1;
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha + 0x00100000);
-      t2 = a - t1;
-      w = sqrt (t1 * y1 - (w * (-w) - (t1 * y2 + t2 * b)));
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return math_narrow_eval (ax + ay);
+
+      return math_narrow_eval (kernel (ax * SCALE, ay * SCALE) / SCALE);
     }
-  if (k != 0)
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
     {
-      uint32_t high;
-      t1 = 1.0;
-      GET_HIGH_WORD (high, t1);
-      SET_HIGH_WORD (t1, high + (k << 20));
-      w *= t1;
-      math_check_force_underflow_nonneg (w);
-      return w;
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return math_narrow_eval (ax + ay);
+
+      ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE);
+      math_check_force_underflow_nonneg (ax);
+      return ax;
     }
-  else
-    return w;
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 #ifndef __ieee754_hypot
 libm_alias_finite (__ieee754_hypot, __hypot)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 03/12] math: Improve hypot performance with FMA
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 01/12] math: Simplify hypotf implementation Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency.  The non-fma version has 30% higher throughput and 10%
higher latency.

Max ULP error is 0.949 with fma and 0.792 without fma.

Passes GLIBC testsuite.
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 75bce2df4e..6fedf0d61f 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
      rounding mode.
    - Handle required underflow exception for subnormal results.
 
-   The expected ULP is ~0.792.
+   The expected ULP is ~0.792 or ~0.948 if FMA is used.  For FMA, the
+   correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+   if x^2 + y^2 is computed with less than 0.707 ULP error.  If |x| >= |2y|,
+   fma (x, x, y^2) has ~0.625 ULP.  If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+   has ~0.625 ULP.
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
@@ -48,6 +52,16 @@ static inline double
 kernel (double ax, double ay)
 {
   double t1, t2;
+#ifdef __FP_FAST_FMA
+  t1 = ay + ay;
+  t2 = ax - ay;
+
+  if (t1 >= ax)
+    return sqrt (fma (t1, ax, t2 * t2));
+  else
+    return sqrt (fma (ax, ax, ay * ay));
+
+#else
   double h = sqrt (ax * ax + ay * ay);
   if (h <= 2.0 * ay)
     {
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
 
   h -= (t1 + t2) / (2.0 * h);
   return h;
+#endif
 }
 
 double
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for subnormal results.

The main advantage of the new algorithm is its precision.  With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 ++++++++++++-----------------
 1 file changed, 98 insertions(+), 133 deletions(-)

diff --git a/sysdeps/ieee754/ldbl-96/e_hypotl.c b/sysdeps/ieee754/ldbl-96/e_hypotl.c
index 44e72353c0..b3799244c6 100644
--- a/sysdeps/ieee754/ldbl-96/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-96/e_hypotl.c
@@ -1,142 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrt(2)/2 ulp, than
- *	sqrt(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrt(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypot(x,y) is INF if x or y is +INF or -INF; else
- *	hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypot(x,y) returns sqrt(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Long Double/Binary96 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+   Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <libm-alias-finite.h>
 
-long double __ieee754_hypotl(long double x, long double y)
+#define SCALE      0x8p-8257L
+#define LARGE_VAL  0xb.504f333f9de6484p+8188L
+#define TINY_VAL   0x8p-8194L
+#define EPS        0x8p-68L
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline long double
+kernel (long double ax, long double ay)
 {
-	long double a,b,t1,t2,y1,y2,w;
-	uint32_t j,k,ea,eb;
-
-	GET_LDOUBLE_EXP(ea,x);
-	ea &= 0x7fff;
-	GET_LDOUBLE_EXP(eb,y);
-	eb &= 0x7fff;
-	if(eb > ea) {a=y;b=x;j=ea; ea=eb;eb=j;} else {a=x;b=y;}
-	SET_LDOUBLE_EXP(a,ea);	/* a <- |a| */
-	SET_LDOUBLE_EXP(b,eb);	/* b <- |b| */
-	if((ea-eb)>0x46) {return a+b;} /* x/y > 2**70 */
-	k=0;
-	if(__builtin_expect(ea > 0x5f3f,0)) {	/* a>2**8000 */
-	   if(ea == 0x7fff) {	/* Inf or NaN */
-	       uint32_t exp __attribute__ ((unused));
-	       uint32_t high,low;
-	       w = a+b;			/* for sNaN */
-	       if (issignaling (a) || issignaling (b))
-		 return w;
-	       GET_LDOUBLE_WORDS(exp,high,low,a);
-	       if(((high&0x7fffffff)|low)==0) w = a;
-	       GET_LDOUBLE_WORDS(exp,high,low,b);
-	       if(((eb^0x7fff)|(high&0x7fffffff)|low)==0) w = b;
-	       return w;
-	   }
-	   /* scale a and b by 2**-9600 */
-	   ea -= 0x2580; eb -= 0x2580;	k += 9600;
-	   SET_LDOUBLE_EXP(a,ea);
-	   SET_LDOUBLE_EXP(b,eb);
-	}
-	if(__builtin_expect(eb < 0x20bf, 0)) {	/* b < 2**-8000 */
-	    if(eb == 0) {	/* subnormal b or 0 */
-		uint32_t exp __attribute__ ((unused));
-		uint32_t high,low;
-		GET_LDOUBLE_WORDS(exp,high,low,b);
-		if((high|low)==0) return a;
-		SET_LDOUBLE_WORDS(t1, 0x7ffd, 0x80000000, 0); /* t1=2^16382 */
-		b *= t1;
-		a *= t1;
-		k -= 16382;
-		GET_LDOUBLE_EXP (ea, a);
-		GET_LDOUBLE_EXP (eb, b);
-		if (eb > ea)
-		  {
-		    t1 = a;
-		    a = b;
-		    b = t1;
-		    j = ea;
-		    ea = eb;
-		    eb = j;
-		  }
-	    } else {		/* scale a and b by 2^9600 */
-		ea += 0x2580;	/* a *= 2^9600 */
-		eb += 0x2580;	/* b *= 2^9600 */
-		k -= 9600;
-		SET_LDOUBLE_EXP(a,ea);
-		SET_LDOUBLE_EXP(b,eb);
-	    }
-	}
-    /* medium size a and b */
-	w = a-b;
-	if (w>b) {
-	    uint32_t high;
-	    GET_LDOUBLE_MSW(high,a);
-	    SET_LDOUBLE_WORDS(t1,ea,high,0);
-	    t2 = a-t1;
-	    w  = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
-	} else {
-	    uint32_t high;
-	    GET_LDOUBLE_MSW(high,b);
-	    a  = a+a;
-	    SET_LDOUBLE_WORDS(y1,eb,high,0);
-	    y2 = b - y1;
-	    GET_LDOUBLE_MSW(high,a);
-	    SET_LDOUBLE_WORDS(t1,ea+1,high,0);
-	    t2 = a - t1;
-	    w  = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
-	}
-	if(k!=0) {
-	    uint32_t exp;
-	    t1 = 1.0;
-	    GET_LDOUBLE_EXP(exp,t1);
-	    SET_LDOUBLE_EXP(t1,exp+k);
-	    w *= t1;
-	    math_check_force_underflow_nonneg (w);
-	    return w;
-	} else return w;
+  long double t1, t2;
+  long double h = sqrtl (ax * ax + ay * ay);
+  if (h <= 2.0L * ay)
+    {
+      long double delta = h - ay;
+      t1 = ax * (2.0L * delta - ax);
+      t2 = (delta - 2.0L * (ax - ay)) * delta;
+    }
+  else
+    {
+      long double delta = h - ax;
+      t1 = 2.0L * delta * (ax - 2.0L * ay);
+      t2 = (4.0L * delta - ay) * ay + delta * delta;
+    }
+
+  h -= (t1 + t2) / (2.0L * h);
+  return h;
+}
+
+long double
+__ieee754_hypotl (long double x, long double y)
+{
+  if (!isfinite(x) || !isfinite(y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  x = fabsl (x);
+  y = fabsl (y);
+
+  long double ax = x < y ? y : x;
+  long double ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
+    {
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return ax + ay;
+
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
+    }
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return ax + ay;;
+
+      ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+      math_check_force_underflow_nonneg (ax);
+      return ax;
+    }
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 libm_alias_finite (__ieee754_hypotl, __hypotl)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 06/12] i386: Move hypot implementation to C Adhemerval Zanella
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:

  - Handle qNaN and sNaN.
  - Tune the 'widely varying operands' to avoid spurious underflow
    due the multiplication and fix the return value for upwards
    rounding mode.
  - Handle required underflow exception for subnormal results.

The main advantage of the new algorithm is its precision.  With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 ++++++++++++----------------
 1 file changed, 96 insertions(+), 130 deletions(-)

diff --git a/sysdeps/ieee754/ldbl-128/e_hypotl.c b/sysdeps/ieee754/ldbl-128/e_hypotl.c
index cd4fdbc4a6..43affd9167 100644
--- a/sysdeps/ieee754/ldbl-128/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-128/e_hypotl.c
@@ -1,141 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrtl(2)/2 ulp, than
- *	sqrtl(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrtl(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 64 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 64 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 64 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypotl(x,y) is INF if x or y is +INF or -INF; else
- *	hypotl(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypotl(x,y) returns sqrtl(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Long Double/Binary128 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+   Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <libm-alias-finite.h>
 
+#define SCALE      L(0x1p-8303)
+#define LARGE_VAL  L(0x1.6a09e667f3bcc908b2fb1366ea95p+8191)
+#define TINY_VAL   L(0x1p-8191)
+#define EPS        L(0x1p-114)
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline _Float128
+kernel (_Float128 ax, _Float128 ay)
+{
+  _Float128 t1, t2;
+  _Float128 h = sqrtl (ax * ax + ay * ay);
+  if (h <= L(2.0) * ay)
+    {
+      _Float128 delta = h - ay;
+      t1 = ax * (L(2.0) * delta - ax);
+      t2 = (delta - L(2.0) * (ax - ay)) * delta;
+    }
+  else
+    {
+      _Float128 delta = h - ax;
+      t1 = L(2.0) * delta * (ax - L(2.0) * ay);
+      t2 = (L(4.0) * delta - ay) * ay + delta * delta;
+    }
+
+  h -= (t1 + t2) / (L(2.0) * h);
+  return h;
+}
+
 _Float128
 __ieee754_hypotl(_Float128 x, _Float128 y)
 {
-	_Float128 a,b,t1,t2,y1,y2,w;
-	int64_t j,k,ha,hb;
-
-	GET_LDOUBLE_MSW64(ha,x);
-	ha &= 0x7fffffffffffffffLL;
-	GET_LDOUBLE_MSW64(hb,y);
-	hb &= 0x7fffffffffffffffLL;
-	if(hb > ha) {a=y;b=x;j=ha; ha=hb;hb=j;} else {a=x;b=y;}
-	SET_LDOUBLE_MSW64(a,ha);	/* a <- |a| */
-	SET_LDOUBLE_MSW64(b,hb);	/* b <- |b| */
-	if((ha-hb)>0x78000000000000LL) {return a+b;} /* x/y > 2**120 */
-	k=0;
-	if(ha > 0x5f3f000000000000LL) {	/* a>2**8000 */
-	   if(ha >= 0x7fff000000000000LL) {	/* Inf or NaN */
-	       uint64_t low;
-	       w = a+b;			/* for sNaN */
-	       if (issignaling (a) || issignaling (b))
-		 return w;
-	       GET_LDOUBLE_LSW64(low,a);
-	       if(((ha&0xffffffffffffLL)|low)==0) w = a;
-	       GET_LDOUBLE_LSW64(low,b);
-	       if(((hb^0x7fff000000000000LL)|low)==0) w = b;
-	       return w;
-	   }
-	   /* scale a and b by 2**-9600 */
-	   ha -= 0x2580000000000000LL;
-	   hb -= 0x2580000000000000LL;	k += 9600;
-	   SET_LDOUBLE_MSW64(a,ha);
-	   SET_LDOUBLE_MSW64(b,hb);
-	}
-	if(hb < 0x20bf000000000000LL) {	/* b < 2**-8000 */
-	    if(hb <= 0x0000ffffffffffffLL) {	/* subnormal b or 0 */
-		uint64_t low;
-		GET_LDOUBLE_LSW64(low,b);
-		if((hb|low)==0) return a;
-		t1=0;
-		SET_LDOUBLE_MSW64(t1,0x7ffd000000000000LL); /* t1=2^16382 */
-		b *= t1;
-		a *= t1;
-		k -= 16382;
-		GET_LDOUBLE_MSW64 (ha, a);
-		GET_LDOUBLE_MSW64 (hb, b);
-		if (hb > ha)
-		  {
-		    t1 = a;
-		    a = b;
-		    b = t1;
-		    j = ha;
-		    ha = hb;
-		    hb = j;
-		  }
-	    } else {		/* scale a and b by 2^9600 */
-		ha += 0x2580000000000000LL;	/* a *= 2^9600 */
-		hb += 0x2580000000000000LL;	/* b *= 2^9600 */
-		k -= 9600;
-		SET_LDOUBLE_MSW64(a,ha);
-		SET_LDOUBLE_MSW64(b,hb);
-	    }
-	}
-    /* medium size a and b */
-	w = a-b;
-	if (w>b) {
-	    t1 = 0;
-	    SET_LDOUBLE_MSW64(t1,ha);
-	    t2 = a-t1;
-	    w  = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
-	} else {
-	    a  = a+a;
-	    y1 = 0;
-	    SET_LDOUBLE_MSW64(y1,hb);
-	    y2 = b - y1;
-	    t1 = 0;
-	    SET_LDOUBLE_MSW64(t1,ha+0x0001000000000000LL);
-	    t2 = a - t1;
-	    w  = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
-	}
-	if(k!=0) {
-	    uint64_t high;
-	    t1 = 1;
-	    GET_LDOUBLE_MSW64(high,t1);
-	    SET_LDOUBLE_MSW64(t1,high+(k<<48));
-	    w *= t1;
-	    math_check_force_underflow_nonneg (w);
-	    return w;
-	} else return w;
+  if (!isfinite(x) || !isfinite(y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  x = fabsl (x);
+  y = fabsl (y);
+
+  _Float128 ax = x < y ? y : x;
+  _Float128 ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
+    {
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return ax + ay;
+
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
+    }
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return ax + ay;
+
+      ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+      math_check_force_underflow_nonneg (ax);
+      return ax;
+    }
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 libm_alias_finite (__ieee754_hypotl, __hypotl)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 06/12] i386: Move hypot implementation to C
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 07/12] math: Remove powerpc e_hypot Adhemerval Zanella
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling.  The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly.  A similar
implementation is provided instead of using the generic implementation:

Checked on i686-linux-gnu.
---
 sysdeps/i386/fpu/e_hypot.S  | 75 -------------------------------------
 sysdeps/i386/fpu/e_hypot.c  | 46 +++++++++++++++++++++++
 sysdeps/i386/fpu/e_hypotf.S | 64 -------------------------------
 3 files changed, 46 insertions(+), 139 deletions(-)
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S

diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S
deleted file mode 100644
index f2c956b77a..0000000000
--- a/sysdeps/i386/fpu/e_hypot.S
+++ /dev/null
@@ -1,75 +0,0 @@
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-DEFINE_DBL_MIN
-
-#ifdef PIC
-# define MO(op) op##@GOTOFF(%edx)
-#else
-# define MO(op) op
-#endif
-
-	.text
-ENTRY(__ieee754_hypot)
-#ifdef  PIC
-	LOAD_PIC_REG (dx)
-#endif
-	fldl	4(%esp)		// x
-	fxam
-	fnstsw
-	fldl	12(%esp)	// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	DBL_NARROW_EVAL_UFLOW_NONNEG
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypot)
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
new file mode 100644
index 0000000000..cc13714ccb
--- /dev/null
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -0,0 +1,46 @@
+/* Euclidean distance function.  Double/Binary64 i386 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include <math_private.h>
+#include <math-underflow.h>
+#include <math-narrow-eval.h>
+#include <libm-alias-finite.h>
+#include <math_config.h>
+
+/* The i386 allows to use the default excess of precision to optimize the
+   hypot implementation, since internal multiplication and sqrt is carried
+   with 80-bit FP type.  */
+double
+__ieee754_hypot (double x, double y)
+{
+  if (!isfinite (x) || !isfinite (y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  long double lx = x;
+  long double ly = y;
+  double r = math_narrow_eval ((double) sqrtl (lx * lx + ly * ly));
+  math_check_force_underflow_nonneg (r);
+  return r;
+}
+libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S
deleted file mode 100644
index cec5d15403..0000000000
--- a/sysdeps/i386/fpu/e_hypotf.S
+++ /dev/null
@@ -1,64 +0,0 @@
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-	.text
-ENTRY(__ieee754_hypotf)
-	flds	4(%esp)		// x
-	fxam
-	fnstsw
-	flds	8(%esp)		// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	FLT_NARROW_EVAL
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypotf)
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 07/12] math: Remove powerpc e_hypot
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 06/12] i386: Move hypot implementation to C Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:42   ` Paul A. Clarke
  2021-12-07 19:03 ` [PATCH v5 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The generic implementation is shows only slight worse performance:

POWER10    reciprocal-throughput    latency
master                   8.28478    13.7253
new hypot                7.21945    13.1933

POWER9     reciprocal-throughput    latency
master                   13.4024    14.0967
new hypot                14.8479    15.8061

POWER8     reciprocal-throughput    latency
master                   15.5767    16.8885
new hypot                16.5371    18.4057

One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).

Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
---
 sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
 sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
 .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
 .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
 .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
 9 files changed, 1 insertion(+), 327 deletions(-)
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
deleted file mode 100644
index f96c589bbd..0000000000
--- a/sysdeps/powerpc/fpu/e_hypot.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/* Pythagorean addition using doubles
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math-underflow.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypot(x,y)
- *
- * This a FP only version without any FP->INT conversion.
- * It is similar to default C version, making appropriates
- * overflow and underflows checks as well scaling when it
- * is needed.
- */
-
-double
-__ieee754_hypot (double x, double y)
-{
-  if ((isinf (x) || isinf (y))
-      && !issignaling (x) && !issignaling (y))
-    return INFINITY;
-  if (isnan (x) || isnan (y))
-    return x + y;
-
-  x = fabs (x);
-  y = fabs (y);
-
-  if (y > x)
-    {
-      double t = x;
-      x = y;
-      y = t;
-    }
-  if (y == 0.0)
-    return x;
-
-  /* if y is higher enough, y * 2^60 might overflow. The tests if
-     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
-     appropriate check to avoid the overflow exception generation.  */
-  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
-    return x + y;
-
-  if (x > 0x1p+500)
-    {
-      x *= 0x1p-600;
-      y *= 0x1p-600;
-      return sqrt (x * x + y * y) / 0x1p-600;
-    }
-  if (y < 0x1p-500)
-    {
-      if (y <= 0x0.fffffffffffffp-1022)
-	{
-	  x *= 0x1p+1022;
-	  y *= 0x1p+1022;
-	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
-	  math_check_force_underflow_nonneg (ret);
-	  return ret;
-	}
-      else
-	{
-	  x *= 0x1p+600;
-	  y *= 0x1p+600;
-	  return sqrt (x * x + y * y) / 0x1p+600;
-	}
-    }
-  return sqrt (x * x + y * y);
-}
-#ifndef __ieee754_hypot
-libm_alias_finite (__ieee754_hypot, __hypot)
-#endif
diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
deleted file mode 100644
index fa201dda51..0000000000
--- a/sysdeps/powerpc/fpu/e_hypotf.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/* Pythagorean addition using floats
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypotf(x,y)
-
-   This a FP only version without any FP->INT conversion.
-   It is similar to default C version, making appropriates
-   overflow and underflows checks as using double precision
-   instead of scaling.  */
-
-#ifdef _ARCH_PWR7
-/* POWER7 isinf and isnan optimizations are fast. */
-# define TEST_INF_NAN(x, y)                                      \
-   if ((isinff(x) || isinff(y))					 \
-       && !issignaling (x) && !issignaling (y))			 \
-     return INFINITY;                                            \
-   if (isnanf(x) || isnanf(y))                                   \
-     return x + y;
-# else
-/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
- * costly (especially for POWER6). */
-# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
- do {                                                            \
-   ieee_float_shape_type gf_u1;                                  \
-   ieee_float_shape_type gf_u2;                                  \
-   gf_u1.value = (f1);                                           \
-   gf_u2.value = (f2);                                           \
-   (i1) = gf_u1.word & 0x7fffffff;                               \
-   (i2) = gf_u2.word & 0x7fffffff;                               \
- } while (0)
-
-# define TEST_INF_NAN(x, y)                                      \
- do {                                                            \
-   uint32_t hx, hy;                                              \
-   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
-   if (hy > hx) {                                                \
-     uint32_t ht = hx; hx = hy; hy = ht;                         \
-   }                                                             \
-   if (hx >= 0x7f800000) {                                       \
-     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
-	 && !issignaling (x) && !issignaling (y))		 \
-       return INFINITY;                                          \
-     return x + y;						 \
-   }                                                             \
- } while (0)
-#endif
-
-
-float
-__ieee754_hypotf (float x, float y)
-{
-  TEST_INF_NAN (x, y);
-
-  return sqrt ((double) x * x + (double) y * y);
-}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
index 60f2c95532..1de0f9b350 100644
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
@@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
 			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
 			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
 			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
-			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
-			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
+			s_logbf-power7 s_logbf-ppc32
 
 CFLAGS-s_llrintf-power6.c += -mcpu=power6
 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
@@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
 CFLAGS-s_logbl-power7.c = -mcpu=power7
 CFLAGS-s_logb-power7.c = -mcpu=power7
 CFLAGS-s_logbf-power7.c = -mcpu=power7
-CFLAGS-e_hypot-power7.c = -mcpu=power7
-CFLAGS-e_hypotf-power7.c = -mcpu=power7
 
 # These files quiet sNaNs in a way that is optimized away without
 # -fsignaling-nans.
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
deleted file mode 100644
index 382b4a0b27..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_power7
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
deleted file mode 100644
index abb14d5469..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_ppc32
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
deleted file mode 100644
index a16efa350c..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypot.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypot,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypot_power7
-            : __ieee754_hypot_ppc32);
-
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
deleted file mode 100644
index f8a26ff22f..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee754_hypot POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_power7
-
-#include <sysdeps/powerpc/fpu/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
deleted file mode 100644
index b13f8c9db2..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_ppc32
-
-#include <sysdeps/ieee754/flt-32/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
deleted file mode 100644
index 1e72605db8..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypotf.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypotf,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypotf_power7
-            : __ieee754_hypotf_ppc32);
-
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 08/12] math: Add math-use-builtinds-fmax.h
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 07/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows the architecture to use the builtin instead of generic
implementation.
---
 math/s_fmax_template.c                      | 5 +++++
 sysdeps/generic/math-type-macros-double.h   | 1 +
 sysdeps/generic/math-type-macros-float.h    | 1 +
 sysdeps/generic/math-type-macros-float128.h | 1 +
 sysdeps/generic/math-type-macros-ldouble.h  | 1 +
 sysdeps/generic/math-use-builtins-fmax.h    | 4 ++++
 sysdeps/generic/math-use-builtins.h         | 1 +
 7 files changed, 14 insertions(+)
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h

diff --git a/math/s_fmax_template.c b/math/s_fmax_template.c
index d817406f04..4a88266e71 100644
--- a/math/s_fmax_template.c
+++ b/math/s_fmax_template.c
@@ -17,10 +17,14 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <math.h>
+#include <math-use-builtins.h>
 
 FLOAT
 M_DECL_FUNC (__fmax) (FLOAT x, FLOAT y)
 {
+#if M_USE_BUILTIN (FMAX)
+  return M_SUF (__builtin_fmax) (x, y);
+#else
   if (isgreaterequal (x, y))
     return x;
   else if (isless (x, y))
@@ -29,6 +33,7 @@ M_DECL_FUNC (__fmax) (FLOAT x, FLOAT y)
     return x + y;
   else
     return isnan (y) ? x : y;
+#endif
 }
 
 declare_mgen_alias (__fmax, fmax);
diff --git a/sysdeps/generic/math-type-macros-double.h b/sysdeps/generic/math-type-macros-double.h
index 8d2d8362cc..e29112d814 100644
--- a/sysdeps/generic/math-type-macros-double.h
+++ b/sysdeps/generic/math-type-macros-double.h
@@ -26,6 +26,7 @@
 #define FLOAT double
 #define CFLOAT _Complex double
 #define M_STRTO_NAN __strtod_nan
+#define M_USE_BUILTIN(c) USE_ ##c ##_BUILTIN
 
 #include <libm-alias-double.h>
 #include <math-nan-payload-double.h>
diff --git a/sysdeps/generic/math-type-macros-float.h b/sysdeps/generic/math-type-macros-float.h
index fb84d62909..5ba7615500 100644
--- a/sysdeps/generic/math-type-macros-float.h
+++ b/sysdeps/generic/math-type-macros-float.h
@@ -25,6 +25,7 @@
 #define FLOAT float
 #define CFLOAT _Complex float
 #define M_STRTO_NAN __strtof_nan
+#define M_USE_BUILTIN(c) USE_ ##c ##F_BUILTIN
 
 /* Standard/GNU macro literals do not exist for the float type.  Use
    the double macro constants.  */
diff --git a/sysdeps/generic/math-type-macros-float128.h b/sysdeps/generic/math-type-macros-float128.h
index 5c190606f5..de4de6e555 100644
--- a/sysdeps/generic/math-type-macros-float128.h
+++ b/sysdeps/generic/math-type-macros-float128.h
@@ -24,6 +24,7 @@
 #define M_SUF(c) c ## f128
 #define FLOAT _Float128
 #define M_STRTO_NAN __strtof128_nan
+#define M_USE_BUILTIN(c) USE_ ##c ##F128_BUILTIN
 
 #define CFLOAT __CFLOAT128
 
diff --git a/sysdeps/generic/math-type-macros-ldouble.h b/sysdeps/generic/math-type-macros-ldouble.h
index a2f282500e..c3edfcea73 100644
--- a/sysdeps/generic/math-type-macros-ldouble.h
+++ b/sysdeps/generic/math-type-macros-ldouble.h
@@ -26,6 +26,7 @@
 #define FLOAT long double
 #define CFLOAT _Complex long double
 #define M_STRTO_NAN __strtold_nan
+#define M_USE_BUILTIN(c) USE_ ##c ##L_BUILTIN
 
 #include <libm-alias-ldouble.h>
 #include <math-nan-payload-ldouble.h>
diff --git a/sysdeps/generic/math-use-builtins-fmax.h b/sysdeps/generic/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..8fc4efca6a
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 0
+#define USE_FMAXF_BUILTIN 0
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index 19d2d1cf3c..e07bba242f 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -34,5 +34,6 @@
 #include <math-use-builtins-copysign.h>
 #include <math-use-builtins-sqrt.h>
 #include <math-use-builtins-fma.h>
+#include <math-use-builtins-fmax.h>
 
 #endif /* MATH_USE_BUILTINS_H  */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 09/12] math: Add math-use-builtinds-fmin.h
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (7 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows the architecture to use the builtin instead of generic
implementation.
---
 math/s_fmin_template.c                   | 6 +++++-
 sysdeps/generic/math-use-builtins-fmin.h | 4 ++++
 sysdeps/generic/math-use-builtins.h      | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h

diff --git a/math/s_fmin_template.c b/math/s_fmin_template.c
index 565a836266..d673704289 100644
--- a/math/s_fmin_template.c
+++ b/math/s_fmin_template.c
@@ -17,11 +17,14 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <math.h>
-
+#include <math-use-builtins.h>
 
 FLOAT
 M_DECL_FUNC (__fmin) (FLOAT x, FLOAT y)
 {
+#if M_USE_BUILTIN (FMIN)
+  return M_SUF (__builtin_fmax) (x, y);
+#else
   if (islessequal (x, y))
     return x;
   else if (isgreater (x, y))
@@ -30,5 +33,6 @@ M_DECL_FUNC (__fmin) (FLOAT x, FLOAT y)
     return x + y;
   else
     return isnan (y) ? x : y;
+#endif
 }
 declare_mgen_alias (__fmin, fmin);
diff --git a/sysdeps/generic/math-use-builtins-fmin.h b/sysdeps/generic/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..d2383ce00c
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 0
+#define USE_FMINF_BUILTIN 0
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index e07bba242f..24fba47575 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -35,5 +35,6 @@
 #include <math-use-builtins-sqrt.h>
 #include <math-use-builtins-fma.h>
 #include <math-use-builtins-fmax.h>
+#include <math-use-builtins-fmin.h>
 
 #endif /* MATH_USE_BUILTINS_H  */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 10/12] aarch64: Add math-use-builtins-f{max,min}.h
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (8 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows to remove the arch-specific implementations.
---
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h |  4 +++
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h |  4 +++
 sysdeps/aarch64/fpu/s_fmax.c                 | 28 --------------------
 sysdeps/aarch64/fpu/s_fmaxf.c                | 28 --------------------
 sysdeps/aarch64/fpu/s_fmin.c                 | 28 --------------------
 sysdeps/aarch64/fpu/s_fminf.c                | 28 --------------------
 6 files changed, 8 insertions(+), 112 deletions(-)
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c

diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmax.h b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..3ba31c8f83
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 1
+#define USE_FMAXF_BUILTIN 1
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmin.h b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..7fd6b45fce
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 1
+#define USE_FMINF_BUILTIN 1
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
deleted file mode 100644
index 7e3593fbda..0000000000
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmax (double x, double y)
-{
-  return __builtin_fmax (x, y);
-}
-
-libm_alias_double (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
deleted file mode 100644
index eb4c469ef8..0000000000
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fmaxf (float x, float y)
-{
-  return __builtin_fmaxf (x, y);
-}
-
-libm_alias_float (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
deleted file mode 100644
index efdee1ed39..0000000000
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 1996-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmin (double x, double y)
-{
-  return __builtin_fmin (x, y);
-}
-
-libm_alias_double (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
deleted file mode 100644
index 3665fabb6b..0000000000
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fminf (float x, float y)
-{
-  return __builtin_fminf (x, y);
-}
-
-libm_alias_float (__fmin, fmin)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 11/12] math: Use fmin/fmax on hypot
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (9 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-07 19:03 ` [PATCH v5 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

It optimizes for architectures that provides fast builtins.

Checked on aarch64-linux-gnu.
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 6fedf0d61f..0bdab989e4 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -38,6 +38,7 @@
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
+#include <math-use-builtins.h>
 #include <libm-alias-finite.h>
 #include "math_config.h"
 
@@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y)
   x = fabs (x);
   y = fabs (y);
 
-  double ax = x < y ? y : x;
-  double ay = x < y ? x : y;
+  double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x);
+  double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y);
 
   /* If ax is huge, scale both inputs down.  */
   if (__glibc_unlikely (ax > LARGE_VAL))
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 12/12] math: Remove the error handling wrapper from hypot and hypotf
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (10 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
@ 2021-12-07 19:03 ` Adhemerval Zanella
  2021-12-08  5:51 ` [PATCH v5 00/12] Improve hypot Paul Zimmermann
  2021-12-10 19:31 ` Adhemerval Zanella
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 19:03 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper with
SVID error handling around the new code.  There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
---
 math/Versions                                 |  2 ++
 math/w_hypot.c                                |  8 ++++++
 math/w_hypot_compat.c                         | 13 +++++++---
 math/w_hypotf.c                               |  8 ++++++
 math/w_hypotf_compat.c                        |  6 ++---
 sysdeps/i386/fpu/e_hypot.c                    | 14 ++++++++++-
 sysdeps/ieee754/dbl-64/e_hypot.c              | 25 ++++++++++++++++---
 sysdeps/ieee754/dbl-64/w_hypot.c              |  1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             | 21 ++++++++++++----
 sysdeps/ieee754/flt-32/w_hypotf.c             |  1 +
 sysdeps/mach/hurd/i386/libm.abilist           |  2 ++
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |  2 ++
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |  2 ++
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |  2 ++
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |  2 ++
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |  2 ++
 sysdeps/unix/sysv/linux/i386/libm.abilist     |  2 ++
 .../sysv/linux/m68k/coldfire/libm.abilist     |  2 ++
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |  2 ++
 .../sysv/linux/microblaze/be/libm.abilist     |  2 ++
 .../sysv/linux/microblaze/le/libm.abilist     |  2 ++
 .../unix/sysv/linux/mips/mips32/libm.abilist  |  2 ++
 .../unix/sysv/linux/mips/mips64/libm.abilist  |  2 ++
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |  2 ++
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |  2 ++
 .../powerpc/powerpc32/nofpu/libm.abilist      |  2 ++
 .../linux/powerpc/powerpc64/be/libm.abilist   |  2 ++
 .../linux/powerpc/powerpc64/le/libm.abilist   |  2 ++
 .../unix/sysv/linux/s390/s390-32/libm.abilist |  2 ++
 .../unix/sysv/linux/s390/s390-64/libm.abilist |  2 ++
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |  2 ++
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |  2 ++
 .../sysv/linux/sparc/sparc32/libm.abilist     |  2 ++
 .../sysv/linux/sparc/sparc64/libm.abilist     |  2 ++
 .../unix/sysv/linux/x86_64/64/libm.abilist    |  2 ++
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |  2 ++
 36 files changed, 135 insertions(+), 16 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c

diff --git a/math/Versions b/math/Versions
index a4b5405ddc..fef7a08c3b 100644
--- a/math/Versions
+++ b/math/Versions
@@ -628,5 +628,7 @@ libm {
     fminimum_numf64x; fminimum_numf128;
     fminimum_magf64x; fminimum_magf128;
     fminimum_mag_numf64x; fminimum_mag_numf128;
+    # No SVID compatible error handling.
+    hypotf; hypot;
   }
 }
diff --git a/math/w_hypot.c b/math/w_hypot.c
new file mode 100644
index 0000000000..66f723a896
--- /dev/null
+++ b/math/w_hypot.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
diff --git a/math/w_hypot_compat.c b/math/w_hypot_compat.c
index f07039cc51..ec983a4ab8 100644
--- a/math/w_hypot_compat.c
+++ b/math/w_hypot_compat.c
@@ -20,9 +20,9 @@
 #include <libm-alias-double.h>
 
 
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
 double
-__hypot (double x, double y)
+__hypot_compat (double x, double y)
 {
 	double z = __ieee754_hypot(x,y);
 	if(__builtin_expect(!isfinite(z), 0)
@@ -31,5 +31,12 @@ __hypot (double x, double y)
 
 	return z;
 }
-libm_alias_double (__hypot, hypot)
+compat_symbol (libm, __hypot_compat, hypot, GLIBC_2_0);
+# ifdef NO_LONG_DOUBLE
+weak_alias (__hypot_compat, hypotl)
+# endif
+# ifdef LONG_DOUBLE_COMPAT
+LONG_DOUBLE_COMPAT_CHOOSE_libm_hypotl (
+  compat_symbol (libm, __hypot_compat, hypotl, FIRST_VERSION_libm_hypotl), );
+# endif
 #endif
diff --git a/math/w_hypotf.c b/math/w_hypotf.c
new file mode 100644
index 0000000000..b15a9b06d0
--- /dev/null
+++ b/math/w_hypotf.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypotf, hypotf)
diff --git a/math/w_hypotf_compat.c b/math/w_hypotf_compat.c
index 05898d3420..2bde4553b0 100644
--- a/math/w_hypotf_compat.c
+++ b/math/w_hypotf_compat.c
@@ -22,9 +22,9 @@
 #include <libm-alias-float.h>
 
 
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
 float
-__hypotf(float x, float y)
+__hypotf_compat (float x, float y)
 {
 	float z = __ieee754_hypotf(x,y);
 	if(__builtin_expect(!isfinite(z), 0)
@@ -34,5 +34,5 @@ __hypotf(float x, float y)
 
 	return z;
 }
-libm_alias_float (__hypot, hypot)
+compat_symbol (libm, __hypotf_compat, hypotf, GLIBC_2_0);
 #endif
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
index cc13714ccb..da0d3da66d 100644
--- a/sysdeps/i386/fpu/e_hypot.c
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -20,14 +20,17 @@
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
+#include <math-svid-compat.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-double.h>
 #include <math_config.h>
+#include <errno.h>
 
 /* The i386 allows to use the default excess of precision to optimize the
    hypot implementation, since internal multiplication and sqrt is carried
    with 80-bit FP type.  */
 double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
 {
   if (!isfinite (x) || !isfinite (y))
     {
@@ -41,6 +44,15 @@ __ieee754_hypot (double x, double y)
   long double ly = y;
   double r = math_narrow_eval ((double) sqrtl (lx * lx + ly * ly));
   math_check_force_underflow_nonneg (r);
+  if (isinf (r))
+    __set_errno (ERANGE);
   return r;
 }
+strong_alias (__hypot, __ieee754_hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
 libm_alias_finite (__ieee754_hypot, __hypot)
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
+#endif
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 0bdab989e4..a822ec2065 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -34,12 +34,15 @@
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
+#include <errno.h>
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
 #include <math-use-builtins.h>
+#include <math-svid-compat.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-double.h>
 #include "math_config.h"
 
 #define SCALE     0x1p-600
@@ -47,6 +50,14 @@
 #define TINY_VAL  0x1p-459
 #define EPS       0x1p-54
 
+static inline double
+handle_errno (double r)
+{
+  if (isinf (r))
+    __set_errno (ERANGE);
+  return r;
+}
+
 /* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
    and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
 static inline double
@@ -83,7 +94,7 @@ kernel (double ax, double ay)
 }
 
 double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
 {
   if (!isfinite(x) || !isfinite(y))
     {
@@ -103,9 +114,10 @@ __ieee754_hypot (double x, double y)
   if (__glibc_unlikely (ax > LARGE_VAL))
     {
       if (__glibc_unlikely (ay <= ax * EPS))
-	return math_narrow_eval (ax + ay);
+	return handle_errno (math_narrow_eval (ax + ay));
 
-      return math_narrow_eval (kernel (ax * SCALE, ay * SCALE) / SCALE);
+      return handle_errno (math_narrow_eval (kernel (ax * SCALE, ay * SCALE)
+					     / SCALE));
     }
 
   /* If ay is tiny, scale both inputs up.  */
@@ -125,6 +137,11 @@ __ieee754_hypot (double x, double y)
 
   return kernel (ax, ay);
 }
-#ifndef __ieee754_hypot
+strong_alias (__hypot, __ieee754_hypot)
 libm_alias_finite (__ieee754_hypot, __hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
 #endif
diff --git a/sysdeps/ieee754/dbl-64/w_hypot.c b/sysdeps/ieee754/dbl-64/w_hypot.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/w_hypot.c
@@ -0,0 +1 @@
+/* Not needed.  */
diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index 436e3c2da8..3c53cceddb 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -16,14 +16,17 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <errno.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-float.h>
+#include <math-svid-compat.h>
 #include <math.h>
 #include "math_config.h"
 #include <math-narrow-eval.h>
 #include <math_private.h>
 
 float
-__ieee754_hypotf (float x, float y)
+__hypotf (float x, float y)
 {
    if (!isfinite(x) || !isfinite(y))
      {
@@ -33,9 +36,17 @@ __ieee754_hypotf (float x, float y)
        return x + y;
      }
 
-  return math_narrow_eval ((float) sqrt ((double) x * (double) x
-					 + (double) y * (double) y));
+  float r = math_narrow_eval ((float) sqrt ((double) x * (double) x
+					    + (double) y * (double) y));
+  if (!isfinite (r))
+    __set_errno (ERANGE);
+  return r;
 }
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
+strong_alias (__hypotf, __ieee754_hypotf)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
+#else
+libm_alias_float (__hypot, hypot)
 #endif
+libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/w_hypotf.c b/sysdeps/ieee754/flt-32/w_hypotf.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/w_hypotf.c
@@ -0,0 +1 @@
+/* Not needed.  */
diff --git a/sysdeps/mach/hurd/i386/libm.abilist b/sysdeps/mach/hurd/i386/libm.abilist
index abf91bd142..8f40ddb150 100644
--- a/sysdeps/mach/hurd/i386/libm.abilist
+++ b/sysdeps/mach/hurd/i386/libm.abilist
@@ -1179,3 +1179,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libm.abilist b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
index 1cef7d3db7..c2e3c6453e 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/alpha/libm.abilist b/sysdeps/unix/sysv/linux/alpha/libm.abilist
index 59d51021fa..4f85b6180f 100644
--- a/sysdeps/unix/sysv/linux/alpha/libm.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libm.abilist
@@ -1201,6 +1201,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libm.abilist b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libm.abilist b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/hppa/libm.abilist b/sysdeps/unix/sysv/linux/hppa/libm.abilist
index 35d316a720..b5dd4e851f 100644
--- a/sysdeps/unix/sysv/linux/hppa/libm.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/i386/libm.abilist b/sysdeps/unix/sysv/linux/i386/libm.abilist
index ef99b3e104..5d89aaa08e 100644
--- a/sysdeps/unix/sysv/linux/i386/libm.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libm.abilist
@@ -1186,3 +1186,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
index 58316c96ae..e7cd739a54 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
@@ -882,3 +882,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
index 4113d3170d..08b902118d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
index 18fe9cc57a..09bb3bd75b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/nios2/libm.abilist b/sysdeps/unix/sysv/linux/nios2/libm.abilist
index 3a2b34ecc2..11abbb5668 100644
--- a/sysdeps/unix/sysv/linux/nios2/libm.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
index 740cc8f55b..1688809c36 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
@@ -888,6 +888,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
index 16fb30566b..e880cebd78 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
@@ -887,6 +887,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
index ad4b98c09a..033385dfc1 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
@@ -881,6 +881,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
index 955765051c..7923d428bc 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
@@ -1316,3 +1316,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
index 1f5bd7754d..9a84163089 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
index 0b18481f39..174bde4fa0 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libm.abilist b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libm.abilist b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
index 727d1ce707..217e6eff7f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
@@ -1152,6 +1152,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
index 0f57574523..6b53b0c59f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
index 574789e061..dbefbc3a1a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
index 1dc89b304d..8001d0f219 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 07/12] math: Remove powerpc e_hypot
  2021-12-07 19:03 ` [PATCH v5 07/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-07 19:42   ` Paul A. Clarke
  0 siblings, 0 replies; 16+ messages in thread
From: Paul A. Clarke @ 2021-12-07 19:42 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Paul Zimmermann, Wilco Dijkstra, tuliom

On Tue, Dec 07, 2021 at 04:03:48PM -0300, Adhemerval Zanella via Libc-alpha wrote:
> The generic implementation is shows only slight worse performance:
> 
> POWER10    reciprocal-throughput    latency
> master                   8.28478    13.7253
> new hypot                7.21945    13.1933
> 
> POWER9     reciprocal-throughput    latency
> master                   13.4024    14.0967
> new hypot                14.8479    15.8061
> 
> POWER8     reciprocal-throughput    latency
> master                   15.5767    16.8885
> new hypot                16.5371    18.4057
> 
> One way to improve might to make gcc generate xsmaxdp/xsmindp for
> fmax/fmin (it onl does for -ffast-math, clang does for default
> options).

I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103605.

> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> (power9).

Presuming the numbers above number hold for this "v5", I have no objections
to merging these changes.

For v4:
Tested-by: Paul A. Clarke <pc@us.ibm.com>

PC

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 00/12] Improve hypot
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (11 preceding siblings ...)
  2021-12-07 19:03 ` [PATCH v5 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
@ 2021-12-08  5:51 ` Paul Zimmermann
  2021-12-10 19:31 ` Adhemerval Zanella
  13 siblings, 0 replies; 16+ messages in thread
From: Paul Zimmermann @ 2021-12-08  5:51 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Wilco.Dijkstra

       Dear Adhemerval,

some of the typos I reported in the v4 are not fixed in v5:

https://sourceware.org/pipermail/libc-alpha/2021-December/133588.html

Best regards,
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 00/12] Improve hypot
  2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
                   ` (12 preceding siblings ...)
  2021-12-08  5:51 ` [PATCH v5 00/12] Improve hypot Paul Zimmermann
@ 2021-12-10 19:31 ` Adhemerval Zanella
  13 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-12-10 19:31 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

If no one oposes it I will commit this, this is moslty a fixup from the
v4 already reviewed by Wilco.

On 07/12/2021 16:03, Adhemerval Zanella wrote:
> This patchset add a different algorithm along with Wilco [1] performance
> improvements.  The default implementation is based on the 'An Improved 
> Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and
> improvements (although the dbl-64 one uses a slight different approach
> when fast FMA is avaliable).  This method is also used by Julia language
> runtime [2].
> 
> The motivation for this change are:
> 
>   1. Use a newer algotihm that favor FP operations over interger ones
>      (which tends to show better performance specially on newer
>      processor with multiple FP units).  It also allows consolidate
>      the multiple implementation (for instance, the powerpc one).
> 
>   2. The new algorithm is more precise without minimum performance
>      difference.
> 
> The current hypot() implementation seems already to be bounded to a
> maximum of 1 ulp of error, however the new proposed algorithm shows an
> slight precision improvement by showing more correctly rounded results.
> 
> With a random 1e9 inputs for different float format I see:
> 
>   - An improvement from 3427362 to 18457 results with 1 ulp of
>     error for Binary64.
>   - An improvement from 233442 to 1274 results with 1 ulp of
>     error for Binary96 (x86_64).
>   - An improvement from 453045 to 1294 results with 1 ulp of
>     error for Binary96 (x86_64).
> 
> Also for the maximal known error master shows (in ulps, with
> corresponding inputs), determined with [3]:
> 
>   binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77
>   binary64  0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022
>   binary96  0.981
> 0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l
>   binary128 0.985
> -0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384
> 
> With the new implementation:
> 
>   binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same]
>   binary64  0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022
>   binary96  0.584
> -0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l
>   binary128 0.749
> 0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192
> 
> If FMA is uses the binary64 shows a slight worse precision:
> 
> I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not
> required since it calls the dbl-64 one.  I have not adapated ldbl-128ibm
> since the format has a lot of caveats and IBM aims to move to ldbl-128.
> 
> [1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html
> [1] https://arxiv.org/pdf/1904.09481.pdf
> [2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c
> [3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c
> 
> ---
> Changes from v5:
>  * Add missing cast on generic hypotf.
>  * Add missing math_narrow_eval on generic hypot.
>  * Fixed return value on generic ldbl-96 hypotl.
>  * Fixed return value on generic ldbl-128 hypotl.
>  * Add POWER10 performance values.
>  * Add missing cast on i686 hypotf.
>  * Rewrite math-use-builtinds-fmax.h and math-use-builtinds-fmin.h
>    based on Joseph's suggestions.
> ---
> 
> Adhemerval Zanella (9):
>   math: Simplify hypotf implementation
>   math: Use an improved algorithm for hypotl (ldbl-96)
>   math: Use an improved algorithm for hypotl (ldbl-128)
>   i386: Move hypot implementation to C
>   math: Remove powerpc e_hypot
>   math: Add math-use-builtinds-fmax.h
>   math: Add math-use-builtinds-fmin.h
>   aarch64: Add math-use-builtins-f{max,min}.h
>   math: Remove the error handling wrapper from hypot and hypotf
> 
> Wilco Dijkstra (3):
>   math: Use an improved algorithm for hypot (dbl-64)
>   math: Improve hypot performance with FMA
>   math: Use fmin/fmax on hypot
> 
>  math/Versions                                 |   2 +
>  math/s_fmax_template.c                        |   5 +
>  math/s_fmin_template.c                        |   6 +-
>  math/w_hypot.c                                |   8 +
>  math/w_hypot_compat.c                         |  13 +-
>  math/w_hypotf.c                               |   8 +
>  math/w_hypotf_compat.c                        |   6 +-
>  sysdeps/aarch64/fpu/math-use-builtins-fmax.h  |   4 +
>  sysdeps/aarch64/fpu/math-use-builtins-fmin.h  |   4 +
>  sysdeps/aarch64/fpu/s_fmax.c                  |  28 --
>  sysdeps/aarch64/fpu/s_fmaxf.c                 |  28 --
>  sysdeps/aarch64/fpu/s_fmin.c                  |  28 --
>  sysdeps/aarch64/fpu/s_fminf.c                 |  28 --
>  sysdeps/generic/math-type-macros-double.h     |   1 +
>  sysdeps/generic/math-type-macros-float.h      |   1 +
>  sysdeps/generic/math-type-macros-float128.h   |   1 +
>  sysdeps/generic/math-type-macros-ldouble.h    |   1 +
>  sysdeps/generic/math-use-builtins-fmax.h      |   4 +
>  sysdeps/generic/math-use-builtins-fmin.h      |   4 +
>  sysdeps/generic/math-use-builtins.h           |   2 +
>  sysdeps/i386/fpu/e_hypot.S                    |  75 -----
>  sysdeps/i386/fpu/e_hypot.c                    |  58 ++++
>  sysdeps/i386/fpu/e_hypotf.S                   |  64 -----
>  sysdeps/ieee754/dbl-64/e_hypot.c              | 270 ++++++++----------
>  sysdeps/ieee754/dbl-64/w_hypot.c              |   1 +
>  sysdeps/ieee754/flt-32/e_hypotf.c             |  79 ++---
>  sysdeps/ieee754/flt-32/math_config.h          |   9 +
>  sysdeps/ieee754/flt-32/w_hypotf.c             |   1 +
>  sysdeps/ieee754/ldbl-128/e_hypotl.c           | 226 +++++++--------
>  sysdeps/ieee754/ldbl-96/e_hypotl.c            | 231 +++++++--------
>  sysdeps/mach/hurd/i386/libm.abilist           |   2 +
>  sysdeps/powerpc/fpu/e_hypot.c                 |  87 ------
>  sysdeps/powerpc/fpu/e_hypotf.c                |  78 -----
>  .../powerpc32/power4/fpu/multiarch/Makefile   |   5 +-
>  .../power4/fpu/multiarch/e_hypot-power7.c     |  23 --
>  .../power4/fpu/multiarch/e_hypot-ppc32.c      |  23 --
>  .../powerpc32/power4/fpu/multiarch/e_hypot.c  |  33 ---
>  .../power4/fpu/multiarch/e_hypotf-power7.c    |  23 --
>  .../power4/fpu/multiarch/e_hypotf-ppc32.c     |  23 --
>  .../powerpc32/power4/fpu/multiarch/e_hypotf.c |  33 ---
>  sysdeps/unix/sysv/linux/aarch64/libm.abilist  |   2 +
>  sysdeps/unix/sysv/linux/alpha/libm.abilist    |   2 +
>  sysdeps/unix/sysv/linux/arm/be/libm.abilist   |   2 +
>  sysdeps/unix/sysv/linux/arm/le/libm.abilist   |   2 +
>  sysdeps/unix/sysv/linux/hppa/libm.abilist     |   2 +
>  sysdeps/unix/sysv/linux/i386/libm.abilist     |   2 +
>  .../sysv/linux/m68k/coldfire/libm.abilist     |   2 +
>  .../unix/sysv/linux/m68k/m680x0/libm.abilist  |   2 +
>  .../sysv/linux/microblaze/be/libm.abilist     |   2 +
>  .../sysv/linux/microblaze/le/libm.abilist     |   2 +
>  .../unix/sysv/linux/mips/mips32/libm.abilist  |   2 +
>  .../unix/sysv/linux/mips/mips64/libm.abilist  |   2 +
>  sysdeps/unix/sysv/linux/nios2/libm.abilist    |   2 +
>  .../linux/powerpc/powerpc32/fpu/libm.abilist  |   2 +
>  .../powerpc/powerpc32/nofpu/libm.abilist      |   2 +
>  .../linux/powerpc/powerpc64/be/libm.abilist   |   2 +
>  .../linux/powerpc/powerpc64/le/libm.abilist   |   2 +
>  .../unix/sysv/linux/s390/s390-32/libm.abilist |   2 +
>  .../unix/sysv/linux/s390/s390-64/libm.abilist |   2 +
>  sysdeps/unix/sysv/linux/sh/be/libm.abilist    |   2 +
>  sysdeps/unix/sysv/linux/sh/le/libm.abilist    |   2 +
>  .../sysv/linux/sparc/sparc32/libm.abilist     |   2 +
>  .../sysv/linux/sparc/sparc64/libm.abilist     |   2 +
>  .../unix/sysv/linux/x86_64/64/libm.abilist    |   2 +
>  .../unix/sysv/linux/x86_64/x32/libm.abilist   |   2 +
>  65 files changed, 547 insertions(+), 1029 deletions(-)
>  create mode 100644 math/w_hypot.c
>  create mode 100644 math/w_hypotf.c
>  create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
>  create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
>  delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
>  delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
>  delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
>  delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
>  create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
>  create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
>  delete mode 100644 sysdeps/i386/fpu/e_hypot.S
>  create mode 100644 sysdeps/i386/fpu/e_hypot.c
>  delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
>  create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
>  create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-10 19:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 19:03 [PATCH v5 00/12] Improve hypot Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 01/12] math: Simplify hypotf implementation Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 06/12] i386: Move hypot implementation to C Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 07/12] math: Remove powerpc e_hypot Adhemerval Zanella
2021-12-07 19:42   ` Paul A. Clarke
2021-12-07 19:03 ` [PATCH v5 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
2021-12-08  5:51 ` [PATCH v5 00/12] Improve hypot Paul Zimmermann
2021-12-10 19:31 ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).