[PATCH v4 00/12] Improve hypot

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v4 00/12] Improve hypot
@ 2021-12-03  0:00 Adhemerval Zanella
  2021-12-03  0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This patchset add a different algorithm along with Wilco [1] performance
improvements.  The default implementation is based on the 'An Improved 
Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and
improvements (although the dbl-64 one uses a slight different approach
when fast FMA is avaliable).  This method is also used by Julia language
runtime [2].

The motivation for this change are:

  1. Use a newer algotihm that favor FP operations over interger ones
     (which tends to show better performance specially on newer
     processor with multiple FP units).  It also allows consolidate
     the multiple implementation (for instance, the powerpc one).

  2. The new algorithm is more precise without minimum performance
     difference.

The current hypot() implementation seems already to be bounded to a
maximum of 1 ulp of error, however the new proposed algorithm shows an
slight precision improvement by showing more correctly rounded results.

With a random 1e9 inputs for different float format I see:

  - An improvement from 3427362 to 18457 results with 1 ulp of
    error for Binary64.
  - An improvement from 233442 to 1274 results with 1 ulp of
    error for Binary96 (x86_64).
  - An improvement from 453045 to 1294 results with 1 ulp of
    error for Binary96 (x86_64).

Also for the maximal known error master shows (in ulps, with
corresponding inputs), determined with [3]:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77
  binary64  0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022
  binary96  0.981 0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l
  binary128 0.985 -0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384

With the new implementation:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same]
  binary64  0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022
  binary96  0.584 -0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l
  binary128 0.749 0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192

If FMA is uses the binary64 shows a slight worse precision:

I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not
required since it calls the dbl-64 one.  I have not adapated ldbl-128ibm
since the format has a lot of caveats and IBM aims to move to ldbl-128.

[1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html
[1] https://arxiv.org/pdf/1904.09481.pdf
[2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c
[3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c

Adhemerval Zanella (9):
  math: Simplify hypotf implementation
  math: Use an improved algorithm for hypotl (ldbl-96)
  math: Use an improved algorithm for hypotl (ldbl-128)
  math: Remove powerpc e_hypot
  i386: Move hypot implementation to C
  math: Add math-use-builtinds-fmax.h
  math: Add math-use-builtinds-fmin.h
  aarch64: Add math-use-builtins-f{max,min}.h
  math: Remove the error handling wrapper from hypot and hypotf

Wilco Dijkstra (3):
  math: Use an improved algorithm for hypot (dbl-64)
  math: Improve hypot performance with FMA
  math: Use fmin/fmax on hypot

 math/Versions                                 |   2 +
 math/s_fmax_template.c                        |  27 ++
 math/s_fmin_template.c                        |  27 ++
 math/w_hypot.c                                |   8 +
 math/w_hypot_compat.c                         |  13 +-
 math/w_hypotf.c                               |   8 +
 math/w_hypotf_compat.c                        |   6 +-
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h  |   4 +
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h  |   4 +
 sysdeps/aarch64/fpu/s_fmax.c                  |  28 --
 sysdeps/aarch64/fpu/s_fmaxf.c                 |  28 --
 sysdeps/aarch64/fpu/s_fmin.c                  |  28 --
 sysdeps/aarch64/fpu/s_fminf.c                 |  28 --
 sysdeps/generic/math-use-builtins-fmax.h      |   4 +
 sysdeps/generic/math-use-builtins-fmin.h      |   4 +
 sysdeps/generic/math-use-builtins.h           |   2 +
 sysdeps/i386/fpu/e_hypot.S                    |  75 -----
 sysdeps/i386/fpu/e_hypot.c                    |  58 ++++
 sysdeps/i386/fpu/e_hypotf.S                   |  64 -----
 sysdeps/ieee754/dbl-64/e_hypot.c              | 269 ++++++++----------
 sysdeps/ieee754/dbl-64/w_hypot.c              |   1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             |  79 ++---
 sysdeps/ieee754/flt-32/math_config.h          |   9 +
 sysdeps/ieee754/flt-32/w_hypotf.c             |   1 +
 sysdeps/ieee754/ldbl-128/e_hypotl.c           | 226 +++++++--------
 sysdeps/ieee754/ldbl-96/e_hypotl.c            | 231 +++++++--------
 sysdeps/mach/hurd/i386/libm.abilist           |   2 +
 sysdeps/powerpc/fpu/e_hypot.c                 |  87 ------
 sysdeps/powerpc/fpu/e_hypotf.c                |  78 -----
 .../powerpc32/power4/fpu/multiarch/Makefile   |   5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     |  23 --
 .../power4/fpu/multiarch/e_hypot-ppc32.c      |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  |  33 ---
 .../power4/fpu/multiarch/e_hypotf-power7.c    |  23 --
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c |  33 ---
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libm.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libm.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |   2 +
 .../sysv/linux/microblaze/be/libm.abilist     |   2 +
 .../sysv/linux/microblaze/le/libm.abilist     |   2 +
 .../unix/sysv/linux/mips/mips32/libm.abilist  |   2 +
 .../unix/sysv/linux/mips/mips64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |   2 +
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libm.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libm.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libm.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libm.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libm.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libm.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libm.abilist     |   2 +
 .../unix/sysv/linux/x86_64/64/libm.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |   2 +
 61 files changed, 586 insertions(+), 1028 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 01/12] math: Simplify hypotf implementation
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-03 13:23   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float.  With gcc it
results in 2 FP comparisons.

The file Copyright is also changed to use  GPL, the implementation was
completely changed by 7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.

Checked on x86_64-linux-gnu.
---
 sysdeps/ieee754/flt-32/e_hypotf.c    | 64 +++++++++++++---------------
 sysdeps/ieee754/flt-32/math_config.h |  9 ++++
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index e770947dc1..1d082fe36c 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -1,46 +1,40 @@
-/* e_hypotf.c -- float version of e_hypot.c.
- */
+/* Euclidean distance function.  Float/Binary32 version.
+   Copyright (C) 2012-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
 
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
 
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <libm-alias-finite.h>
 #include <math.h>
+#include "math_config.h"
+#include <math-narrow-eval.h>
 #include <math_private.h>
-#include <libm-alias-finite.h>
 
 float
-__ieee754_hypotf(float x, float y)
+__ieee754_hypotf (float x, float y)
 {
-	double d_x, d_y;
-	int32_t ha, hb;
-
-	GET_FLOAT_WORD(ha,x);
-	ha &= 0x7fffffff;
-	GET_FLOAT_WORD(hb,y);
-	hb &= 0x7fffffff;
-	if (ha == 0x7f800000 && !issignaling (y))
-	  return fabsf(x);
-	else if (hb == 0x7f800000 && !issignaling (x))
-	  return fabsf(y);
-	else if (ha > 0x7f800000 || hb > 0x7f800000)
-	  return fabsf(x) * fabsf(y);
-	else if (ha == 0)
-	  return fabsf(y);
-	else if (hb == 0)
-	  return fabsf(x);
-
-	d_x = (double) x;
-	d_y = (double) y;
+   if (!isfinite(x) || !isfinite(y))
+     {
+       if ((isinf (x) || isinf (y))
+	   && !issignalingf_inline (x) && !issignalingf_inline (y))
+	 return INFINITY;
+       return x + y;
+     }
 
-	return (float) sqrt(d_x * d_x + d_y * d_y);
+  return math_narrow_eval (sqrt ((double) x * (double) x
+				 + (double) y * (double) y));
 }
 #ifndef __ieee754_hypotf
 libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index 513454a297..daa8a82f99 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -101,6 +101,15 @@ asdouble (uint64_t i)
   return u.f;
 }
 
+static inline int
+issignalingf_inline (float x)
+{
+  uint32_t ix = asuint (x);
+  if (HIGH_ORDER_BIT_IS_SET_FOR_SNAN)
+    return (ix & 0x7fc00000) == 0x7fc00000;
+  return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL;
+}
+
 #define NOINLINE __attribute__ ((noinline))
 
 attribute_hidden float __math_oflowf (uint32_t);
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
  2021-12-03  0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-03 13:41   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for denormal results.

The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).

The performance result are also only slight worse than current
implementation.  On x86_64 (Ryzen 5900X) with gcc 12:

Before:

  "hypot": {
   "workload-random": {
    "duration": 3.73319e+09,
    "iterations": 1.12e+08,
    "reciprocal-throughput": 22.8737,
    "latency": 43.7904,
    "max-throughput": 4.37184e+07,
    "min-throughput": 2.28361e+07
   }
  }

After:

  "hypot": {
   "workload-random": {
    "duration": 3.7597e+09,
    "iterations": 9.8e+07,
    "reciprocal-throughput": 23.7547,
    "latency": 52.9739,
    "max-throughput": 4.2097e+07,
    "min-throughput": 1.88772e+07
   }
  }

Co-Authored-By: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 235 ++++++++++++-------------------
 1 file changed, 92 insertions(+), 143 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 9ec4c1ced0..274b14b57e 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -1,164 +1,113 @@
-/* @(#)e_hypot.c 5.1 93/09/24 */
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypot(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrt(2)/2 ulp, than
- *	sqrt(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrt(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypot(x,y) is INF if x or y is +INF or -INF; else
- *	hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypot(x,y) returns sqrt(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Double/Binary64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* The implementation uses a correction based on 'An Improved Algorithm for
+   hypot(a,b)' by Carlos F. Borges [1] usingthe MyHypot3 with the following
+   changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   The expected ULP is ~0.792.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
+#include <math-narrow-eval.h>
 #include <libm-alias-finite.h>
+#include "math_config.h"
 
-double
-__ieee754_hypot (double x, double y)
+#define SCALE     0x1p-600
+#define LARGE_VAL 0x1p+511
+#define TINY_VAL  0x1p-459
+#define EPS       0x1p-54
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline double
+kernel (double ax, double ay)
 {
-  double a, b, t1, t2, y1, y2, w;
-  int32_t j, k, ha, hb;
-
-  GET_HIGH_WORD (ha, x);
-  ha &= 0x7fffffff;
-  GET_HIGH_WORD (hb, y);
-  hb &= 0x7fffffff;
-  if (hb > ha)
+  double t1, t2;
+  double h = sqrt (ax * ax + ay * ay);
+  if (h <= 2.0 * ay)
     {
-      a = y; b = x; j = ha; ha = hb; hb = j;
+      double delta = h - ay;
+      t1 = ax * (2.0 * delta - ax);
+      t2 = (delta - 2.0 * (ax - ay)) * delta;
     }
   else
     {
-      a = x; b = y;
-    }
-  SET_HIGH_WORD (a, ha);        /* a <- |a| */
-  SET_HIGH_WORD (b, hb);        /* b <- |b| */
-  if ((ha - hb) > 0x3c00000)
-    {
-      return a + b;
-    }                                       /* x/y > 2**60 */
-  k = 0;
-  if (__glibc_unlikely (ha > 0x5f300000))                  /* a>2**500 */
-    {
-      if (ha >= 0x7ff00000)             /* Inf or NaN */
-	{
-	  uint32_t low;
-	  w = a + b;                    /* for sNaN */
-	  if (issignaling (a) || issignaling (b))
-	    return w;
-	  GET_LOW_WORD (low, a);
-	  if (((ha & 0xfffff) | low) == 0)
-	    w = a;
-	  GET_LOW_WORD (low, b);
-	  if (((hb ^ 0x7ff00000) | low) == 0)
-	    w = b;
-	  return w;
-	}
-      /* scale a and b by 2**-600 */
-      ha -= 0x25800000; hb -= 0x25800000;  k += 600;
-      SET_HIGH_WORD (a, ha);
-      SET_HIGH_WORD (b, hb);
-    }
-  if (__builtin_expect (hb < 0x23d00000, 0))            /* b < 2**-450 */
-    {
-      if (hb <= 0x000fffff)             /* subnormal b or 0 */
-	{
-	  uint32_t low;
-	  GET_LOW_WORD (low, b);
-	  if ((hb | low) == 0)
-	    return a;
-	  t1 = 0;
-	  SET_HIGH_WORD (t1, 0x7fd00000);       /* t1=2^1022 */
-	  b *= t1;
-	  a *= t1;
-	  k -= 1022;
-	  GET_HIGH_WORD (ha, a);
-	  GET_HIGH_WORD (hb, b);
-	  if (hb > ha)
-	    {
-	      t1 = a;
-	      a = b;
-	      b = t1;
-	      j = ha;
-	      ha = hb;
-	      hb = j;
-	    }
-	}
-      else                      /* scale a and b by 2^600 */
-	{
-	  ha += 0x25800000;             /* a *= 2^600 */
-	  hb += 0x25800000;             /* b *= 2^600 */
-	  k -= 600;
-	  SET_HIGH_WORD (a, ha);
-	  SET_HIGH_WORD (b, hb);
-	}
+      double delta = h - ax;
+      t1 = 2.0 * delta * (ax - 2.0 * ay);
+      t2 = (4.0 * delta - ay) * ay + delta * delta;
     }
-  /* medium size a and b */
-  w = a - b;
-  if (w > b)
+
+  h -= (t1 + t2) / (2.0 * h);
+  return h;
+}
+
+double
+__ieee754_hypot (double x, double y)
+{
+  if (!isfinite(x) || !isfinite(y))
     {
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha);
-      t2 = a - t1;
-      w = sqrt (t1 * t1 - (b * (-b) - t2 * (a + t1)));
+      if ((isinf (x) || isinf (y))
+	  && !issignaling_inline (x) && !issignaling_inline (y))
+	return INFINITY;
+      return x + y;
     }
-  else
+
+  x = fabs (x);
+  y = fabs (y);
+
+  double ax = x < y ? y : x;
+  double ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
     {
-      a = a + a;
-      y1 = 0;
-      SET_HIGH_WORD (y1, hb);
-      y2 = b - y1;
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha + 0x00100000);
-      t2 = a - t1;
-      w = sqrt (t1 * y1 - (w * (-w) - (t1 * y2 + t2 * b)));
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return ax + ay;
+ 
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
     }
-  if (k != 0)
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
     {
-      uint32_t high;
-      t1 = 1.0;
-      GET_HIGH_WORD (high, t1);
-      SET_HIGH_WORD (t1, high + (k << 20));
-      w *= t1;
-      math_check_force_underflow_nonneg (w);
-      return w;
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return math_narrow_eval (ax + ay);
+
+      ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE);
+      math_check_force_underflow_nonneg (ax);
+      return ax;
     }
-  else
-    return w;
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 #ifndef __ieee754_hypot
 libm_alias_finite (__ieee754_hypot, __hypot)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 03/12] math: Improve hypot performance with FMA
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
  2021-12-03  0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
  2021-12-03  0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-03 13:44   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency.  The non-fma version has 30% higher throughput and 10%
higher latency.

Max ULP error is 0.949 with fma and 0.792 without fma.

Passes GLIBC testsuite.
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 274b14b57e..f53061badc 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
      rounding mode.
    - Handle required underflow exception for subnormal results.
 
-   The expected ULP is ~0.792.
+   The expected ULP is ~0.792 or ~0.948 if FMA is used.  For FMA, the
+   correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+   if x^2 + y^2 is computed with less than 0.707 ULP error.  If |x| >= |2y|,
+   fma (x, x, y^2) has ~0.625 ULP.  If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+   has ~0.625 ULP.
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
@@ -48,6 +52,16 @@ static inline double
 kernel (double ax, double ay)
 {
   double t1, t2;
+#ifdef __FP_FAST_FMA
+  t1 = ay + ay;
+  t2 = ax - ay;
+
+  if (t1 >= ax)
+    return sqrt (fma (t1, ax, t2 * t2));
+  else
+    return sqrt (fma (ax, ax, ay * ay));
+
+#else
   double h = sqrt (ax * ax + ay * ay);
   if (h <= 2.0 * ay)
     {
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
 
   h -= (t1 + t2) / (2.0 * h);
   return h;
+#endif
 }
 
 double
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-06 12:00   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for subnormal results.

The main advantage of the new algorithm is its precision.  With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 ++++++++++++-----------------
 1 file changed, 98 insertions(+), 133 deletions(-)

diff --git a/sysdeps/ieee754/ldbl-96/e_hypotl.c b/sysdeps/ieee754/ldbl-96/e_hypotl.c
index 44e72353c0..b32fad757f 100644
--- a/sysdeps/ieee754/ldbl-96/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-96/e_hypotl.c
@@ -1,142 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrt(2)/2 ulp, than
- *	sqrt(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrt(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypot(x,y) is INF if x or y is +INF or -INF; else
- *	hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypot(x,y) returns sqrt(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Long Double/Binary96 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+   Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <libm-alias-finite.h>
 
-long double __ieee754_hypotl(long double x, long double y)
+#define SCALE      0x8p-8257L
+#define LARGE_VAL  0xb.504f333f9de6484p+8188L
+#define TINY_VAL   0x8p-8194L
+#define EPS        0x8p-68L
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline long double
+kernel (long double ax, long double ay)
 {
-	long double a,b,t1,t2,y1,y2,w;
-	uint32_t j,k,ea,eb;
-
-	GET_LDOUBLE_EXP(ea,x);
-	ea &= 0x7fff;
-	GET_LDOUBLE_EXP(eb,y);
-	eb &= 0x7fff;
-	if(eb > ea) {a=y;b=x;j=ea; ea=eb;eb=j;} else {a=x;b=y;}
-	SET_LDOUBLE_EXP(a,ea);	/* a <- |a| */
-	SET_LDOUBLE_EXP(b,eb);	/* b <- |b| */
-	if((ea-eb)>0x46) {return a+b;} /* x/y > 2**70 */
-	k=0;
-	if(__builtin_expect(ea > 0x5f3f,0)) {	/* a>2**8000 */
-	   if(ea == 0x7fff) {	/* Inf or NaN */
-	       uint32_t exp __attribute__ ((unused));
-	       uint32_t high,low;
-	       w = a+b;			/* for sNaN */
-	       if (issignaling (a) || issignaling (b))
-		 return w;
-	       GET_LDOUBLE_WORDS(exp,high,low,a);
-	       if(((high&0x7fffffff)|low)==0) w = a;
-	       GET_LDOUBLE_WORDS(exp,high,low,b);
-	       if(((eb^0x7fff)|(high&0x7fffffff)|low)==0) w = b;
-	       return w;
-	   }
-	   /* scale a and b by 2**-9600 */
-	   ea -= 0x2580; eb -= 0x2580;	k += 9600;
-	   SET_LDOUBLE_EXP(a,ea);
-	   SET_LDOUBLE_EXP(b,eb);
-	}
-	if(__builtin_expect(eb < 0x20bf, 0)) {	/* b < 2**-8000 */
-	    if(eb == 0) {	/* subnormal b or 0 */
-		uint32_t exp __attribute__ ((unused));
-		uint32_t high,low;
-		GET_LDOUBLE_WORDS(exp,high,low,b);
-		if((high|low)==0) return a;
-		SET_LDOUBLE_WORDS(t1, 0x7ffd, 0x80000000, 0); /* t1=2^16382 */
-		b *= t1;
-		a *= t1;
-		k -= 16382;
-		GET_LDOUBLE_EXP (ea, a);
-		GET_LDOUBLE_EXP (eb, b);
-		if (eb > ea)
-		  {
-		    t1 = a;
-		    a = b;
-		    b = t1;
-		    j = ea;
-		    ea = eb;
-		    eb = j;
-		  }
-	    } else {		/* scale a and b by 2^9600 */
-		ea += 0x2580;	/* a *= 2^9600 */
-		eb += 0x2580;	/* b *= 2^9600 */
-		k -= 9600;
-		SET_LDOUBLE_EXP(a,ea);
-		SET_LDOUBLE_EXP(b,eb);
-	    }
-	}
-    /* medium size a and b */
-	w = a-b;
-	if (w>b) {
-	    uint32_t high;
-	    GET_LDOUBLE_MSW(high,a);
-	    SET_LDOUBLE_WORDS(t1,ea,high,0);
-	    t2 = a-t1;
-	    w  = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
-	} else {
-	    uint32_t high;
-	    GET_LDOUBLE_MSW(high,b);
-	    a  = a+a;
-	    SET_LDOUBLE_WORDS(y1,eb,high,0);
-	    y2 = b - y1;
-	    GET_LDOUBLE_MSW(high,a);
-	    SET_LDOUBLE_WORDS(t1,ea+1,high,0);
-	    t2 = a - t1;
-	    w  = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
-	}
-	if(k!=0) {
-	    uint32_t exp;
-	    t1 = 1.0;
-	    GET_LDOUBLE_EXP(exp,t1);
-	    SET_LDOUBLE_EXP(t1,exp+k);
-	    w *= t1;
-	    math_check_force_underflow_nonneg (w);
-	    return w;
-	} else return w;
+  long double t1, t2;
+  long double h = sqrtl (ax * ax + ay * ay);
+  if (h <= 2.0L * ay)
+    {
+      long double delta = h - ay;
+      t1 = ax * (2.0L * delta - ax);
+      t2 = (delta - 2.0L * (ax - ay)) * delta;
+    }
+  else
+    {
+      long double delta = h - ax;
+      t1 = 2.0L * delta * (ax - 2.0L * ay);
+      t2 = (4.0L * delta - ay) * ay + delta * delta;
+    }
+
+  h -= (t1 + t2) / (2.0L * h);
+  return h;
+}
+
+long double
+__ieee754_hypotl (long double x, long double y)
+{
+  if (!isfinite(x) || !isfinite(y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  x = fabsl (x);
+  y = fabsl (y);
+
+  long double ax = x < y ? y : x;
+  long double ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
+    {
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return ax + ay;
+
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
+    }
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return ax;
+
+      ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+      math_check_force_underflow_nonneg (ax);
+      return ax;
+    }
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 libm_alias_finite (__ieee754_hypotl, __hypotl)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-06 11:58   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:

  - Handle qNaN and sNaN.
  - Tune the 'widely varying operands' to avoid spurious underflow
    due the multiplication and fix the return value for upwards
    rounding mode.
  - Handle required underflow exception for subnormal results.

The main advantage of the new algorithm is its precision.  With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 ++++++++++++----------------
 1 file changed, 96 insertions(+), 130 deletions(-)

diff --git a/sysdeps/ieee754/ldbl-128/e_hypotl.c b/sysdeps/ieee754/ldbl-128/e_hypotl.c
index cd4fdbc4a6..022fa9aaf7 100644
--- a/sysdeps/ieee754/ldbl-128/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-128/e_hypotl.c
@@ -1,141 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrtl(2)/2 ulp, than
- *	sqrtl(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrtl(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 64 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 64 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 64 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypotl(x,y) is INF if x or y is +INF or -INF; else
- *	hypotl(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypotl(x,y) returns sqrtl(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+/* Euclidean distance function.  Long Double/Binary128 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+   Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <libm-alias-finite.h>
 
+#define SCALE      L(0x1p-8303)
+#define LARGE_VAL  L(0x1.6a09e667f3bcc908b2fb1366ea95p+8191)
+#define TINY_VAL   L(0x1p-8191)
+#define EPS        L(0x1p-114)
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+   and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
+static inline _Float128
+kernel (_Float128 ax, _Float128 ay)
+{
+  _Float128 t1, t2;
+  _Float128 h = sqrtl (ax * ax + ay * ay);
+  if (h <= L(2.0) * ay)
+    {
+      _Float128 delta = h - ay;
+      t1 = ax * (L(2.0) * delta - ax);
+      t2 = (delta - L(2.0) * (ax - ay)) * delta;
+    }
+  else
+    {
+      _Float128 delta = h - ax;
+      t1 = L(2.0) * delta * (ax - L(2.0) * ay);
+      t2 = (L(4.0) * delta - ay) * ay + delta * delta;
+    }
+
+  h -= (t1 + t2) / (L(2.0) * h);
+  return h;
+}
+
 _Float128
 __ieee754_hypotl(_Float128 x, _Float128 y)
 {
-	_Float128 a,b,t1,t2,y1,y2,w;
-	int64_t j,k,ha,hb;
-
-	GET_LDOUBLE_MSW64(ha,x);
-	ha &= 0x7fffffffffffffffLL;
-	GET_LDOUBLE_MSW64(hb,y);
-	hb &= 0x7fffffffffffffffLL;
-	if(hb > ha) {a=y;b=x;j=ha; ha=hb;hb=j;} else {a=x;b=y;}
-	SET_LDOUBLE_MSW64(a,ha);	/* a <- |a| */
-	SET_LDOUBLE_MSW64(b,hb);	/* b <- |b| */
-	if((ha-hb)>0x78000000000000LL) {return a+b;} /* x/y > 2**120 */
-	k=0;
-	if(ha > 0x5f3f000000000000LL) {	/* a>2**8000 */
-	   if(ha >= 0x7fff000000000000LL) {	/* Inf or NaN */
-	       uint64_t low;
-	       w = a+b;			/* for sNaN */
-	       if (issignaling (a) || issignaling (b))
-		 return w;
-	       GET_LDOUBLE_LSW64(low,a);
-	       if(((ha&0xffffffffffffLL)|low)==0) w = a;
-	       GET_LDOUBLE_LSW64(low,b);
-	       if(((hb^0x7fff000000000000LL)|low)==0) w = b;
-	       return w;
-	   }
-	   /* scale a and b by 2**-9600 */
-	   ha -= 0x2580000000000000LL;
-	   hb -= 0x2580000000000000LL;	k += 9600;
-	   SET_LDOUBLE_MSW64(a,ha);
-	   SET_LDOUBLE_MSW64(b,hb);
-	}
-	if(hb < 0x20bf000000000000LL) {	/* b < 2**-8000 */
-	    if(hb <= 0x0000ffffffffffffLL) {	/* subnormal b or 0 */
-		uint64_t low;
-		GET_LDOUBLE_LSW64(low,b);
-		if((hb|low)==0) return a;
-		t1=0;
-		SET_LDOUBLE_MSW64(t1,0x7ffd000000000000LL); /* t1=2^16382 */
-		b *= t1;
-		a *= t1;
-		k -= 16382;
-		GET_LDOUBLE_MSW64 (ha, a);
-		GET_LDOUBLE_MSW64 (hb, b);
-		if (hb > ha)
-		  {
-		    t1 = a;
-		    a = b;
-		    b = t1;
-		    j = ha;
-		    ha = hb;
-		    hb = j;
-		  }
-	    } else {		/* scale a and b by 2^9600 */
-		ha += 0x2580000000000000LL;	/* a *= 2^9600 */
-		hb += 0x2580000000000000LL;	/* b *= 2^9600 */
-		k -= 9600;
-		SET_LDOUBLE_MSW64(a,ha);
-		SET_LDOUBLE_MSW64(b,hb);
-	    }
-	}
-    /* medium size a and b */
-	w = a-b;
-	if (w>b) {
-	    t1 = 0;
-	    SET_LDOUBLE_MSW64(t1,ha);
-	    t2 = a-t1;
-	    w  = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
-	} else {
-	    a  = a+a;
-	    y1 = 0;
-	    SET_LDOUBLE_MSW64(y1,hb);
-	    y2 = b - y1;
-	    t1 = 0;
-	    SET_LDOUBLE_MSW64(t1,ha+0x0001000000000000LL);
-	    t2 = a - t1;
-	    w  = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
-	}
-	if(k!=0) {
-	    uint64_t high;
-	    t1 = 1;
-	    GET_LDOUBLE_MSW64(high,t1);
-	    SET_LDOUBLE_MSW64(t1,high+(k<<48));
-	    w *= t1;
-	    math_check_force_underflow_nonneg (w);
-	    return w;
-	} else return w;
+  if (!isfinite(x) || !isfinite(y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  x = fabsl (x);
+  y = fabsl (y);
+
+  _Float128 ax = x < y ? y : x;
+  _Float128 ay = x < y ? x : y;
+
+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
+    {
+      if (__glibc_unlikely (ay <= ax * EPS))
+	return ax + ay;
+
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
+    }
+
+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+	return ax;
+
+      ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+      math_check_force_underflow_nonneg (ax);
+      return ax;
+    }
+
+  /* Common case: ax is not huge and ay is not tiny.  */
+  if (__glibc_unlikely (ay <= ax * EPS))
+    return ax + ay;
+
+  return kernel (ax, ay);
 }
 libm_alias_finite (__ieee754_hypotl, __hypotl)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 06/12] math: Remove powerpc e_hypot
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-06 17:12   ` Adhemerval Zanella
  2021-12-03  0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The generic implementation is shows only slight worse performance:

POWER9     reciprocal-throughput    latency
master                   13.4024    14.0967
new hypot                14.8479    15.8061

POWER8     reciprocal-throughput    latency
master                   15.5767    16.8885
new hypot                16.5371    18.4057

One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).

Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
---
 sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
 sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
 .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
 .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
 .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
 9 files changed, 1 insertion(+), 327 deletions(-)
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
deleted file mode 100644
index f96c589bbd..0000000000
--- a/sysdeps/powerpc/fpu/e_hypot.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/* Pythagorean addition using doubles
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math-underflow.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypot(x,y)
- *
- * This a FP only version without any FP->INT conversion.
- * It is similar to default C version, making appropriates
- * overflow and underflows checks as well scaling when it
- * is needed.
- */
-
-double
-__ieee754_hypot (double x, double y)
-{
-  if ((isinf (x) || isinf (y))
-      && !issignaling (x) && !issignaling (y))
-    return INFINITY;
-  if (isnan (x) || isnan (y))
-    return x + y;
-
-  x = fabs (x);
-  y = fabs (y);
-
-  if (y > x)
-    {
-      double t = x;
-      x = y;
-      y = t;
-    }
-  if (y == 0.0)
-    return x;
-
-  /* if y is higher enough, y * 2^60 might overflow. The tests if
-     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
-     appropriate check to avoid the overflow exception generation.  */
-  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
-    return x + y;
-
-  if (x > 0x1p+500)
-    {
-      x *= 0x1p-600;
-      y *= 0x1p-600;
-      return sqrt (x * x + y * y) / 0x1p-600;
-    }
-  if (y < 0x1p-500)
-    {
-      if (y <= 0x0.fffffffffffffp-1022)
-	{
-	  x *= 0x1p+1022;
-	  y *= 0x1p+1022;
-	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
-	  math_check_force_underflow_nonneg (ret);
-	  return ret;
-	}
-      else
-	{
-	  x *= 0x1p+600;
-	  y *= 0x1p+600;
-	  return sqrt (x * x + y * y) / 0x1p+600;
-	}
-    }
-  return sqrt (x * x + y * y);
-}
-#ifndef __ieee754_hypot
-libm_alias_finite (__ieee754_hypot, __hypot)
-#endif
diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
deleted file mode 100644
index fa201dda51..0000000000
--- a/sysdeps/powerpc/fpu/e_hypotf.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/* Pythagorean addition using floats
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypotf(x,y)
-
-   This a FP only version without any FP->INT conversion.
-   It is similar to default C version, making appropriates
-   overflow and underflows checks as using double precision
-   instead of scaling.  */
-
-#ifdef _ARCH_PWR7
-/* POWER7 isinf and isnan optimizations are fast. */
-# define TEST_INF_NAN(x, y)                                      \
-   if ((isinff(x) || isinff(y))					 \
-       && !issignaling (x) && !issignaling (y))			 \
-     return INFINITY;                                            \
-   if (isnanf(x) || isnanf(y))                                   \
-     return x + y;
-# else
-/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
- * costly (especially for POWER6). */
-# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
- do {                                                            \
-   ieee_float_shape_type gf_u1;                                  \
-   ieee_float_shape_type gf_u2;                                  \
-   gf_u1.value = (f1);                                           \
-   gf_u2.value = (f2);                                           \
-   (i1) = gf_u1.word & 0x7fffffff;                               \
-   (i2) = gf_u2.word & 0x7fffffff;                               \
- } while (0)
-
-# define TEST_INF_NAN(x, y)                                      \
- do {                                                            \
-   uint32_t hx, hy;                                              \
-   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
-   if (hy > hx) {                                                \
-     uint32_t ht = hx; hx = hy; hy = ht;                         \
-   }                                                             \
-   if (hx >= 0x7f800000) {                                       \
-     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
-	 && !issignaling (x) && !issignaling (y))		 \
-       return INFINITY;                                          \
-     return x + y;						 \
-   }                                                             \
- } while (0)
-#endif
-
-
-float
-__ieee754_hypotf (float x, float y)
-{
-  TEST_INF_NAN (x, y);
-
-  return sqrt ((double) x * x + (double) y * y);
-}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
index 60f2c95532..1de0f9b350 100644
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
@@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
 			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
 			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
 			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
-			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
-			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
+			s_logbf-power7 s_logbf-ppc32
 
 CFLAGS-s_llrintf-power6.c += -mcpu=power6
 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
@@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
 CFLAGS-s_logbl-power7.c = -mcpu=power7
 CFLAGS-s_logb-power7.c = -mcpu=power7
 CFLAGS-s_logbf-power7.c = -mcpu=power7
-CFLAGS-e_hypot-power7.c = -mcpu=power7
-CFLAGS-e_hypotf-power7.c = -mcpu=power7
 
 # These files quiet sNaNs in a way that is optimized away without
 # -fsignaling-nans.
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
deleted file mode 100644
index 382b4a0b27..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_power7
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
deleted file mode 100644
index abb14d5469..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_ppc32
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
deleted file mode 100644
index a16efa350c..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypot.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypot,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypot_power7
-            : __ieee754_hypot_ppc32);
-
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
deleted file mode 100644
index f8a26ff22f..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee754_hypot POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_power7
-
-#include <sysdeps/powerpc/fpu/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
deleted file mode 100644
index b13f8c9db2..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_ppc32
-
-#include <sysdeps/ieee754/flt-32/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
deleted file mode 100644
index 1e72605db8..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypotf.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypotf,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypotf_power7
-            : __ieee754_hypotf_ppc32);
-
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 07/12] i386: Move hypot implementation to C
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-03 14:51   ` Wilco Dijkstra
  2021-12-03  0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling.  The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly.  A similar
implementation is provided instead of using the generic implementation:

Checked on i686-linux-gnu.
---
 sysdeps/i386/fpu/e_hypot.S  | 75 -------------------------------------
 sysdeps/i386/fpu/e_hypot.c  | 46 +++++++++++++++++++++++
 sysdeps/i386/fpu/e_hypotf.S | 64 -------------------------------
 3 files changed, 46 insertions(+), 139 deletions(-)
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S

diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S
deleted file mode 100644
index f2c956b77a..0000000000
--- a/sysdeps/i386/fpu/e_hypot.S
+++ /dev/null
@@ -1,75 +0,0 @@
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-DEFINE_DBL_MIN
-
-#ifdef PIC
-# define MO(op) op##@GOTOFF(%edx)
-#else
-# define MO(op) op
-#endif
-
-	.text
-ENTRY(__ieee754_hypot)
-#ifdef  PIC
-	LOAD_PIC_REG (dx)
-#endif
-	fldl	4(%esp)		// x
-	fxam
-	fnstsw
-	fldl	12(%esp)	// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	DBL_NARROW_EVAL_UFLOW_NONNEG
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypot)
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
new file mode 100644
index 0000000000..b7c068e734
--- /dev/null
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -0,0 +1,46 @@
+/* Euclidean distance function.  Double/Binary64 i386 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include <math_private.h>
+#include <math-underflow.h>
+#include <math-narrow-eval.h>
+#include <libm-alias-finite.h>
+#include <math_config.h>
+
+/* The i386 allows to use the default excess of precision to optimize the
+   hypot implementation, since internal multiplication and sqrt is carried
+   with 80-bit FP type.  */
+double
+__ieee754_hypot (double x, double y)
+{
+  if (!isfinite (x) || !isfinite (y))
+    {
+      if ((isinf (x) || isinf (y))
+	  && !issignaling (x) && !issignaling (y))
+	return INFINITY;
+      return x + y;
+    }
+
+  long double lx = x;
+  long double ly = y;
+  double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));
+  math_check_force_underflow_nonneg (r);
+  return r;
+}
+libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S
deleted file mode 100644
index cec5d15403..0000000000
--- a/sysdeps/i386/fpu/e_hypotf.S
+++ /dev/null
@@ -1,64 +0,0 @@
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-	.text
-ENTRY(__ieee754_hypotf)
-	flds	4(%esp)		// x
-	fxam
-	fnstsw
-	flds	8(%esp)		// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	FLT_NARROW_EVAL
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypotf)
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
@ 2021-12-03  0:00 ` Adhemerval Zanella
  2021-12-06 11:52   ` Wilco Dijkstra
  2021-12-06 21:11   ` Joseph Myers
  2021-12-03  0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:00 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows the architecture to use the builtin instead of generic
implementation.
---
 math/s_fmax_template.c                   | 27 ++++++++++++++++++++++++
 sysdeps/generic/math-use-builtins-fmax.h |  4 ++++
 sysdeps/generic/math-use-builtins.h      |  1 +
 3 files changed, 32 insertions(+)
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h

diff --git a/math/s_fmax_template.c b/math/s_fmax_template.c
index d817406f04..4417fdb283 100644
--- a/math/s_fmax_template.c
+++ b/math/s_fmax_template.c
@@ -17,10 +17,37 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <math.h>
+#include <math-use-builtins.h>
+
+#if __HAVE_FLOAT128
+# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
+# define BUILTIN_F128     , _Float128 :__builtin_fmaxf128
+#else
+# define USE_BUILTIN_F128
+# define BUILTIN_F128
+#endif
+
+#define USE_BUILTIN(X, Y)                   \
+  _Generic((X),                             \
+	   float       : USE_FMAXF_BUILTIN, \
+	   double      : USE_FMAX_BUILTIN,  \
+	   long double : USE_FMAXL_BUILTIN  \
+	   USE_BUILTIN_F128)
+
+#define BUILTIN(X, Y)                       \
+  _Generic((X),                             \
+	   float       : __builtin_fmaxf,   \
+	   double      : __builtin_fmax,    \
+	   long double : __builtin_fmaxl    \
+	   BUILTIN_F128)                    \
+  (X, Y)
 
 FLOAT
 M_DECL_FUNC (__fmax) (FLOAT x, FLOAT y)
 {
+  if (USE_BUILTIN (x, y))
+    return BUILTIN (x, y);
+
   if (isgreaterequal (x, y))
     return x;
   else if (isless (x, y))
diff --git a/sysdeps/generic/math-use-builtins-fmax.h b/sysdeps/generic/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..8fc4efca6a
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 0
+#define USE_FMAXF_BUILTIN 0
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index 19d2d1cf3c..e07bba242f 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -34,5 +34,6 @@
 #include <math-use-builtins-copysign.h>
 #include <math-use-builtins-sqrt.h>
 #include <math-use-builtins-fma.h>
+#include <math-use-builtins-fmax.h>
 
 #endif /* MATH_USE_BUILTINS_H  */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (7 preceding siblings ...)
  2021-12-03  0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
@ 2021-12-03  0:01 ` Adhemerval Zanella
  2021-12-06 11:50   ` Wilco Dijkstra
  2021-12-03  0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:01 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows the architecture to use the builtin instead of generic
implementation.
---
 math/s_fmin_template.c                   | 27 ++++++++++++++++++++++++
 sysdeps/generic/math-use-builtins-fmin.h |  4 ++++
 sysdeps/generic/math-use-builtins.h      |  1 +
 3 files changed, 32 insertions(+)
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h

diff --git a/math/s_fmin_template.c b/math/s_fmin_template.c
index 565a836266..27c2382f59 100644
--- a/math/s_fmin_template.c
+++ b/math/s_fmin_template.c
@@ -17,11 +17,38 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <math.h>
+#include <math-use-builtins.h>
+
+#if __HAVE_FLOAT128
+# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
+# define BUILTIN_F128     , _Float128 : __builtin_fminf128
+#else
+# define USE_BUILTIN_F128
+# define BUILTIN_F128
+#endif
+
+#define USE_BUILTIN(X, Y)                   \
+  _Generic((X),                             \
+	   float       : USE_FMAXF_BUILTIN, \
+	   double      : USE_FMAX_BUILTIN,  \
+	   long double : USE_FMAXL_BUILTIN  \
+	   USE_BUILTIN_F128)
+
+#define BUILTIN(X, Y)                       \
+  _Generic((X),                             \
+	   float       : __builtin_fminf,   \
+	   double      : __builtin_fmin,    \
+	   long double : __builtin_fminl    \
+	   BUILTIN_F128)                    \
+  (X, Y)
 
 
 FLOAT
 M_DECL_FUNC (__fmin) (FLOAT x, FLOAT y)
 {
+  if (USE_BUILTIN (x, y))
+    return BUILTIN (x, y);
+
   if (islessequal (x, y))
     return x;
   else if (isgreater (x, y))
diff --git a/sysdeps/generic/math-use-builtins-fmin.h b/sysdeps/generic/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..d2383ce00c
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 0
+#define USE_FMINF_BUILTIN 0
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index e07bba242f..24fba47575 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -35,5 +35,6 @@
 #include <math-use-builtins-sqrt.h>
 #include <math-use-builtins-fma.h>
 #include <math-use-builtins-fmax.h>
+#include <math-use-builtins-fmin.h>
 
 #endif /* MATH_USE_BUILTINS_H  */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (8 preceding siblings ...)
  2021-12-03  0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
@ 2021-12-03  0:01 ` Adhemerval Zanella
  2021-12-06 11:46   ` Wilco Dijkstra
  2021-12-03  0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:01 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

It allows to remove the arch-specific implementations.
---
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h |  4 +++
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h |  4 +++
 sysdeps/aarch64/fpu/s_fmax.c                 | 28 --------------------
 sysdeps/aarch64/fpu/s_fmaxf.c                | 28 --------------------
 sysdeps/aarch64/fpu/s_fmin.c                 | 28 --------------------
 sysdeps/aarch64/fpu/s_fminf.c                | 28 --------------------
 6 files changed, 8 insertions(+), 112 deletions(-)
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c

diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmax.h b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..6b9e7b7692
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 1
+#define USE_FMAXF_BUILTIN 10
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmin.h b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..7fd6b45fce
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 1
+#define USE_FMINF_BUILTIN 1
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
deleted file mode 100644
index 7e3593fbda..0000000000
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmax (double x, double y)
-{
-  return __builtin_fmax (x, y);
-}
-
-libm_alias_double (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
deleted file mode 100644
index eb4c469ef8..0000000000
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fmaxf (float x, float y)
-{
-  return __builtin_fmaxf (x, y);
-}
-
-libm_alias_float (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
deleted file mode 100644
index efdee1ed39..0000000000
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 1996-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmin (double x, double y)
-{
-  return __builtin_fmin (x, y);
-}
-
-libm_alias_double (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
deleted file mode 100644
index 3665fabb6b..0000000000
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fminf (float x, float y)
-{
-  return __builtin_fminf (x, y);
-}
-
-libm_alias_float (__fmin, fmin)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 11/12] math: Use fmin/fmax on hypot
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (9 preceding siblings ...)
  2021-12-03  0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
@ 2021-12-03  0:01 ` Adhemerval Zanella
  2021-12-06 11:44   ` Wilco Dijkstra
  2021-12-03  0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
  2021-12-03  8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
  12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:01 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

It optimizes for architectures that provides fast builtins.

Checked on aarch64-linux-gnu.
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index f53061badc..ce51784d27 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -38,6 +38,7 @@
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
+#include <math-use-builtins.h>
 #include <libm-alias-finite.h>
 #include "math_config.h"
 
@@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y)
   x = fabs (x);
   y = fabs (y);
 
-  double ax = x < y ? y : x;
-  double ay = x < y ? x : y;
+  double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x);
+  double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y);
 
   /* If ax is huge, scale both inputs down.  */
   if (__glibc_unlikely (ax > LARGE_VAL))
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (10 preceding siblings ...)
  2021-12-03  0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
@ 2021-12-03  0:01 ` Adhemerval Zanella
  2021-12-03  8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
  12 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03  0:01 UTC (permalink / raw)
  To: libc-alpha, Paul Zimmermann, Wilco Dijkstra

The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper with
SVID error handling around the new code.  There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
---
 math/Versions                                 |  2 ++
 math/w_hypot.c                                |  8 ++++++
 math/w_hypot_compat.c                         | 13 +++++++---
 math/w_hypotf.c                               |  8 ++++++
 math/w_hypotf_compat.c                        |  6 ++---
 sysdeps/i386/fpu/e_hypot.c                    | 14 +++++++++-
 sysdeps/ieee754/dbl-64/e_hypot.c              | 26 +++++++++++++++----
 sysdeps/ieee754/dbl-64/w_hypot.c              |  1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             | 21 +++++++++++----
 sysdeps/ieee754/flt-32/w_hypotf.c             |  1 +
 sysdeps/mach/hurd/i386/libm.abilist           |  2 ++
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |  2 ++
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |  2 ++
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |  2 ++
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |  2 ++
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |  2 ++
 sysdeps/unix/sysv/linux/i386/libm.abilist     |  2 ++
 .../sysv/linux/m68k/coldfire/libm.abilist     |  2 ++
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |  2 ++
 .../sysv/linux/microblaze/be/libm.abilist     |  2 ++
 .../sysv/linux/microblaze/le/libm.abilist     |  2 ++
 .../unix/sysv/linux/mips/mips32/libm.abilist  |  2 ++
 .../unix/sysv/linux/mips/mips64/libm.abilist  |  2 ++
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |  2 ++
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |  2 ++
 .../powerpc/powerpc32/nofpu/libm.abilist      |  2 ++
 .../linux/powerpc/powerpc64/be/libm.abilist   |  2 ++
 .../linux/powerpc/powerpc64/le/libm.abilist   |  2 ++
 .../unix/sysv/linux/s390/s390-32/libm.abilist |  2 ++
 .../unix/sysv/linux/s390/s390-64/libm.abilist |  2 ++
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |  2 ++
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |  2 ++
 .../sysv/linux/sparc/sparc32/libm.abilist     |  2 ++
 .../sysv/linux/sparc/sparc64/libm.abilist     |  2 ++
 .../unix/sysv/linux/x86_64/64/libm.abilist    |  2 ++
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |  2 ++
 36 files changed, 135 insertions(+), 17 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c

diff --git a/math/Versions b/math/Versions
index a4b5405ddc..fef7a08c3b 100644
--- a/math/Versions
+++ b/math/Versions
@@ -628,5 +628,7 @@ libm {
     fminimum_numf64x; fminimum_numf128;
     fminimum_magf64x; fminimum_magf128;
     fminimum_mag_numf64x; fminimum_mag_numf128;
+    # No SVID compatible error handling.
+    hypotf; hypot;
   }
 }
diff --git a/math/w_hypot.c b/math/w_hypot.c
new file mode 100644
index 0000000000..66f723a896
--- /dev/null
+++ b/math/w_hypot.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
diff --git a/math/w_hypot_compat.c b/math/w_hypot_compat.c
index f07039cc51..ec983a4ab8 100644
--- a/math/w_hypot_compat.c
+++ b/math/w_hypot_compat.c
@@ -20,9 +20,9 @@
 #include <libm-alias-double.h>
 
 
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
 double
-__hypot (double x, double y)
+__hypot_compat (double x, double y)
 {
 	double z = __ieee754_hypot(x,y);
 	if(__builtin_expect(!isfinite(z), 0)
@@ -31,5 +31,12 @@ __hypot (double x, double y)
 
 	return z;
 }
-libm_alias_double (__hypot, hypot)
+compat_symbol (libm, __hypot_compat, hypot, GLIBC_2_0);
+# ifdef NO_LONG_DOUBLE
+weak_alias (__hypot_compat, hypotl)
+# endif
+# ifdef LONG_DOUBLE_COMPAT
+LONG_DOUBLE_COMPAT_CHOOSE_libm_hypotl (
+  compat_symbol (libm, __hypot_compat, hypotl, FIRST_VERSION_libm_hypotl), );
+# endif
 #endif
diff --git a/math/w_hypotf.c b/math/w_hypotf.c
new file mode 100644
index 0000000000..b15a9b06d0
--- /dev/null
+++ b/math/w_hypotf.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypotf, hypotf)
diff --git a/math/w_hypotf_compat.c b/math/w_hypotf_compat.c
index 05898d3420..2bde4553b0 100644
--- a/math/w_hypotf_compat.c
+++ b/math/w_hypotf_compat.c
@@ -22,9 +22,9 @@
 #include <libm-alias-float.h>
 
 
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
 float
-__hypotf(float x, float y)
+__hypotf_compat (float x, float y)
 {
 	float z = __ieee754_hypotf(x,y);
 	if(__builtin_expect(!isfinite(z), 0)
@@ -34,5 +34,5 @@ __hypotf(float x, float y)
 
 	return z;
 }
-libm_alias_float (__hypot, hypot)
+compat_symbol (libm, __hypotf_compat, hypotf, GLIBC_2_0);
 #endif
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
index b7c068e734..a0c5734b68 100644
--- a/sysdeps/i386/fpu/e_hypot.c
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -20,14 +20,17 @@
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
+#include <math-svid-compat.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-double.h>
 #include <math_config.h>
+#include <errno.h>
 
 /* The i386 allows to use the default excess of precision to optimize the
    hypot implementation, since internal multiplication and sqrt is carried
    with 80-bit FP type.  */
 double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
 {
   if (!isfinite (x) || !isfinite (y))
     {
@@ -41,6 +44,15 @@ __ieee754_hypot (double x, double y)
   long double ly = y;
   double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));
   math_check_force_underflow_nonneg (r);
+  if (isinf (r))
+    __set_errno (ERANGE);
   return r;
 }
+strong_alias (__hypot, __ieee754_hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
 libm_alias_finite (__ieee754_hypot, __hypot)
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
+#endif
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index ce51784d27..54f936c82b 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -34,12 +34,15 @@
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
+#include <errno.h>
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
 #include <math-narrow-eval.h>
 #include <math-use-builtins.h>
+#include <math-svid-compat.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-double.h>
 #include "math_config.h"
 
 #define SCALE     0x1p-600
@@ -47,6 +50,14 @@
 #define TINY_VAL  0x1p-459
 #define EPS       0x1p-54
 
+static inline double
+handle_errno (double r)
+{
+  if (isinf (r))
+    __set_errno (ERANGE);
+  return r;
+}
+
 /* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
    and squaring ax, ay and (ax - ay) does not overflow or underflow.  */
 static inline double
@@ -83,7 +94,7 @@ kernel (double ax, double ay)
 }
 
 double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
 {
   if (!isfinite(x) || !isfinite(y))
     {
@@ -103,9 +114,9 @@ __ieee754_hypot (double x, double y)
   if (__glibc_unlikely (ax > LARGE_VAL))
     {
       if (__glibc_unlikely (ay <= ax * EPS))
-	return ax + ay;
- 
-      return kernel (ax * SCALE, ay * SCALE) / SCALE;
+	return handle_errno (ax + ay);
+
+      return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE);
     }
 
   /* If ay is tiny, scale both inputs up.  */
@@ -125,6 +136,11 @@ __ieee754_hypot (double x, double y)
 
   return kernel (ax, ay);
 }
-#ifndef __ieee754_hypot
+strong_alias (__hypot, __ieee754_hypot)
 libm_alias_finite (__ieee754_hypot, __hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
 #endif
diff --git a/sysdeps/ieee754/dbl-64/w_hypot.c b/sysdeps/ieee754/dbl-64/w_hypot.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/w_hypot.c
@@ -0,0 +1 @@
+/* Not needed.  */
diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index 1d082fe36c..809c00c11e 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -16,14 +16,17 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <errno.h>
 #include <libm-alias-finite.h>
+#include <libm-alias-float.h>
+#include <math-svid-compat.h>
 #include <math.h>
 #include "math_config.h"
 #include <math-narrow-eval.h>
 #include <math_private.h>
 
 float
-__ieee754_hypotf (float x, float y)
+__hypotf (float x, float y)
 {
    if (!isfinite(x) || !isfinite(y))
      {
@@ -33,9 +36,17 @@ __ieee754_hypotf (float x, float y)
        return x + y;
      }
 
-  return math_narrow_eval (sqrt ((double) x * (double) x
-				 + (double) y * (double) y));
+  float r = math_narrow_eval (sqrt ((double) x * (double) x
+				     + (double) y * (double) y));
+  if (!isfinite (r))
+    __set_errno (ERANGE);
+  return r;
 }
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
+strong_alias (__hypotf, __ieee754_hypotf)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
+#else
+libm_alias_float (__hypot, hypot)
 #endif
+libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/w_hypotf.c b/sysdeps/ieee754/flt-32/w_hypotf.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/w_hypotf.c
@@ -0,0 +1 @@
+/* Not needed.  */
diff --git a/sysdeps/mach/hurd/i386/libm.abilist b/sysdeps/mach/hurd/i386/libm.abilist
index abf91bd142..8f40ddb150 100644
--- a/sysdeps/mach/hurd/i386/libm.abilist
+++ b/sysdeps/mach/hurd/i386/libm.abilist
@@ -1179,3 +1179,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libm.abilist b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
index 1cef7d3db7..c2e3c6453e 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/alpha/libm.abilist b/sysdeps/unix/sysv/linux/alpha/libm.abilist
index 59d51021fa..4f85b6180f 100644
--- a/sysdeps/unix/sysv/linux/alpha/libm.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libm.abilist
@@ -1201,6 +1201,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libm.abilist b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libm.abilist b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/hppa/libm.abilist b/sysdeps/unix/sysv/linux/hppa/libm.abilist
index 35d316a720..b5dd4e851f 100644
--- a/sysdeps/unix/sysv/linux/hppa/libm.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/i386/libm.abilist b/sysdeps/unix/sysv/linux/i386/libm.abilist
index ef99b3e104..5d89aaa08e 100644
--- a/sysdeps/unix/sysv/linux/i386/libm.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libm.abilist
@@ -1186,3 +1186,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 _LIB_VERSION D 0x4
 GLIBC_2.4 __clog10 F
 GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
index 58316c96ae..e7cd739a54 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
@@ -882,3 +882,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
index 4113d3170d..08b902118d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
index 18fe9cc57a..09bb3bd75b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/nios2/libm.abilist b/sysdeps/unix/sysv/linux/nios2/libm.abilist
index 3a2b34ecc2..11abbb5668 100644
--- a/sysdeps/unix/sysv/linux/nios2/libm.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
index 740cc8f55b..1688809c36 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
@@ -888,6 +888,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
index 16fb30566b..e880cebd78 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
@@ -887,6 +887,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
index ad4b98c09a..033385dfc1 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
@@ -881,6 +881,8 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
index 955765051c..7923d428bc 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
@@ -1316,3 +1316,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
index 1f5bd7754d..9a84163089 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
index 0b18481f39..174bde4fa0 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libm.abilist b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libm.abilist b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
index 727d1ce707..217e6eff7f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
@@ -1152,6 +1152,8 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
 GLIBC_2.4 __clog10l F
 GLIBC_2.4 __finitel F
 GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
index 0f57574523..6b53b0c59f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
index 574789e061..dbefbc3a1a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
index 1dc89b304d..8001d0f219 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
 GLIBC_2.35 fminimuml F
 GLIBC_2.35 fsqrt F
 GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
-- 
2.32.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 00/12] Improve hypot
  2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
                   ` (11 preceding siblings ...)
  2021-12-03  0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
@ 2021-12-03  8:51 ` Paul Zimmermann
  2021-12-06 12:36   ` Adhemerval Zanella
  12 siblings, 1 reply; 36+ messages in thread
From: Paul Zimmermann @ 2021-12-03  8:51 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Wilco.Dijkstra

       Dear Adhemerval,

a few typos in your commit message:

> when fast FMA is avaliable
should be "available"

> Use a newer algotihm
should be "algorithm"

interger -> integer

>  - An improvement from 233442 to 1274 results with 1 ulp of
>    error for Binary96 (x86_64).
>  - An improvement from 453045 to 1294 results with 1 ulp of
>    error for Binary96 (x86_64).

should the latter be binary128?

> If FMA is uses
should be "used"

adapated -> adapted

> If FMA is uses the binary64 shows a slight worse precision:

I confirm, I find:

   binary64 0.948812 -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022

Best regards,
Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 01/12] math: Simplify hypotf implementation
  2021-12-03  0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
@ 2021-12-03 13:23   ` Wilco Dijkstra
  2021-12-03 19:44     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 13:23 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

-       return (float) sqrt(d_x * d_x + d_y * d_y);
+  return math_narrow_eval (sqrt ((double) x * (double) x
+                                + (double) y * (double) y));

This still requires a cast to float after the sqrt, otherwise math_narrow_eval
has no effect. OK with that change.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
  2021-12-03  0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
@ 2021-12-03 13:41   ` Wilco Dijkstra
  2021-12-03 19:44     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 13:41 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemeval,

+  /* If ax is huge, scale both inputs down.  */
+  if (__glibc_unlikely (ax > LARGE_VAL))
     {
+      if (__glibc_unlikely (ay <= ax * EPS))
+       return ax + ay;
+ 
+      return kernel (ax * SCALE, ay * SCALE) / SCALE;
     }

Both of these returns can overflow and thus should have a math_narrow_eval
(only the case where we can't overflow or underflow doesn't). OK with that change.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 03/12] math: Improve hypot performance with FMA
  2021-12-03  0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
@ 2021-12-03 13:44   ` Wilco Dijkstra
  0 siblings, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 13:44 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi,

Looks good to me.

Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 07/12] i386: Move hypot implementation to C
  2021-12-03  0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
@ 2021-12-03 14:51   ` Wilco Dijkstra
  2021-12-06 12:26     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 14:51 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

+  double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));

Same problem here, this needs a cast to double.

Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 01/12] math: Simplify hypotf implementation
  2021-12-03 13:23   ` Wilco Dijkstra
@ 2021-12-03 19:44     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 19:44 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 03/12/2021 10:23, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> -       return (float) sqrt(d_x * d_x + d_y * d_y);
> +  return math_narrow_eval (sqrt ((double) x * (double) x
> +                                + (double) y * (double) y));
> 
> This still requires a cast to float after the sqrt, otherwise math_narrow_eval
> has no effect. OK with that change.

Ack.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
  2021-12-03 13:41   ` Wilco Dijkstra
@ 2021-12-03 19:44     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 19:44 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 03/12/2021 10:41, Wilco Dijkstra wrote:
> Hi Adhemeval,
> 
> +  /* If ax is huge, scale both inputs down.  */
> +  if (__glibc_unlikely (ax > LARGE_VAL))
>      {
> +      if (__glibc_unlikely (ay <= ax * EPS))
> +       return ax + ay;
> + 
> +      return kernel (ax * SCALE, ay * SCALE) / SCALE;
>      }
> 
> Both of these returns can overflow and thus should have a math_narrow_eval
> (only the case where we can't overflow or underflow doesn't). OK with that change.

Ack.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 11/12] math: Use fmin/fmax on hypot
  2021-12-03  0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
@ 2021-12-06 11:44   ` Wilco Dijkstra
  0 siblings, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:44 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

@@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y)
   x = fabs (x);
   y = fabs (y);
 
-  double ax = x < y ? y : x;
-  double ay = x < y ? x : y;
+  double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x);
+  double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y);
 
Looks good.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
  2021-12-03  0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
@ 2021-12-06 11:46   ` Wilco Dijkstra
  2021-12-06 12:35     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:46 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 1
+#define USE_FMAXF_BUILTIN 10

Typo: 1 rather than 10

Looks good otherwise.

Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h
  2021-12-03  0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
@ 2021-12-06 11:50   ` Wilco Dijkstra
  0 siblings, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:50 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

 math/s_fmin_template.c                   | 27 ++++++++++++++++++++++++
 sysdeps/generic/math-use-builtins-fmin.h |  4 ++++
 sysdeps/generic/math-use-builtins.h      |  1 +
 3 files changed, 32 insertions(+)
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h

Looks good to me.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
  2021-12-03  0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
@ 2021-12-06 11:52   ` Wilco Dijkstra
  2021-12-06 21:11   ` Joseph Myers
  1 sibling, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:52 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,

 math/s_fmax_template.c                   | 27 ++++++++++++++++++++++++
 sysdeps/generic/math-use-builtins-fmax.h |  4 ++++
 sysdeps/generic/math-use-builtins.h      |  1 +
 3 files changed, 32 insertions(+)
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h

Looks good.

Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
  2021-12-03  0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
@ 2021-12-06 11:58   ` Wilco Dijkstra
  2021-12-06 12:22     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:58 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,


+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+       return ax;

Why not return ax + ay like in the other special cases? It's more accurate for rounding
upwards, and it's best to keep the code as similar as possible as the double version.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
  2021-12-03  0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
@ 2021-12-06 12:00   ` Wilco Dijkstra
  2021-12-06 12:21     ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 12:00 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Paul Zimmermann

Hi Adhemerval,


+  /* If ay is tiny, scale both inputs up.  */
+  if (__glibc_unlikely (ay < TINY_VAL))
+    {
+      if (__glibc_unlikely (ax >= ay / EPS))
+       return ax;

Same here - return ax + ay.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
  2021-12-06 12:00   ` Wilco Dijkstra
@ 2021-12-06 12:21     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:21 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 06/12/2021 09:00, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> 
> +  /* If ay is tiny, scale both inputs up.  */
> +  if (__glibc_unlikely (ay < TINY_VAL))
> +    {
> +      if (__glibc_unlikely (ax >= ay / EPS))
> +       return ax;
> 
> Same here - return ax + ay.

Ack, I have fixed it.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
  2021-12-06 11:58   ` Wilco Dijkstra
@ 2021-12-06 12:22     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:22 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 06/12/2021 08:58, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> 
> +  /* If ay is tiny, scale both inputs up.  */
> +  if (__glibc_unlikely (ay < TINY_VAL))
> +    {
> +      if (__glibc_unlikely (ax >= ay / EPS))
> +       return ax;
> 
> Why not return ax + ay like in the other special cases? It's more accurate for rounding
> upwards, and it's best to keep the code as similar as possible as the double version.

Ack, I have fix it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 07/12] i386: Move hypot implementation to C
  2021-12-03 14:51   ` Wilco Dijkstra
@ 2021-12-06 12:26     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:26 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 03/12/2021 11:51, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> +  double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));
> 
> Same problem here, this needs a cast to double.
> 
> Wilco
> 

Ack.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
  2021-12-06 11:46   ` Wilco Dijkstra
@ 2021-12-06 12:35     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:35 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, Paul Zimmermann



On 06/12/2021 08:46, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> +++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
> @@ -0,0 +1,4 @@
> +#define USE_FMAX_BUILTIN 1
> +#define USE_FMAXF_BUILTIN 10
> 
> Typo: 1 rather than 10

Oops, I have fixed it.

> 
> Looks good otherwise.
> 
> Wilco
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 00/12] Improve hypot
  2021-12-03  8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
@ 2021-12-06 12:36   ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:36 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: libc-alpha, Wilco.Dijkstra



On 03/12/2021 05:51, Paul Zimmermann wrote:
>        Dear Adhemerval,
> 
> a few typos in your commit message:
> 
>> when fast FMA is avaliable
> should be "available"
> 
>> Use a newer algotihm
> should be "algorithm"
> 
> interger -> integer
> 
>>  - An improvement from 233442 to 1274 results with 1 ulp of
>>    error for Binary96 (x86_64).
>>  - An improvement from 453045 to 1294 results with 1 ulp of
>>    error for Binary96 (x86_64).
> 
> should the latter be binary128?

Yeah, indeed.

> 
>> If FMA is uses
> should be "used"
> 
> adapated -> adapted
> 
>> If FMA is uses the binary64 shows a slight worse precision:
> 
> I confirm, I find:
> 
>    binary64 0.948812 -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022
> 
> Best regards,
> Paul
> 

Thanks for checking on it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
  2021-12-03  0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-06 17:12   ` Adhemerval Zanella
  2021-12-06 21:29     ` Paul A. Clarke
  0 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 17:12 UTC (permalink / raw)
  To: libc-alpha, Tulio Magno Quites Machado Filho, Paul A. Clarke



On 02/12/2021 21:00, Adhemerval Zanella wrote:
> The generic implementation is shows only slight worse performance:
> 
> POWER9     reciprocal-throughput    latency
> master                   13.4024    14.0967
> new hypot                14.8479    15.8061
> 
> POWER8     reciprocal-throughput    latency
> master                   15.5767    16.8885
> new hypot                16.5371    18.4057
> 
> One way to improve might to make gcc generate xsmaxdp/xsmindp for
> fmax/fmin (it onl does for -ffast-math, clang does for default
> options).
> 
> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> (power9).

Hi Tulio/Paul,

This the only missing patch in this set and I would like to check with you,
powerpc maintainers, that if it would be ok to push it.  The resulting
performance difference, including the latest one that removes the wrappers,
is slight better:

         
POWER9           reciprocal-throughput     latency
master                          13.4024    14.0967
new hypot                       11.9206    13.9871

POWER8            reciprocal-throughput    latency
master                          15.5767    16.8885
new hypot                       15.3541    18.0856


The POWER8 çatency difference seems to be due branch misprediction 
in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:

POWER8            reciprocal-throughput    latency
xsmaxdp/xsmindp                 12.8959    16.2082

POWER9 is not affected (I don't see any performance difference by
using xsmaxdp/xsmindp).

The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
for some reason, clang use them on default -O2 option.


> ---
>  sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
>  sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
>  .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
>  .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
>  .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
>  .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
>  .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
>  .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
>  .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
>  9 files changed, 1 insertion(+), 327 deletions(-)
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> 
> diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
> deleted file mode 100644
> index f96c589bbd..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypot.c
> +++ /dev/null
> @@ -1,87 +0,0 @@
> -/* Pythagorean addition using doubles
> -   Copyright (C) 2011-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Library General Public License as
> -   published by the Free Software Foundation; either version 2 of the
> -   License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Library General Public License for more details.
> -
> -   You should have received a copy of the GNU Library General Public
> -   License along with the GNU C Library; see the file COPYING.LIB.  If
> -   not, see <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math-underflow.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypot(x,y)
> - *
> - * This a FP only version without any FP->INT conversion.
> - * It is similar to default C version, making appropriates
> - * overflow and underflows checks as well scaling when it
> - * is needed.
> - */
> -
> -double
> -__ieee754_hypot (double x, double y)
> -{
> -  if ((isinf (x) || isinf (y))
> -      && !issignaling (x) && !issignaling (y))
> -    return INFINITY;
> -  if (isnan (x) || isnan (y))
> -    return x + y;
> -
> -  x = fabs (x);
> -  y = fabs (y);
> -
> -  if (y > x)
> -    {
> -      double t = x;
> -      x = y;
> -      y = t;
> -    }
> -  if (y == 0.0)
> -    return x;
> -
> -  /* if y is higher enough, y * 2^60 might overflow. The tests if
> -     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
> -     appropriate check to avoid the overflow exception generation.  */
> -  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
> -    return x + y;
> -
> -  if (x > 0x1p+500)
> -    {
> -      x *= 0x1p-600;
> -      y *= 0x1p-600;
> -      return sqrt (x * x + y * y) / 0x1p-600;
> -    }
> -  if (y < 0x1p-500)
> -    {
> -      if (y <= 0x0.fffffffffffffp-1022)
> -	{
> -	  x *= 0x1p+1022;
> -	  y *= 0x1p+1022;
> -	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
> -	  math_check_force_underflow_nonneg (ret);
> -	  return ret;
> -	}
> -      else
> -	{
> -	  x *= 0x1p+600;
> -	  y *= 0x1p+600;
> -	  return sqrt (x * x + y * y) / 0x1p+600;
> -	}
> -    }
> -  return sqrt (x * x + y * y);
> -}
> -#ifndef __ieee754_hypot
> -libm_alias_finite (__ieee754_hypot, __hypot)
> -#endif
> diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
> deleted file mode 100644
> index fa201dda51..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypotf.c
> +++ /dev/null
> @@ -1,78 +0,0 @@
> -/* Pythagorean addition using floats
> -   Copyright (C) 2011-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Library General Public License as
> -   published by the Free Software Foundation; either version 2 of the
> -   License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Library General Public License for more details.
> -
> -   You should have received a copy of the GNU Library General Public
> -   License along with the GNU C Library; see the file COPYING.LIB.  If
> -   not, see <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypotf(x,y)
> -
> -   This a FP only version without any FP->INT conversion.
> -   It is similar to default C version, making appropriates
> -   overflow and underflows checks as using double precision
> -   instead of scaling.  */
> -
> -#ifdef _ARCH_PWR7
> -/* POWER7 isinf and isnan optimizations are fast. */
> -# define TEST_INF_NAN(x, y)                                      \
> -   if ((isinff(x) || isinff(y))					 \
> -       && !issignaling (x) && !issignaling (y))			 \
> -     return INFINITY;                                            \
> -   if (isnanf(x) || isnanf(y))                                   \
> -     return x + y;
> -# else
> -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
> - * costly (especially for POWER6). */
> -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
> - do {                                                            \
> -   ieee_float_shape_type gf_u1;                                  \
> -   ieee_float_shape_type gf_u2;                                  \
> -   gf_u1.value = (f1);                                           \
> -   gf_u2.value = (f2);                                           \
> -   (i1) = gf_u1.word & 0x7fffffff;                               \
> -   (i2) = gf_u2.word & 0x7fffffff;                               \
> - } while (0)
> -
> -# define TEST_INF_NAN(x, y)                                      \
> - do {                                                            \
> -   uint32_t hx, hy;                                              \
> -   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
> -   if (hy > hx) {                                                \
> -     uint32_t ht = hx; hx = hy; hy = ht;                         \
> -   }                                                             \
> -   if (hx >= 0x7f800000) {                                       \
> -     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
> -	 && !issignaling (x) && !issignaling (y))		 \
> -       return INFINITY;                                          \
> -     return x + y;						 \
> -   }                                                             \
> - } while (0)
> -#endif
> -
> -
> -float
> -__ieee754_hypotf (float x, float y)
> -{
> -  TEST_INF_NAN (x, y);
> -
> -  return sqrt ((double) x * x + (double) y * y);
> -}
> -#ifndef __ieee754_hypotf
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
> -#endif
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> index 60f2c95532..1de0f9b350 100644
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
>  			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
>  			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
>  			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
> -			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
> -			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
> +			s_logbf-power7 s_logbf-ppc32
>  
>  CFLAGS-s_llrintf-power6.c += -mcpu=power6
>  CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
> @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
>  CFLAGS-s_logbl-power7.c = -mcpu=power7
>  CFLAGS-s_logb-power7.c = -mcpu=power7
>  CFLAGS-s_logbf-power7.c = -mcpu=power7
> -CFLAGS-e_hypot-power7.c = -mcpu=power7
> -CFLAGS-e_hypotf-power7.c = -mcpu=power7
>  
>  # These files quiet sNaNs in a way that is optimized away without
>  # -fsignaling-nans.
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> deleted file mode 100644
> index 382b4a0b27..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() POWER7 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> deleted file mode 100644
> index abb14d5469..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_ppc32
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> deleted file mode 100644
> index a16efa350c..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypot.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypot,
> -	    (hwcap & PPC_FEATURE_ARCH_2_06)
> -	    ? __ieee754_hypot_power7
> -            : __ieee754_hypot_ppc32);
> -
> -libm_alias_finite (__ieee754_hypot, __hypot)
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> deleted file mode 100644
> index f8a26ff22f..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee754_hypot POWER7 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> deleted file mode 100644
> index b13f8c9db2..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_ppc32
> -
> -#include <sysdeps/ieee754/flt-32/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> deleted file mode 100644
> index 1e72605db8..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypotf.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypotf,
> -	    (hwcap & PPC_FEATURE_ARCH_2_06)
> -	    ? __ieee754_hypotf_power7
> -            : __ieee754_hypotf_ppc32);
> -
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
  2021-12-03  0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
  2021-12-06 11:52   ` Wilco Dijkstra
@ 2021-12-06 21:11   ` Joseph Myers
  2021-12-07 13:21     ` Adhemerval Zanella
  1 sibling, 1 reply; 36+ messages in thread
From: Joseph Myers @ 2021-12-06 21:11 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Paul Zimmermann, Wilco Dijkstra

On Thu, 2 Dec 2021, Adhemerval Zanella via Libc-alpha wrote:

> +#if __HAVE_FLOAT128
> +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
> +# define BUILTIN_F128     , _Float128 :__builtin_fmaxf128
> +#else
> +# define USE_BUILTIN_F128
> +# define BUILTIN_F128
> +#endif
> +
> +#define USE_BUILTIN(X, Y)                   \
> +  _Generic((X),                             \
> +	   float       : USE_FMAXF_BUILTIN, \
> +	   double      : USE_FMAX_BUILTIN,  \
> +	   long double : USE_FMAXL_BUILTIN  \
> +	   USE_BUILTIN_F128)
> +
> +#define BUILTIN(X, Y)                       \
> +  _Generic((X),                             \
> +	   float       : __builtin_fmaxf,   \
> +	   double      : __builtin_fmax,    \
> +	   long double : __builtin_fmaxl    \
> +	   BUILTIN_F128)                    \
> +  (X, Y)

You should not hardcode cases for different types in type-generic 
templates.  This would fail if we add support for _Float16 functions in 
future, for example.

Use M_SUF to call built-in functions (possibly via defining M_FMAX in 
math-type-macros.h, similar to the other M_* macros there.

Do something similar for USE_*_BUILTIN (I suggest defining M_USE_BUILTIN 
in the per-type headers used by math-type-macros.h, so you can then call 
M_USE_BUILTIN (FMAX).

You'll probably need to use #if with the M_USE_BUILTIN call (which should 
be OK after the above changes), because GCC only added __builtin_fmaxfN 
functions in commit ee5fd23a481f510528e00f4c988ed0e6a71218c2 (GCC 8 and 
later) but older GCC versions are supported for building for some 
architectures with _Float128 support, and if __builtin_fmaxf128 gets 
referenced inside if (0) with older GCC it would be an undeclared 
identifier.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
  2021-12-06 17:12   ` Adhemerval Zanella
@ 2021-12-06 21:29     ` Paul A. Clarke
  2021-12-07 13:19       ` Adhemerval Zanella
  0 siblings, 1 reply; 36+ messages in thread
From: Paul A. Clarke @ 2021-12-06 21:29 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Tulio Magno Quites Machado Filho

On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
> On 02/12/2021 21:00, Adhemerval Zanella wrote:
> > The generic implementation is shows only slight worse performance:
> > 
> > POWER9     reciprocal-throughput    latency
> > master                   13.4024    14.0967
> > new hypot                14.8479    15.8061
> > 
> > POWER8     reciprocal-throughput    latency
> > master                   15.5767    16.8885
> > new hypot                16.5371    18.4057
> > 
> > One way to improve might to make gcc generate xsmaxdp/xsmindp for
> > fmax/fmin (it onl does for -ffast-math, clang does for default
> > options).
> > 
> > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> > (power9).
> 
> Hi Tulio/Paul,
> 
> This the only missing patch in this set and I would like to check with you,
> powerpc maintainers, that if it would be ok to push it.  The resulting
> performance difference, including the latest one that removes the wrappers,
> is slight better:
> 
>          
> POWER9           reciprocal-throughput     latency
> master                          13.4024    14.0967
> new hypot                       11.9206    13.9871
> 
> POWER8            reciprocal-throughput    latency
> master                          15.5767    16.8885
> new hypot                       15.3541    18.0856

For Power10 / master:

  "hypot": {
   "workload-random": {
    "duration": 5.28242e+08,
    "iterations": 4.8e+07,
    "reciprocal-throughput": 8.28478,
    "latency": 13.7253,
    "max-throughput": 1.20703e+08,
    "min-throughput": 7.2858e+07
   }
  }

For Power10 / new hypot:

  "hypot": {
   "workload-random": {
    "duration": 5.30731e+08,
    "iterations": 5.2e+07,
    "reciprocal-throughput": 7.21945,
    "latency": 13.1933,
    "max-throughput": 1.38515e+08,
    "min-throughput": 7.57963e+07
   }
  }

> The POWER8 çatency difference seems to be due branch misprediction 
> in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
> 
> POWER8            reciprocal-throughput    latency
> xsmaxdp/xsmindp                 12.8959    16.2082
> 
> POWER9 is not affected (I don't see any performance difference by
> using xsmaxdp/xsmindp).
> 
> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
> for some reason, clang use them on default -O2 option.

I believe clang may be wrong here, in that the instructions do not
properly handle NaN for the fmin/fmax semantics without -ffast-math.

PC

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
  2021-12-06 21:29     ` Paul A. Clarke
@ 2021-12-07 13:19       ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 13:19 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: libc-alpha, Tulio Magno Quites Machado Filho



On 06/12/2021 18:29, Paul A. Clarke wrote:
> On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
>> On 02/12/2021 21:00, Adhemerval Zanella wrote:
>>> The generic implementation is shows only slight worse performance:
>>>
>>> POWER9     reciprocal-throughput    latency
>>> master                   13.4024    14.0967
>>> new hypot                14.8479    15.8061
>>>
>>> POWER8     reciprocal-throughput    latency
>>> master                   15.5767    16.8885
>>> new hypot                16.5371    18.4057
>>>
>>> One way to improve might to make gcc generate xsmaxdp/xsmindp for
>>> fmax/fmin (it onl does for -ffast-math, clang does for default
>>> options).
>>>
>>> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
>>> (power9).
>>
>> Hi Tulio/Paul,
>>
>> This the only missing patch in this set and I would like to check with you,
>> powerpc maintainers, that if it would be ok to push it.  The resulting
>> performance difference, including the latest one that removes the wrappers,
>> is slight better:
>>
>>          
>> POWER9           reciprocal-throughput     latency
>> master                          13.4024    14.0967
>> new hypot                       11.9206    13.9871
>>
>> POWER8            reciprocal-throughput    latency
>> master                          15.5767    16.8885
>> new hypot                       15.3541    18.0856
> 
> For Power10 / master:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.28242e+08,
>     "iterations": 4.8e+07,
>     "reciprocal-throughput": 8.28478,
>     "latency": 13.7253,
>     "max-throughput": 1.20703e+08,
>     "min-throughput": 7.2858e+07
>    }
>   }
> 
> For Power10 / new hypot:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.30731e+08,
>     "iterations": 5.2e+07,
>     "reciprocal-throughput": 7.21945,
>     "latency": 13.1933,
>     "max-throughput": 1.38515e+08,
>     "min-throughput": 7.57963e+07
>    }
>   }

So I assume it would be ok to push this patch based on the power10
performance numbers, right?

> 
>> The POWER8 çatency difference seems to be due branch misprediction 
>> in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
>> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
>>
>> POWER8            reciprocal-throughput    latency
>> xsmaxdp/xsmindp                 12.8959    16.2082
>>
>> POWER9 is not affected (I don't see any performance difference by
>> using xsmaxdp/xsmindp).
>>
>> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
>> for some reason, clang use them on default -O2 option.
> 
> I believe clang may be wrong here, in that the instructions do not
> properly handle NaN for the fmin/fmax semantics without -ffast-math.

At least on power8/power9 I don't see any test-double-{fmax,fmin} 
regressions if I use xsmaxdp/xsmindp instead of default implementation.
And libm-test-{fmax,fmin}.inc does extensively tests both default
and signaling NaNs.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
  2021-12-06 21:11   ` Joseph Myers
@ 2021-12-07 13:21     ` Adhemerval Zanella
  0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 13:21 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha, Paul Zimmermann, Wilco Dijkstra



On 06/12/2021 18:11, Joseph Myers wrote:
> On Thu, 2 Dec 2021, Adhemerval Zanella via Libc-alpha wrote:
> 
>> +#if __HAVE_FLOAT128
>> +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
>> +# define BUILTIN_F128     , _Float128 :__builtin_fmaxf128
>> +#else
>> +# define USE_BUILTIN_F128
>> +# define BUILTIN_F128
>> +#endif
>> +
>> +#define USE_BUILTIN(X, Y)                   \
>> +  _Generic((X),                             \
>> +	   float       : USE_FMAXF_BUILTIN, \
>> +	   double      : USE_FMAX_BUILTIN,  \
>> +	   long double : USE_FMAXL_BUILTIN  \
>> +	   USE_BUILTIN_F128)
>> +
>> +#define BUILTIN(X, Y)                       \
>> +  _Generic((X),                             \
>> +	   float       : __builtin_fmaxf,   \
>> +	   double      : __builtin_fmax,    \
>> +	   long double : __builtin_fmaxl    \
>> +	   BUILTIN_F128)                    \
>> +  (X, Y)
> 
> You should not hardcode cases for different types in type-generic 
> templates.  This would fail if we add support for _Float16 functions in 
> future, for example.
> 
> Use M_SUF to call built-in functions (possibly via defining M_FMAX in 
> math-type-macros.h, similar to the other M_* macros there.
> 
> Do something similar for USE_*_BUILTIN (I suggest defining M_USE_BUILTIN 
> in the per-type headers used by math-type-macros.h, so you can then call 
> M_USE_BUILTIN (FMAX).

Indeed it is cleaner, I don't see the need of just M_FMAX.

> 
> You'll probably need to use #if with the M_USE_BUILTIN call (which should 
> be OK after the above changes), because GCC only added __builtin_fmaxfN 
> functions in commit ee5fd23a481f510528e00f4c988ed0e6a71218c2 (GCC 8 and 
> later) but older GCC versions are supported for building for some 
> architectures with _Float128 support, and if __builtin_fmaxf128 gets 
> referenced inside if (0) with older GCC it would be an undeclared 
> identifier.
>

Right, but currently we use declare_mgen_alias for fmax/fmin, so I am not
sure it we really need to take care if __builtin_fmaxfN support.

For _Float128 I will need to check if this does incur in any issue, I 
am currently building some older gcc toolchain to check it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-12-07 13:21 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03  0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
2021-12-03 13:23   ` Wilco Dijkstra
2021-12-03 19:44     ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
2021-12-03 13:41   ` Wilco Dijkstra
2021-12-03 19:44     ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
2021-12-03 13:44   ` Wilco Dijkstra
2021-12-03  0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
2021-12-06 12:00   ` Wilco Dijkstra
2021-12-06 12:21     ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
2021-12-06 11:58   ` Wilco Dijkstra
2021-12-06 12:22     ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
2021-12-06 17:12   ` Adhemerval Zanella
2021-12-06 21:29     ` Paul A. Clarke
2021-12-07 13:19       ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
2021-12-03 14:51   ` Wilco Dijkstra
2021-12-06 12:26     ` Adhemerval Zanella
2021-12-03  0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
2021-12-06 11:52   ` Wilco Dijkstra
2021-12-06 21:11   ` Joseph Myers
2021-12-07 13:21     ` Adhemerval Zanella
2021-12-03  0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
2021-12-06 11:50   ` Wilco Dijkstra
2021-12-03  0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
2021-12-06 11:46   ` Wilco Dijkstra
2021-12-06 12:35     ` Adhemerval Zanella
2021-12-03  0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
2021-12-06 11:44   ` Wilco Dijkstra
2021-12-03  0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
2021-12-03  8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
2021-12-06 12:36   ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).