* [PATCH v4 01/12] math: Simplify hypotf implementation
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-03 13:23 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
` (11 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by 7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
---
sysdeps/ieee754/flt-32/e_hypotf.c | 64 +++++++++++++---------------
sysdeps/ieee754/flt-32/math_config.h | 9 ++++
2 files changed, 38 insertions(+), 35 deletions(-)
diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index e770947dc1..1d082fe36c 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -1,46 +1,40 @@
-/* e_hypotf.c -- float version of e_hypot.c.
- */
+/* Euclidean distance function. Float/Binary32 version.
+ Copyright (C) 2012-2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <libm-alias-finite.h>
#include <math.h>
+#include "math_config.h"
+#include <math-narrow-eval.h>
#include <math_private.h>
-#include <libm-alias-finite.h>
float
-__ieee754_hypotf(float x, float y)
+__ieee754_hypotf (float x, float y)
{
- double d_x, d_y;
- int32_t ha, hb;
-
- GET_FLOAT_WORD(ha,x);
- ha &= 0x7fffffff;
- GET_FLOAT_WORD(hb,y);
- hb &= 0x7fffffff;
- if (ha == 0x7f800000 && !issignaling (y))
- return fabsf(x);
- else if (hb == 0x7f800000 && !issignaling (x))
- return fabsf(y);
- else if (ha > 0x7f800000 || hb > 0x7f800000)
- return fabsf(x) * fabsf(y);
- else if (ha == 0)
- return fabsf(y);
- else if (hb == 0)
- return fabsf(x);
-
- d_x = (double) x;
- d_y = (double) y;
+ if (!isfinite(x) || !isfinite(y))
+ {
+ if ((isinf (x) || isinf (y))
+ && !issignalingf_inline (x) && !issignalingf_inline (y))
+ return INFINITY;
+ return x + y;
+ }
- return (float) sqrt(d_x * d_x + d_y * d_y);
+ return math_narrow_eval (sqrt ((double) x * (double) x
+ + (double) y * (double) y));
}
#ifndef __ieee754_hypotf
libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index 513454a297..daa8a82f99 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -101,6 +101,15 @@ asdouble (uint64_t i)
return u.f;
}
+static inline int
+issignalingf_inline (float x)
+{
+ uint32_t ix = asuint (x);
+ if (HIGH_ORDER_BIT_IS_SET_FOR_SNAN)
+ return (ix & 0x7fc00000) == 0x7fc00000;
+ return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL;
+}
+
#define NOINLINE __attribute__ ((noinline))
attribute_hidden float __math_oflowf (uint32_t);
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 01/12] math: Simplify hypotf implementation
2021-12-03 0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
@ 2021-12-03 13:23 ` Wilco Dijkstra
2021-12-03 19:44 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 13:23 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
- return (float) sqrt(d_x * d_x + d_y * d_y);
+ return math_narrow_eval (sqrt ((double) x * (double) x
+ + (double) y * (double) y));
This still requires a cast to float after the sqrt, otherwise math_narrow_eval
has no effect. OK with that change.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 01/12] math: Simplify hypotf implementation
2021-12-03 13:23 ` Wilco Dijkstra
@ 2021-12-03 19:44 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 19:44 UTC (permalink / raw)
To: Wilco Dijkstra, libc-alpha, Paul Zimmermann
On 03/12/2021 10:23, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
> - return (float) sqrt(d_x * d_x + d_y * d_y);
> + return math_narrow_eval (sqrt ((double) x * (double) x
> + + (double) y * (double) y));
>
> This still requires a cast to float after the sqrt, otherwise math_narrow_eval
> has no effect. OK with that change.
Ack.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-03 13:41 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
` (10 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Before:
"hypot": {
"workload-random": {
"duration": 3.73319e+09,
"iterations": 1.12e+08,
"reciprocal-throughput": 22.8737,
"latency": 43.7904,
"max-throughput": 4.37184e+07,
"min-throughput": 2.28361e+07
}
}
After:
"hypot": {
"workload-random": {
"duration": 3.7597e+09,
"iterations": 9.8e+07,
"reciprocal-throughput": 23.7547,
"latency": 52.9739,
"max-throughput": 4.2097e+07,
"min-throughput": 1.88772e+07
}
}
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
---
sysdeps/ieee754/dbl-64/e_hypot.c | 235 ++++++++++++-------------------
1 file changed, 92 insertions(+), 143 deletions(-)
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 9ec4c1ced0..274b14b57e 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -1,164 +1,113 @@
-/* @(#)e_hypot.c 5.1 93/09/24 */
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypot(x,y)
- *
- * Method :
- * If (assume round-to-nearest) z=x*x+y*y
- * has error less than sqrt(2)/2 ulp, than
- * sqrt(z) has error less than 1 ulp (exercise).
- *
- * So, compute sqrt(x*x+y*y) with some care as
- * follows to get the error below 1 ulp:
- *
- * Assume x>y>0;
- * (if possible, set rounding to round-to-nearest)
- * 1. if x > 2y use
- * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- * where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- * 2. if x <= 2y use
- * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- * y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- * NOTE: scaling may be necessary if some argument is too
- * large or too tiny
- *
- * Special cases:
- * hypot(x,y) is INF if x or y is +INF or -INF; else
- * hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- * hypot(x,y) returns sqrt(x^2+y^2) with error less
- * than 1 ulps (units in the last place)
- */
+/* Euclidean distance function. Double/Binary64 version.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* The implementation uses a correction based on 'An Improved Algorithm for
+ hypot(a,b)' by Carlos F. Borges [1] usingthe MyHypot3 with the following
+ changes:
+
+ - Handle qNaN and sNaN.
+ - Tune the 'widely varying operands' to avoid spurious underflow
+ due the multiplication and fix the return value for upwards
+ rounding mode.
+ - Handle required underflow exception for subnormal results.
+
+ The expected ULP is ~0.792.
+
+ [1] https://arxiv.org/pdf/1904.09481.pdf */
#include <math.h>
#include <math_private.h>
#include <math-underflow.h>
+#include <math-narrow-eval.h>
#include <libm-alias-finite.h>
+#include "math_config.h"
-double
-__ieee754_hypot (double x, double y)
+#define SCALE 0x1p-600
+#define LARGE_VAL 0x1p+511
+#define TINY_VAL 0x1p-459
+#define EPS 0x1p-54
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+ and squaring ax, ay and (ax - ay) does not overflow or underflow. */
+static inline double
+kernel (double ax, double ay)
{
- double a, b, t1, t2, y1, y2, w;
- int32_t j, k, ha, hb;
-
- GET_HIGH_WORD (ha, x);
- ha &= 0x7fffffff;
- GET_HIGH_WORD (hb, y);
- hb &= 0x7fffffff;
- if (hb > ha)
+ double t1, t2;
+ double h = sqrt (ax * ax + ay * ay);
+ if (h <= 2.0 * ay)
{
- a = y; b = x; j = ha; ha = hb; hb = j;
+ double delta = h - ay;
+ t1 = ax * (2.0 * delta - ax);
+ t2 = (delta - 2.0 * (ax - ay)) * delta;
}
else
{
- a = x; b = y;
- }
- SET_HIGH_WORD (a, ha); /* a <- |a| */
- SET_HIGH_WORD (b, hb); /* b <- |b| */
- if ((ha - hb) > 0x3c00000)
- {
- return a + b;
- } /* x/y > 2**60 */
- k = 0;
- if (__glibc_unlikely (ha > 0x5f300000)) /* a>2**500 */
- {
- if (ha >= 0x7ff00000) /* Inf or NaN */
- {
- uint32_t low;
- w = a + b; /* for sNaN */
- if (issignaling (a) || issignaling (b))
- return w;
- GET_LOW_WORD (low, a);
- if (((ha & 0xfffff) | low) == 0)
- w = a;
- GET_LOW_WORD (low, b);
- if (((hb ^ 0x7ff00000) | low) == 0)
- w = b;
- return w;
- }
- /* scale a and b by 2**-600 */
- ha -= 0x25800000; hb -= 0x25800000; k += 600;
- SET_HIGH_WORD (a, ha);
- SET_HIGH_WORD (b, hb);
- }
- if (__builtin_expect (hb < 0x23d00000, 0)) /* b < 2**-450 */
- {
- if (hb <= 0x000fffff) /* subnormal b or 0 */
- {
- uint32_t low;
- GET_LOW_WORD (low, b);
- if ((hb | low) == 0)
- return a;
- t1 = 0;
- SET_HIGH_WORD (t1, 0x7fd00000); /* t1=2^1022 */
- b *= t1;
- a *= t1;
- k -= 1022;
- GET_HIGH_WORD (ha, a);
- GET_HIGH_WORD (hb, b);
- if (hb > ha)
- {
- t1 = a;
- a = b;
- b = t1;
- j = ha;
- ha = hb;
- hb = j;
- }
- }
- else /* scale a and b by 2^600 */
- {
- ha += 0x25800000; /* a *= 2^600 */
- hb += 0x25800000; /* b *= 2^600 */
- k -= 600;
- SET_HIGH_WORD (a, ha);
- SET_HIGH_WORD (b, hb);
- }
+ double delta = h - ax;
+ t1 = 2.0 * delta * (ax - 2.0 * ay);
+ t2 = (4.0 * delta - ay) * ay + delta * delta;
}
- /* medium size a and b */
- w = a - b;
- if (w > b)
+
+ h -= (t1 + t2) / (2.0 * h);
+ return h;
+}
+
+double
+__ieee754_hypot (double x, double y)
+{
+ if (!isfinite(x) || !isfinite(y))
{
- t1 = 0;
- SET_HIGH_WORD (t1, ha);
- t2 = a - t1;
- w = sqrt (t1 * t1 - (b * (-b) - t2 * (a + t1)));
+ if ((isinf (x) || isinf (y))
+ && !issignaling_inline (x) && !issignaling_inline (y))
+ return INFINITY;
+ return x + y;
}
- else
+
+ x = fabs (x);
+ y = fabs (y);
+
+ double ax = x < y ? y : x;
+ double ay = x < y ? x : y;
+
+ /* If ax is huge, scale both inputs down. */
+ if (__glibc_unlikely (ax > LARGE_VAL))
{
- a = a + a;
- y1 = 0;
- SET_HIGH_WORD (y1, hb);
- y2 = b - y1;
- t1 = 0;
- SET_HIGH_WORD (t1, ha + 0x00100000);
- t2 = a - t1;
- w = sqrt (t1 * y1 - (w * (-w) - (t1 * y2 + t2 * b)));
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax * SCALE, ay * SCALE) / SCALE;
}
- if (k != 0)
+
+ /* If ay is tiny, scale both inputs up. */
+ if (__glibc_unlikely (ay < TINY_VAL))
{
- uint32_t high;
- t1 = 1.0;
- GET_HIGH_WORD (high, t1);
- SET_HIGH_WORD (t1, high + (k << 20));
- w *= t1;
- math_check_force_underflow_nonneg (w);
- return w;
+ if (__glibc_unlikely (ax >= ay / EPS))
+ return math_narrow_eval (ax + ay);
+
+ ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE);
+ math_check_force_underflow_nonneg (ax);
+ return ax;
}
- else
- return w;
+
+ /* Common case: ax is not huge and ay is not tiny. */
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax, ay);
}
#ifndef __ieee754_hypot
libm_alias_finite (__ieee754_hypot, __hypot)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
2021-12-03 0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
@ 2021-12-03 13:41 ` Wilco Dijkstra
2021-12-03 19:44 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-03 13:41 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemeval,
+ /* If ax is huge, scale both inputs down. */
+ if (__glibc_unlikely (ax > LARGE_VAL))
{
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax * SCALE, ay * SCALE) / SCALE;
}
Both of these returns can overflow and thus should have a math_narrow_eval
(only the case where we can't overflow or underflow doesn't). OK with that change.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64)
2021-12-03 13:41 ` Wilco Dijkstra
@ 2021-12-03 19:44 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 19:44 UTC (permalink / raw)
To: Wilco Dijkstra, libc-alpha, Paul Zimmermann
On 03/12/2021 10:41, Wilco Dijkstra wrote:
> Hi Adhemeval,
>
> + /* If ax is huge, scale both inputs down. */
> + if (__glibc_unlikely (ax > LARGE_VAL))
> {
> + if (__glibc_unlikely (ay <= ax * EPS))
> + return ax + ay;
> +
> + return kernel (ax * SCALE, ay * SCALE) / SCALE;
> }
>
> Both of these returns can overflow and thus should have a math_narrow_eval
> (only the case where we can't overflow or underflow doesn't). OK with that change.
Ack.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 03/12] math: Improve hypot performance with FMA
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 01/12] math: Simplify hypotf implementation Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-03 13:44 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
` (9 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
---
sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 274b14b57e..f53061badc 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
rounding mode.
- Handle required underflow exception for subnormal results.
- The expected ULP is ~0.792.
+ The expected ULP is ~0.792 or ~0.948 if FMA is used. For FMA, the
+ correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+ if x^2 + y^2 is computed with less than 0.707 ULP error. If |x| >= |2y|,
+ fma (x, x, y^2) has ~0.625 ULP. If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+ has ~0.625 ULP.
[1] https://arxiv.org/pdf/1904.09481.pdf */
@@ -48,6 +52,16 @@ static inline double
kernel (double ax, double ay)
{
double t1, t2;
+#ifdef __FP_FAST_FMA
+ t1 = ay + ay;
+ t2 = ax - ay;
+
+ if (t1 >= ax)
+ return sqrt (fma (t1, ax, t2 * t2));
+ else
+ return sqrt (fma (ax, ax, ay * ay));
+
+#else
double h = sqrt (ax * ax + ay * ay);
if (h <= 2.0 * ay)
{
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
h -= (t1 + t2) / (2.0 * h);
return h;
+#endif
}
double
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (2 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-06 12:00 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
` (8 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
[1] https://arxiv.org/pdf/1904.09481.pdf
---
sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 ++++++++++++-----------------
1 file changed, 98 insertions(+), 133 deletions(-)
diff --git a/sysdeps/ieee754/ldbl-96/e_hypotl.c b/sysdeps/ieee754/ldbl-96/e_hypotl.c
index 44e72353c0..b32fad757f 100644
--- a/sysdeps/ieee754/ldbl-96/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-96/e_hypotl.c
@@ -1,142 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- * If (assume round-to-nearest) z=x*x+y*y
- * has error less than sqrt(2)/2 ulp, than
- * sqrt(z) has error less than 1 ulp (exercise).
- *
- * So, compute sqrt(x*x+y*y) with some care as
- * follows to get the error below 1 ulp:
- *
- * Assume x>y>0;
- * (if possible, set rounding to round-to-nearest)
- * 1. if x > 2y use
- * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- * where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- * 2. if x <= 2y use
- * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- * y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- * NOTE: scaling may be necessary if some argument is too
- * large or too tiny
- *
- * Special cases:
- * hypot(x,y) is INF if x or y is +INF or -INF; else
- * hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- * hypot(x,y) returns sqrt(x^2+y^2) with error less
- * than 1 ulps (units in the last place)
- */
+/* Euclidean distance function. Long Double/Binary96 version.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+ Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+ - Handle qNaN and sNaN.
+ - Tune the 'widely varying operands' to avoid spurious underflow
+ due the multiplication and fix the return value for upwards
+ rounding mode.
+ - Handle required underflow exception for subnormal results.
+
+ [1] https://arxiv.org/pdf/1904.09481.pdf */
#include <math.h>
#include <math_private.h>
#include <math-underflow.h>
#include <libm-alias-finite.h>
-long double __ieee754_hypotl(long double x, long double y)
+#define SCALE 0x8p-8257L
+#define LARGE_VAL 0xb.504f333f9de6484p+8188L
+#define TINY_VAL 0x8p-8194L
+#define EPS 0x8p-68L
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+ and squaring ax, ay and (ax - ay) does not overflow or underflow. */
+static inline long double
+kernel (long double ax, long double ay)
{
- long double a,b,t1,t2,y1,y2,w;
- uint32_t j,k,ea,eb;
-
- GET_LDOUBLE_EXP(ea,x);
- ea &= 0x7fff;
- GET_LDOUBLE_EXP(eb,y);
- eb &= 0x7fff;
- if(eb > ea) {a=y;b=x;j=ea; ea=eb;eb=j;} else {a=x;b=y;}
- SET_LDOUBLE_EXP(a,ea); /* a <- |a| */
- SET_LDOUBLE_EXP(b,eb); /* b <- |b| */
- if((ea-eb)>0x46) {return a+b;} /* x/y > 2**70 */
- k=0;
- if(__builtin_expect(ea > 0x5f3f,0)) { /* a>2**8000 */
- if(ea == 0x7fff) { /* Inf or NaN */
- uint32_t exp __attribute__ ((unused));
- uint32_t high,low;
- w = a+b; /* for sNaN */
- if (issignaling (a) || issignaling (b))
- return w;
- GET_LDOUBLE_WORDS(exp,high,low,a);
- if(((high&0x7fffffff)|low)==0) w = a;
- GET_LDOUBLE_WORDS(exp,high,low,b);
- if(((eb^0x7fff)|(high&0x7fffffff)|low)==0) w = b;
- return w;
- }
- /* scale a and b by 2**-9600 */
- ea -= 0x2580; eb -= 0x2580; k += 9600;
- SET_LDOUBLE_EXP(a,ea);
- SET_LDOUBLE_EXP(b,eb);
- }
- if(__builtin_expect(eb < 0x20bf, 0)) { /* b < 2**-8000 */
- if(eb == 0) { /* subnormal b or 0 */
- uint32_t exp __attribute__ ((unused));
- uint32_t high,low;
- GET_LDOUBLE_WORDS(exp,high,low,b);
- if((high|low)==0) return a;
- SET_LDOUBLE_WORDS(t1, 0x7ffd, 0x80000000, 0); /* t1=2^16382 */
- b *= t1;
- a *= t1;
- k -= 16382;
- GET_LDOUBLE_EXP (ea, a);
- GET_LDOUBLE_EXP (eb, b);
- if (eb > ea)
- {
- t1 = a;
- a = b;
- b = t1;
- j = ea;
- ea = eb;
- eb = j;
- }
- } else { /* scale a and b by 2^9600 */
- ea += 0x2580; /* a *= 2^9600 */
- eb += 0x2580; /* b *= 2^9600 */
- k -= 9600;
- SET_LDOUBLE_EXP(a,ea);
- SET_LDOUBLE_EXP(b,eb);
- }
- }
- /* medium size a and b */
- w = a-b;
- if (w>b) {
- uint32_t high;
- GET_LDOUBLE_MSW(high,a);
- SET_LDOUBLE_WORDS(t1,ea,high,0);
- t2 = a-t1;
- w = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
- } else {
- uint32_t high;
- GET_LDOUBLE_MSW(high,b);
- a = a+a;
- SET_LDOUBLE_WORDS(y1,eb,high,0);
- y2 = b - y1;
- GET_LDOUBLE_MSW(high,a);
- SET_LDOUBLE_WORDS(t1,ea+1,high,0);
- t2 = a - t1;
- w = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
- }
- if(k!=0) {
- uint32_t exp;
- t1 = 1.0;
- GET_LDOUBLE_EXP(exp,t1);
- SET_LDOUBLE_EXP(t1,exp+k);
- w *= t1;
- math_check_force_underflow_nonneg (w);
- return w;
- } else return w;
+ long double t1, t2;
+ long double h = sqrtl (ax * ax + ay * ay);
+ if (h <= 2.0L * ay)
+ {
+ long double delta = h - ay;
+ t1 = ax * (2.0L * delta - ax);
+ t2 = (delta - 2.0L * (ax - ay)) * delta;
+ }
+ else
+ {
+ long double delta = h - ax;
+ t1 = 2.0L * delta * (ax - 2.0L * ay);
+ t2 = (4.0L * delta - ay) * ay + delta * delta;
+ }
+
+ h -= (t1 + t2) / (2.0L * h);
+ return h;
+}
+
+long double
+__ieee754_hypotl (long double x, long double y)
+{
+ if (!isfinite(x) || !isfinite(y))
+ {
+ if ((isinf (x) || isinf (y))
+ && !issignaling (x) && !issignaling (y))
+ return INFINITY;
+ return x + y;
+ }
+
+ x = fabsl (x);
+ y = fabsl (y);
+
+ long double ax = x < y ? y : x;
+ long double ay = x < y ? x : y;
+
+ /* If ax is huge, scale both inputs down. */
+ if (__glibc_unlikely (ax > LARGE_VAL))
+ {
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax * SCALE, ay * SCALE) / SCALE;
+ }
+
+ /* If ay is tiny, scale both inputs up. */
+ if (__glibc_unlikely (ay < TINY_VAL))
+ {
+ if (__glibc_unlikely (ax >= ay / EPS))
+ return ax;
+
+ ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+ math_check_force_underflow_nonneg (ax);
+ return ax;
+ }
+
+ /* Common case: ax is not huge and ay is not tiny. */
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax, ay);
}
libm_alias_finite (__ieee754_hypotl, __hypotl)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
2021-12-03 0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
@ 2021-12-06 12:00 ` Wilco Dijkstra
2021-12-06 12:21 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 12:00 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
+ /* If ay is tiny, scale both inputs up. */
+ if (__glibc_unlikely (ay < TINY_VAL))
+ {
+ if (__glibc_unlikely (ax >= ay / EPS))
+ return ax;
Same here - return ax + ay.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96)
2021-12-06 12:00 ` Wilco Dijkstra
@ 2021-12-06 12:21 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:21 UTC (permalink / raw)
To: Wilco Dijkstra, libc-alpha, Paul Zimmermann
On 06/12/2021 09:00, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
>
> + /* If ay is tiny, scale both inputs up. */
> + if (__glibc_unlikely (ay < TINY_VAL))
> + {
> + if (__glibc_unlikely (ax >= ay / EPS))
> + return ax;
>
> Same here - return ax + ay.
Ack, I have fixed it.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (3 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-06 11:58 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
` (7 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
---
sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 ++++++++++++----------------
1 file changed, 96 insertions(+), 130 deletions(-)
diff --git a/sysdeps/ieee754/ldbl-128/e_hypotl.c b/sysdeps/ieee754/ldbl-128/e_hypotl.c
index cd4fdbc4a6..022fa9aaf7 100644
--- a/sysdeps/ieee754/ldbl-128/e_hypotl.c
+++ b/sysdeps/ieee754/ldbl-128/e_hypotl.c
@@ -1,141 +1,107 @@
-/* e_hypotl.c -- long double version of e_hypot.c.
- */
-
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
-/* __ieee754_hypotl(x,y)
- *
- * Method :
- * If (assume round-to-nearest) z=x*x+y*y
- * has error less than sqrtl(2)/2 ulp, than
- * sqrtl(z) has error less than 1 ulp (exercise).
- *
- * So, compute sqrtl(x*x+y*y) with some care as
- * follows to get the error below 1 ulp:
- *
- * Assume x>y>0;
- * (if possible, set rounding to round-to-nearest)
- * 1. if x > 2y use
- * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- * where x1 = x with lower 64 bits cleared, x2 = x-x1; else
- * 2. if x <= 2y use
- * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- * where t1 = 2x with lower 64 bits cleared, t2 = 2x-t1,
- * y1= y with lower 64 bits chopped, y2 = y-y1.
- *
- * NOTE: scaling may be necessary if some argument is too
- * large or too tiny
- *
- * Special cases:
- * hypotl(x,y) is INF if x or y is +INF or -INF; else
- * hypotl(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- * hypotl(x,y) returns sqrtl(x^2+y^2) with error less
- * than 1 ulps (units in the last place)
- */
+/* Euclidean distance function. Long Double/Binary128 version.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+ Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+ - Handle qNaN and sNaN.
+ - Tune the 'widely varying operands' to avoid spurious underflow
+ due the multiplication and fix the return value for upwards
+ rounding mode.
+ - Handle required underflow exception for subnormal results.
+
+ [1] https://arxiv.org/pdf/1904.09481.pdf */
#include <math.h>
#include <math_private.h>
#include <math-underflow.h>
#include <libm-alias-finite.h>
+#define SCALE L(0x1p-8303)
+#define LARGE_VAL L(0x1.6a09e667f3bcc908b2fb1366ea95p+8191)
+#define TINY_VAL L(0x1p-8191)
+#define EPS L(0x1p-114)
+
+/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
+ and squaring ax, ay and (ax - ay) does not overflow or underflow. */
+static inline _Float128
+kernel (_Float128 ax, _Float128 ay)
+{
+ _Float128 t1, t2;
+ _Float128 h = sqrtl (ax * ax + ay * ay);
+ if (h <= L(2.0) * ay)
+ {
+ _Float128 delta = h - ay;
+ t1 = ax * (L(2.0) * delta - ax);
+ t2 = (delta - L(2.0) * (ax - ay)) * delta;
+ }
+ else
+ {
+ _Float128 delta = h - ax;
+ t1 = L(2.0) * delta * (ax - L(2.0) * ay);
+ t2 = (L(4.0) * delta - ay) * ay + delta * delta;
+ }
+
+ h -= (t1 + t2) / (L(2.0) * h);
+ return h;
+}
+
_Float128
__ieee754_hypotl(_Float128 x, _Float128 y)
{
- _Float128 a,b,t1,t2,y1,y2,w;
- int64_t j,k,ha,hb;
-
- GET_LDOUBLE_MSW64(ha,x);
- ha &= 0x7fffffffffffffffLL;
- GET_LDOUBLE_MSW64(hb,y);
- hb &= 0x7fffffffffffffffLL;
- if(hb > ha) {a=y;b=x;j=ha; ha=hb;hb=j;} else {a=x;b=y;}
- SET_LDOUBLE_MSW64(a,ha); /* a <- |a| */
- SET_LDOUBLE_MSW64(b,hb); /* b <- |b| */
- if((ha-hb)>0x78000000000000LL) {return a+b;} /* x/y > 2**120 */
- k=0;
- if(ha > 0x5f3f000000000000LL) { /* a>2**8000 */
- if(ha >= 0x7fff000000000000LL) { /* Inf or NaN */
- uint64_t low;
- w = a+b; /* for sNaN */
- if (issignaling (a) || issignaling (b))
- return w;
- GET_LDOUBLE_LSW64(low,a);
- if(((ha&0xffffffffffffLL)|low)==0) w = a;
- GET_LDOUBLE_LSW64(low,b);
- if(((hb^0x7fff000000000000LL)|low)==0) w = b;
- return w;
- }
- /* scale a and b by 2**-9600 */
- ha -= 0x2580000000000000LL;
- hb -= 0x2580000000000000LL; k += 9600;
- SET_LDOUBLE_MSW64(a,ha);
- SET_LDOUBLE_MSW64(b,hb);
- }
- if(hb < 0x20bf000000000000LL) { /* b < 2**-8000 */
- if(hb <= 0x0000ffffffffffffLL) { /* subnormal b or 0 */
- uint64_t low;
- GET_LDOUBLE_LSW64(low,b);
- if((hb|low)==0) return a;
- t1=0;
- SET_LDOUBLE_MSW64(t1,0x7ffd000000000000LL); /* t1=2^16382 */
- b *= t1;
- a *= t1;
- k -= 16382;
- GET_LDOUBLE_MSW64 (ha, a);
- GET_LDOUBLE_MSW64 (hb, b);
- if (hb > ha)
- {
- t1 = a;
- a = b;
- b = t1;
- j = ha;
- ha = hb;
- hb = j;
- }
- } else { /* scale a and b by 2^9600 */
- ha += 0x2580000000000000LL; /* a *= 2^9600 */
- hb += 0x2580000000000000LL; /* b *= 2^9600 */
- k -= 9600;
- SET_LDOUBLE_MSW64(a,ha);
- SET_LDOUBLE_MSW64(b,hb);
- }
- }
- /* medium size a and b */
- w = a-b;
- if (w>b) {
- t1 = 0;
- SET_LDOUBLE_MSW64(t1,ha);
- t2 = a-t1;
- w = sqrtl(t1*t1-(b*(-b)-t2*(a+t1)));
- } else {
- a = a+a;
- y1 = 0;
- SET_LDOUBLE_MSW64(y1,hb);
- y2 = b - y1;
- t1 = 0;
- SET_LDOUBLE_MSW64(t1,ha+0x0001000000000000LL);
- t2 = a - t1;
- w = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b)));
- }
- if(k!=0) {
- uint64_t high;
- t1 = 1;
- GET_LDOUBLE_MSW64(high,t1);
- SET_LDOUBLE_MSW64(t1,high+(k<<48));
- w *= t1;
- math_check_force_underflow_nonneg (w);
- return w;
- } else return w;
+ if (!isfinite(x) || !isfinite(y))
+ {
+ if ((isinf (x) || isinf (y))
+ && !issignaling (x) && !issignaling (y))
+ return INFINITY;
+ return x + y;
+ }
+
+ x = fabsl (x);
+ y = fabsl (y);
+
+ _Float128 ax = x < y ? y : x;
+ _Float128 ay = x < y ? x : y;
+
+ /* If ax is huge, scale both inputs down. */
+ if (__glibc_unlikely (ax > LARGE_VAL))
+ {
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax * SCALE, ay * SCALE) / SCALE;
+ }
+
+ /* If ay is tiny, scale both inputs up. */
+ if (__glibc_unlikely (ay < TINY_VAL))
+ {
+ if (__glibc_unlikely (ax >= ay / EPS))
+ return ax;
+
+ ax = kernel (ax / SCALE, ay / SCALE) * SCALE;
+ math_check_force_underflow_nonneg (ax);
+ return ax;
+ }
+
+ /* Common case: ax is not huge and ay is not tiny. */
+ if (__glibc_unlikely (ay <= ax * EPS))
+ return ax + ay;
+
+ return kernel (ax, ay);
}
libm_alias_finite (__ieee754_hypotl, __hypotl)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
2021-12-03 0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
@ 2021-12-06 11:58 ` Wilco Dijkstra
2021-12-06 12:22 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:58 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
+ /* If ay is tiny, scale both inputs up. */
+ if (__glibc_unlikely (ay < TINY_VAL))
+ {
+ if (__glibc_unlikely (ax >= ay / EPS))
+ return ax;
Why not return ax + ay like in the other special cases? It's more accurate for rounding
upwards, and it's best to keep the code as similar as possible as the double version.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128)
2021-12-06 11:58 ` Wilco Dijkstra
@ 2021-12-06 12:22 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:22 UTC (permalink / raw)
To: Wilco Dijkstra, libc-alpha, Paul Zimmermann
On 06/12/2021 08:58, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
>
> + /* If ay is tiny, scale both inputs up. */
> + if (__glibc_unlikely (ay < TINY_VAL))
> + {
> + if (__glibc_unlikely (ax >= ay / EPS))
> + return ax;
>
> Why not return ax + ay like in the other special cases? It's more accurate for rounding
> upwards, and it's best to keep the code as similar as possible as the double version.
Ack, I have fix it.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 06/12] math: Remove powerpc e_hypot
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (4 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-06 17:12 ` Adhemerval Zanella
2021-12-03 0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
` (6 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
The generic implementation is shows only slight worse performance:
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 14.8479 15.8061
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 16.5371 18.4057
One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).
Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
---
sysdeps/powerpc/fpu/e_hypot.c | 87 -------------------
sysdeps/powerpc/fpu/e_hypotf.c | 78 -----------------
.../powerpc32/power4/fpu/multiarch/Makefile | 5 +-
.../power4/fpu/multiarch/e_hypot-power7.c | 23 -----
.../power4/fpu/multiarch/e_hypot-ppc32.c | 23 -----
.../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 -------
.../power4/fpu/multiarch/e_hypotf-power7.c | 23 -----
.../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 -----
.../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
9 files changed, 1 insertion(+), 327 deletions(-)
delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
deleted file mode 100644
index f96c589bbd..0000000000
--- a/sysdeps/powerpc/fpu/e_hypot.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/* Pythagorean addition using doubles
- Copyright (C) 2011-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Library General Public License as
- published by the Free Software Foundation; either version 2 of the
- License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Library General Public License for more details.
-
- You should have received a copy of the GNU Library General Public
- License along with the GNU C Library; see the file COPYING.LIB. If
- not, see <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <math_private.h>
-#include <math-underflow.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypot(x,y)
- *
- * This a FP only version without any FP->INT conversion.
- * It is similar to default C version, making appropriates
- * overflow and underflows checks as well scaling when it
- * is needed.
- */
-
-double
-__ieee754_hypot (double x, double y)
-{
- if ((isinf (x) || isinf (y))
- && !issignaling (x) && !issignaling (y))
- return INFINITY;
- if (isnan (x) || isnan (y))
- return x + y;
-
- x = fabs (x);
- y = fabs (y);
-
- if (y > x)
- {
- double t = x;
- x = y;
- y = t;
- }
- if (y == 0.0)
- return x;
-
- /* if y is higher enough, y * 2^60 might overflow. The tests if
- y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
- appropriate check to avoid the overflow exception generation. */
- if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
- return x + y;
-
- if (x > 0x1p+500)
- {
- x *= 0x1p-600;
- y *= 0x1p-600;
- return sqrt (x * x + y * y) / 0x1p-600;
- }
- if (y < 0x1p-500)
- {
- if (y <= 0x0.fffffffffffffp-1022)
- {
- x *= 0x1p+1022;
- y *= 0x1p+1022;
- double ret = sqrt (x * x + y * y) / 0x1p+1022;
- math_check_force_underflow_nonneg (ret);
- return ret;
- }
- else
- {
- x *= 0x1p+600;
- y *= 0x1p+600;
- return sqrt (x * x + y * y) / 0x1p+600;
- }
- }
- return sqrt (x * x + y * y);
-}
-#ifndef __ieee754_hypot
-libm_alias_finite (__ieee754_hypot, __hypot)
-#endif
diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
deleted file mode 100644
index fa201dda51..0000000000
--- a/sysdeps/powerpc/fpu/e_hypotf.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/* Pythagorean addition using floats
- Copyright (C) 2011-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Library General Public License as
- published by the Free Software Foundation; either version 2 of the
- License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Library General Public License for more details.
-
- You should have received a copy of the GNU Library General Public
- License along with the GNU C Library; see the file COPYING.LIB. If
- not, see <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <math_private.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypotf(x,y)
-
- This a FP only version without any FP->INT conversion.
- It is similar to default C version, making appropriates
- overflow and underflows checks as using double precision
- instead of scaling. */
-
-#ifdef _ARCH_PWR7
-/* POWER7 isinf and isnan optimizations are fast. */
-# define TEST_INF_NAN(x, y) \
- if ((isinff(x) || isinff(y)) \
- && !issignaling (x) && !issignaling (y)) \
- return INFINITY; \
- if (isnanf(x) || isnanf(y)) \
- return x + y;
-# else
-/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
- * costly (especially for POWER6). */
-# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2) \
- do { \
- ieee_float_shape_type gf_u1; \
- ieee_float_shape_type gf_u2; \
- gf_u1.value = (f1); \
- gf_u2.value = (f2); \
- (i1) = gf_u1.word & 0x7fffffff; \
- (i2) = gf_u2.word & 0x7fffffff; \
- } while (0)
-
-# define TEST_INF_NAN(x, y) \
- do { \
- uint32_t hx, hy; \
- GET_TWO_FLOAT_WORD(x, y, hx, hy); \
- if (hy > hx) { \
- uint32_t ht = hx; hx = hy; hy = ht; \
- } \
- if (hx >= 0x7f800000) { \
- if ((hx == 0x7f800000 || hy == 0x7f800000) \
- && !issignaling (x) && !issignaling (y)) \
- return INFINITY; \
- return x + y; \
- } \
- } while (0)
-#endif
-
-
-float
-__ieee754_hypotf (float x, float y)
-{
- TEST_INF_NAN (x, y);
-
- return sqrt ((double) x * x + (double) y * y);
-}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
index 60f2c95532..1de0f9b350 100644
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
@@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
- s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
- e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
+ s_logbf-power7 s_logbf-ppc32
CFLAGS-s_llrintf-power6.c += -mcpu=power6
CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
@@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
CFLAGS-s_logbl-power7.c = -mcpu=power7
CFLAGS-s_logb-power7.c = -mcpu=power7
CFLAGS-s_logbf-power7.c = -mcpu=power7
-CFLAGS-e_hypot-power7.c = -mcpu=power7
-CFLAGS-e_hypotf-power7.c = -mcpu=power7
# These files quiet sNaNs in a way that is optimized away without
# -fsignaling-nans.
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
deleted file mode 100644
index 382b4a0b27..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() POWER7 version.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_power7
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
deleted file mode 100644
index abb14d5469..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_ppc32
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
deleted file mode 100644
index a16efa350c..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypot.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypot,
- (hwcap & PPC_FEATURE_ARCH_2_06)
- ? __ieee754_hypot_power7
- : __ieee754_hypot_ppc32);
-
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
deleted file mode 100644
index f8a26ff22f..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee754_hypot POWER7 version.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_power7
-
-#include <sysdeps/powerpc/fpu/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
deleted file mode 100644
index b13f8c9db2..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* __ieee_hypot() PowerPC32 version.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_ppc32
-
-#include <sysdeps/ieee754/flt-32/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
deleted file mode 100644
index 1e72605db8..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Multiple versions of ieee754_hypotf.
- Copyright (C) 2013-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypotf,
- (hwcap & PPC_FEATURE_ARCH_2_06)
- ? __ieee754_hypotf_power7
- : __ieee754_hypotf_ppc32);
-
-libm_alias_finite (__ieee754_hypotf, __hypotf)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
2021-12-03 0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-06 17:12 ` Adhemerval Zanella
2021-12-06 21:29 ` Paul A. Clarke
0 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 17:12 UTC (permalink / raw)
To: libc-alpha, Tulio Magno Quites Machado Filho, Paul A. Clarke
On 02/12/2021 21:00, Adhemerval Zanella wrote:
> The generic implementation is shows only slight worse performance:
>
> POWER9 reciprocal-throughput latency
> master 13.4024 14.0967
> new hypot 14.8479 15.8061
>
> POWER8 reciprocal-throughput latency
> master 15.5767 16.8885
> new hypot 16.5371 18.4057
>
> One way to improve might to make gcc generate xsmaxdp/xsmindp for
> fmax/fmin (it onl does for -ffast-math, clang does for default
> options).
>
> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> (power9).
Hi Tulio/Paul,
This the only missing patch in this set and I would like to check with you,
powerpc maintainers, that if it would be ok to push it. The resulting
performance difference, including the latest one that removes the wrappers,
is slight better:
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 11.9206 13.9871
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 15.3541 18.0856
The POWER8 çatency difference seems to be due branch misprediction
in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the
USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
POWER8 reciprocal-throughput latency
xsmaxdp/xsmindp 12.8959 16.2082
POWER9 is not affected (I don't see any performance difference by
using xsmaxdp/xsmindp).
The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
for some reason, clang use them on default -O2 option.
> ---
> sysdeps/powerpc/fpu/e_hypot.c | 87 -------------------
> sysdeps/powerpc/fpu/e_hypotf.c | 78 -----------------
> .../powerpc32/power4/fpu/multiarch/Makefile | 5 +-
> .../power4/fpu/multiarch/e_hypot-power7.c | 23 -----
> .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 -----
> .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 -------
> .../power4/fpu/multiarch/e_hypotf-power7.c | 23 -----
> .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 -----
> .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
> 9 files changed, 1 insertion(+), 327 deletions(-)
> delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
> delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
>
> diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
> deleted file mode 100644
> index f96c589bbd..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypot.c
> +++ /dev/null
> @@ -1,87 +0,0 @@
> -/* Pythagorean addition using doubles
> - Copyright (C) 2011-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Library General Public License as
> - published by the Free Software Foundation; either version 2 of the
> - License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Library General Public License for more details.
> -
> - You should have received a copy of the GNU Library General Public
> - License along with the GNU C Library; see the file COPYING.LIB. If
> - not, see <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math-underflow.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypot(x,y)
> - *
> - * This a FP only version without any FP->INT conversion.
> - * It is similar to default C version, making appropriates
> - * overflow and underflows checks as well scaling when it
> - * is needed.
> - */
> -
> -double
> -__ieee754_hypot (double x, double y)
> -{
> - if ((isinf (x) || isinf (y))
> - && !issignaling (x) && !issignaling (y))
> - return INFINITY;
> - if (isnan (x) || isnan (y))
> - return x + y;
> -
> - x = fabs (x);
> - y = fabs (y);
> -
> - if (y > x)
> - {
> - double t = x;
> - x = y;
> - y = t;
> - }
> - if (y == 0.0)
> - return x;
> -
> - /* if y is higher enough, y * 2^60 might overflow. The tests if
> - y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
> - appropriate check to avoid the overflow exception generation. */
> - if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
> - return x + y;
> -
> - if (x > 0x1p+500)
> - {
> - x *= 0x1p-600;
> - y *= 0x1p-600;
> - return sqrt (x * x + y * y) / 0x1p-600;
> - }
> - if (y < 0x1p-500)
> - {
> - if (y <= 0x0.fffffffffffffp-1022)
> - {
> - x *= 0x1p+1022;
> - y *= 0x1p+1022;
> - double ret = sqrt (x * x + y * y) / 0x1p+1022;
> - math_check_force_underflow_nonneg (ret);
> - return ret;
> - }
> - else
> - {
> - x *= 0x1p+600;
> - y *= 0x1p+600;
> - return sqrt (x * x + y * y) / 0x1p+600;
> - }
> - }
> - return sqrt (x * x + y * y);
> -}
> -#ifndef __ieee754_hypot
> -libm_alias_finite (__ieee754_hypot, __hypot)
> -#endif
> diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
> deleted file mode 100644
> index fa201dda51..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypotf.c
> +++ /dev/null
> @@ -1,78 +0,0 @@
> -/* Pythagorean addition using floats
> - Copyright (C) 2011-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Library General Public License as
> - published by the Free Software Foundation; either version 2 of the
> - License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Library General Public License for more details.
> -
> - You should have received a copy of the GNU Library General Public
> - License along with the GNU C Library; see the file COPYING.LIB. If
> - not, see <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypotf(x,y)
> -
> - This a FP only version without any FP->INT conversion.
> - It is similar to default C version, making appropriates
> - overflow and underflows checks as using double precision
> - instead of scaling. */
> -
> -#ifdef _ARCH_PWR7
> -/* POWER7 isinf and isnan optimizations are fast. */
> -# define TEST_INF_NAN(x, y) \
> - if ((isinff(x) || isinff(y)) \
> - && !issignaling (x) && !issignaling (y)) \
> - return INFINITY; \
> - if (isnanf(x) || isnanf(y)) \
> - return x + y;
> -# else
> -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
> - * costly (especially for POWER6). */
> -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2) \
> - do { \
> - ieee_float_shape_type gf_u1; \
> - ieee_float_shape_type gf_u2; \
> - gf_u1.value = (f1); \
> - gf_u2.value = (f2); \
> - (i1) = gf_u1.word & 0x7fffffff; \
> - (i2) = gf_u2.word & 0x7fffffff; \
> - } while (0)
> -
> -# define TEST_INF_NAN(x, y) \
> - do { \
> - uint32_t hx, hy; \
> - GET_TWO_FLOAT_WORD(x, y, hx, hy); \
> - if (hy > hx) { \
> - uint32_t ht = hx; hx = hy; hy = ht; \
> - } \
> - if (hx >= 0x7f800000) { \
> - if ((hx == 0x7f800000 || hy == 0x7f800000) \
> - && !issignaling (x) && !issignaling (y)) \
> - return INFINITY; \
> - return x + y; \
> - } \
> - } while (0)
> -#endif
> -
> -
> -float
> -__ieee754_hypotf (float x, float y)
> -{
> - TEST_INF_NAN (x, y);
> -
> - return sqrt ((double) x * x + (double) y * y);
> -}
> -#ifndef __ieee754_hypotf
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
> -#endif
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> index 60f2c95532..1de0f9b350 100644
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
> s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
> s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
> s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
> - s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
> - e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
> + s_logbf-power7 s_logbf-ppc32
>
> CFLAGS-s_llrintf-power6.c += -mcpu=power6
> CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
> @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
> CFLAGS-s_logbl-power7.c = -mcpu=power7
> CFLAGS-s_logb-power7.c = -mcpu=power7
> CFLAGS-s_logbf-power7.c = -mcpu=power7
> -CFLAGS-e_hypot-power7.c = -mcpu=power7
> -CFLAGS-e_hypotf-power7.c = -mcpu=power7
>
> # These files quiet sNaNs in a way that is optimized away without
> # -fsignaling-nans.
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> deleted file mode 100644
> index 382b4a0b27..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() POWER7 version.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> deleted file mode 100644
> index abb14d5469..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_ppc32
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> deleted file mode 100644
> index a16efa350c..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypot.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypot,
> - (hwcap & PPC_FEATURE_ARCH_2_06)
> - ? __ieee754_hypot_power7
> - : __ieee754_hypot_ppc32);
> -
> -libm_alias_finite (__ieee754_hypot, __hypot)
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> deleted file mode 100644
> index f8a26ff22f..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee754_hypot POWER7 version.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> deleted file mode 100644
> index b13f8c9db2..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_ppc32
> -
> -#include <sysdeps/ieee754/flt-32/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> deleted file mode 100644
> index 1e72605db8..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypotf.
> - Copyright (C) 2013-2021 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypotf,
> - (hwcap & PPC_FEATURE_ARCH_2_06)
> - ? __ieee754_hypotf_power7
> - : __ieee754_hypotf_ppc32);
> -
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
2021-12-06 17:12 ` Adhemerval Zanella
@ 2021-12-06 21:29 ` Paul A. Clarke
2021-12-07 13:19 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Paul A. Clarke @ 2021-12-06 21:29 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha, Tulio Magno Quites Machado Filho
On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
> On 02/12/2021 21:00, Adhemerval Zanella wrote:
> > The generic implementation is shows only slight worse performance:
> >
> > POWER9 reciprocal-throughput latency
> > master 13.4024 14.0967
> > new hypot 14.8479 15.8061
> >
> > POWER8 reciprocal-throughput latency
> > master 15.5767 16.8885
> > new hypot 16.5371 18.4057
> >
> > One way to improve might to make gcc generate xsmaxdp/xsmindp for
> > fmax/fmin (it onl does for -ffast-math, clang does for default
> > options).
> >
> > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> > (power9).
>
> Hi Tulio/Paul,
>
> This the only missing patch in this set and I would like to check with you,
> powerpc maintainers, that if it would be ok to push it. The resulting
> performance difference, including the latest one that removes the wrappers,
> is slight better:
>
>
> POWER9 reciprocal-throughput latency
> master 13.4024 14.0967
> new hypot 11.9206 13.9871
>
> POWER8 reciprocal-throughput latency
> master 15.5767 16.8885
> new hypot 15.3541 18.0856
For Power10 / master:
"hypot": {
"workload-random": {
"duration": 5.28242e+08,
"iterations": 4.8e+07,
"reciprocal-throughput": 8.28478,
"latency": 13.7253,
"max-throughput": 1.20703e+08,
"min-throughput": 7.2858e+07
}
}
For Power10 / new hypot:
"hypot": {
"workload-random": {
"duration": 5.30731e+08,
"iterations": 5.2e+07,
"reciprocal-throughput": 7.21945,
"latency": 13.1933,
"max-throughput": 1.38515e+08,
"min-throughput": 7.57963e+07
}
}
> The POWER8 çatency difference seems to be due branch misprediction
> in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the
> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
>
> POWER8 reciprocal-throughput latency
> xsmaxdp/xsmindp 12.8959 16.2082
>
> POWER9 is not affected (I don't see any performance difference by
> using xsmaxdp/xsmindp).
>
> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
> for some reason, clang use them on default -O2 option.
I believe clang may be wrong here, in that the instructions do not
properly handle NaN for the fmin/fmax semantics without -ffast-math.
PC
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 06/12] math: Remove powerpc e_hypot
2021-12-06 21:29 ` Paul A. Clarke
@ 2021-12-07 13:19 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 13:19 UTC (permalink / raw)
To: Paul A. Clarke; +Cc: libc-alpha, Tulio Magno Quites Machado Filho
On 06/12/2021 18:29, Paul A. Clarke wrote:
> On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
>> On 02/12/2021 21:00, Adhemerval Zanella wrote:
>>> The generic implementation is shows only slight worse performance:
>>>
>>> POWER9 reciprocal-throughput latency
>>> master 13.4024 14.0967
>>> new hypot 14.8479 15.8061
>>>
>>> POWER8 reciprocal-throughput latency
>>> master 15.5767 16.8885
>>> new hypot 16.5371 18.4057
>>>
>>> One way to improve might to make gcc generate xsmaxdp/xsmindp for
>>> fmax/fmin (it onl does for -ffast-math, clang does for default
>>> options).
>>>
>>> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
>>> (power9).
>>
>> Hi Tulio/Paul,
>>
>> This the only missing patch in this set and I would like to check with you,
>> powerpc maintainers, that if it would be ok to push it. The resulting
>> performance difference, including the latest one that removes the wrappers,
>> is slight better:
>>
>>
>> POWER9 reciprocal-throughput latency
>> master 13.4024 14.0967
>> new hypot 11.9206 13.9871
>>
>> POWER8 reciprocal-throughput latency
>> master 15.5767 16.8885
>> new hypot 15.3541 18.0856
>
> For Power10 / master:
>
> "hypot": {
> "workload-random": {
> "duration": 5.28242e+08,
> "iterations": 4.8e+07,
> "reciprocal-throughput": 8.28478,
> "latency": 13.7253,
> "max-throughput": 1.20703e+08,
> "min-throughput": 7.2858e+07
> }
> }
>
> For Power10 / new hypot:
>
> "hypot": {
> "workload-random": {
> "duration": 5.30731e+08,
> "iterations": 5.2e+07,
> "reciprocal-throughput": 7.21945,
> "latency": 13.1933,
> "max-throughput": 1.38515e+08,
> "min-throughput": 7.57963e+07
> }
> }
So I assume it would be ok to push this patch based on the power10
performance numbers, right?
>
>> The POWER8 çatency difference seems to be due branch misprediction
>> in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the
>> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
>>
>> POWER8 reciprocal-throughput latency
>> xsmaxdp/xsmindp 12.8959 16.2082
>>
>> POWER9 is not affected (I don't see any performance difference by
>> using xsmaxdp/xsmindp).
>>
>> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
>> for some reason, clang use them on default -O2 option.
>
> I believe clang may be wrong here, in that the instructions do not
> properly handle NaN for the fmin/fmax semantics without -ffast-math.
At least on power8/power9 I don't see any test-double-{fmax,fmin}
regressions if I use xsmaxdp/xsmindp instead of default implementation.
And libm-test-{fmax,fmin}.inc does extensively tests both default
and signaling NaNs.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 07/12] i386: Move hypot implementation to C
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (5 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 06/12] math: Remove powerpc e_hypot Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-03 14:51 ` Wilco Dijkstra
2021-12-03 0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
` (5 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling. The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly. A similar
implementation is provided instead of using the generic implementation:
Checked on i686-linux-gnu.
---
sysdeps/i386/fpu/e_hypot.S | 75 -------------------------------------
sysdeps/i386/fpu/e_hypot.c | 46 +++++++++++++++++++++++
sysdeps/i386/fpu/e_hypotf.S | 64 -------------------------------
3 files changed, 46 insertions(+), 139 deletions(-)
delete mode 100644 sysdeps/i386/fpu/e_hypot.S
create mode 100644 sysdeps/i386/fpu/e_hypot.c
delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S
deleted file mode 100644
index f2c956b77a..0000000000
--- a/sysdeps/i386/fpu/e_hypot.S
+++ /dev/null
@@ -1,75 +0,0 @@
-/* Compute the hypothenuse of X and Y.
- Copyright (C) 1998-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-DEFINE_DBL_MIN
-
-#ifdef PIC
-# define MO(op) op##@GOTOFF(%edx)
-#else
-# define MO(op) op
-#endif
-
- .text
-ENTRY(__ieee754_hypot)
-#ifdef PIC
- LOAD_PIC_REG (dx)
-#endif
- fldl 4(%esp) // x
- fxam
- fnstsw
- fldl 12(%esp) // y : x
- movb %ah, %ch
- fxam
- fnstsw
- movb %ah, %al
- orb %ch, %ah
- sahf
- jc 1f
- fmul %st(0) // y * y : x
- fxch // x : y * y
- fmul %st(0) // x * x : y * y
- faddp // x * x + y * y
- fsqrt
- DBL_NARROW_EVAL_UFLOW_NONNEG
-2: ret
-
- // We have to test whether any of the parameters is Inf.
- // In this case the result is infinity.
-1: andb $0x45, %al
- cmpb $5, %al
- je 3f // jump if y is Inf
- andb $0x45, %ch
- cmpb $5, %ch
- jne 4f // jump if x is not Inf
- fxch
-3: fstp %st(1)
- fabs
- jmp 2b
-
-4: testb $1, %al
- jnz 5f // y is NaN
- fxch
-5: fstp %st(1)
- jmp 2b
-
-END(__ieee754_hypot)
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
new file mode 100644
index 0000000000..b7c068e734
--- /dev/null
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -0,0 +1,46 @@
+/* Euclidean distance function. Double/Binary64 i386 version.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <math.h>
+#include <math_private.h>
+#include <math-underflow.h>
+#include <math-narrow-eval.h>
+#include <libm-alias-finite.h>
+#include <math_config.h>
+
+/* The i386 allows to use the default excess of precision to optimize the
+ hypot implementation, since internal multiplication and sqrt is carried
+ with 80-bit FP type. */
+double
+__ieee754_hypot (double x, double y)
+{
+ if (!isfinite (x) || !isfinite (y))
+ {
+ if ((isinf (x) || isinf (y))
+ && !issignaling (x) && !issignaling (y))
+ return INFINITY;
+ return x + y;
+ }
+
+ long double lx = x;
+ long double ly = y;
+ double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));
+ math_check_force_underflow_nonneg (r);
+ return r;
+}
+libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S
deleted file mode 100644
index cec5d15403..0000000000
--- a/sysdeps/i386/fpu/e_hypotf.S
+++ /dev/null
@@ -1,64 +0,0 @@
-/* Compute the hypothenuse of X and Y.
- Copyright (C) 1998-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
- .text
-ENTRY(__ieee754_hypotf)
- flds 4(%esp) // x
- fxam
- fnstsw
- flds 8(%esp) // y : x
- movb %ah, %ch
- fxam
- fnstsw
- movb %ah, %al
- orb %ch, %ah
- sahf
- jc 1f
- fmul %st(0) // y * y : x
- fxch // x : y * y
- fmul %st(0) // x * x : y * y
- faddp // x * x + y * y
- fsqrt
- FLT_NARROW_EVAL
-2: ret
-
- // We have to test whether any of the parameters is Inf.
- // In this case the result is infinity.
-1: andb $0x45, %al
- cmpb $5, %al
- je 3f // jump if y is Inf
- andb $0x45, %ch
- cmpb $5, %ch
- jne 4f // jump if x is not Inf
- fxch
-3: fstp %st(1)
- fabs
- jmp 2b
-
-4: testb $1, %al
- jnz 5f // y is NaN
- fxch
-5: fstp %st(1)
- jmp 2b
-
-END(__ieee754_hypotf)
-libm_alias_finite (__ieee754_hypotf, __hypotf)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (6 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 07/12] i386: Move hypot implementation to C Adhemerval Zanella
@ 2021-12-03 0:00 ` Adhemerval Zanella
2021-12-06 11:52 ` Wilco Dijkstra
2021-12-06 21:11 ` Joseph Myers
2021-12-03 0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
` (4 subsequent siblings)
12 siblings, 2 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:00 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
It allows the architecture to use the builtin instead of generic
implementation.
---
math/s_fmax_template.c | 27 ++++++++++++++++++++++++
sysdeps/generic/math-use-builtins-fmax.h | 4 ++++
sysdeps/generic/math-use-builtins.h | 1 +
3 files changed, 32 insertions(+)
create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
diff --git a/math/s_fmax_template.c b/math/s_fmax_template.c
index d817406f04..4417fdb283 100644
--- a/math/s_fmax_template.c
+++ b/math/s_fmax_template.c
@@ -17,10 +17,37 @@
<https://www.gnu.org/licenses/>. */
#include <math.h>
+#include <math-use-builtins.h>
+
+#if __HAVE_FLOAT128
+# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
+# define BUILTIN_F128 , _Float128 :__builtin_fmaxf128
+#else
+# define USE_BUILTIN_F128
+# define BUILTIN_F128
+#endif
+
+#define USE_BUILTIN(X, Y) \
+ _Generic((X), \
+ float : USE_FMAXF_BUILTIN, \
+ double : USE_FMAX_BUILTIN, \
+ long double : USE_FMAXL_BUILTIN \
+ USE_BUILTIN_F128)
+
+#define BUILTIN(X, Y) \
+ _Generic((X), \
+ float : __builtin_fmaxf, \
+ double : __builtin_fmax, \
+ long double : __builtin_fmaxl \
+ BUILTIN_F128) \
+ (X, Y)
FLOAT
M_DECL_FUNC (__fmax) (FLOAT x, FLOAT y)
{
+ if (USE_BUILTIN (x, y))
+ return BUILTIN (x, y);
+
if (isgreaterequal (x, y))
return x;
else if (isless (x, y))
diff --git a/sysdeps/generic/math-use-builtins-fmax.h b/sysdeps/generic/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..8fc4efca6a
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 0
+#define USE_FMAXF_BUILTIN 0
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index 19d2d1cf3c..e07bba242f 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -34,5 +34,6 @@
#include <math-use-builtins-copysign.h>
#include <math-use-builtins-sqrt.h>
#include <math-use-builtins-fma.h>
+#include <math-use-builtins-fmax.h>
#endif /* MATH_USE_BUILTINS_H */
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
2021-12-03 0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
@ 2021-12-06 11:52 ` Wilco Dijkstra
2021-12-06 21:11 ` Joseph Myers
1 sibling, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:52 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
math/s_fmax_template.c | 27 ++++++++++++++++++++++++
sysdeps/generic/math-use-builtins-fmax.h | 4 ++++
sysdeps/generic/math-use-builtins.h | 1 +
3 files changed, 32 insertions(+)
create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
Looks good.
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
2021-12-03 0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
2021-12-06 11:52 ` Wilco Dijkstra
@ 2021-12-06 21:11 ` Joseph Myers
2021-12-07 13:21 ` Adhemerval Zanella
1 sibling, 1 reply; 36+ messages in thread
From: Joseph Myers @ 2021-12-06 21:11 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha, Paul Zimmermann, Wilco Dijkstra
On Thu, 2 Dec 2021, Adhemerval Zanella via Libc-alpha wrote:
> +#if __HAVE_FLOAT128
> +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
> +# define BUILTIN_F128 , _Float128 :__builtin_fmaxf128
> +#else
> +# define USE_BUILTIN_F128
> +# define BUILTIN_F128
> +#endif
> +
> +#define USE_BUILTIN(X, Y) \
> + _Generic((X), \
> + float : USE_FMAXF_BUILTIN, \
> + double : USE_FMAX_BUILTIN, \
> + long double : USE_FMAXL_BUILTIN \
> + USE_BUILTIN_F128)
> +
> +#define BUILTIN(X, Y) \
> + _Generic((X), \
> + float : __builtin_fmaxf, \
> + double : __builtin_fmax, \
> + long double : __builtin_fmaxl \
> + BUILTIN_F128) \
> + (X, Y)
You should not hardcode cases for different types in type-generic
templates. This would fail if we add support for _Float16 functions in
future, for example.
Use M_SUF to call built-in functions (possibly via defining M_FMAX in
math-type-macros.h, similar to the other M_* macros there.
Do something similar for USE_*_BUILTIN (I suggest defining M_USE_BUILTIN
in the per-type headers used by math-type-macros.h, so you can then call
M_USE_BUILTIN (FMAX).
You'll probably need to use #if with the M_USE_BUILTIN call (which should
be OK after the above changes), because GCC only added __builtin_fmaxfN
functions in commit ee5fd23a481f510528e00f4c988ed0e6a71218c2 (GCC 8 and
later) but older GCC versions are supported for building for some
architectures with _Float128 support, and if __builtin_fmaxf128 gets
referenced inside if (0) with older GCC it would be an undeclared
identifier.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h
2021-12-06 21:11 ` Joseph Myers
@ 2021-12-07 13:21 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-07 13:21 UTC (permalink / raw)
To: Joseph Myers; +Cc: libc-alpha, Paul Zimmermann, Wilco Dijkstra
On 06/12/2021 18:11, Joseph Myers wrote:
> On Thu, 2 Dec 2021, Adhemerval Zanella via Libc-alpha wrote:
>
>> +#if __HAVE_FLOAT128
>> +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
>> +# define BUILTIN_F128 , _Float128 :__builtin_fmaxf128
>> +#else
>> +# define USE_BUILTIN_F128
>> +# define BUILTIN_F128
>> +#endif
>> +
>> +#define USE_BUILTIN(X, Y) \
>> + _Generic((X), \
>> + float : USE_FMAXF_BUILTIN, \
>> + double : USE_FMAX_BUILTIN, \
>> + long double : USE_FMAXL_BUILTIN \
>> + USE_BUILTIN_F128)
>> +
>> +#define BUILTIN(X, Y) \
>> + _Generic((X), \
>> + float : __builtin_fmaxf, \
>> + double : __builtin_fmax, \
>> + long double : __builtin_fmaxl \
>> + BUILTIN_F128) \
>> + (X, Y)
>
> You should not hardcode cases for different types in type-generic
> templates. This would fail if we add support for _Float16 functions in
> future, for example.
>
> Use M_SUF to call built-in functions (possibly via defining M_FMAX in
> math-type-macros.h, similar to the other M_* macros there.
>
> Do something similar for USE_*_BUILTIN (I suggest defining M_USE_BUILTIN
> in the per-type headers used by math-type-macros.h, so you can then call
> M_USE_BUILTIN (FMAX).
Indeed it is cleaner, I don't see the need of just M_FMAX.
>
> You'll probably need to use #if with the M_USE_BUILTIN call (which should
> be OK after the above changes), because GCC only added __builtin_fmaxfN
> functions in commit ee5fd23a481f510528e00f4c988ed0e6a71218c2 (GCC 8 and
> later) but older GCC versions are supported for building for some
> architectures with _Float128 support, and if __builtin_fmaxf128 gets
> referenced inside if (0) with older GCC it would be an undeclared
> identifier.
>
Right, but currently we use declare_mgen_alias for fmax/fmin, so I am not
sure it we really need to take care if __builtin_fmaxfN support.
For _Float128 I will need to check if this does incur in any issue, I
am currently building some older gcc toolchain to check it.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (7 preceding siblings ...)
2021-12-03 0:00 ` [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
@ 2021-12-03 0:01 ` Adhemerval Zanella
2021-12-06 11:50 ` Wilco Dijkstra
2021-12-03 0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
` (3 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:01 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
It allows the architecture to use the builtin instead of generic
implementation.
---
math/s_fmin_template.c | 27 ++++++++++++++++++++++++
sysdeps/generic/math-use-builtins-fmin.h | 4 ++++
sysdeps/generic/math-use-builtins.h | 1 +
3 files changed, 32 insertions(+)
create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
diff --git a/math/s_fmin_template.c b/math/s_fmin_template.c
index 565a836266..27c2382f59 100644
--- a/math/s_fmin_template.c
+++ b/math/s_fmin_template.c
@@ -17,11 +17,38 @@
<https://www.gnu.org/licenses/>. */
#include <math.h>
+#include <math-use-builtins.h>
+
+#if __HAVE_FLOAT128
+# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN
+# define BUILTIN_F128 , _Float128 : __builtin_fminf128
+#else
+# define USE_BUILTIN_F128
+# define BUILTIN_F128
+#endif
+
+#define USE_BUILTIN(X, Y) \
+ _Generic((X), \
+ float : USE_FMAXF_BUILTIN, \
+ double : USE_FMAX_BUILTIN, \
+ long double : USE_FMAXL_BUILTIN \
+ USE_BUILTIN_F128)
+
+#define BUILTIN(X, Y) \
+ _Generic((X), \
+ float : __builtin_fminf, \
+ double : __builtin_fmin, \
+ long double : __builtin_fminl \
+ BUILTIN_F128) \
+ (X, Y)
FLOAT
M_DECL_FUNC (__fmin) (FLOAT x, FLOAT y)
{
+ if (USE_BUILTIN (x, y))
+ return BUILTIN (x, y);
+
if (islessequal (x, y))
return x;
else if (isgreater (x, y))
diff --git a/sysdeps/generic/math-use-builtins-fmin.h b/sysdeps/generic/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..d2383ce00c
--- /dev/null
+++ b/sysdeps/generic/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 0
+#define USE_FMINF_BUILTIN 0
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h
index e07bba242f..24fba47575 100644
--- a/sysdeps/generic/math-use-builtins.h
+++ b/sysdeps/generic/math-use-builtins.h
@@ -35,5 +35,6 @@
#include <math-use-builtins-sqrt.h>
#include <math-use-builtins-fma.h>
#include <math-use-builtins-fmax.h>
+#include <math-use-builtins-fmin.h>
#endif /* MATH_USE_BUILTINS_H */
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (8 preceding siblings ...)
2021-12-03 0:01 ` [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
@ 2021-12-03 0:01 ` Adhemerval Zanella
2021-12-06 11:46 ` Wilco Dijkstra
2021-12-03 0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
` (2 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:01 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
It allows to remove the arch-specific implementations.
---
sysdeps/aarch64/fpu/math-use-builtins-fmax.h | 4 +++
sysdeps/aarch64/fpu/math-use-builtins-fmin.h | 4 +++
sysdeps/aarch64/fpu/s_fmax.c | 28 --------------------
sysdeps/aarch64/fpu/s_fmaxf.c | 28 --------------------
sysdeps/aarch64/fpu/s_fmin.c | 28 --------------------
sysdeps/aarch64/fpu/s_fminf.c | 28 --------------------
6 files changed, 8 insertions(+), 112 deletions(-)
create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmax.h b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
new file mode 100644
index 0000000000..6b9e7b7692
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 1
+#define USE_FMAXF_BUILTIN 10
+#define USE_FMAXL_BUILTIN 0
+#define USE_FMAXF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmin.h b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
new file mode 100644
index 0000000000..7fd6b45fce
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h
@@ -0,0 +1,4 @@
+#define USE_FMIN_BUILTIN 1
+#define USE_FMINF_BUILTIN 1
+#define USE_FMINL_BUILTIN 0
+#define USE_FMINF128_BUILTIN 0
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
deleted file mode 100644
index 7e3593fbda..0000000000
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of the
- License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmax (double x, double y)
-{
- return __builtin_fmax (x, y);
-}
-
-libm_alias_double (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
deleted file mode 100644
index eb4c469ef8..0000000000
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of the
- License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fmaxf (float x, float y)
-{
- return __builtin_fmaxf (x, y);
-}
-
-libm_alias_float (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
deleted file mode 100644
index efdee1ed39..0000000000
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 1996-2021 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <libm-alias-double.h>
-
-double
-__fmin (double x, double y)
-{
- return __builtin_fmin (x, y);
-}
-
-libm_alias_double (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
deleted file mode 100644
index 3665fabb6b..0000000000
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2011-2021 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of the
- License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <math.h>
-#include <libm-alias-float.h>
-
-float
-__fminf (float x, float y)
-{
- return __builtin_fminf (x, y);
-}
-
-libm_alias_float (__fmin, fmin)
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
2021-12-03 0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
@ 2021-12-06 11:46 ` Wilco Dijkstra
2021-12-06 12:35 ` Adhemerval Zanella
0 siblings, 1 reply; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:46 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
+++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
@@ -0,0 +1,4 @@
+#define USE_FMAX_BUILTIN 1
+#define USE_FMAXF_BUILTIN 10
Typo: 1 rather than 10
Looks good otherwise.
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h
2021-12-06 11:46 ` Wilco Dijkstra
@ 2021-12-06 12:35 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:35 UTC (permalink / raw)
To: Wilco Dijkstra, libc-alpha, Paul Zimmermann
On 06/12/2021 08:46, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
> +++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h
> @@ -0,0 +1,4 @@
> +#define USE_FMAX_BUILTIN 1
> +#define USE_FMAXF_BUILTIN 10
>
> Typo: 1 rather than 10
Oops, I have fixed it.
>
> Looks good otherwise.
>
> Wilco
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 11/12] math: Use fmin/fmax on hypot
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (9 preceding siblings ...)
2021-12-03 0:01 ` [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
@ 2021-12-03 0:01 ` Adhemerval Zanella
2021-12-06 11:44 ` Wilco Dijkstra
2021-12-03 0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
2021-12-03 8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
12 siblings, 1 reply; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:01 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It optimizes for architectures that provides fast builtins.
Checked on aarch64-linux-gnu.
---
sysdeps/ieee754/dbl-64/e_hypot.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index f53061badc..ce51784d27 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -38,6 +38,7 @@
#include <math_private.h>
#include <math-underflow.h>
#include <math-narrow-eval.h>
+#include <math-use-builtins.h>
#include <libm-alias-finite.h>
#include "math_config.h"
@@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y)
x = fabs (x);
y = fabs (y);
- double ax = x < y ? y : x;
- double ay = x < y ? x : y;
+ double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x);
+ double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y);
/* If ax is huge, scale both inputs down. */
if (__glibc_unlikely (ax > LARGE_VAL))
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 11/12] math: Use fmin/fmax on hypot
2021-12-03 0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
@ 2021-12-06 11:44 ` Wilco Dijkstra
0 siblings, 0 replies; 36+ messages in thread
From: Wilco Dijkstra @ 2021-12-06 11:44 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Paul Zimmermann
Hi Adhemerval,
@@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y)
x = fabs (x);
y = fabs (y);
- double ax = x < y ? y : x;
- double ay = x < y ? x : y;
+ double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x);
+ double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y);
Looks good.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (10 preceding siblings ...)
2021-12-03 0:01 ` [PATCH v4 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
@ 2021-12-03 0:01 ` Adhemerval Zanella
2021-12-03 8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
12 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-03 0:01 UTC (permalink / raw)
To: libc-alpha, Paul Zimmermann, Wilco Dijkstra
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
---
math/Versions | 2 ++
math/w_hypot.c | 8 ++++++
math/w_hypot_compat.c | 13 +++++++---
math/w_hypotf.c | 8 ++++++
math/w_hypotf_compat.c | 6 ++---
sysdeps/i386/fpu/e_hypot.c | 14 +++++++++-
sysdeps/ieee754/dbl-64/e_hypot.c | 26 +++++++++++++++----
sysdeps/ieee754/dbl-64/w_hypot.c | 1 +
sysdeps/ieee754/flt-32/e_hypotf.c | 21 +++++++++++----
sysdeps/ieee754/flt-32/w_hypotf.c | 1 +
sysdeps/mach/hurd/i386/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/i386/libm.abilist | 2 ++
.../sysv/linux/m68k/coldfire/libm.abilist | 2 ++
.../unix/sysv/linux/m68k/m680x0/libm.abilist | 2 ++
.../sysv/linux/microblaze/be/libm.abilist | 2 ++
.../sysv/linux/microblaze/le/libm.abilist | 2 ++
.../unix/sysv/linux/mips/mips32/libm.abilist | 2 ++
.../unix/sysv/linux/mips/mips64/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 ++
.../linux/powerpc/powerpc32/fpu/libm.abilist | 2 ++
.../powerpc/powerpc32/nofpu/libm.abilist | 2 ++
.../linux/powerpc/powerpc64/be/libm.abilist | 2 ++
.../linux/powerpc/powerpc64/le/libm.abilist | 2 ++
.../unix/sysv/linux/s390/s390-32/libm.abilist | 2 ++
.../unix/sysv/linux/s390/s390-64/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 ++
sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 ++
.../sysv/linux/sparc/sparc32/libm.abilist | 2 ++
.../sysv/linux/sparc/sparc64/libm.abilist | 2 ++
.../unix/sysv/linux/x86_64/64/libm.abilist | 2 ++
.../unix/sysv/linux/x86_64/x32/libm.abilist | 2 ++
36 files changed, 135 insertions(+), 17 deletions(-)
create mode 100644 math/w_hypot.c
create mode 100644 math/w_hypotf.c
create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
diff --git a/math/Versions b/math/Versions
index a4b5405ddc..fef7a08c3b 100644
--- a/math/Versions
+++ b/math/Versions
@@ -628,5 +628,7 @@ libm {
fminimum_numf64x; fminimum_numf128;
fminimum_magf64x; fminimum_magf128;
fminimum_mag_numf64x; fminimum_mag_numf128;
+ # No SVID compatible error handling.
+ hypotf; hypot;
}
}
diff --git a/math/w_hypot.c b/math/w_hypot.c
new file mode 100644
index 0000000000..66f723a896
--- /dev/null
+++ b/math/w_hypot.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
diff --git a/math/w_hypot_compat.c b/math/w_hypot_compat.c
index f07039cc51..ec983a4ab8 100644
--- a/math/w_hypot_compat.c
+++ b/math/w_hypot_compat.c
@@ -20,9 +20,9 @@
#include <libm-alias-double.h>
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
double
-__hypot (double x, double y)
+__hypot_compat (double x, double y)
{
double z = __ieee754_hypot(x,y);
if(__builtin_expect(!isfinite(z), 0)
@@ -31,5 +31,12 @@ __hypot (double x, double y)
return z;
}
-libm_alias_double (__hypot, hypot)
+compat_symbol (libm, __hypot_compat, hypot, GLIBC_2_0);
+# ifdef NO_LONG_DOUBLE
+weak_alias (__hypot_compat, hypotl)
+# endif
+# ifdef LONG_DOUBLE_COMPAT
+LONG_DOUBLE_COMPAT_CHOOSE_libm_hypotl (
+ compat_symbol (libm, __hypot_compat, hypotl, FIRST_VERSION_libm_hypotl), );
+# endif
#endif
diff --git a/math/w_hypotf.c b/math/w_hypotf.c
new file mode 100644
index 0000000000..b15a9b06d0
--- /dev/null
+++ b/math/w_hypotf.c
@@ -0,0 +1,8 @@
+#include <math-type-macros-float.h>
+#undef __USE_WRAPPER_TEMPLATE
+#define __USE_WRAPPER_TEMPLATE 1
+#undef declare_mgen_alias
+#define declare_mgen_alias(a, b)
+#include <w_hypot_template.c>
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypotf, hypotf)
diff --git a/math/w_hypotf_compat.c b/math/w_hypotf_compat.c
index 05898d3420..2bde4553b0 100644
--- a/math/w_hypotf_compat.c
+++ b/math/w_hypotf_compat.c
@@ -22,9 +22,9 @@
#include <libm-alias-float.h>
-#if LIBM_SVID_COMPAT
+#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35)
float
-__hypotf(float x, float y)
+__hypotf_compat (float x, float y)
{
float z = __ieee754_hypotf(x,y);
if(__builtin_expect(!isfinite(z), 0)
@@ -34,5 +34,5 @@ __hypotf(float x, float y)
return z;
}
-libm_alias_float (__hypot, hypot)
+compat_symbol (libm, __hypotf_compat, hypotf, GLIBC_2_0);
#endif
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
index b7c068e734..a0c5734b68 100644
--- a/sysdeps/i386/fpu/e_hypot.c
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -20,14 +20,17 @@
#include <math_private.h>
#include <math-underflow.h>
#include <math-narrow-eval.h>
+#include <math-svid-compat.h>
#include <libm-alias-finite.h>
+#include <libm-alias-double.h>
#include <math_config.h>
+#include <errno.h>
/* The i386 allows to use the default excess of precision to optimize the
hypot implementation, since internal multiplication and sqrt is carried
with 80-bit FP type. */
double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
{
if (!isfinite (x) || !isfinite (y))
{
@@ -41,6 +44,15 @@ __ieee754_hypot (double x, double y)
long double ly = y;
double r = math_narrow_eval (sqrtl (lx * lx + ly * ly));
math_check_force_underflow_nonneg (r);
+ if (isinf (r))
+ __set_errno (ERANGE);
return r;
}
+strong_alias (__hypot, __ieee754_hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
libm_alias_finite (__ieee754_hypot, __hypot)
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
+#endif
diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index ce51784d27..54f936c82b 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -34,12 +34,15 @@
[1] https://arxiv.org/pdf/1904.09481.pdf */
+#include <errno.h>
#include <math.h>
#include <math_private.h>
#include <math-underflow.h>
#include <math-narrow-eval.h>
#include <math-use-builtins.h>
+#include <math-svid-compat.h>
#include <libm-alias-finite.h>
+#include <libm-alias-double.h>
#include "math_config.h"
#define SCALE 0x1p-600
@@ -47,6 +50,14 @@
#define TINY_VAL 0x1p-459
#define EPS 0x1p-54
+static inline double
+handle_errno (double r)
+{
+ if (isinf (r))
+ __set_errno (ERANGE);
+ return r;
+}
+
/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0
and squaring ax, ay and (ax - ay) does not overflow or underflow. */
static inline double
@@ -83,7 +94,7 @@ kernel (double ax, double ay)
}
double
-__ieee754_hypot (double x, double y)
+__hypot (double x, double y)
{
if (!isfinite(x) || !isfinite(y))
{
@@ -103,9 +114,9 @@ __ieee754_hypot (double x, double y)
if (__glibc_unlikely (ax > LARGE_VAL))
{
if (__glibc_unlikely (ay <= ax * EPS))
- return ax + ay;
-
- return kernel (ax * SCALE, ay * SCALE) / SCALE;
+ return handle_errno (ax + ay);
+
+ return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE);
}
/* If ay is tiny, scale both inputs up. */
@@ -125,6 +136,11 @@ __ieee754_hypot (double x, double y)
return kernel (ax, ay);
}
-#ifndef __ieee754_hypot
+strong_alias (__hypot, __ieee754_hypot)
libm_alias_finite (__ieee754_hypot, __hypot)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypot, hypot, GLIBC_2_35);
+libm_alias_double_other (__hypot, hypot)
+#else
+libm_alias_double (__hypot, hypot)
#endif
diff --git a/sysdeps/ieee754/dbl-64/w_hypot.c b/sysdeps/ieee754/dbl-64/w_hypot.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/w_hypot.c
@@ -0,0 +1 @@
+/* Not needed. */
diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c
index 1d082fe36c..809c00c11e 100644
--- a/sysdeps/ieee754/flt-32/e_hypotf.c
+++ b/sysdeps/ieee754/flt-32/e_hypotf.c
@@ -16,14 +16,17 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
+#include <errno.h>
#include <libm-alias-finite.h>
+#include <libm-alias-float.h>
+#include <math-svid-compat.h>
#include <math.h>
#include "math_config.h"
#include <math-narrow-eval.h>
#include <math_private.h>
float
-__ieee754_hypotf (float x, float y)
+__hypotf (float x, float y)
{
if (!isfinite(x) || !isfinite(y))
{
@@ -33,9 +36,17 @@ __ieee754_hypotf (float x, float y)
return x + y;
}
- return math_narrow_eval (sqrt ((double) x * (double) x
- + (double) y * (double) y));
+ float r = math_narrow_eval (sqrt ((double) x * (double) x
+ + (double) y * (double) y));
+ if (!isfinite (r))
+ __set_errno (ERANGE);
+ return r;
}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
+strong_alias (__hypotf, __ieee754_hypotf)
+#if LIBM_SVID_COMPAT
+versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35);
+libm_alias_float_other (__hypot, hypot)
+#else
+libm_alias_float (__hypot, hypot)
#endif
+libm_alias_finite (__ieee754_hypotf, __hypotf)
diff --git a/sysdeps/ieee754/flt-32/w_hypotf.c b/sysdeps/ieee754/flt-32/w_hypotf.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/w_hypotf.c
@@ -0,0 +1 @@
+/* Not needed. */
diff --git a/sysdeps/mach/hurd/i386/libm.abilist b/sysdeps/mach/hurd/i386/libm.abilist
index abf91bd142..8f40ddb150 100644
--- a/sysdeps/mach/hurd/i386/libm.abilist
+++ b/sysdeps/mach/hurd/i386/libm.abilist
@@ -1179,3 +1179,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libm.abilist b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
index 1cef7d3db7..c2e3c6453e 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/alpha/libm.abilist b/sysdeps/unix/sysv/linux/alpha/libm.abilist
index 59d51021fa..4f85b6180f 100644
--- a/sysdeps/unix/sysv/linux/alpha/libm.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libm.abilist
@@ -1201,6 +1201,8 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libm.abilist b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 _LIB_VERSION D 0x4
GLIBC_2.4 __clog10 F
GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libm.abilist b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 _LIB_VERSION D 0x4
GLIBC_2.4 __clog10 F
GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/hppa/libm.abilist b/sysdeps/unix/sysv/linux/hppa/libm.abilist
index 35d316a720..b5dd4e851f 100644
--- a/sysdeps/unix/sysv/linux/hppa/libm.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/i386/libm.abilist b/sysdeps/unix/sysv/linux/i386/libm.abilist
index ef99b3e104..5d89aaa08e 100644
--- a/sysdeps/unix/sysv/linux/i386/libm.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libm.abilist
@@ -1186,3 +1186,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
index 44666ad7cd..36190add84 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist
@@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 _LIB_VERSION D 0x4
GLIBC_2.4 __clog10 F
GLIBC_2.4 __clog10f F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
index 58316c96ae..e7cd739a54 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist
@@ -882,3 +882,5 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
index b5e5da0272..274ecff630 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
index 4113d3170d..08b902118d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
index 18fe9cc57a..09bb3bd75b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/nios2/libm.abilist b/sysdeps/unix/sysv/linux/nios2/libm.abilist
index 3a2b34ecc2..11abbb5668 100644
--- a/sysdeps/unix/sysv/linux/nios2/libm.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libm.abilist
@@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
index 740cc8f55b..1688809c36 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist
@@ -888,6 +888,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
index 16fb30566b..e880cebd78 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist
@@ -887,6 +887,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
index ad4b98c09a..033385dfc1 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist
@@ -881,6 +881,8 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
index 955765051c..7923d428bc 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist
@@ -1316,3 +1316,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
index 1f5bd7754d..9a84163089 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
index 0b18481f39..174bde4fa0 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist
@@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libm.abilist b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libm.abilist b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
index f525a9e77e..1e1324d667 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libm.abilist
@@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 exp2l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
index 727d1ce707..217e6eff7f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist
@@ -1152,6 +1152,8 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
GLIBC_2.4 __clog10l F
GLIBC_2.4 __finitel F
GLIBC_2.4 __fpclassifyl F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
index 0f57574523..6b53b0c59f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist
@@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
index 574789e061..dbefbc3a1a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
index 1dc89b304d..8001d0f219 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist
@@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F
GLIBC_2.35 fminimuml F
GLIBC_2.35 fsqrt F
GLIBC_2.35 fsqrtl F
+GLIBC_2.35 hypot F
+GLIBC_2.35 hypotf F
--
2.32.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 00/12] Improve hypot
2021-12-03 0:00 [PATCH v4 00/12] Improve hypot Adhemerval Zanella
` (11 preceding siblings ...)
2021-12-03 0:01 ` [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
@ 2021-12-03 8:51 ` Paul Zimmermann
2021-12-06 12:36 ` Adhemerval Zanella
12 siblings, 1 reply; 36+ messages in thread
From: Paul Zimmermann @ 2021-12-03 8:51 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha, Wilco.Dijkstra
Dear Adhemerval,
a few typos in your commit message:
> when fast FMA is avaliable
should be "available"
> Use a newer algotihm
should be "algorithm"
interger -> integer
> - An improvement from 233442 to 1274 results with 1 ulp of
> error for Binary96 (x86_64).
> - An improvement from 453045 to 1294 results with 1 ulp of
> error for Binary96 (x86_64).
should the latter be binary128?
> If FMA is uses
should be "used"
adapated -> adapted
> If FMA is uses the binary64 shows a slight worse precision:
I confirm, I find:
binary64 0.948812 -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022
Best regards,
Paul
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 00/12] Improve hypot
2021-12-03 8:51 ` [PATCH v4 00/12] Improve hypot Paul Zimmermann
@ 2021-12-06 12:36 ` Adhemerval Zanella
0 siblings, 0 replies; 36+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 12:36 UTC (permalink / raw)
To: Paul Zimmermann; +Cc: libc-alpha, Wilco.Dijkstra
On 03/12/2021 05:51, Paul Zimmermann wrote:
> Dear Adhemerval,
>
> a few typos in your commit message:
>
>> when fast FMA is avaliable
> should be "available"
>
>> Use a newer algotihm
> should be "algorithm"
>
> interger -> integer
>
>> - An improvement from 233442 to 1274 results with 1 ulp of
>> error for Binary96 (x86_64).
>> - An improvement from 453045 to 1294 results with 1 ulp of
>> error for Binary96 (x86_64).
>
> should the latter be binary128?
Yeah, indeed.
>
>> If FMA is uses
> should be "used"
>
> adapated -> adapted
>
>> If FMA is uses the binary64 shows a slight worse precision:
>
> I confirm, I find:
>
> binary64 0.948812 -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022
>
> Best regards,
> Paul
>
Thanks for checking on it.
^ permalink raw reply [flat|nested] 36+ messages in thread