public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64)
@ 2021-10-28 15:19 Wilco Dijkstra
  2021-10-28 17:19 ` Joseph Myers
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2021-10-28 15:19 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: 'GNU C Library'

Hi Adhemerval,

On AArch64 the new version is ~25% slower even after removing wrappers.
Using the faster checks for special cases helps here too of course. However
there is another issue: issignaling is not inlined, forcing a frame to be created
plus several callee-saves. This fixes it:

  if (!isfinite(x) || !isfinite(y))
  {
    if ((isinf (x) || isinf (y))
        && !issignaling_inline (x) && !issignaling_inline (y))
      return INFINITY;
    return x + y;
  }

This gave a 20% speedup, so it is now only a little slower than before.
We could rename the inline function to __issignaling I suppose, and then
it should be used as long as you include math_config.h.

+  h = math_narrow_eval (h * scale);
+  math_check_force_underflow_nonneg (h);
+  return h;

I don't think you need math_check_force_underflow_nonneg at all given
h * scale should already set the underflow flag correctly. Removing this
gives performance within ~2% of the original.

Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64)
  2021-10-28 15:19 [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64) Wilco Dijkstra
@ 2021-10-28 17:19 ` Joseph Myers
  2021-10-29 12:38   ` Wilco Dijkstra
  0 siblings, 1 reply; 5+ messages in thread
From: Joseph Myers @ 2021-10-28 17:19 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Adhemerval Zanella, 'GNU C Library'

On Thu, 28 Oct 2021, Wilco Dijkstra via Libc-alpha wrote:

> +  h = math_narrow_eval (h * scale);
> +  math_check_force_underflow_nonneg (h);
> +  return h;
> 
> I don't think you need math_check_force_underflow_nonneg at all given
> h * scale should already set the underflow flag correctly. Removing this
> gives performance within ~2% of the original.

Scaling down doesn't set the underflow flag if the scaling down was exact 
(even though the actual hypot operation is inexact and so should raise 
underflow).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64)
  2021-10-28 17:19 ` Joseph Myers
@ 2021-10-29 12:38   ` Wilco Dijkstra
  2021-10-29 16:20     ` Joseph Myers
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2021-10-29 12:38 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Adhemerval Zanella, 'GNU C Library'

Hi Joseph,

> Scaling down doesn't set the underflow flag if the scaling down was exact 
> (even though the actual hypot operation is inexact and so should raise 
> underflow).

That's true, but doesn't that apply to the float version too? The convert from
double to float might also be precise (and denormal) even if earlier calculations
were inexact.

Either way, any special case checks that are required should only be done if scale
is not equal to 1.0, like:

  h = math_narrow_eval (h * scale);
  if (fastpath)
    return h;
  math_check_force_underflow_nonneg (h);
  return handle_errno (h);

Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64)
  2021-10-29 12:38   ` Wilco Dijkstra
@ 2021-10-29 16:20     ` Joseph Myers
  0 siblings, 0 replies; 5+ messages in thread
From: Joseph Myers @ 2021-10-29 16:20 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'

On Fri, 29 Oct 2021, Wilco Dijkstra via Libc-alpha wrote:

> Hi Joseph,
> 
> > Scaling down doesn't set the underflow flag if the scaling down was exact 
> > (even though the actual hypot operation is inexact and so should raise 
> > underflow).
> 
> That's true, but doesn't that apply to the float version too? The convert from
> double to float might also be precise (and denormal) even if earlier calculations
> were inexact.

The only case where hypotf can have a mathematical result that is tiny and 
inexact is when both arguments are in the subnormal range for float and 
the implementation in terms of double then involves computing the (double) 
square root of a number M * 2^-298, for some (exact) M in the range [2, 
2^46), and then converting that (double) square root to float.

The square root of an integer in the range [2, 2^46) cannot be close 
enough to an integer for the result of rounding it in either direction to 
53 bits (so at least 30 fractional bits) to be an integer, so it is not 
possible for the conversion from double to float to be exact when the 
mathematical result of hypotf is tiny and inexact.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64)
  2021-10-25 11:57 [PATCH v2 0/9] Improve hypot() Adhemerval Zanella
@ 2021-10-25 11:57 ` Adhemerval Zanella
  0 siblings, 0 replies; 5+ messages in thread
From: Adhemerval Zanella @ 2021-10-25 11:57 UTC (permalink / raw)
  To: libc-alpha

This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for denormal results.

The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).

The performance result are also only slight worse than current
implementation.  On x86_64 (Ryzen 5900X) with gcc 10.3.1:

Before:

  "hypot": {
   "workload-random": {
    "duration": 3.73205e+09,
    "iterations": 1.1e+08,
    "reciprocal-throughput": 23.3032,
    "latency": 44.5523,
    "max-throughput": 4.29127e+07,
    "min-throughput": 2.24455e+07
   }
  }

After:

  "hypot": {
   "workload-random": {
    "duration": 3.74903e+09,
    "iterations": 1.04e+08,
    "reciprocal-throughput": 22.3537,
    "latency": 49.743,
    "max-throughput": 4.47353e+07,
    "min-throughput": 2.01033e+07
   }
  }

Co-Authored-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 225 ++++++++++++-------------------
 1 file changed, 86 insertions(+), 139 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 9ec4c1ced0..231fb0d70f 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -1,164 +1,111 @@
-/* @(#)e_hypot.c 5.1 93/09/24 */
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
+/* Euclidean distance function.  Double/Binary64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
 
-/* __ieee754_hypot(x,y)
- *
- * Method :
- *	If (assume round-to-nearest) z=x*x+y*y
- *	has error less than sqrt(2)/2 ulp, than
- *	sqrt(z) has error less than 1 ulp (exercise).
- *
- *	So, compute sqrt(x*x+y*y) with some care as
- *	follows to get the error below 1 ulp:
- *
- *	Assume x>y>0;
- *	(if possible, set rounding to round-to-nearest)
- *	1. if x > 2y  use
- *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
- *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
- *	2. if x <= 2y use
- *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
- *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
- *	y1= y with lower 32 bits chopped, y2 = y-y1.
- *
- *	NOTE: scaling may be necessary if some argument is too
- *	      large or too tiny
- *
- * Special cases:
- *	hypot(x,y) is INF if x or y is +INF or -INF; else
- *	hypot(x,y) is NAN if x or y is NAN.
- *
- * Accuracy:
- *	hypot(x,y) returns sqrt(x^2+y^2) with error less
- *	than 1 ulps (units in the last place)
- */
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by
+   Carlos F. Borges [1] using the MyHypot3 with the following changes:
+
+   - Handle qNaN and sNaN.
+   - Tune the 'widely varying operands' to avoid spurious underflow
+     due the multiplication and fix the return value for upwards
+     rounding mode.
+   - Handle required underflow exception for subnormal results.
+
+   [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
 #include <math.h>
 #include <math_private.h>
 #include <math-underflow.h>
+#include <math-narrow-eval.h>
 #include <libm-alias-finite.h>
+#include <math_config.h>
+
+/* sqrt (DBL_EPSILON / 2.0)  */
+#define SQRT_EPS_DIV_2     0x1.6a09e667f3bcdp-27
+/* DBL_MIN / (sqrt (DBL_EPSILON / 2.0))   */
+#define DBL_MIN_THRESHOLD  0x1.6a09e667f3bcdp-996
+/* eps (double) * sqrt (DBL_MIN))  */
+#define SCALE              0x1p-563
+/* 1 / eps (sqrt (DBL_MIN)  */
+#define INV_SCALE          0x1p+563
+/* sqrt (DBL_MAX)  */
+#define SQRT_DBL_MAX       0x1.6a09e667f3bccp+511
+/* sqrt (DBL_MIN)  */
+#define SQRT_DBL_MIN       0x1p-511
 
 double
 __ieee754_hypot (double x, double y)
 {
-  double a, b, t1, t2, y1, y2, w;
-  int32_t j, k, ha, hb;
+  if ((isinf (x) || isinf (y))
+      && !issignaling (x) && !issignaling (y))
+    return INFINITY;
+  if (isnan (x) || isnan (y))
+    return x + y;
 
-  GET_HIGH_WORD (ha, x);
-  ha &= 0x7fffffff;
-  GET_HIGH_WORD (hb, y);
-  hb &= 0x7fffffff;
-  if (hb > ha)
-    {
-      a = y; b = x; j = ha; ha = hb; hb = j;
-    }
-  else
+  double ax = fabs (x);
+  double ay = fabs (y);
+  if (ay > ax)
     {
-      a = x; b = y;
+      double tmp = ax;
+      ax = ay;
+      ay = tmp;
     }
-  SET_HIGH_WORD (a, ha);        /* a <- |a| */
-  SET_HIGH_WORD (b, hb);        /* b <- |b| */
-  if ((ha - hb) > 0x3c00000)
-    {
-      return a + b;
-    }                                       /* x/y > 2**60 */
-  k = 0;
-  if (__glibc_unlikely (ha > 0x5f300000))                  /* a>2**500 */
-    {
-      if (ha >= 0x7ff00000)             /* Inf or NaN */
-	{
-	  uint32_t low;
-	  w = a + b;                    /* for sNaN */
-	  if (issignaling (a) || issignaling (b))
-	    return w;
-	  GET_LOW_WORD (low, a);
-	  if (((ha & 0xfffff) | low) == 0)
-	    w = a;
-	  GET_LOW_WORD (low, b);
-	  if (((hb ^ 0x7ff00000) | low) == 0)
-	    w = b;
-	  return w;
-	}
-      /* scale a and b by 2**-600 */
-      ha -= 0x25800000; hb -= 0x25800000;  k += 600;
-      SET_HIGH_WORD (a, ha);
-      SET_HIGH_WORD (b, hb);
-    }
-  if (__builtin_expect (hb < 0x23d00000, 0))            /* b < 2**-450 */
+
+  /* Widely varying operands.  The DBL_MIN_THRESHOLD check is used to avoid
+     a spurious underflow from the multiplication.  */
+  if (ax >= DBL_MIN_THRESHOLD && ay <= ax * SQRT_EPS_DIV_2)
+    return (ay == 0.0) ? ax : math_narrow_eval (ax + DBL_TRUE_MIN);
+
+  double scale = SCALE;
+  if (ax > SQRT_DBL_MAX)
     {
-      if (hb <= 0x000fffff)             /* subnormal b or 0 */
-	{
-	  uint32_t low;
-	  GET_LOW_WORD (low, b);
-	  if ((hb | low) == 0)
-	    return a;
-	  t1 = 0;
-	  SET_HIGH_WORD (t1, 0x7fd00000);       /* t1=2^1022 */
-	  b *= t1;
-	  a *= t1;
-	  k -= 1022;
-	  GET_HIGH_WORD (ha, a);
-	  GET_HIGH_WORD (hb, b);
-	  if (hb > ha)
-	    {
-	      t1 = a;
-	      a = b;
-	      b = t1;
-	      j = ha;
-	      ha = hb;
-	      hb = j;
-	    }
-	}
-      else                      /* scale a and b by 2^600 */
-	{
-	  ha += 0x25800000;             /* a *= 2^600 */
-	  hb += 0x25800000;             /* b *= 2^600 */
-	  k -= 600;
-	  SET_HIGH_WORD (a, ha);
-	  SET_HIGH_WORD (b, hb);
-	}
+      ax *= scale;
+      ay *= scale;
+      scale = INV_SCALE;
     }
-  /* medium size a and b */
-  w = a - b;
-  if (w > b)
+  else if (ay < SQRT_DBL_MIN)
     {
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha);
-      t2 = a - t1;
-      w = sqrt (t1 * t1 - (b * (-b) - t2 * (a + t1)));
+      ax /= scale;
+      ay /= scale;
     }
   else
+    scale = 1.0;
+
+  double h = sqrt (ax * ax + ay * ay);
+
+  double t1, t2;
+  if (h == 0.0)
+    return h;
+  else if (h <= 2.0 * ay)
     {
-      a = a + a;
-      y1 = 0;
-      SET_HIGH_WORD (y1, hb);
-      y2 = b - y1;
-      t1 = 0;
-      SET_HIGH_WORD (t1, ha + 0x00100000);
-      t2 = a - t1;
-      w = sqrt (t1 * y1 - (w * (-w) - (t1 * y2 + t2 * b)));
+      double delta = h - ay;
+      t1 = ax * (2.0 * delta - ax);
+      t2 = (delta - 2.0 * (ax - ay)) * delta;
     }
-  if (k != 0)
+  else
     {
-      uint32_t high;
-      t1 = 1.0;
-      GET_HIGH_WORD (high, t1);
-      SET_HIGH_WORD (t1, high + (k << 20));
-      w *= t1;
-      math_check_force_underflow_nonneg (w);
-      return w;
+      double delta = h - ax;
+      t1 = 2.0 * delta * (ax - 2 * ay);
+      t2 = (4.0 * delta - ay) * ay + delta * delta;
     }
-  else
-    return w;
+  h -= (t1 + t2) / (2.0 * h);
+  h = math_narrow_eval (h * scale);
+  math_check_force_underflow_nonneg (h);
+  return h;
 }
 #ifndef __ieee754_hypot
 libm_alias_finite (__ieee754_hypot, __hypot)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-29 16:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28 15:19 [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64) Wilco Dijkstra
2021-10-28 17:19 ` Joseph Myers
2021-10-29 12:38   ` Wilco Dijkstra
2021-10-29 16:20     ` Joseph Myers
  -- strict thread matches above, loose matches on Subject: below --
2021-10-25 11:57 [PATCH v2 0/9] Improve hypot() Adhemerval Zanella
2021-10-25 11:57 ` [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).