public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/2] ieee754: Remove slow paths from asin and acos
@ 2020-01-27 10:45 Anssi Hannula
  2020-01-27 11:25 ` [PATCH 2/2] ieee754: Remove unused __sin32 and __cos32 Anssi Hannula
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Anssi Hannula @ 2020-01-27 10:45 UTC (permalink / raw)
  To: libc-alpha

asin and acos have slow paths for rounding the last bit that cause some
calls to be 500-1500x slower than average calls.

These slow paths are rare, a test of a trillion (1.000.000.000.000)
random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.

The slow paths claim correct rounding and use __sin32() and __cos32()
(which compare two result candidates and return the closest one) as the
final step, with the second result candidate (res1) having a small offset
applied from res. This suggests that res and res1 are intended to be 1
ULP apart (which makes sense for rounding), barring bugs, allowing us to
pick either one and still remain within 1 ULP of the exact result.

Remove the slow paths as the accuracy is better than 1 ULP even without
them, which is enough for glibc.

Also remove code comments claiming correctly rounded results.

After slow path removal, checking the accuracy of 14.400.000.000 random
asin() and acos() inputs showed only three incorrectly rounded
(error > 0.5 ULP) results:
- asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
- asin(-0x1.f692ba202abcp-4)  (0.500003 ULP, same as before)
- asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
The first two had the same error even before this commit, and they did
not use the slow path at all.

Checking 4934 known randomly found previously-slow-path asin inputs
shows 25 calls with incorrectly rounded results, with a maximum error of
0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
rounded all these inputs correctly (error < 0.5 ULP).
The observed average speed increase was 130x.

Checking 36240 known randomly found previously-slow-path acos inputs
shows 42 calls with incorrectly rounded results, with a maximum error of
0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
slow-path code showed 34 calls with incorrectly rounded results, with the
same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
The observed average speed increase was 130x.

The functions could likely be trimmed more while keeping acceptable
accuracy, but this at least gets rid of the egregiously slow cases.

Tested on x86_64.
---
 sysdeps/ieee754/dbl-64/e_asin.c | 76 +++++++--------------------------
 1 file changed, 15 insertions(+), 61 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_asin.c b/sysdeps/ieee754/dbl-64/e_asin.c
index eac3d27fda..8a3b26f664 100644
--- a/sysdeps/ieee754/dbl-64/e_asin.c
+++ b/sysdeps/ieee754/dbl-64/e_asin.c
@@ -25,12 +25,6 @@
 /*               doasin.c sincos32.c dosincos.c mpa.c             */
 /*               sincos.tbl  asincos.tbl  powtwo.tbl root.tbl     */
 /*                                                                */
-/* Ultimate asin/acos routines. Given an IEEE double machine      */
-/* number x, compute the correctly rounded value of               */
-/* arcsin(x)or arccos(x)  according to the function called.       */
-/* Assumption: Machine arithmetic operations are performed in     */
-/* round to nearest mode of IEEE 754 standard.                    */
-/*                                                                */
 /******************************************************************/
 #include "endian.h"
 #include "mydefs.h"
@@ -53,13 +47,7 @@ void __doasin(double x, double dx, double w[]);
 void __dubsin(double x, double dx, double v[]);
 void __dubcos(double x, double dx, double v[]);
 void __docos(double x, double dx, double v[]);
-double __sin32(double x, double res, double res1);
-double __cos32(double x, double res, double res1);
 
-/***************************************************************************/
-/* An ultimate asin routine. Given an IEEE double machine number x         */
-/* it computes the correctly rounded (to nearest) value of arcsin(x)       */
-/***************************************************************************/
 double
 SECTION
 __ieee754_asin(double x){
@@ -100,13 +88,7 @@ __ieee754_asin(double x){
       if (res == res+1.00014*cor) return res;
       else {
 	__doasin(x,0,w);
-	if (w[0]==(w[0]+1.00000001*w[1])) return w[0];
-	else {
-	  y=fabs(x);
-	  res=fabs(w[0]);
-	  res1=fabs(w[0]+1.1*w[1]);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
-	}
+	return w[0];
       }
     }
   }
@@ -137,8 +119,7 @@ __ieee754_asin(double x){
 	if (z>1.0e-27) return (m>0)?min(res,res1):-min(res,res1);
 	else if (z<-1.0e-27) return (m>0)?max(res,res1):-max(res,res1);
 	else {
-	  y=fabs(x);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
+	  return (m>0)?res:-res;
 	}
       }
     }
@@ -170,8 +151,7 @@ __ieee754_asin(double x){
 	if (z>1.0e-27) return (m>0)?min(res,res1):-min(res,res1);
 	else if (z<-1.0e-27) return (m>0)?max(res,res1):-max(res,res1);
 	else {
-	  y=fabs(x);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
+	  return (m>0)?res:-res;
 	}
       }
     }
@@ -205,8 +185,7 @@ __ieee754_asin(double x){
 	if (z>1.0e-27) return (m>0)?min(res,res1):-min(res,res1);
 	else if (z<-1.0e-27) return (m>0)?max(res,res1):-max(res,res1);
 	else {
-	  y=fabs(x);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
+	  return (m>0)?res:-res;
 	}
       }
     }
@@ -243,8 +222,7 @@ __ieee754_asin(double x){
 	if (z>1.0e-27) return (m>0)?min(res,res1):-min(res,res1);
 	else if (z<-1.0e-27) return (m>0)?max(res,res1):-max(res,res1);
 	else {
-	  y=fabs(x);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
+	  return (m>0)?res:-res;
 	}
       }
     }
@@ -282,8 +260,7 @@ __ieee754_asin(double x){
 	if (z>1.0e-27) return (m>0)?min(res,res1):-min(res,res1);
 	else if (z<-1.0e-27) return (m>0)?max(res,res1):-max(res,res1);
 	else {
-	  y=fabs(x);
-	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
+	  return (m>0)?res:-res;
 	}
       }
     }
@@ -313,13 +290,7 @@ __ieee754_asin(double x){
       res1=hp0.x-2.0*w[0];
       cor=((hp0.x-res1)-2.0*w[0])+(hp1.x-2.0*w[1]);
       res = res1+cor;
-      cor = (res1-res)+cor;
-      if (res==(res+1.0000001*cor)) return (m>0)?res:-res;
-      else {
-	y=fabs(x);
-	res1=res+1.1*cor;
-	return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
-      }
+      return (m>0)?res:-res;
     }
   }    /*   else  if (k < 0x3ff00000)    */
   /*---------------------------- |x|>=1 -------------------------------*/
@@ -388,12 +359,7 @@ __ieee754_acos(double x)
 	r=hp0.x-w[0];
 	cor=((hp0.x-r)-w[0])+(hp1.x-w[1]);
 	res=r+cor;
-	cor=(r-res)+cor;
-	if (res ==(res +1.00000001*cor)) return res;
-	else {
-	  res1=res+1.1*cor;
-	  return __cos32(x,res,res1);
-	}
+	return res;
       }
     }
   }    /*   else  if (k < 0x3fc00000)    */
@@ -429,7 +395,7 @@ __ieee754_acos(double x)
 	z=(w[0]-x)+w[1];
 	if (z>1.0e-27) return max(res,res1);
 	else if (z<-1.0e-27) return min(res,res1);
-	else return __cos32(x,res,res1);
+	else return res;
       }
     }
   }    /*   else  if (k < 0x3fe00000)    */
@@ -464,7 +430,7 @@ __ieee754_acos(double x)
        z=(w[0]-x)+w[1];
        if (z>1.0e-27) return max(res,res1);
        else if (z<-1.0e-27) return min(res,res1);
-       else return __cos32(x,res,res1);
+       else return res;
      }
    }
   }    /*   else  if (k < 0x3fe80000)    */
@@ -499,7 +465,7 @@ __ieee754_acos(double x)
 	z=(w[0]-x)+w[1];
 	if (z>1.0e-27) return max(res,res1);
 	else if (z<-1.0e-27) return min(res,res1);
-	else return __cos32(x,res,res1);
+	else return res;
       }
     }
   }    /*   else  if (k < 0x3fed8000)    */
@@ -535,7 +501,7 @@ __ieee754_acos(double x)
 	z=(w[0]-x)+w[1];
 	if (z>1.0e-27) return max(res,res1);
 	else if (z<-1.0e-27) return min(res,res1);
-	else return __cos32(x,res,res1);
+	else return res;
       }
     }
   }    /*   else  if (k < 0x3fee8000)    */
@@ -571,7 +537,7 @@ __ieee754_acos(double x)
        z=(w[0]-x)+w[1];
        if (z>1.0e-27) return max(res,res1);
        else if (z<-1.0e-27) return min(res,res1);
-       else return __cos32(x,res,res1);
+       else return res;
      }
    }
   }    /*   else  if (k < 0x3fef0000)    */
@@ -602,13 +568,7 @@ __ieee754_acos(double x)
 	res1=hp0.x-w[0];
 	cor=((hp0.x-res1)-w[0])+(hp1.x-w[1]);
 	res = res1+cor;
-	cor = (res1-res)+cor;
-	if (res==(res+1.000001*cor)) return (res+res);
-	else {
-	  res=res+res;
-	  res1=res+1.2*cor;
-	  return __cos32(x,res,res1);
-	}
+	return (res+res);
       }
     }
     else {
@@ -620,13 +580,7 @@ __ieee754_acos(double x)
 	cc=(y-c)+cc;
 	__doasin(c,cc,w);
 	res = w[0];
-	cor=w[1];
-	if (res==(res+1.000001*cor)) return (res+res);
-	else {
-	  res=res+res;
-	  res1=res+1.2*cor;
-	  return __cos32(x,res,res1);
-	}
+	return (res+res);
       }
     }
   }    /*   else  if (k < 0x3ff00000)    */
-- 
2.21.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] ieee754: Remove unused __sin32 and __cos32
  2020-01-27 10:45 [PATCH 1/2] ieee754: Remove slow paths from asin and acos Anssi Hannula
@ 2020-01-27 11:25 ` Anssi Hannula
  2020-01-27 13:12 ` [PATCH 1/2] ieee754: Remove slow paths from asin and acos Szabolcs Nagy
  2020-12-18  7:09 ` [COMMITTED] " Siddhesh Poyarekar
  2 siblings, 0 replies; 8+ messages in thread
From: Anssi Hannula @ 2020-01-27 11:25 UTC (permalink / raw)
  To: libc-alpha

The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
---
 manual/probes.texi                           | 14 -----
 sysdeps/generic/math_private.h               |  2 -
 sysdeps/ieee754/dbl-64/sincos32.c            | 62 --------------------
 sysdeps/x86_64/fpu/multiarch/e_asin-fma.c    |  2 -
 sysdeps/x86_64/fpu/multiarch/e_asin-fma4.c   |  2 -
 sysdeps/x86_64/fpu/multiarch/sincos32-fma.c  |  2 -
 sysdeps/x86_64/fpu/multiarch/sincos32-fma4.c |  2 -
 7 files changed, 86 deletions(-)

diff --git a/manual/probes.texi b/manual/probes.texi
index 0ea560ed78..61254bc430 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -311,20 +311,6 @@ input that results in multiple precision computation with precision
 is the computed result.
 @end deftp
 
-@deftp Probe slowasin (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{asin} function is called with
-an input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input to the function and @var{$arg2}
-is the computed result.
-@end deftp
-
-@deftp Probe slowacos (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{acos} function is called with
-an input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input to the function and @var{$arg2}
-is the computed result.
-@end deftp
-
 @deftp Probe slowsin (double @var{$arg1}, double @var{$arg2})
 This probe is triggered when the @code{sin} function is called with an
 input that results in multiple precision computation with precision
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index 9296324d24..10a6fde28d 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -202,8 +202,6 @@ extern int __branred (double __x, double *__a, double *__aa);
 extern void __doasin (double __x, double __dx, double __v[]);
 extern void __dubsin (double __x, double __dx, double __v[]);
 extern void __dubcos (double __x, double __dx, double __v[]);
-extern double __sin32 (double __x, double __res, double __res1);
-extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
 extern void __docos (double __x, double __dx, double __v[]);
diff --git a/sysdeps/ieee754/dbl-64/sincos32.c b/sysdeps/ieee754/dbl-64/sincos32.c
index c00e5d1a0f..a28932dffe 100644
--- a/sysdeps/ieee754/dbl-64/sincos32.c
+++ b/sysdeps/ieee754/dbl-64/sincos32.c
@@ -127,68 +127,6 @@ __c32 (mp_no *x, mp_no *y, mp_no *z, int p)
   __cpy (&s, z, p);
 }
 
-/* Receive double x and two double results of sin(x) and return result which is
-   more accurate, computing sin(x) with multi precision routine c32.  */
-double
-SECTION
-__sin32 (double x, double res, double res1)
-{
-  int p;
-  mp_no a, b, c;
-  p = 32;
-  __dbl_mp (res, &a, p);
-  __dbl_mp (0.5 * (res1 - res), &b, p);
-  __add (&a, &b, &c, p);
-  if (x > 0.8)
-    {
-      __sub (&hp, &c, &a, p);
-      __c32 (&a, &b, &c, p);
-    }
-  else
-    __c32 (&c, &a, &b, p);	/* b=sin(0.5*(res+res1))  */
-  __dbl_mp (x, &c, p);		/* c = x  */
-  __sub (&b, &c, &a, p);
-  /* if a > 0 return min (res, res1), otherwise return max (res, res1).  */
-  if ((a.d[0] > 0 && res >= res1) || (a.d[0] <= 0 && res <= res1))
-    res = res1;
-  LIBC_PROBE (slowasin, 2, &res, &x);
-  return res;
-}
-
-/* Receive double x and two double results of cos(x) and return result which is
-   more accurate, computing cos(x) with multi precision routine c32.  */
-double
-SECTION
-__cos32 (double x, double res, double res1)
-{
-  int p;
-  mp_no a, b, c;
-  p = 32;
-  __dbl_mp (res, &a, p);
-  __dbl_mp (0.5 * (res1 - res), &b, p);
-  __add (&a, &b, &c, p);
-  if (x > 2.4)
-    {
-      __sub (&pi, &c, &a, p);
-      __c32 (&a, &b, &c, p);
-      b.d[0] = -b.d[0];
-    }
-  else if (x > 0.8)
-    {
-      __sub (&hp, &c, &a, p);
-      __c32 (&a, &c, &b, p);
-    }
-  else
-    __c32 (&c, &b, &a, p);	/* b=cos(0.5*(res+res1))  */
-  __dbl_mp (x, &c, p);		/* c = x                  */
-  __sub (&b, &c, &a, p);
-  /* if a > 0 return max (res, res1), otherwise return min (res, res1).  */
-  if ((a.d[0] > 0 && res <= res1) || (a.d[0] <= 0 && res >= res1))
-    res = res1;
-  LIBC_PROBE (slowacos, 2, &res, &x);
-  return res;
-}
-
 /* Compute sin() of double-length number (X + DX) as Multi Precision number and
    return result as double.  If REDUCE_RANGE is true, X is assumed to be the
    original input and DX is ignored.  */
diff --git a/sysdeps/x86_64/fpu/multiarch/e_asin-fma.c b/sysdeps/x86_64/fpu/multiarch/e_asin-fma.c
index 50e9c64247..1e3767bf2d 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_asin-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_asin-fma.c
@@ -1,11 +1,9 @@
 #define __ieee754_acos __ieee754_acos_fma
 #define __ieee754_asin __ieee754_asin_fma
-#define __cos32 __cos32_fma
 #define __doasin __doasin_fma
 #define __docos __docos_fma
 #define __dubcos __dubcos_fma
 #define __dubsin __dubsin_fma
-#define __sin32 __sin32_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_asin.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_asin-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_asin-fma4.c
index 2657c31f49..0965556a01 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_asin-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_asin-fma4.c
@@ -1,11 +1,9 @@
 #define __ieee754_acos __ieee754_acos_fma4
 #define __ieee754_asin __ieee754_asin_fma4
-#define __cos32 __cos32_fma4
 #define __doasin __doasin_fma4
 #define __docos __docos_fma4
 #define __dubcos __dubcos_fma4
 #define __dubsin __dubsin_fma4
-#define __sin32 __sin32_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_asin.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/sincos32-fma.c b/sysdeps/x86_64/fpu/multiarch/sincos32-fma.c
index dcd44bc5e8..4152b84835 100644
--- a/sysdeps/x86_64/fpu/multiarch/sincos32-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/sincos32-fma.c
@@ -1,5 +1,3 @@
-#define __cos32 __cos32_fma
-#define __sin32 __sin32_fma
 #define __c32 __c32_fma
 #define __mpsin __mpsin_fma
 #define __mpsin1 __mpsin1_fma
diff --git a/sysdeps/x86_64/fpu/multiarch/sincos32-fma4.c b/sysdeps/x86_64/fpu/multiarch/sincos32-fma4.c
index ebbfa18cca..643eedf138 100644
--- a/sysdeps/x86_64/fpu/multiarch/sincos32-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/sincos32-fma4.c
@@ -1,5 +1,3 @@
-#define __cos32 __cos32_fma4
-#define __sin32 __sin32_fma4
 #define __c32 __c32_fma4
 #define __mpsin __mpsin_fma4
 #define __mpsin1 __mpsin1_fma4
-- 
2.21.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ieee754: Remove slow paths from asin and acos
  2020-01-27 10:45 [PATCH 1/2] ieee754: Remove slow paths from asin and acos Anssi Hannula
  2020-01-27 11:25 ` [PATCH 2/2] ieee754: Remove unused __sin32 and __cos32 Anssi Hannula
@ 2020-01-27 13:12 ` Szabolcs Nagy
  2020-01-28 10:51   ` Anssi Hannula
  2020-03-24 23:15   ` Joseph Myers
  2020-12-18  7:09 ` [COMMITTED] " Siddhesh Poyarekar
  2 siblings, 2 replies; 8+ messages in thread
From: Szabolcs Nagy @ 2020-01-27 13:12 UTC (permalink / raw)
  To: Anssi Hannula, libc-alpha; +Cc: nd

On 27/01/2020 10:45, Anssi Hannula wrote:
> asin and acos have slow paths for rounding the last bit that cause some
> calls to be 500-1500x slower than average calls.
> 
> These slow paths are rare, a test of a trillion (1.000.000.000.000)
> random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
> for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.
> 
> The slow paths claim correct rounding and use __sin32() and __cos32()
> (which compare two result candidates and return the closest one) as the
> final step, with the second result candidate (res1) having a small offset
> applied from res. This suggests that res and res1 are intended to be 1
> ULP apart (which makes sense for rounding), barring bugs, allowing us to
> pick either one and still remain within 1 ULP of the exact result.
> 
> Remove the slow paths as the accuracy is better than 1 ULP even without
> them, which is enough for glibc.
> 
> Also remove code comments claiming correctly rounded results.

the patch looks good to me.

> 
> After slow path removal, checking the accuracy of 14.400.000.000 random
> asin() and acos() inputs showed only three incorrectly rounded
> (error > 0.5 ULP) results:
> - asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
> - asin(-0x1.f692ba202abcp-4)  (0.500003 ULP, same as before)
> - asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
> The first two had the same error even before this commit, and they did
> not use the slow path at all.
> 
> Checking 4934 known randomly found previously-slow-path asin inputs
> shows 25 calls with incorrectly rounded results, with a maximum error of
> 0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
> rounded all these inputs correctly (error < 0.5 ULP).
> The observed average speed increase was 130x.
> 
> Checking 36240 known randomly found previously-slow-path acos inputs
> shows 42 calls with incorrectly rounded results, with a maximum error of
> 0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
> slow-path code showed 34 calls with incorrectly rounded results, with the
> same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
> The observed average speed increase was 130x.
> 
> The functions could likely be trimmed more while keeping acceptable
> accuracy, but this at least gets rid of the egregiously slow cases.
> 
> Tested on x86_64.
> ---
...
>  double
>  SECTION
>  __ieee754_asin(double x){
> @@ -100,13 +88,7 @@ __ieee754_asin(double x){
>        if (res == res+1.00014*cor) return res;
>        else {
>  	__doasin(x,0,w);
> -	if (w[0]==(w[0]+1.00000001*w[1])) return w[0];
> -	else {
> -	  y=fabs(x);
> -	  res=fabs(w[0]);
> -	  res1=fabs(w[0]+1.1*w[1]);
> -	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
> -	}
> +	return w[0];

if we assume the code is correctly rounded (though your tests
show it isnt), then we can derive error bounds from the
checks, e.g.

  if (res == res+1.00014*cor) return res;

here means that the computed result is res + cor, where
|cor| < 0.5 ulp (in nearest rounding mode), correct rounding
would require the exact result res + xcor with
|xcor| < 1.00014*|cor| < .50007 ulp.

so in this case i think that earlier if can be removed too
with a comment saying that worst case error is expected to
be .50007 ulp in nearest rounding mode.

it would be nice to do similar analysis in other cases
and have some comments in the code about expected error
bounds. (but if that seems too much work i'm not against
the proposed patch.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ieee754: Remove slow paths from asin and acos
  2020-01-27 13:12 ` [PATCH 1/2] ieee754: Remove slow paths from asin and acos Szabolcs Nagy
@ 2020-01-28 10:51   ` Anssi Hannula
  2020-03-24 23:15   ` Joseph Myers
  1 sibling, 0 replies; 8+ messages in thread
From: Anssi Hannula @ 2020-01-28 10:51 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha; +Cc: nd

On 27.1.2020 14.51, Szabolcs Nagy wrote:
> On 27/01/2020 10:45, Anssi Hannula wrote:
>> asin and acos have slow paths for rounding the last bit that cause some
>> calls to be 500-1500x slower than average calls.
>>
>> These slow paths are rare, a test of a trillion (1.000.000.000.000)
>> random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
>> for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.
>>
>> The slow paths claim correct rounding and use __sin32() and __cos32()
>> (which compare two result candidates and return the closest one) as the
>> final step, with the second result candidate (res1) having a small offset
>> applied from res. This suggests that res and res1 are intended to be 1
>> ULP apart (which makes sense for rounding), barring bugs, allowing us to
>> pick either one and still remain within 1 ULP of the exact result.
>>
>> Remove the slow paths as the accuracy is better than 1 ULP even without
>> them, which is enough for glibc.
>>
>> Also remove code comments claiming correctly rounded results.
> the patch looks good to me.
>
>> After slow path removal, checking the accuracy of 14.400.000.000 random
>> asin() and acos() inputs showed only three incorrectly rounded
>> (error > 0.5 ULP) results:
>> - asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
>> - asin(-0x1.f692ba202abcp-4)  (0.500003 ULP, same as before)
>> - asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
>> The first two had the same error even before this commit, and they did
>> not use the slow path at all.
>>
>> Checking 4934 known randomly found previously-slow-path asin inputs
>> shows 25 calls with incorrectly rounded results, with a maximum error of
>> 0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
>> rounded all these inputs correctly (error < 0.5 ULP).
>> The observed average speed increase was 130x.
>>
>> Checking 36240 known randomly found previously-slow-path acos inputs
>> shows 42 calls with incorrectly rounded results, with a maximum error of
>> 0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
>> slow-path code showed 34 calls with incorrectly rounded results, with the
>> same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
>> The observed average speed increase was 130x.
>>
>> The functions could likely be trimmed more while keeping acceptable
>> accuracy, but this at least gets rid of the egregiously slow cases.
>>
>> Tested on x86_64.
>> ---
> ...
>>  double
>>  SECTION
>>  __ieee754_asin(double x){
>> @@ -100,13 +88,7 @@ __ieee754_asin(double x){
>>        if (res == res+1.00014*cor) return res;
>>        else {
>>  	__doasin(x,0,w);
>> -	if (w[0]==(w[0]+1.00000001*w[1])) return w[0];
>> -	else {
>> -	  y=fabs(x);
>> -	  res=fabs(w[0]);
>> -	  res1=fabs(w[0]+1.1*w[1]);
>> -	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
>> -	}
>> +	return w[0];
> if we assume the code is correctly rounded (though your tests
> show it isnt), then we can derive error bounds from the
> checks, e.g.
>
>   if (res == res+1.00014*cor) return res;
>
> here means that the computed result is res + cor, where
> |cor| < 0.5 ulp (in nearest rounding mode), correct rounding
> would require the exact result res + xcor with
> |xcor| < 1.00014*|cor| < .50007 ulp.
>
> so in this case i think that earlier if can be removed too
> with a comment saying that worst case error is expected to
> be .50007 ulp in nearest rounding mode.

Indeed, I suspected as much.
I wasn't really comfortable trying to derive error bounds (and/or remove
more code) as the original results were not 100% accurate and I'm not
too much of an expert on this.

> it would be nice to do similar analysis in other cases
> and have some comments in the code about expected error
> bounds. (but if that seems too much work i'm not against
> the proposed patch.)


I'd prefer to keep the patch as-is as long as that is acceptable, as
I've spent a bit too much time on this already :)

Thanks for taking a look.

-- 
Anssi Hannula / Bitwise Oy
+358 503803997

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ieee754: Remove slow paths from asin and acos
  2020-01-27 13:12 ` [PATCH 1/2] ieee754: Remove slow paths from asin and acos Szabolcs Nagy
  2020-01-28 10:51   ` Anssi Hannula
@ 2020-03-24 23:15   ` Joseph Myers
  1 sibling, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-03-24 23:15 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Anssi Hannula, libc-alpha, nd

On Mon, 27 Jan 2020, Szabolcs Nagy wrote:

> On 27/01/2020 10:45, Anssi Hannula wrote:
> > asin and acos have slow paths for rounding the last bit that cause some
> > calls to be 500-1500x slower than average calls.
> > 
> > These slow paths are rare, a test of a trillion (1.000.000.000.000)
> > random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
> > for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.
> > 
> > The slow paths claim correct rounding and use __sin32() and __cos32()
> > (which compare two result candidates and return the closest one) as the
> > final step, with the second result candidate (res1) having a small offset
> > applied from res. This suggests that res and res1 are intended to be 1
> > ULP apart (which makes sense for rounding), barring bugs, allowing us to
> > pick either one and still remain within 1 ULP of the exact result.
> > 
> > Remove the slow paths as the accuracy is better than 1 ULP even without
> > them, which is enough for glibc.
> > 
> > Also remove code comments claiming correctly rounded results.
> 
> the patch looks good to me.

Yes, please commit.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [COMMITTED] ieee754: Remove slow paths from asin and acos
  2020-01-27 10:45 [PATCH 1/2] ieee754: Remove slow paths from asin and acos Anssi Hannula
  2020-01-27 11:25 ` [PATCH 2/2] ieee754: Remove unused __sin32 and __cos32 Anssi Hannula
  2020-01-27 13:12 ` [PATCH 1/2] ieee754: Remove slow paths from asin and acos Szabolcs Nagy
@ 2020-12-18  7:09 ` Siddhesh Poyarekar
  2020-12-18  8:28   ` Paul Zimmermann
  2 siblings, 1 reply; 8+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-18  7:09 UTC (permalink / raw)
  To: libc-alpha; +Cc: Anssi Hannula, joseph

Looks like this patchset was approved but never committed.  I have
pushed it now:

https://sourceware.org/pipermail/glibc-cvs/2020q4/071274.html
https://sourceware.org/pipermail/glibc-cvs/2020q4/071275.html

Siddhesh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [COMMITTED] ieee754: Remove slow paths from asin and acos
  2020-12-18  7:09 ` [COMMITTED] " Siddhesh Poyarekar
@ 2020-12-18  8:28   ` Paul Zimmermann
  2020-12-18  9:26     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Zimmermann @ 2020-12-18  8:28 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha, joseph

       Hi,

> Date: Fri, 18 Dec 2020 12:39:26 +0530
> From: Siddhesh Poyarekar via Libc-alpha <libc-alpha@sourceware.org>
> 
> Looks like this patchset was approved but never committed.  I have
> pushed it now:
> 
> https://sourceware.org/pipermail/glibc-cvs/2020q4/071274.html
> https://sourceware.org/pipermail/glibc-cvs/2020q4/071275.html
> 
> Siddhesh

note that acos() was not correctly rounded before (x=0x1.fffff3634acd6p-1
gave an error of 0.5000000044534775 ulps [1]).

I did run "make bench" in 2.32 and after those patches.

2.32:
  "acos": {
   "": {
    "duration": 3.63669e+09,
    "iterations": 2.43e+07,
    "max": 892.2,
    "min": 15.357,
    "mean": 149.658
   },
  "asin": {
   "": {
    "duration": 3.58039e+09,
    "iterations": 2.5e+07,
    "max": 488.568,
    "min": 15.231,
    "mean": 143.216
   },

after:
  "acos": {
   "": {
    "duration": 3.65234e+09,
    "iterations": 2.43e+07,
    "max": 1015,
    "min": 15.456,
    "mean": 150.302
   },
  "asin": {
   "": {
    "duration": 3.61159e+09,
    "iterations": 2.5e+07,
    "max": 1011.85,
    "min": 16.131,
    "mean": 144.464
   },

It does not seem the maximal time did decrease. Can someone else check?

Paul

[1] https://members.loria.fr/PZimmermann/papers/accuracy.pdf


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [COMMITTED] ieee754: Remove slow paths from asin and acos
  2020-12-18  8:28   ` Paul Zimmermann
@ 2020-12-18  9:26     ` Siddhesh Poyarekar
  0 siblings, 0 replies; 8+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-18  9:26 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: libc-alpha, joseph

On 12/18/20 1:58 PM, Paul Zimmermann wrote:
>         Hi,
> 
>> Date: Fri, 18 Dec 2020 12:39:26 +0530
>> From: Siddhesh Poyarekar via Libc-alpha <libc-alpha@sourceware.org>
>>
>> Looks like this patchset was approved but never committed.  I have
>> pushed it now:
>>
>> https://sourceware.org/pipermail/glibc-cvs/2020q4/071274.html
>> https://sourceware.org/pipermail/glibc-cvs/2020q4/071275.html
>>
>> Siddhesh
> 
> note that acos() was not correctly rounded before (x=0x1.fffff3634acd6p-1
> gave an error of 0.5000000044534775 ulps [1]).

That seems within an acceptable range for glibc; some years ago we gave 
up trying to be correctly rounded.

> I did run "make bench" in 2.32 and after those patches.
> 
> 2.32:
>    "acos": {
>     "": {
>      "duration": 3.63669e+09,
>      "iterations": 2.43e+07,
>      "max": 892.2,
>      "min": 15.357,
>      "mean": 149.658
>     },
>    "asin": {
>     "": {
>      "duration": 3.58039e+09,
>      "iterations": 2.5e+07,
>      "max": 488.568,
>      "min": 15.231,
>      "mean": 143.216
>     },
> 
> after:
>    "acos": {
>     "": {
>      "duration": 3.65234e+09,
>      "iterations": 2.43e+07,
>      "max": 1015,
>      "min": 15.456,
>      "mean": 150.302
>     },
>    "asin": {
>     "": {
>      "duration": 3.61159e+09,
>      "iterations": 2.5e+07,
>      "max": 1011.85,
>      "min": 16.131,
>      "mean": 144.464
>     },
> 
> It does not seem the maximal time did decrease. Can someone else check?
> 

That may just be due to the slow inputs not being represented in the 
synthetic workload in the benchmark; IIRC I had auto-generated a random 
set for the non-mp paths.

Siddhesh

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-18  9:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 10:45 [PATCH 1/2] ieee754: Remove slow paths from asin and acos Anssi Hannula
2020-01-27 11:25 ` [PATCH 2/2] ieee754: Remove unused __sin32 and __cos32 Anssi Hannula
2020-01-27 13:12 ` [PATCH 1/2] ieee754: Remove slow paths from asin and acos Szabolcs Nagy
2020-01-28 10:51   ` Anssi Hannula
2020-03-24 23:15   ` Joseph Myers
2020-12-18  7:09 ` [COMMITTED] " Siddhesh Poyarekar
2020-12-18  8:28   ` Paul Zimmermann
2020-12-18  9:26     ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).