[PATCH 2/6] Remove slow paths from sin/cos

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 2/6] Remove slow paths from sin/cos
@ 2018-03-09 15:46 Wilco Dijkstra
  2018-03-09 16:17 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2018-03-09 15:46 UTC (permalink / raw)
  To: libc-alpha; +Cc: nd

This patch removes 2nd of the 3 range reduction cases and defer to the final one.
Input values above 2^27 are extremely rare, so this case doesn't need to as be
optimized as smaller inputs.

ChangeLog:
2018-03-09  Wilco Dijkstra  <wdijkstr@arm.com>

	* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
	(do_sincos_2): Likewise.
	(__sin): Remove middle range reduction case.
	(__cos): Likewise.
	* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
	reduction case.

--
diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 9673a461ac592fc2bf3babc755dae336312e4c56..1f98e29278183d1fccd7c2b3fd467d6b16c245ed 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -362,80 +362,6 @@ do_sincos_1 (double a, double da, double x, int4 n, bool shift_quadrant)
   return retval;
 }
 
-static inline int4
-__always_inline
-reduce_sincos_2 (double x, double *a, double *da)
-{
-  mynumber v;
-
-  double t = (x * hpinv + toint);
-  double xn = t - toint;
-  v.x = t;
-  double xn1 = (xn + 8.0e22) - 8.0e22;
-  double xn2 = xn - xn1;
-  double y = ((((x - xn1 * mp1) - xn1 * mp2) - xn2 * mp1) - xn2 * mp2);
-  int4 n = v.i[LOW_HALF] & 3;
-  double db = xn1 * pp3;
-  t = y - db;
-  db = (y - t) - db;
-  db = (db - xn2 * pp3) - xn * pp4;
-  double b = t + db;
-  db = (t - b) + db;
-
-  *a = b;
-  *da = db;
-
-  return n;
-}
-
-/* Compute sin (A + DA).  cos can be computed by passing SHIFT_QUADRANT as
-   true, which results in shifting the quadrant N clockwise.  */
-static double
-__always_inline
-do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant)
-{
-  double res, retval, cor, xx;
-
-  double eps = 1.0e-24;
-
-  int4 k = (n + shift_quadrant) & 3;
-
-  switch (k)
-    {
-    case 2:
-      a = -a;
-      da = -da;
-      /* Fall through.  */
-    case 0:
-      xx = a * a;
-      if (xx < 0.01588)
-	{
-	  /* Taylor series.  */
-	  res = TAYLOR_SIN (xx, a, da, cor);
-	  cor = 1.02 * cor + __copysign (eps, cor);
-	  retval = (res == res + cor) ? res : bsloww (a, da, x, n);
-	}
-      else
-	{
-	  res = do_sin (a, da, &cor);
-	  cor = 1.035 * cor + __copysign (eps, cor);
-	  retval = ((res == res + cor) ? __copysign (res, a)
-		    : bsloww1 (a, da, x, n));
-	}
-      break;
-
-    case 1:
-    case 3:
-      res = do_cos (a, da, &cor);
-      cor = 1.025 * cor + __copysign (eps, cor);
-      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
-		: bsloww2 (a, da, x, n));
-      break;
-    }
-
-  return retval;
-}
-
 /*******************************************************************/
 /* An ultimate sin routine. Given an IEEE double machine number x   */
 /* it computes the correctly rounded (to nearest) value of sin(x)  */
@@ -498,16 +424,7 @@ __sin (double x)
       retval = do_sincos_1 (a, da, x, n, false);
     }				/*   else  if (k <  0x419921FB )    */
 
-/*---------------------105414350 <|x|< 281474976710656 --------------------*/
-  else if (k < 0x42F00000)
-    {
-      double a, da;
-
-      int4 n = reduce_sincos_2 (x, &a, &da);
-      retval = do_sincos_2 (a, da, x, n, false);
-    }				/*   else  if (k <  0x42F00000 )   */
-
-/* -----------------281474976710656 <|x| <2^1024----------------------------*/
+/* --------------------105414350 <|x| <2^1024------------------------------*/
   else if (k < 0x7ff00000)
     retval = reduce_and_compute (x, false);
 
@@ -584,15 +501,7 @@ __cos (double x)
       retval = do_sincos_1 (a, da, x, n, true);
     }				/*   else  if (k <  0x419921FB )    */
 
-  else if (k < 0x42F00000)
-    {
-      double a, da;
-
-      int4 n = reduce_sincos_2 (x, &a, &da);
-      retval = do_sincos_2 (a, da, x, n, true);
-    }				/*   else  if (k <  0x42F00000 )    */
-
-  /* 281474976710656 <|x| <2^1024 */
+  /* 105414350 <|x| <2^1024 */
   else if (k < 0x7ff00000)
     retval = reduce_and_compute (x, true);
 
diff --git a/sysdeps/ieee754/dbl-64/s_sincos.c b/sysdeps/ieee754/dbl-64/s_sincos.c
index e1977ea7e93c32cca5369677f23e68f8f797a9f4..a9af8ce526bfe78c06cfafa65de0815ec69585c5 100644
--- a/sysdeps/ieee754/dbl-64/s_sincos.c
+++ b/sysdeps/ieee754/dbl-64/s_sincos.c
@@ -86,16 +86,6 @@ __sincos (double x, double *sinx, double *cosx)
 
       return;
     }
-  if (k < 0x42F00000)
-    {
-      double a, da;
-      int4 n = reduce_sincos_2 (x, &a, &da);
-
-      *sinx = do_sincos_2 (a, da, x, n, false);
-      *cosx = do_sincos_2 (a, da, x, n, true);
-
-      return;
-    }
   if (k < 0x7ff00000)
     {
       reduce_and_compute_sincos (x, sinx, cosx);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 15:46 [PATCH 2/6] Remove slow paths from sin/cos Wilco Dijkstra
@ 2018-03-09 16:17 ` Siddhesh Poyarekar
  2018-03-09 18:19   ` Ondřej Bílka
  0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2018-03-09 16:17 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha; +Cc: nd

On Friday 09 March 2018 09:16 PM, Wilco Dijkstra wrote:
> This patch removes 2nd of the 3 range reduction cases and defer to the final one.
> Input values above 2^27 are extremely rare, so this case doesn't need to as be
> optimized as smaller inputs.

Please elaborate on two more points:

- The basis for your claim that values above 2^27 are extremely rare,
  i.e. papers that you may have referred that conclude this and/or
  applications you may have profiled

- The quantum of performance loss due to dropping the 2nd case.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 16:17 ` Siddhesh Poyarekar
@ 2018-03-09 18:19   ` Ondřej Bílka
  2018-03-09 18:52     ` Zack Weinberg
  2018-03-09 19:06     ` Wilco Dijkstra
  0 siblings, 2 replies; 14+ messages in thread
From: Ondřej Bílka @ 2018-03-09 18:19 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Wilco Dijkstra, libc-alpha, nd

On Fri, Mar 09, 2018 at 09:47:09PM +0530, Siddhesh Poyarekar wrote:
> On Friday 09 March 2018 09:16 PM, Wilco Dijkstra wrote:
> > This patch removes 2nd of the 3 range reduction cases and defer to the final one.
> > Input values above 2^27 are extremely rare, so this case doesn't need to as be
> > optimized as smaller inputs.
> 
> Please elaborate on two more points:
> 
> - The basis for your claim that values above 2^27 are extremely rare,
>   i.e. papers that you may have referred that conclude this and/or
>   applications you may have profiled
> 
Main reason is that for these inputs accuracy doesn't make a sense.
There is already millions bigger error caused by limited precision when
rounding input to float.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 18:19   ` Ondřej Bílka
@ 2018-03-09 18:52     ` Zack Weinberg
  2018-03-09 23:05       ` Steve Ellcey
  2018-03-09 19:06     ` Wilco Dijkstra
  1 sibling, 1 reply; 14+ messages in thread
From: Zack Weinberg @ 2018-03-09 18:52 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: Siddhesh Poyarekar, Wilco Dijkstra, libc-alpha, nd

On Fri, Mar 9, 2018 at 1:19 PM, Ondřej Bílka <neleai@seznam.cz> wrote:
> On Fri, Mar 09, 2018 at 09:47:09PM +0530, Siddhesh Poyarekar wrote:
>> On Friday 09 March 2018 09:16 PM, Wilco Dijkstra wrote:
>> > This patch removes 2nd of the 3 range reduction cases and defer to the final one.
>> > Input values above 2^27 are extremely rare, so this case doesn't need to as be
>> > optimized as smaller inputs.
>>
>> Please elaborate on two more points:
>>
>> - The basis for your claim that values above 2^27 are extremely rare,
>>   i.e. papers that you may have referred that conclude this and/or
>>   applications you may have profiled
>>
> Main reason is that for these inputs accuracy doesn't make a sense.
> There is already millions bigger error caused by limited precision when
> rounding input to float.

More concretely, for IEEE single, the gap between representable values
is bigger than 2π for values whose exponent is 2^26 or above.  Since
the sine function is periodic over 2π, that means the result of sinf()
is effectively meaningless for any input at least that big - _any
value_ within the input period could have been rounded to the
representable, so the "true" answer could be anything.  (I am tempted
to suggest that we return NaN for inputs this large, without even
bothering to do argument reduction.)

For double, this happens at 2^55, and for x86-64 long double it
happens at 2^66.  I _think_ x86-64 long double is 80-bit IEEE
extended, not quad, but I could be wrong.

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 18:19   ` Ondřej Bílka
  2018-03-09 18:52     ` Zack Weinberg
@ 2018-03-09 19:06     ` Wilco Dijkstra
  2018-03-09 19:31       ` Siddhesh Poyarekar
  1 sibling, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2018-03-09 19:06 UTC (permalink / raw)
  To: Ondřej Bílka, Siddhesh Poyarekar; +Cc: libc-alpha, nd

Ondřej Bílka wrote:
> On Fri, Mar 09, 2018 at 09:47:09PM +0530, Siddhesh Poyarekar wrote:
>> On Friday 09 March 2018 09:16 PM, Wilco Dijkstra wrote:
>> > This patch removes 2nd of the 3 range reduction cases and defer to the final one.
>> > Input values above 2^27 are extremely rare, so this case doesn't need to as be
>> > optimized as smaller inputs.
>> 
>> Please elaborate on two more points:
>> 
>> - The basis for your claim that values above 2^27 are extremely rare,
>>   i.e. papers that you may have referred that conclude this and/or
>>   applications you may have profiled
>> 
> Main reason is that for these inputs accuracy doesn't make a sense.
> There is already millions bigger error caused by limited precision when
> rounding input to float.

Exactly, it's rare for sin/cos input to be > 10, largest I've ever seen was around 1000. 
There is a case to be made for making wildly out or range inputs an error or just return 0.0
rather than using insanely accurate range reduction (the result is going to be equally wrong
either way), but that's a different discussion.

Fast range reduction of up to 2^27 is a huge overkill, we could get a significant speedup by
reducing this range.

The 2nd level range reduction (2^27 till 2^48) is so pointless that there is absolutely no
reason keep it. There is about a 2.3x slowdown in this range due to __branred being
braindead slow.

Basically I can only see a reason for 3 levels of range reduction if the first range reduction
only handles a tiny range (say up to 10), the final range reduction is extremely slow and we
can find proof of an application relying on fast range reduction up to say 1000 or 1000000.

I think a 3-4x speedup of __branred should be feasible using a much simpler algorithm and
integer arithmetic. This will not only cover a much wider range but then we will also be sure
it works correctly!

Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 19:06     ` Wilco Dijkstra
@ 2018-03-09 19:31       ` Siddhesh Poyarekar
  2018-03-12 18:09         ` Wilco Dijkstra
  0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2018-03-09 19:31 UTC (permalink / raw)
  To: Wilco Dijkstra, Ondřej Bílka; +Cc: libc-alpha, nd

On Saturday 10 March 2018 12:36 AM, Wilco Dijkstra wrote:
> Exactly, it's rare for sin/cos input to be > 10, largest I've ever seen was around 1000. 
> There is a case to be made for making wildly out or range inputs an error or just return 0.0
> rather than using insanely accurate range reduction (the result is going to be equally wrong
> either way), but that's a different discussion.

To be clear, I do not object to the patchset, it is the right way
forward.  I would like to see clearer documentation as to why we decided
to drop this path in this patch and what the impact of that is.  In that
sense, it would be nice to have a better rationale in the commit log
than "I've never seen inputs that big".

> Fast range reduction of up to 2^27 is a huge overkill, we could get a significant speedup by
> reducing this range.
>
> The 2nd level range reduction (2^27 till 2^48) is so pointless that there is absolutely no
> reason keep it. There is about a 2.3x slowdown in this range due to __branred being
> braindead slow.

Thanks, all this should be part of the commit log.  However, from what I
understand, this particular patch will result in a regression (since
IIRC reduce_and_compute is slower) but the following patch will fix that
and improve things.

Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 18:52     ` Zack Weinberg
@ 2018-03-09 23:05       ` Steve Ellcey
  2018-03-10  0:52         ` Joseph Myers
  0 siblings, 1 reply; 14+ messages in thread
From: Steve Ellcey @ 2018-03-09 23:05 UTC (permalink / raw)
  To: Zack Weinberg, Ondřej Bílka
  Cc: Siddhesh Poyarekar, Wilco Dijkstra, libc-alpha, nd

On Fri, 2018-03-09 at 13:52 -0500, Zack Weinberg wrote:
>Â 
> More concretely, for IEEE single, the gap between representable values
> is bigger than 2Ï€ for values whose exponent is 2^26 or above.Â Â Since
> the sine function is periodic over 2Ï€, that means the result of sinf()
> is effectively meaningless for any input at least that big - _any
> value_ within the input period could have been rounded to the
> representable, so the "true" answer could be anything.Â Â (I am tempted
> to suggest that we return NaN for inputs this large, without even
> bothering to do argument reduction.)
> 
> For double, this happens at 2^55, and for x86-64 long double it
> happens at 2^66.Â Â I _think_ x86-64 long double is 80-bit IEEE
> extended, not quad, but I could be wrong.
> 
> zw

And yet we have tests for large numbers inÂ auto-libm-test-out-sin
and auto-libm-test-out-cos. Â I was testing a different vector sin/cos
implementation on aarch64 and I got some big diffs on very large
output. Â Maybe we shouldn't test for these values if they are not
of any value.

testing double (vector length 2)
Failure: Test: sin_vlen2 (-0x2p+64)
Result:
Â is:Â Â Â Â Â Â Â Â Â -0.0000000000000000e+00Â Â -0x0.0000000000000p+0
Â should be:Â Â Â 4.7183876212354675e-02Â Â Â 0x1.8287c2a760e04p-5
Â difference:Â Â 4.7183876212354675e-02Â Â Â 0x1.8287c2a760e04p-5
Â ulpÂ Â Â Â Â Â Â :Â Â 6799913194491396.0000
Â max.ulpÂ Â Â :Â Â 1.0000
Failure: Test: sin_vlen2 (-0x3.3de320f6be87ep+1020)
Result:
Â is:Â Â Â Â Â Â Â Â Â -0.0000000000000000e+00Â Â -0x0.0000000000000p+0
Â should be:Â Â -9.9219562491895441e-01Â Â -0x1.fc0110a085bafp-1
Â difference:Â Â 9.9219562491895441e-01Â Â Â 0x1.fc0110a085bafp-1
Â ulpÂ Â Â Â Â Â Â :Â Â 8936903693327279.0000
Â max.ulpÂ Â Â :Â Â 1.0000


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 23:05       ` Steve Ellcey
@ 2018-03-10  0:52         ` Joseph Myers
  2018-03-12 15:36           ` Wilco Dijkstra
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Myers @ 2018-03-10  0:52 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Zack Weinberg, Ondřej Bílka, Siddhesh Poyarekar,
	Wilco Dijkstra, libc-alpha, nd

[-- Attachment #1: Type: text/plain, Size: 615 bytes --]

On Fri, 9 Mar 2018, Steve Ellcey wrote:

> And yet we have tests for large numbers inÂ auto-libm-test-out-sin
> and auto-libm-test-out-cos. Â I was testing a different vector sin/cos

Yes.  A floating-point number represents a particular real number, not an 
interval, so all the usual accuracy goals (of results within a few ulps of 
the correct answer) apply for large inputs (but performance is not a 
concern for those inputs).  Only for IBM long double is this relaxed, to 
treat values not representable in 106 mantissa bits as if they do 
represent intervals.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-10  0:52         ` Joseph Myers
@ 2018-03-12 15:36           ` Wilco Dijkstra
  2018-03-12 15:46             ` Zack Weinberg
  2018-03-12 16:10             ` Joseph Myers
  0 siblings, 2 replies; 14+ messages in thread
From: Wilco Dijkstra @ 2018-03-12 15:36 UTC (permalink / raw)
  To: Joseph Myers, Steve Ellcey
  Cc: Zack Weinberg, Ondřej Bílka, Siddhesh Poyarekar,
	libc-alpha, nd

Joseph Myers wrote:

> Yes.  A floating-point number represents a particular real number, not an 
> interval, so all the usual accuracy goals (of results within a few ulps of 
> the correct answer) apply for large inputs (but performance is not a 
> concern for those inputs).  Only for IBM long double is this relaxed, to 
> treat values not representable in 106 mantissa bits as if they do 
> represent intervals.

Though these patches keep the ULP accuracy across the full range as is, 
we could agree on higher ULP errors for large/huge range reduction cases 
in the future. The main complexity is for certain rare inputs which happen to
be extremely close to an integer multiple of PI/2, and those few cases mean
you need significant extra work to guarantee 0.5 ULP error bound on range
reduction.

Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-12 15:36           ` Wilco Dijkstra
@ 2018-03-12 15:46             ` Zack Weinberg
  2018-03-12 16:10             ` Joseph Myers
  1 sibling, 0 replies; 14+ messages in thread
From: Zack Weinberg @ 2018-03-12 15:46 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Joseph Myers, Steve Ellcey, Ondřej Bílka,
	Siddhesh Poyarekar, libc-alpha, nd

On Mon, Mar 12, 2018 at 11:36 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Joseph Myers wrote:
>
>> Yes.  A floating-point number represents a particular real number, not an
>> interval, so all the usual accuracy goals (of results within a few ulps of
>> the correct answer) apply for large inputs (but performance is not a
>> concern for those inputs).  Only for IBM long double is this relaxed, to
>> treat values not representable in 106 mantissa bits as if they do
>> represent intervals.
>
> Though these patches keep the ULP accuracy across the full range as is,
> we could agree on higher ULP errors for large/huge range reduction cases
> in the future. The main complexity is for certain rare inputs which happen to
> be extremely close to an integer multiple of PI/2, and those few cases mean
> you need significant extra work to guarantee 0.5 ULP error bound on range
> reduction.

I think 0.5 ULP error bound for values close to _small_ integer
multiples of pi/2 is worth preserving.  I'm not sure what the right
cutoff is, but if we need to do that "significant extra work" for,
like, values close to 10*pi and the most natural way to implement that
also gives us the same error bound for values close to 1e6*pi, then I
don't see why we shouldn't keep it for both.

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-12 15:36           ` Wilco Dijkstra
  2018-03-12 15:46             ` Zack Weinberg
@ 2018-03-12 16:10             ` Joseph Myers
  2018-03-12 21:13               ` Wilco Dijkstra
  1 sibling, 1 reply; 14+ messages in thread
From: Joseph Myers @ 2018-03-12 16:10 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Steve Ellcey, Zack Weinberg, Ondřej Bílka,
	Siddhesh Poyarekar, libc-alpha, nd

On Mon, 12 Mar 2018, Wilco Dijkstra wrote:

> Though these patches keep the ULP accuracy across the full range as is, 
> we could agree on higher ULP errors for large/huge range reduction cases 
> in the future. The main complexity is for certain rare inputs which happen to
> be extremely close to an integer multiple of PI/2, and those few cases mean
> you need significant extra work to guarantee 0.5 ULP error bound on range
> reduction.

I don't think the work for having an error bound not much more than 0.5ulp 
on the final result of sin/cos for large arguments is significantly 
different from the work for having an error bound of say 3ulp (the 
testsuite has a global maximum of 9ulp (16ulp for IBM long double) beyond 
which it will not accept errors even if those large errors are listed in 
libm-test-ulps files - that bound is simply based on the errors 
empirically observed at present, for functions other than Bessel functions 
and cpow which are known to have cases with much larger errors).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-09 19:31       ` Siddhesh Poyarekar
@ 2018-03-12 18:09         ` Wilco Dijkstra
  2018-03-13  8:53           ` Siddhesh Poyarekar
  0 siblings, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2018-03-12 18:09 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Ondřej Bílka; +Cc: libc-alpha, nd

Siddhesh Poyarekar wrote:
> On Saturday 10 March 2018 12:36 AM, Wilco Dijkstra wrote:
>> Exactly, it's rare for sin/cos input to be > 10, largest I've ever seen was around 1000.
>> There is a case to be made for making wildly out or range inputs an error or just return 0.0
>> rather than using insanely accurate range reduction (the result is going to be equally wrong
>> either way), but that's a different discussion.
>
> To be clear, I do not object to the patchset, it is the right way
> forward.  I would like to see clearer documentation as to why we decided
> to drop this path in this patch and what the impact of that is.  In that
> sense, it would be nice to have a better rationale in the commit log
> than "I've never seen inputs that big".

There are multiple reasons, correctness (the implementation wasn't accurate enough),
premature optimization, complexity, etc. If we are to spend time on developing a better
range reduction, the time is best spent on (a) common cases, ie. small values, and
(b) on a much faster general range reducer (which could be designed to be faster on
smaller inputs rather than be constant-time time __branred).

A lot of the speedup is achieved by just making the code simpler. If we want to support 3-level
range reduction, there must be a really good reason for it (I've not seen 3-level reduction used
anywhere else).

>> Fast range reduction of up to 2^27 is a huge overkill, we could get a significant speedup by
>> reducing this range.
>>
>> The 2nd level range reduction (2^27 till 2^48) is so pointless that there is absolutely no
>> reason keep it. There is about a 2.3x slowdown in this range due to __branred being
>> braindead slow.
>
> Thanks, all this should be part of the commit log.  However, from what I
> understand, this particular patch will result in a regression (since
> IIRC reduce_and_compute is slower) but the following patch will fix that
> and improve things.

I'll improve the commit log in the next version (assuming there are comments on the actual
patch!).

Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-12 16:10             ` Joseph Myers
@ 2018-03-12 21:13               ` Wilco Dijkstra
  0 siblings, 0 replies; 14+ messages in thread
From: Wilco Dijkstra @ 2018-03-12 21:13 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Steve Ellcey, Zack Weinberg, Ondřej Bílka,
	Siddhesh Poyarekar, libc-alpha, nd

Joseph Myers wrote:
> On Mon, 12 Mar 2018, Wilco Dijkstra wrote:
>
>> Though these patches keep the ULP accuracy across the full range as is, 
>> we could agree on higher ULP errors for large/huge range reduction cases 
>> in the future. The main complexity is for certain rare inputs which happen to
>> be extremely close to an integer multiple of PI/2, and those few cases mean
>> you need significant extra work to guarantee 0.5 ULP error bound on range
>> reduction.
>
> I don't think the work for having an error bound not much more than 0.5ulp 
> on the final result of sin/cos for large arguments is significantly 
> different from the work for having an error bound of say 3ulp (the 
> testsuite has a global maximum of 9ulp (16ulp for IBM long double) beyond 
> which it will not accept errors even if those large errors are listed in 
> libm-test-ulps files - that bound is simply based on the errors 
> empirically observed at present, for functions other than Bessel functions 
> and cpow which are known to have cases with much larger errors).

Having to do another reduction step can increase latency significantly. To give
an example, I had to increase accuracy of the first-level range reduction because
a handful huge multiples of PI (6-7 digits) had 2-3ULP reduction error. The additional
accuracy means range reduction became 50% slower while being way too accurate
for almost all inputs.

Obviously it would be possible redesign it from scratch using a much smaller input
range and FMA when available etc.

Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] Remove slow paths from sin/cos
  2018-03-12 18:09         ` Wilco Dijkstra
@ 2018-03-13  8:53           ` Siddhesh Poyarekar
  0 siblings, 0 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2018-03-13  8:53 UTC (permalink / raw)
  To: Wilco Dijkstra, Ondřej Bílka; +Cc: libc-alpha, nd

On Monday 12 March 2018 11:38 PM, Wilco Dijkstra wrote:
> I'll improve the commit log in the next version (assuming there are comments on the actual
> patch!).

Overall the patchset looks good to me but I don't think I'll have time
to take a closer look any time this week or the next.  In any case I
will defer to Joseph for final approval of the patchset given his more
extensive experience in the area.

Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-13  8:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 15:46 [PATCH 2/6] Remove slow paths from sin/cos Wilco Dijkstra
2018-03-09 16:17 ` Siddhesh Poyarekar
2018-03-09 18:19   ` Ondřej Bílka
2018-03-09 18:52     ` Zack Weinberg
2018-03-09 23:05       ` Steve Ellcey
2018-03-10  0:52         ` Joseph Myers
2018-03-12 15:36           ` Wilco Dijkstra
2018-03-12 15:46             ` Zack Weinberg
2018-03-12 16:10             ` Joseph Myers
2018-03-12 21:13               ` Wilco Dijkstra
2018-03-09 19:06     ` Wilco Dijkstra
2018-03-09 19:31       ` Siddhesh Poyarekar
2018-03-12 18:09         ` Wilco Dijkstra
2018-03-13  8:53           ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).