[PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x|

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469
  2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
  2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
  2016-08-23 18:23 ` [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos Siddhesh Poyarekar
@ 2016-08-23 18:23 ` Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
  2016-09-01 16:35   ` [PATCH " Joseph Myers
  2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
  2016-08-23 18:23 ` [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin Siddhesh Poyarekar
  4 siblings, 2 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

The only code looks slightly different from DO_SIN but on closer
examination, should give exactly the same result.  Drop it in favour
of the DO_SIN function call.

	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Use DO_SIN.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index e03c75a..82f9345 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -441,7 +441,7 @@ SECTION
 #endif
 __sin (double x)
 {
-  double xx, res, t, cor, y, s, c, sn, ssn, cs, ccs;
+  double xx, res, t, cor;
   mynumber u;
   int4 k, m;
   double retval = 0;
@@ -471,23 +471,8 @@ __sin (double x)
 /*---------------------------- 0.25<|x|< 0.855469---------------------- */
   else if (k < 0x3feb6000)
     {
-      u.x = big + fabs (x);
-      y = fabs (x) - (u.x - big);
-      y = (x > 0 ? y : -y);
-
-      xx = y * y;
-      s = y + y * xx * (sn3 + xx * sn5);
-      c = xx * (cs2 + xx * (cs4 + xx * cs6));
-      SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
-      if (m <= 0)
-        {
-          sn = -sn;
-	  ssn = -ssn;
-	}
-      cor = (ssn + s * ccs - sn * c) + cs * s;
-      res = sn + cor;
-      cor = (sn - res) + cor;
-      retval = (res == res + 1.096 * cor) ? res : slow1 (x);
+      res = do_sin (x, 0, &cor);
+      retval = (res == res + 1.096 * cor) ? (m > 0 ? res : -res) : slow1 (x);
     }				/*   else  if (k < 0x3feb6000)    */
 
 /*----------------------- 0.855469  <|x|<2.426265  ----------------------*/
-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/5] sin/cos/sincos cleanups
@ 2016-08-23 18:23 Siddhesh Poyarekar
  2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

Hi,

Here is another set of patches to clean up the sin/cos code further.  The focus
of these patches was to consolidate and simplify code in an attempt to reduce
duplicates and introduce some consistency in computation.  For example, there
are places that use fabs(x) and others that use if(x > 0) {... x ...} else {...
 -x ...}.  As a final patch, I inlined all of the support functions.

The cumulative effect of these patches is a 16% improvement in sincos in the
min case and 3% in the mean case in the microbenchmark.  sin regresses by 4% in
the min case and is largely unaffected by the mean case.  cos is faster by 3%
in the min case and unchanged in the mean case.  In addition to the
microbenchmark, I also tested SPEC2006, which gives about a 2% improvement on
aarch64 and similar (about 1.5%) on x86_64.

Tested on x86_64 and aarch64 to verify that there are no regressions.  There is
further scope for consolidation in these functions and I intend to continue
working on them on top of these changes.  While the primary effect will be
readability of the code, I also expect the changes to have a positive impact on
performance, especially for sincos.

Siddhesh

Siddhesh Poyarekar (5):
  Consolidate reduce_and_compute code
  Use fabs(x) instead of branching on signedness of input to sin and cos
  Consolidate input partitioning into do_cos and do_sin
  Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469
  Inline all support functions for sin and cos

 sysdeps/ieee754/dbl-64/s_sin.c | 420 ++++++++++++++++-------------------------
 1 file changed, 158 insertions(+), 262 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] Consolidate reduce_and_compute code
  2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
@ 2016-08-23 18:23 ` Siddhesh Poyarekar
  2016-08-24  1:52   ` Adhemerval Zanella
  2016-08-29 16:03   ` Joseph Myers
  2016-08-23 18:23 ` [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos Siddhesh Poyarekar
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

This patch reshuffles the reduce_and_compute code so that the
structure matches other code structures of the same type elsewhere in
s_sin.c and s_sincos.c.  This is the beginning of an attempt to
consolidate and reduce code duplication in functions in s_sin.c to
make it easier to read and possibly also easier for the compiler to
optimize.

	* sysdeps/ieee754/dbl-64/s_sin.c (reduce_and_compute):
	Consolidate switch cases 0 and 2.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 7c9a079..e1ee7a9 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -249,23 +249,20 @@ reduce_and_compute (double x, unsigned int k)
   k = (n + k) % 4;
   switch (k)
     {
-      case 0:
-	if (a * a < 0.01588)
-	  retval = bsloww (a, da, x, n);
-	else
-	  retval = bsloww1 (a, da, x, n);
-	break;
-      case 2:
-	if (a * a < 0.01588)
-	  retval = bsloww (-a, -da, x, n);
-	else
-	  retval = bsloww1 (-a, -da, x, n);
-	break;
+    case 2:
+      a = -a;
+      da = -da;
+    case 0:
+      if (a * a < 0.01588)
+	retval = bsloww (a, da, x, n);
+      else
+	retval = bsloww1 (a, da, x, n);
+      break;
 
-      case 1:
-      case 3:
-	retval = bsloww2 (a, da, x, n);
-	break;
+    case 1:
+    case 3:
+      retval = bsloww2 (a, da, x, n);
+      break;
     }
   return retval;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/5] Inline all support functions for sin and cos
  2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
                   ` (2 preceding siblings ...)
  2016-08-23 18:23 ` [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469 Siddhesh Poyarekar
@ 2016-08-23 18:23 ` Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
                     ` (2 more replies)
  2016-08-23 18:23 ` [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin Siddhesh Poyarekar
  4 siblings, 3 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

The support functions for sin and cos have a lot of identical
functionality, so inlining them gives a pretty decent jump in
functionality: ~19% in the sincos function.  On SPEC2006 this
translates to about 2.1% in the tonto test.

	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Mark as inline.
	(do_cos_slow): Likewise.
	(do_sin): Likewise.
	(do_sin_slow): Likewise.
	(slow): Likewise.
	(slow1): Likewise.
	(slow2): Likewise.
	(sloww): Likewise.
	(sloww1): Likewise.
	(sloww2): Likewise.
	(bsloww): Likewise.
	(bsloww1): Likewise.
	(bsloww2): Likewise.
	(cslow2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 52 +++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 82f9345..c20ef4d 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -145,7 +145,8 @@ static double cslow2 (double x);
    of the number by combining the sin and cos of X (as computed by a variation
    of the Taylor series) with the values looked up from the sin/cos table to
    get the result in RES and a correction value in COR.  */
-static double
+static inline double
+__always_inline
 do_cos (double x, double dx, double *corp)
 {
   mynumber u;
@@ -170,7 +171,8 @@ do_cos (double x, double dx, double *corp)
 
 /* A more precise variant of DO_COS.  EPS is the adjustment to the correction
    COR.  */
-static double
+static inline double
+__always_inline
 do_cos_slow (double x, double dx, double eps, double *corp)
 {
   mynumber u;
@@ -205,7 +207,8 @@ do_cos_slow (double x, double dx, double eps, double *corp)
    the number by combining the sin and cos of X (as computed by a variation of
    the Taylor series) with the values looked up from the sin/cos table to get
    the result in RES and a correction value in COR.  */
-static double
+static inline double
+__always_inline
 do_sin (double x, double dx, double *corp)
 {
   mynumber u;
@@ -229,7 +232,8 @@ do_sin (double x, double dx, double *corp)
 
 /* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
    COR.  */
-static double
+static inline double
+__always_inline
 do_sin_slow (double x, double dx, double eps, double *corp)
 {
   mynumber u;
@@ -615,8 +619,8 @@ __cos (double x)
 /* precision  and if still doesn't accurate enough by mpsin   or dubsin */
 /************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 slow (double x)
 {
   double res, cor, w[2];
@@ -636,8 +640,8 @@ slow (double x)
 /* and if result still doesn't accurate enough by mpsin   or dubsin            */
 /*******************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 slow1 (double x)
 {
   double w[2], cor, res;
@@ -657,8 +661,8 @@ slow1 (double x)
 /*  Routine compute sin(x) for   0.855469  <|x|<2.426265  by  __sincostab.tbl  */
 /* and if result still doesn't accurate enough by mpsin   or dubsin       */
 /**************************************************************************/
-static double
-SECTION
+static inline double
+__always_inline
 slow2 (double x)
 {
   double w[2], y, y1, y2, cor, res;
@@ -686,8 +690,8 @@ slow2 (double x)
 /* result.And if result not accurate enough routine calls mpsin1 or dubsin */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww (double x, double dx, double orig, int k)
 {
   double y, t, res, cor, w[2], a, da, xn;
@@ -747,8 +751,8 @@ sloww (double x, double dx, double orig, int k)
 /* accurate enough routine calls  mpsin1   or dubsin                       */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww1 (double x, double dx, double orig, int k)
 {
   double w[2], cor, res;
@@ -777,8 +781,8 @@ sloww1 (double x, double dx, double orig, int k)
 /* accurate enough routine calls  mpsin1   or dubsin                       */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww2 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -808,8 +812,8 @@ sloww2 (double x, double dx, double orig, int n)
 /* result.And if result not accurate enough routine calls other routines    */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww (double x, double dx, double orig, int n)
 {
   double res, cor, w[2], a, da;
@@ -837,8 +841,8 @@ bsloww (double x, double dx, double orig, int n)
 /* And if result not  accurate enough routine calls  other routines         */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww1 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -865,8 +869,8 @@ bsloww1 (double x, double dx, double orig, int n)
 /* And if result not accurate enough routine calls  other routines          */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww2 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -891,8 +895,8 @@ bsloww2 (double x, double dx, double orig, int n)
 /* precision  and if still doesn't accurate enough by mpcos   or docos  */
 /************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 cslow2 (double x)
 {
   double w[2], cor, res;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin
  2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
                   ` (3 preceding siblings ...)
  2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
@ 2016-08-23 18:23 ` Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
  2016-09-01 16:23   ` [PATCH " Joseph Myers
  4 siblings, 2 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

All calls to do_cos are preceded by code that partitions x into a
larger double that gives an offset into the sincos table and a smaller
double that is used in a polynomial computation.  Consolidate all of
them into do_cos and do_sin to reduce code duplication.

	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
	arguments.  Consolidate input partitioning from callers here.
	(do_cos_slow): Likewise.
	(do_sin): Likewise.
	(do_sin_slow): Likewise.
	(do_sincos_1): Remove the no longer necessary input partitioning.
	(do_sincos_2): Likewise.
	(__sin): Likewise.
	(__cos): Likewise.
	(slow1): Likewise.
	(slow2): Likewise.
	(sloww1): Likewise.
	(sloww2): Likewise.
	(bsloww1): Likewise.
	(bsloww2): Likewise.
	(cslow2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 191 ++++++++++++++++++-----------------------
 1 file changed, 82 insertions(+), 109 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 7f6cd09..e03c75a 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -141,14 +141,21 @@ static double bsloww2 (double x, double dx, double orig, int n);
 int __branred (double x, double *a, double *aa);
 static double cslow2 (double x);
 
-/* Given a number partitioned into U and X such that U is an index into the
-   sin/cos table, this macro computes the cosine of the number by combining
-   the sin and cos of X (as computed by a variation of the Taylor series) with
-   the values looked up from the sin/cos table to get the result in RES and a
-   correction value in COR.  */
+/* Given a number partitioned into X and DX, this function computes the cosine
+   of the number by combining the sin and cos of X (as computed by a variation
+   of the Taylor series) with the values looked up from the sin/cos table to
+   get the result in RES and a correction value in COR.  */
 static double
-do_cos (mynumber u, double x, double *corp)
+do_cos (double x, double dx, double *corp)
 {
+  mynumber u;
+
+  if (x < 0)
+    dx = -dx;
+
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big) + dx;
+
   double xx, s, sn, ssn, c, cs, ccs, res, cor;
   xx = x * x;
   s = x + x * xx * (sn3 + xx * sn5);
@@ -161,11 +168,19 @@ do_cos (mynumber u, double x, double *corp)
   return res;
 }
 
-/* A more precise variant of DO_COS where the number is partitioned into U, X
-   and DX.  EPS is the adjustment to the correction COR.  */
+/* A more precise variant of DO_COS.  EPS is the adjustment to the correction
+   COR.  */
 static double
-do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
+do_cos_slow (double x, double dx, double eps, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, y, x1, x2, e1, e2, res, cor;
   double s, sn, ssn, c, cs, ccs;
   xx = x * x;
@@ -186,14 +201,20 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
   return res;
 }
 
-/* Given a number partitioned into U and X and DX such that U is an index into
-   the sin/cos table, this macro computes the sine of the number by combining
-   the sin and cos of X (as computed by a variation of the Taylor series) with
-   the values looked up from the sin/cos table to get the result in RES and a
-   correction value in COR.  */
+/* Given a number partitioned into X and DX, this function computes the sine of
+   the number by combining the sin and cos of X (as computed by a variation of
+   the Taylor series) with the values looked up from the sin/cos table to get
+   the result in RES and a correction value in COR.  */
 static double
-do_sin (mynumber u, double x, double dx, double *corp)
+do_sin (double x, double dx, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, s, sn, ssn, c, cs, ccs, cor, res;
   xx = x * x;
   s = x + (dx + x * xx * (sn3 + xx * sn5));
@@ -206,11 +227,18 @@ do_sin (mynumber u, double x, double dx, double *corp)
   return res;
 }
 
-/* A more precise variant of res = do_sin where the number is partitioned into U, X
-   and DX.  EPS is the adjustment to the correction COR.  */
+/* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
+   COR.  */
 static double
-do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
+do_sin_slow (double x, double dx, double eps, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, y, x1, x2, c1, c2, res, cor;
   double s, sn, ssn, c, cs, ccs;
   xx = x * x;
@@ -288,8 +316,7 @@ static double
 __always_inline
 do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 {
-  double xx, retval, res, cor, y;
-  mynumber u;
+  double xx, retval, res, cor;
   double eps = fabs (x) * 1.2e-30;
 
   int k1 = (n + k) & 3;
@@ -309,10 +336,7 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 	}
       else
 	{
-	  double db = (a > 0 ? da : -da);
-	  u.x = big + fabs (a);
-	  y = fabs (a) - (u.x - big);
-	  res = do_sin (u, y, db, &cor);
+	  res = do_sin (a, da, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
 	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
 		    : sloww1 (a, da, x, k));
@@ -321,16 +345,11 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 
     case 1:
     case 3:
-	{
-	  double db = (a > 0 ? da : -da);
-	  u.x = big + fabs (a);
-	  y = fabs (a) - (u.x - big) + db;
-	  res = do_cos (u, y, &cor);
-	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-	  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
-		    : sloww2 (a, da, x, n));
-	  break;
-	}
+      res = do_cos (a, da, &cor);
+      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
+		: sloww2 (a, da, x, n));
+      break;
     }
 
   return retval;
@@ -369,7 +388,6 @@ __always_inline
 do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 {
   double res, retval, cor, xx;
-  mynumber u;
 
   double eps = 1.0e-24;
 
@@ -392,10 +410,7 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 	}
       else
 	{
-	  double db = (a > 0 ? da : -da);
-	  u.x = big + fabs (a);
-	  double y = fabs (a) - (u.x - big);
-	  res = do_sin (u, y, db, &cor);
+	  res = do_sin (a, da, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
 	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
 		    : bsloww1 (a, da, x, n));
@@ -404,16 +419,11 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 
     case 1:
     case 3:
-	{
-	  double db = (a > 0 ? da : -da);
-	  u.x = big + fabs (a);
-	  double y = fabs (a) - (u.x - big) + db;
-	  res = do_cos (u, y, &cor);
-	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-	  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
-		    : bsloww2 (a, da, x, n));
-	  break;
-	}
+      res = do_cos (a, da, &cor);
+      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
+		: bsloww2 (a, da, x, n));
+      break;
     }
 
   return retval;
@@ -485,11 +495,7 @@ __sin (double x)
     {
 
       t = hp0 - fabs (x);
-      u.x = big + fabs (t);
-      y = fabs (t) - (u.x - big);
-      y = ((t >= 0) ? hp1 : -hp1) + y;
-
-      res = do_cos (u, y, &cor);
+      res = do_cos (t, hp1, &cor);
       retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
     }				/*   else  if (k < 0x400368fd)    */
 
@@ -561,10 +567,7 @@ __cos (double x)
 
   else if (k < 0x3feb6000)
     {				/* 2^-27 < |x| < 0.855469 */
-      y = fabs (x);
-      u.x = big + y;
-      y = y - (u.x - big);
-      res = do_cos (u, y, &cor);
+      res = do_cos (x, 0, &cor);
       retval = (res == res + 1.020 * cor) ? res : cslow2 (x);
     }				/*   else  if (k < 0x3feb6000)    */
 
@@ -582,10 +585,7 @@ __cos (double x)
 	}
       else
 	{
-	  double db = (a > 0 ? da : -da);
-	  u.x = big + fabs (a);
-	  y = fabs (a) - (u.x - big);
-	  res = do_sin (u, y, db, &cor);
+	  res = do_sin (a, da, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
 	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
 		    : sloww1 (a, da, x, 1));
@@ -655,12 +655,9 @@ static double
 SECTION
 slow1 (double x)
 {
-  mynumber u;
-  double w[2], y, cor, res;
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  res = do_sin_slow (u, y, 0, 0, &cor);
+  double w[2], cor, res;
+
+  res = do_sin_slow (x, 0, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
@@ -679,15 +676,10 @@ static double
 SECTION
 slow2 (double x)
 {
-  mynumber u;
-  double w[2], y, y1, y2, cor, res, del;
+  double w[2], y, y1, y2, cor, res;
 
   double t = hp0 - fabs (x);
-  u.x = big + fabs (t);
-  y = fabs (t) - (u.x - big);
-  del = (t >= 0) ? hp1 : -hp1;
-
-  res = do_cos_slow (u, y, del, 0, &cor);
+  res = do_cos_slow (t, hp1, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
@@ -774,17 +766,14 @@ static double
 SECTION
 sloww1 (double x, double dx, double orig, int k)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  u.x = big + fabs (x);
-  y = fabs (x) - (u.x - big);
-  dx = (x > 0 ? dx : -dx);
-  res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
+  res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
+  dx = (x > 0 ? dx : -dx);
   __dubsin (fabs (x), dx, w);
 
   double eps = 1.1e-30 * fabs (orig);
@@ -807,17 +796,14 @@ static double
 SECTION
 sloww2 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  u.x = big + fabs (x);
-  y = fabs (x) - (u.x - big);
-  dx = (x > 0 ? dx : -dx);
-  res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
+  res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
+  dx = x > 0 ? dx : -dx;
   __docos (fabs (x), dx, w);
 
   double eps = 1.1e-30 * fabs (orig);
@@ -870,17 +856,13 @@ static double
 SECTION
 bsloww1 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  dx = (x > 0) ? dx : -dx;
-  res = do_sin_slow (u, y, dx, 1.1e-24, &cor);
+  res = do_sin_slow (x, dx, 1.1e-24, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
+  dx = (x > 0) ? dx : -dx;
   __dubsin (fabs (x), dx, w);
 
   cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
@@ -902,17 +884,13 @@ static double
 SECTION
 bsloww2 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  dx = (x > 0) ? dx : -dx;
-  res = do_cos_slow (u, y, dx, 1.1e-24, &cor);
+  res = do_cos_slow (x, dx, 1.1e-24, &cor);
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
+  dx = (x > 0) ? dx : -dx;
   __docos (fabs (x), dx, w);
 
   cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
@@ -932,18 +910,13 @@ static double
 SECTION
 cslow2 (double x)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  res = do_cos_slow (u, y, 0, 0, &cor);
+  res = do_cos_slow (x, 0, 0, &cor);
   if (res == res + cor)
     return res;
 
-  y = fabs (x);
-  __docos (y, 0, w);
+  __docos (fabs (x), 0, w);
   if (w[0] == w[0] + 1.000000005 * w[1])
     return w[0];
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos
  2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
  2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
@ 2016-08-23 18:23 ` Siddhesh Poyarekar
  2016-08-23 20:53   ` Manfred
  2016-08-29 16:19   ` Joseph Myers
  2016-08-23 18:23 ` [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469 Siddhesh Poyarekar
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-23 18:23 UTC (permalink / raw)
  To: libc-alpha

The sin and cos code is inconsistent about its use of fabs to get the
absolute value of X where in some places it conditionalizes the code
while in others it uses fabs.  fabs seems to be a better candidate in
most cases because it avoids a branch.  Similarly there is an attempt
to make it easier for the compiler to emit conditional assignment
instructions (like fcsel on aarch64) where it can, by isolating
conditional assignment constructs from the rest of the expression.

A further benefit of this change is to identify common constructs
across functions and consolidate them in future patches.

	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos_slow): Use ternary
	instead of if/else.
	(do_sin_slow): Likewise.
	(do_sincos_1): Use fabs instead of if/else.
	(do_sincos_2): Likewise.
	(__sin): Likewise.
	(__cos): Likewise.
	(slow2): Likewise.
	(sloww): Likewise.
	(sloww1): Likewise.  Drop argument M.
	(sloww2): Use fabs instead of if/else.
	(bsloww): Likewise.
	(bsloww1): Likewise.
	(bsloww2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 233 +++++++++++++++--------------------------
 1 file changed, 85 insertions(+), 148 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index e1ee7a9..7f6cd09 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -133,7 +133,7 @@ static double slow (double x);
 static double slow1 (double x);
 static double slow2 (double x);
 static double sloww (double x, double dx, double orig, int n);
-static double sloww1 (double x, double dx, double orig, int m, int n);
+static double sloww1 (double x, double dx, double orig, int n);
 static double sloww2 (double x, double dx, double orig, int n);
 static double bsloww (double x, double dx, double orig, int n);
 static double bsloww1 (double x, double dx, double orig, int n);
@@ -181,10 +181,7 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
   cor = cor + ((cs - y) - e1 * x1);
   res = y + cor;
   cor = (y - res) + cor;
-  if (cor > 0)
-    cor = 1.0005 * cor + eps;
-  else
-    cor = 1.0005 * cor - eps;
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
   *corp = cor;
   return res;
 }
@@ -229,10 +226,7 @@ do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
   cor = cor + ((sn - y) + c1 * x1);
   res = y + cor;
   cor = (y - res) + cor;
-  if (cor > 0)
-    cor = 1.0005 * cor + eps;
-  else
-    cor = 1.0005 * cor - eps;
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
   *corp = cor;
   return res;
 }
@@ -296,7 +290,6 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 {
   double xx, retval, res, cor, y;
   mynumber u;
-  int m;
   double eps = fabs (x) * 1.2e-30;
 
   int k1 = (n + k) & 3;
@@ -316,37 +309,28 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 	}
       else
 	{
-	  if (a > 0)
-	    m = 1;
-	  else
-	    {
-	      m = 0;
-	      a = -a;
-	      da = -da;
-	    }
-	  u.x = big + a;
-	  y = a - (u.x - big);
-	  res = do_sin (u, y, da, &cor);
+	  double db = (a > 0 ? da : -da);
+	  u.x = big + fabs (a);
+	  y = fabs (a) - (u.x - big);
+	  res = do_sin (u, y, db, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
-	  retval = ((res == res + cor) ? ((m) ? res : -res)
-		    : sloww1 (a, da, x, m, k));
+	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
+		    : sloww1 (a, da, x, k));
 	}
       break;
 
     case 1:
     case 3:
-      if (a < 0)
 	{
-	  a = -a;
-	  da = -da;
+	  double db = (a > 0 ? da : -da);
+	  u.x = big + fabs (a);
+	  y = fabs (a) - (u.x - big) + db;
+	  res = do_cos (u, y, &cor);
+	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+	  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
+		    : sloww2 (a, da, x, n));
+	  break;
 	}
-      u.x = big + a;
-      y = a - (u.x - big) + da;
-      res = do_cos (u, y, &cor);
-      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
-		: sloww2 (a, da, x, n));
-      break;
     }
 
   return retval;
@@ -408,43 +392,28 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 	}
       else
 	{
-	  double t, db, y;
-	  int m;
-	  if (a > 0)
-	    {
-	      m = 1;
-	      t = a;
-	      db = da;
-	    }
-	  else
-	    {
-	      m = 0;
-	      t = -a;
-	      db = -da;
-	    }
-	  u.x = big + t;
-	  y = t - (u.x - big);
+	  double db = (a > 0 ? da : -da);
+	  u.x = big + fabs (a);
+	  double y = fabs (a) - (u.x - big);
 	  res = do_sin (u, y, db, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
-	  retval = ((res == res + cor) ? ((m) ? res : -res)
+	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
 		    : bsloww1 (a, da, x, n));
 	}
       break;
 
     case 1:
     case 3:
-      if (a < 0)
 	{
-	  a = -a;
-	  da = -da;
+	  double db = (a > 0 ? da : -da);
+	  u.x = big + fabs (a);
+	  double y = fabs (a) - (u.x - big) + db;
+	  res = do_cos (u, y, &cor);
+	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+	  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
+		    : bsloww2 (a, da, x, n));
+	  break;
 	}
-      u.x = big + a;
-      double y = a - (u.x - big) + da;
-      res = do_cos (u, y, &cor);
-      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
-		: bsloww2 (a, da, x, n));
-      break;
     }
 
   return retval;
@@ -492,8 +461,10 @@ __sin (double x)
 /*---------------------------- 0.25<|x|< 0.855469---------------------- */
   else if (k < 0x3feb6000)
     {
-      u.x = (m > 0) ? big + x : big - x;
-      y = (m > 0) ? x - (u.x - big) : x + (u.x - big);
+      u.x = big + fabs (x);
+      y = fabs (x) - (u.x - big);
+      y = (x > 0 ? y : -y);
+
       xx = y * y;
       s = y + y * xx * (sn3 + xx * sn5);
       c = xx * (cs2 + xx * (cs4 + xx * cs6));
@@ -513,17 +484,11 @@ __sin (double x)
   else if (k < 0x400368fd)
     {
 
-      y = (m > 0) ? hp0 - x : hp0 + x;
-      if (y >= 0)
-	{
-	  u.x = big + y;
-	  y = (y - (u.x - big)) + hp1;
-	}
-      else
-	{
-	  u.x = big - y;
-	  y = (-hp1) - (y + (u.x - big));
-	}
+      t = hp0 - fabs (x);
+      u.x = big + fabs (t);
+      y = fabs (t) - (u.x - big);
+      y = ((t >= 0) ? hp1 : -hp1) + y;
+
       res = do_cos (u, y, &cor);
       retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
     }				/*   else  if (k < 0x400368fd)    */
@@ -617,22 +582,13 @@ __cos (double x)
 	}
       else
 	{
-	  if (a > 0)
-	    {
-	      m = 1;
-	    }
-	  else
-	    {
-	      m = 0;
-	      a = -a;
-	      da = -da;
-	    }
-	  u.x = big + a;
-	  y = a - (u.x - big);
-	  res = do_sin (u, y, da, &cor);
+	  double db = (a > 0 ? da : -da);
+	  u.x = big + fabs (a);
+	  y = fabs (a) - (u.x - big);
+	  res = do_sin (u, y, db, &cor);
 	  cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
-	  retval = ((res == res + cor) ? ((m) ? res : -res)
-		    : sloww1 (a, da, x, m, 1));
+	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
+		    : sloww1 (a, da, x, 1));
 	}
 
     }				/*   else  if (k < 0x400368fd)    */
@@ -726,20 +682,11 @@ slow2 (double x)
   mynumber u;
   double w[2], y, y1, y2, cor, res, del;
 
-  y = fabs (x);
-  y = hp0 - y;
-  if (y >= 0)
-    {
-      u.x = big + y;
-      y = y - (u.x - big);
-      del = hp1;
-    }
-  else
-    {
-      u.x = big - y;
-      y = -(y + (u.x - big));
-      del = -hp1;
-    }
+  double t = hp0 - fabs (x);
+  u.x = big + fabs (t);
+  y = fabs (t) - (u.x - big);
+  del = (t >= 0) ? hp1 : -hp1;
+
   res = do_cos_slow (u, y, del, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
@@ -771,19 +718,18 @@ sloww (double x, double dx, double orig, int k)
   int4 n;
   res = TAYLOR_SLOW (x, dx, cor);
 
-  if (cor > 0)
-    cor = 1.0005 * cor + fabs (orig) * 3.1e-30;
-  else
-    cor = 1.0005 * cor - fabs (orig) * 3.1e-30;
+  double eps = fabs (orig) * 3.1e-30;
+
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
 
   if (res == res + cor)
     return res;
 
-  (x > 0) ? __dubsin (x, dx, w) : __dubsin (-x, -dx, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + fabs (orig) * 1.1e-30;
-  else
-    cor = 1.000000001 * w[1] - fabs (orig) * 1.1e-30;
+  a = fabs (x);
+  da = (x > 0) ? dx : -dx;
+  __dubsin (a, da, w);
+  eps = fabs (orig) * 1.1e-30;
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -805,11 +751,11 @@ sloww (double x, double dx, double orig, int k)
       a = -a;
       da = -da;
     }
-  (a > 0) ? __dubsin (a, da, w) : __dubsin (-a, -da, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + fabs (orig) * 1.1e-40;
-  else
-    cor = 1.000000001 * w[1] - fabs (orig) * 1.1e-40;
+  x = fabs (a);
+  dx = (a > 0) ? da : -da;
+  __dubsin (x, dx, w);
+  eps = fabs (orig) * 1.1e-40;
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (a > 0) ? w[0] : -w[0];
@@ -826,27 +772,26 @@ sloww (double x, double dx, double orig, int k)
 
 static double
 SECTION
-sloww1 (double x, double dx, double orig, int m, int k)
+sloww1 (double x, double dx, double orig, int k)
 {
   mynumber u;
   double w[2], y, cor, res;
 
-  u.x = big + x;
-  y = x - (u.x - big);
+  u.x = big + fabs (x);
+  y = fabs (x) - (u.x - big);
+  dx = (x > 0 ? dx : -dx);
   res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
-    return (m > 0) ? res : -res;
+    return (x > 0) ? res : -res;
 
-  __dubsin (x, dx, w);
+  __dubsin (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-30 * fabs (orig);
-  else
-    cor = 1.000000005 * w[1] - 1.1e-30 * fabs (orig);
+  double eps = 1.1e-30 * fabs (orig);
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
-    return (m > 0) ? w[0] : -w[0];
+    return (x > 0) ? w[0] : -w[0];
 
   return (k == 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
 }
@@ -865,19 +810,18 @@ sloww2 (double x, double dx, double orig, int n)
   mynumber u;
   double w[2], y, cor, res;
 
-  u.x = big + x;
-  y = x - (u.x - big);
+  u.x = big + fabs (x);
+  y = fabs (x) - (u.x - big);
+  dx = (x > 0 ? dx : -dx);
   res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
-  __docos (x, dx, w);
+  __docos (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-30 * fabs (orig);
-  else
-    cor = 1.000000005 * w[1] - 1.1e-30 * fabs (orig);
+  double eps = 1.1e-30 * fabs (orig);
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (n & 2) ? -w[0] : w[0];
@@ -897,18 +841,17 @@ static double
 SECTION
 bsloww (double x, double dx, double orig, int n)
 {
-  double res, cor, w[2];
+  double res, cor, w[2], a, da;
 
   res = TAYLOR_SLOW (x, dx, cor);
-  cor = (cor > 0) ? 1.0005 * cor + 1.1e-24 : 1.0005 * cor - 1.1e-24;
+  cor = 1.0005 * cor + ((cor > 0) ? 1.1e-24 : -1.1e-24);
   if (res == res + cor)
     return res;
 
-  (x > 0) ? __dubsin (x, dx, w) : __dubsin (-x, -dx, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000001 * w[1] - 1.1e-24;
+  a = fabs (x);
+  da = (x > 0) ? dx : -dx;
+  __dubsin (a, da, w);
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -940,10 +883,7 @@ bsloww1 (double x, double dx, double orig, int n)
 
   __dubsin (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000005 * w[1] - 1.1e-24;
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -975,10 +915,7 @@ bsloww2 (double x, double dx, double orig, int n)
 
   __docos (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000005 * w[1] - 1.1e-24;
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (n & 2) ? -w[0] : w[0];
-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos
  2016-08-23 18:23 ` [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos Siddhesh Poyarekar
@ 2016-08-23 20:53   ` Manfred
  2016-08-23 23:05     ` Joseph Myers
  2016-08-24  2:50     ` Siddhesh Poyarekar
  2016-08-29 16:19   ` Joseph Myers
  1 sibling, 2 replies; 22+ messages in thread
From: Manfred @ 2016-08-23 20:53 UTC (permalink / raw)
  To: libc-alpha



On 08/23/2016 08:22 PM, Siddhesh Poyarekar wrote:

> @@ -181,10 +181,7 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
>    cor = cor + ((cs - y) - e1 * x1);
>    res = y + cor;
>    cor = (y - res) + cor;
> -  if (cor > 0)
> -    cor = 1.0005 * cor + eps;
> -  else
> -    cor = 1.0005 * cor - eps;
> +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
>    *corp = cor;
>    return res;

If eps is known to be >=0 then
 > +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
might be written as
cor = 1.0005 * cor + copysign(eps, cor);

Similarly to fabs(), copysign() avoids a branch - or a potential one 
from the ternary.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos
  2016-08-23 20:53   ` Manfred
@ 2016-08-23 23:05     ` Joseph Myers
  2016-08-24  2:50     ` Siddhesh Poyarekar
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2016-08-23 23:05 UTC (permalink / raw)
  To: Manfred; +Cc: libc-alpha

On Tue, 23 Aug 2016, Manfred wrote:

> If eps is known to be >=0 then
> > +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
> might be written as
> cor = 1.0005 * cor + copysign(eps, cor);
> 
> Similarly to fabs(), copysign() avoids a branch - or a potential one from the
> ternary.

That should be __copysign for namespace reasons (though they should 
generally do the same thing because of inlines in math_private.h).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] Consolidate reduce_and_compute code
  2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
@ 2016-08-24  1:52   ` Adhemerval Zanella
  2016-08-29 16:03   ` Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2016-08-24  1:52 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

LGTM, this is mostly indentation.

> Em 23 de ago de 2016, às 15:22, Siddhesh Poyarekar <siddhesh@sourceware.org> escreveu:
> 
> This patch reshuffles the reduce_and_compute code so that the
> structure matches other code structures of the same type elsewhere in
> s_sin.c and s_sincos.c.  This is the beginning of an attempt to
> consolidate and reduce code duplication in functions in s_sin.c to
> make it easier to read and possibly also easier for the compiler to
> optimize.
> 
>    * sysdeps/ieee754/dbl-64/s_sin.c (reduce_and_compute):
>    Consolidate switch cases 0 and 2.
> ---
> sysdeps/ieee754/dbl-64/s_sin.c | 29 +++++++++++++----------------
> 1 file changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 7c9a079..e1ee7a9 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -249,23 +249,20 @@ reduce_and_compute (double x, unsigned int k)
>   k = (n + k) % 4;
>   switch (k)
>     {
> -      case 0:
> -    if (a * a < 0.01588)
> -      retval = bsloww (a, da, x, n);
> -    else
> -      retval = bsloww1 (a, da, x, n);
> -    break;
> -      case 2:
> -    if (a * a < 0.01588)
> -      retval = bsloww (-a, -da, x, n);
> -    else
> -      retval = bsloww1 (-a, -da, x, n);
> -    break;
> +    case 2:
> +      a = -a;
> +      da = -da;
> +    case 0:
> +      if (a * a < 0.01588)
> +    retval = bsloww (a, da, x, n);
> +      else
> +    retval = bsloww1 (a, da, x, n);
> +      break;
> 
> -      case 1:
> -      case 3:
> -    retval = bsloww2 (a, da, x, n);
> -    break;
> +    case 1:
> +    case 3:
> +      retval = bsloww2 (a, da, x, n);
> +      break;
>     }
>   return retval;
> }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos
  2016-08-23 20:53   ` Manfred
  2016-08-23 23:05     ` Joseph Myers
@ 2016-08-24  2:50     ` Siddhesh Poyarekar
  1 sibling, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-24  2:50 UTC (permalink / raw)
  To: Manfred, libc-alpha

On Wednesday 24 August 2016 02:23 AM, Manfred wrote:
> If eps is known to be >=0 then
>> +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
> might be written as
> cor = 1.0005 * cor + copysign(eps, cor);
>
> Similarly to fabs(), copysign() avoids a branch - or a potential one
> from the ternary.

Thanks, there are a lot of places in the code that can benefit from 
this, so I'll post a separate patch to clean it all up.

Siddhesh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] Consolidate reduce_and_compute code
  2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
  2016-08-24  1:52   ` Adhemerval Zanella
@ 2016-08-29 16:03   ` Joseph Myers
  2016-08-30  9:26     ` Siddhesh Poyarekar
  1 sibling, 1 reply; 22+ messages in thread
From: Joseph Myers @ 2016-08-29 16:03 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> +    case 2:
> +      a = -a;
> +      da = -da;
> +    case 0:

OK with a comment on this fallthrough (we might want to use the 
-Wimplicit-fallthrough being proposed for GCC 7...).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos
  2016-08-23 18:23 ` [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos Siddhesh Poyarekar
  2016-08-23 20:53   ` Manfred
@ 2016-08-29 16:19   ` Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2016-08-29 16:19 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> The sin and cos code is inconsistent about its use of fabs to get the
> absolute value of X where in some places it conditionalizes the code
> while in others it uses fabs.  fabs seems to be a better candidate in
> most cases because it avoids a branch.  Similarly there is an attempt
> to make it easier for the compiler to emit conditional assignment
> instructions (like fcsel on aarch64) where it can, by isolating
> conditional assignment constructs from the rest of the expression.
> 
> A further benefit of this change is to identify common constructs
> across functions and consolidate them in future patches.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos_slow): Use ternary
> 	instead of if/else.
> 	(do_sin_slow): Likewise.
> 	(do_sincos_1): Use fabs instead of if/else.
> 	(do_sincos_2): Likewise.
> 	(__sin): Likewise.
> 	(__cos): Likewise.
> 	(slow2): Likewise.
> 	(sloww): Likewise.
> 	(sloww1): Likewise.  Drop argument M.
> 	(sloww2): Use fabs instead of if/else.
> 	(bsloww): Likewise.
> 	(bsloww1): Likewise.
> 	(bsloww2): Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PING][PATCH 3/5] Consolidate input partitioning into do_cos and do_sin
  2016-08-23 18:23 ` [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin Siddhesh Poyarekar
@ 2016-08-30  3:12   ` Siddhesh Poyarekar
  2016-09-01 16:23   ` [PATCH " Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-30  3:12 UTC (permalink / raw)
  To: libc-alpha

Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:
> All calls to do_cos are preceded by code that partitions x into a
> larger double that gives an offset into the sincos table and a smaller
> double that is used in a polynomial computation.  Consolidate all of
> them into do_cos and do_sin to reduce code duplication.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
> 	arguments.  Consolidate input partitioning from callers here.
> 	(do_cos_slow): Likewise.
> 	(do_sin): Likewise.
> 	(do_sin_slow): Likewise.
> 	(do_sincos_1): Remove the no longer necessary input partitioning.
> 	(do_sincos_2): Likewise.
> 	(__sin): Likewise.
> 	(__cos): Likewise.
> 	(slow1): Likewise.
> 	(slow2): Likewise.
> 	(sloww1): Likewise.
> 	(sloww2): Likewise.
> 	(bsloww1): Likewise.
> 	(bsloww2): Likewise.
> 	(cslow2): Likewise.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 191 ++++++++++++++++++-----------------------
>  1 file changed, 82 insertions(+), 109 deletions(-)
> 
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 7f6cd09..e03c75a 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -141,14 +141,21 @@ static double bsloww2 (double x, double dx, double orig, int n);
>  int __branred (double x, double *a, double *aa);
>  static double cslow2 (double x);
>  
> -/* Given a number partitioned into U and X such that U is an index into the
> -   sin/cos table, this macro computes the cosine of the number by combining
> -   the sin and cos of X (as computed by a variation of the Taylor series) with
> -   the values looked up from the sin/cos table to get the result in RES and a
> -   correction value in COR.  */
> +/* Given a number partitioned into X and DX, this function computes the cosine
> +   of the number by combining the sin and cos of X (as computed by a variation
> +   of the Taylor series) with the values looked up from the sin/cos table to
> +   get the result in RES and a correction value in COR.  */
>  static double
> -do_cos (mynumber u, double x, double *corp)
> +do_cos (double x, double dx, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x < 0)
> +    dx = -dx;
> +
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big) + dx;
> +
>    double xx, s, sn, ssn, c, cs, ccs, res, cor;
>    xx = x * x;
>    s = x + x * xx * (sn3 + xx * sn5);
> @@ -161,11 +168,19 @@ do_cos (mynumber u, double x, double *corp)
>    return res;
>  }
>  
> -/* A more precise variant of DO_COS where the number is partitioned into U, X
> -   and DX.  EPS is the adjustment to the correction COR.  */
> +/* A more precise variant of DO_COS.  EPS is the adjustment to the correction
> +   COR.  */
>  static double
> -do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
> +do_cos_slow (double x, double dx, double eps, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, y, x1, x2, e1, e2, res, cor;
>    double s, sn, ssn, c, cs, ccs;
>    xx = x * x;
> @@ -186,14 +201,20 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
>    return res;
>  }
>  
> -/* Given a number partitioned into U and X and DX such that U is an index into
> -   the sin/cos table, this macro computes the sine of the number by combining
> -   the sin and cos of X (as computed by a variation of the Taylor series) with
> -   the values looked up from the sin/cos table to get the result in RES and a
> -   correction value in COR.  */
> +/* Given a number partitioned into X and DX, this function computes the sine of
> +   the number by combining the sin and cos of X (as computed by a variation of
> +   the Taylor series) with the values looked up from the sin/cos table to get
> +   the result in RES and a correction value in COR.  */
>  static double
> -do_sin (mynumber u, double x, double dx, double *corp)
> +do_sin (double x, double dx, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, s, sn, ssn, c, cs, ccs, cor, res;
>    xx = x * x;
>    s = x + (dx + x * xx * (sn3 + xx * sn5));
> @@ -206,11 +227,18 @@ do_sin (mynumber u, double x, double dx, double *corp)
>    return res;
>  }
>  
> -/* A more precise variant of res = do_sin where the number is partitioned into U, X
> -   and DX.  EPS is the adjustment to the correction COR.  */
> +/* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
> +   COR.  */
>  static double
> -do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
> +do_sin_slow (double x, double dx, double eps, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, y, x1, x2, c1, c2, res, cor;
>    double s, sn, ssn, c, cs, ccs;
>    xx = x * x;
> @@ -288,8 +316,7 @@ static double
>  __always_inline
>  do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>  {
> -  double xx, retval, res, cor, y;
> -  mynumber u;
> +  double xx, retval, res, cor;
>    double eps = fabs (x) * 1.2e-30;
>  
>    int k1 = (n + k) & 3;
> @@ -309,10 +336,7 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>  	}
>        else
>  	{
> -	  double db = (a > 0 ? da : -da);
> -	  u.x = big + fabs (a);
> -	  y = fabs (a) - (u.x - big);
> -	  res = do_sin (u, y, db, &cor);
> +	  res = do_sin (a, da, &cor);
>  	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
>  	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>  		    : sloww1 (a, da, x, k));
> @@ -321,16 +345,11 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>  
>      case 1:
>      case 3:
> -	{
> -	  double db = (a > 0 ? da : -da);
> -	  u.x = big + fabs (a);
> -	  y = fabs (a) - (u.x - big) + db;
> -	  res = do_cos (u, y, &cor);
> -	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> -	  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
> -		    : sloww2 (a, da, x, n));
> -	  break;
> -	}
> +      res = do_cos (a, da, &cor);
> +      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> +      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
> +		: sloww2 (a, da, x, n));
> +      break;
>      }
>  
>    return retval;
> @@ -369,7 +388,6 @@ __always_inline
>  do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>  {
>    double res, retval, cor, xx;
> -  mynumber u;
>  
>    double eps = 1.0e-24;
>  
> @@ -392,10 +410,7 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>  	}
>        else
>  	{
> -	  double db = (a > 0 ? da : -da);
> -	  u.x = big + fabs (a);
> -	  double y = fabs (a) - (u.x - big);
> -	  res = do_sin (u, y, db, &cor);
> +	  res = do_sin (a, da, &cor);
>  	  cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
>  	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>  		    : bsloww1 (a, da, x, n));
> @@ -404,16 +419,11 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>  
>      case 1:
>      case 3:
> -	{
> -	  double db = (a > 0 ? da : -da);
> -	  u.x = big + fabs (a);
> -	  double y = fabs (a) - (u.x - big) + db;
> -	  res = do_cos (u, y, &cor);
> -	  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> -	  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
> -		    : bsloww2 (a, da, x, n));
> -	  break;
> -	}
> +      res = do_cos (a, da, &cor);
> +      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> +      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
> +		: bsloww2 (a, da, x, n));
> +      break;
>      }
>  
>    return retval;
> @@ -485,11 +495,7 @@ __sin (double x)
>      {
>  
>        t = hp0 - fabs (x);
> -      u.x = big + fabs (t);
> -      y = fabs (t) - (u.x - big);
> -      y = ((t >= 0) ? hp1 : -hp1) + y;
> -
> -      res = do_cos (u, y, &cor);
> +      res = do_cos (t, hp1, &cor);
>        retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
>      }				/*   else  if (k < 0x400368fd)    */
>  
> @@ -561,10 +567,7 @@ __cos (double x)
>  
>    else if (k < 0x3feb6000)
>      {				/* 2^-27 < |x| < 0.855469 */
> -      y = fabs (x);
> -      u.x = big + y;
> -      y = y - (u.x - big);
> -      res = do_cos (u, y, &cor);
> +      res = do_cos (x, 0, &cor);
>        retval = (res == res + 1.020 * cor) ? res : cslow2 (x);
>      }				/*   else  if (k < 0x3feb6000)    */
>  
> @@ -582,10 +585,7 @@ __cos (double x)
>  	}
>        else
>  	{
> -	  double db = (a > 0 ? da : -da);
> -	  u.x = big + fabs (a);
> -	  y = fabs (a) - (u.x - big);
> -	  res = do_sin (u, y, db, &cor);
> +	  res = do_sin (a, da, &cor);
>  	  cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
>  	  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>  		    : sloww1 (a, da, x, 1));
> @@ -655,12 +655,9 @@ static double
>  SECTION
>  slow1 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  res = do_sin_slow (u, y, 0, 0, &cor);
> +  double w[2], cor, res;
> +
> +  res = do_sin_slow (x, 0, 0, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> @@ -679,15 +676,10 @@ static double
>  SECTION
>  slow2 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, y1, y2, cor, res, del;
> +  double w[2], y, y1, y2, cor, res;
>  
>    double t = hp0 - fabs (x);
> -  u.x = big + fabs (t);
> -  y = fabs (t) - (u.x - big);
> -  del = (t >= 0) ? hp1 : -hp1;
> -
> -  res = do_cos_slow (u, y, del, 0, &cor);
> +  res = do_cos_slow (t, hp1, 0, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> @@ -774,17 +766,14 @@ static double
>  SECTION
>  sloww1 (double x, double dx, double orig, int k)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  u.x = big + fabs (x);
> -  y = fabs (x) - (u.x - big);
> -  dx = (x > 0 ? dx : -dx);
> -  res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
> +  res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
>  
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> +  dx = (x > 0 ? dx : -dx);
>    __dubsin (fabs (x), dx, w);
>  
>    double eps = 1.1e-30 * fabs (orig);
> @@ -807,17 +796,14 @@ static double
>  SECTION
>  sloww2 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  u.x = big + fabs (x);
> -  y = fabs (x) - (u.x - big);
> -  dx = (x > 0 ? dx : -dx);
> -  res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
> +  res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
>  
>    if (res == res + cor)
>      return (n & 2) ? -res : res;
>  
> +  dx = x > 0 ? dx : -dx;
>    __docos (fabs (x), dx, w);
>  
>    double eps = 1.1e-30 * fabs (orig);
> @@ -870,17 +856,13 @@ static double
>  SECTION
>  bsloww1 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  dx = (x > 0) ? dx : -dx;
> -  res = do_sin_slow (u, y, dx, 1.1e-24, &cor);
> +  res = do_sin_slow (x, dx, 1.1e-24, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> +  dx = (x > 0) ? dx : -dx;
>    __dubsin (fabs (x), dx, w);
>  
>    cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
> @@ -902,17 +884,13 @@ static double
>  SECTION
>  bsloww2 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  dx = (x > 0) ? dx : -dx;
> -  res = do_cos_slow (u, y, dx, 1.1e-24, &cor);
> +  res = do_cos_slow (x, dx, 1.1e-24, &cor);
>    if (res == res + cor)
>      return (n & 2) ? -res : res;
>  
> +  dx = (x > 0) ? dx : -dx;
>    __docos (fabs (x), dx, w);
>  
>    cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
> @@ -932,18 +910,13 @@ static double
>  SECTION
>  cslow2 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  res = do_cos_slow (u, y, 0, 0, &cor);
> +  res = do_cos_slow (x, 0, 0, &cor);
>    if (res == res + cor)
>      return res;
>  
> -  y = fabs (x);
> -  __docos (y, 0, w);
> +  __docos (fabs (x), 0, w);
>    if (w[0] == w[0] + 1.000000005 * w[1])
>      return w[0];
>  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PING][PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469
  2016-08-23 18:23 ` [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469 Siddhesh Poyarekar
@ 2016-08-30  3:12   ` Siddhesh Poyarekar
  2016-09-01 16:35   ` [PATCH " Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-30  3:12 UTC (permalink / raw)
  To: libc-alpha

Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:
> The only code looks slightly different from DO_SIN but on closer
> examination, should give exactly the same result.  Drop it in favour
> of the DO_SIN function call.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Use DO_SIN.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 21 +++------------------
>  1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index e03c75a..82f9345 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -441,7 +441,7 @@ SECTION
>  #endif
>  __sin (double x)
>  {
> -  double xx, res, t, cor, y, s, c, sn, ssn, cs, ccs;
> +  double xx, res, t, cor;
>    mynumber u;
>    int4 k, m;
>    double retval = 0;
> @@ -471,23 +471,8 @@ __sin (double x)
>  /*---------------------------- 0.25<|x|< 0.855469---------------------- */
>    else if (k < 0x3feb6000)
>      {
> -      u.x = big + fabs (x);
> -      y = fabs (x) - (u.x - big);
> -      y = (x > 0 ? y : -y);
> -
> -      xx = y * y;
> -      s = y + y * xx * (sn3 + xx * sn5);
> -      c = xx * (cs2 + xx * (cs4 + xx * cs6));
> -      SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
> -      if (m <= 0)
> -        {
> -          sn = -sn;
> -	  ssn = -ssn;
> -	}
> -      cor = (ssn + s * ccs - sn * c) + cs * s;
> -      res = sn + cor;
> -      cor = (sn - res) + cor;
> -      retval = (res == res + 1.096 * cor) ? res : slow1 (x);
> +      res = do_sin (x, 0, &cor);
> +      retval = (res == res + 1.096 * cor) ? (m > 0 ? res : -res) : slow1 (x);
>      }				/*   else  if (k < 0x3feb6000)    */
>  
>  /*----------------------- 0.855469  <|x|<2.426265  ----------------------*/
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PING][PATCH 5/5] Inline all support functions for sin and cos
  2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
@ 2016-08-30  3:12   ` Siddhesh Poyarekar
  2016-08-30  7:53   ` [PATCH " Andreas Schwab
  2016-09-01 16:36   ` Joseph Myers
  2 siblings, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-30  3:12 UTC (permalink / raw)
  To: libc-alpha

Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:
> The support functions for sin and cos have a lot of identical
> functionality, so inlining them gives a pretty decent jump in
> functionality: ~19% in the sincos function.  On SPEC2006 this
> translates to about 2.1% in the tonto test.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Mark as inline.
> 	(do_cos_slow): Likewise.
> 	(do_sin): Likewise.
> 	(do_sin_slow): Likewise.
> 	(slow): Likewise.
> 	(slow1): Likewise.
> 	(slow2): Likewise.
> 	(sloww): Likewise.
> 	(sloww1): Likewise.
> 	(sloww2): Likewise.
> 	(bsloww): Likewise.
> 	(bsloww1): Likewise.
> 	(bsloww2): Likewise.
> 	(cslow2): Likewise.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 52 +++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 24 deletions(-)
> 
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 82f9345..c20ef4d 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -145,7 +145,8 @@ static double cslow2 (double x);
>     of the number by combining the sin and cos of X (as computed by a variation
>     of the Taylor series) with the values looked up from the sin/cos table to
>     get the result in RES and a correction value in COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_cos (double x, double dx, double *corp)
>  {
>    mynumber u;
> @@ -170,7 +171,8 @@ do_cos (double x, double dx, double *corp)
>  
>  /* A more precise variant of DO_COS.  EPS is the adjustment to the correction
>     COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_cos_slow (double x, double dx, double eps, double *corp)
>  {
>    mynumber u;
> @@ -205,7 +207,8 @@ do_cos_slow (double x, double dx, double eps, double *corp)
>     the number by combining the sin and cos of X (as computed by a variation of
>     the Taylor series) with the values looked up from the sin/cos table to get
>     the result in RES and a correction value in COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_sin (double x, double dx, double *corp)
>  {
>    mynumber u;
> @@ -229,7 +232,8 @@ do_sin (double x, double dx, double *corp)
>  
>  /* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
>     COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_sin_slow (double x, double dx, double eps, double *corp)
>  {
>    mynumber u;
> @@ -615,8 +619,8 @@ __cos (double x)
>  /* precision  and if still doesn't accurate enough by mpsin   or dubsin */
>  /************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow (double x)
>  {
>    double res, cor, w[2];
> @@ -636,8 +640,8 @@ slow (double x)
>  /* and if result still doesn't accurate enough by mpsin   or dubsin            */
>  /*******************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow1 (double x)
>  {
>    double w[2], cor, res;
> @@ -657,8 +661,8 @@ slow1 (double x)
>  /*  Routine compute sin(x) for   0.855469  <|x|<2.426265  by  __sincostab.tbl  */
>  /* and if result still doesn't accurate enough by mpsin   or dubsin       */
>  /**************************************************************************/
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow2 (double x)
>  {
>    double w[2], y, y1, y2, cor, res;
> @@ -686,8 +690,8 @@ slow2 (double x)
>  /* result.And if result not accurate enough routine calls mpsin1 or dubsin */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww (double x, double dx, double orig, int k)
>  {
>    double y, t, res, cor, w[2], a, da, xn;
> @@ -747,8 +751,8 @@ sloww (double x, double dx, double orig, int k)
>  /* accurate enough routine calls  mpsin1   or dubsin                       */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww1 (double x, double dx, double orig, int k)
>  {
>    double w[2], cor, res;
> @@ -777,8 +781,8 @@ sloww1 (double x, double dx, double orig, int k)
>  /* accurate enough routine calls  mpsin1   or dubsin                       */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww2 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -808,8 +812,8 @@ sloww2 (double x, double dx, double orig, int n)
>  /* result.And if result not accurate enough routine calls other routines    */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww (double x, double dx, double orig, int n)
>  {
>    double res, cor, w[2], a, da;
> @@ -837,8 +841,8 @@ bsloww (double x, double dx, double orig, int n)
>  /* And if result not  accurate enough routine calls  other routines         */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww1 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -865,8 +869,8 @@ bsloww1 (double x, double dx, double orig, int n)
>  /* And if result not accurate enough routine calls  other routines          */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww2 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -891,8 +895,8 @@ bsloww2 (double x, double dx, double orig, int n)
>  /* precision  and if still doesn't accurate enough by mpcos   or docos  */
>  /************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  cslow2 (double x)
>  {
>    double w[2], cor, res;
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] Inline all support functions for sin and cos
  2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
@ 2016-08-30  7:53   ` Andreas Schwab
  2016-08-30  7:59     ` Ramana Radhakrishnan
  2016-08-30  8:48     ` Siddhesh Poyarekar
  2016-09-01 16:36   ` Joseph Myers
  2 siblings, 2 replies; 22+ messages in thread
From: Andreas Schwab @ 2016-08-30  7:53 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Aug 23 2016, Siddhesh Poyarekar <siddhesh@sourceware.org> wrote:

> The support functions for sin and cos have a lot of identical
> functionality, so inlining them gives a pretty decent jump in
> functionality: ~19% in the sincos function.  On SPEC2006 this

What is the metric of functionality?

> translates to about 2.1% in the tonto test.

What does "tonto test" mean?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] Inline all support functions for sin and cos
  2016-08-30  7:53   ` [PATCH " Andreas Schwab
@ 2016-08-30  7:59     ` Ramana Radhakrishnan
  2016-08-30  8:48     ` Siddhesh Poyarekar
  1 sibling, 0 replies; 22+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-30  7:59 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Siddhesh Poyarekar, GNU C Library

On Tue, Aug 30, 2016 at 8:52 AM, Andreas Schwab <schwab@suse.de> wrote:
> On Aug 23 2016, Siddhesh Poyarekar <siddhesh@sourceware.org> wrote:
>
>> The support functions for sin and cos have a lot of identical
>> functionality, so inlining them gives a pretty decent jump in
>> functionality: ~19% in the sincos function.  On SPEC2006 this
>
> What is the metric of functionality?
>
>> translates to about 2.1% in the tonto test.
>
> What does "tonto test" mean?

https://www.spec.org/cpu2006/Docs/465.tonto.html



>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] Inline all support functions for sin and cos
  2016-08-30  7:53   ` [PATCH " Andreas Schwab
  2016-08-30  7:59     ` Ramana Radhakrishnan
@ 2016-08-30  8:48     ` Siddhesh Poyarekar
  1 sibling, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-30  8:48 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha



On Tuesday 30 August 2016 01:22 PM, Andreas Schwab wrote:
>> The support functions for sin and cos have a lot of identical
>> functionality, so inlining them gives a pretty decent jump in
>> functionality: ~19% in the sincos function.  On SPEC2006 this
> What is the metric of functionality?

Sorry, that was a typo, it should read as "a pretty decent jump in
performance" in the sincos function microbenchmark in benchtests.

>> translates to about 2.1% in the tonto test.
> What does "tonto test" mean?

The tonto test is part of the CPU2006 benchmark and it uses sincos and
its children functions for a little under half of its execution time.

Siddhesh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] Consolidate reduce_and_compute code
  2016-08-29 16:03   ` Joseph Myers
@ 2016-08-30  9:26     ` Siddhesh Poyarekar
  0 siblings, 0 replies; 22+ messages in thread
From: Siddhesh Poyarekar @ 2016-08-30  9:26 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha



On Monday 29 August 2016 09:33 PM, Joseph Myers wrote:
> OK with a comment on this fallthrough (we might want to use the 
> -Wimplicit-fallthrough being proposed for GCC 7...).
>

I pushed a separate commit with the fallthrough comment for both switch
blocks in s_sin.c.

Siddhesh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin
  2016-08-23 18:23 ` [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
@ 2016-09-01 16:23   ` Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2016-09-01 16:23 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> All calls to do_cos are preceded by code that partitions x into a
> larger double that gives an offset into the sincos table and a smaller
> double that is used in a polynomial computation.  Consolidate all of
> them into do_cos and do_sin to reduce code duplication.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
> 	arguments.  Consolidate input partitioning from callers here.
> 	(do_cos_slow): Likewise.
> 	(do_sin): Likewise.
> 	(do_sin_slow): Likewise.
> 	(do_sincos_1): Remove the no longer necessary input partitioning.
> 	(do_sincos_2): Likewise.
> 	(__sin): Likewise.
> 	(__cos): Likewise.
> 	(slow1): Likewise.
> 	(slow2): Likewise.
> 	(sloww1): Likewise.
> 	(sloww2): Likewise.
> 	(bsloww1): Likewise.
> 	(bsloww2): Likewise.
> 	(cslow2): Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469
  2016-08-23 18:23 ` [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469 Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
@ 2016-09-01 16:35   ` Joseph Myers
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2016-09-01 16:35 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> The only code looks slightly different from DO_SIN but on closer
> examination, should give exactly the same result.  Drop it in favour
> of the DO_SIN function call.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Use DO_SIN.

OK, but it's do_sin not DO_SIN; uppercasing only applies when referring to 
the value of a variable.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] Inline all support functions for sin and cos
  2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
  2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
  2016-08-30  7:53   ` [PATCH " Andreas Schwab
@ 2016-09-01 16:36   ` Joseph Myers
  2 siblings, 0 replies; 22+ messages in thread
From: Joseph Myers @ 2016-09-01 16:36 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> The support functions for sin and cos have a lot of identical
> functionality, so inlining them gives a pretty decent jump in
> functionality: ~19% in the sincos function.  On SPEC2006 this
> translates to about 2.1% in the tonto test.
> 
> 	* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Mark as inline.
> 	(do_cos_slow): Likewise.
> 	(do_sin): Likewise.
> 	(do_sin_slow): Likewise.
> 	(slow): Likewise.
> 	(slow1): Likewise.
> 	(slow2): Likewise.
> 	(sloww): Likewise.
> 	(sloww1): Likewise.
> 	(sloww2): Likewise.
> 	(bsloww): Likewise.
> 	(bsloww1): Likewise.
> 	(bsloww2): Likewise.
> 	(cslow2): Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-01 16:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 18:23 [PATCH 0/5] sin/cos/sincos cleanups Siddhesh Poyarekar
2016-08-23 18:23 ` [PATCH 1/5] Consolidate reduce_and_compute code Siddhesh Poyarekar
2016-08-24  1:52   ` Adhemerval Zanella
2016-08-29 16:03   ` Joseph Myers
2016-08-30  9:26     ` Siddhesh Poyarekar
2016-08-23 18:23 ` [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos Siddhesh Poyarekar
2016-08-23 20:53   ` Manfred
2016-08-23 23:05     ` Joseph Myers
2016-08-24  2:50     ` Siddhesh Poyarekar
2016-08-29 16:19   ` Joseph Myers
2016-08-23 18:23 ` [PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469 Siddhesh Poyarekar
2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
2016-09-01 16:35   ` [PATCH " Joseph Myers
2016-08-23 18:23 ` [PATCH 5/5] Inline all support functions for sin and cos Siddhesh Poyarekar
2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
2016-08-30  7:53   ` [PATCH " Andreas Schwab
2016-08-30  7:59     ` Ramana Radhakrishnan
2016-08-30  8:48     ` Siddhesh Poyarekar
2016-09-01 16:36   ` Joseph Myers
2016-08-23 18:23 ` [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin Siddhesh Poyarekar
2016-08-30  3:12   ` [PING][PATCH " Siddhesh Poyarekar
2016-09-01 16:23   ` [PATCH " Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).