[PATCH 0/3] Updates to the new math code

public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Updates to the new math code
@ 2018-07-04 15:52 Szabolcs Nagy
  2018-07-04 15:53 ` [PATCH 1/3] Fix code style and comments of " Szabolcs Nagy
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-04 15:52 UTC (permalink / raw)
  To: newlib; +Cc: nd

There are some modifications and bug fixes in the Arm Optimized
Routines repo and i'd like to sync newlib with it.

Szabolcs Nagy (3):
   Fix code style and comments of new math code
   Change the return type of converttoint and document the semantics
   Fix large ulp error in pow without fma very near 1.0

  newlib/libm/common/exp.c         | 22 +++++++++++++------
  newlib/libm/common/exp2.c        | 18 +++++++++++-----
  newlib/libm/common/log.c         | 46 ++++++++++++++++++++++------------------
  newlib/libm/common/log2.c        | 31 ++++++++++++++-------------
  newlib/libm/common/math_config.h | 44 ++++++++++++++++++++++++++++++++------
  newlib/libm/common/pow.c         | 35 ++++++++++++++++++++++--------
  newlib/libm/common/sincosf.c     | 16 +++++++-------
  newlib/libm/common/sinf.c        | 12 +++++------
  8 files changed, 147 insertions(+), 77 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] Fix code style and comments of new math code
  2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
@ 2018-07-04 15:53 ` Szabolcs Nagy
  2018-07-04 15:54 ` [PATCH 2/3] Change the return type of converttoint and document the, semantics Szabolcs Nagy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-04 15:53 UTC (permalink / raw)
  To: newlib; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0001-Fix-code-style-and-comments-of-new-math-code.patch --]
[-- Type: text/x-patch, Size: 20044 bytes --]

From 02f0bb54ac75cb780c09968a2e78d3df060ac932 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Tue, 3 Jul 2018 12:54:36 +0100
Subject: [PATCH 1/3] Fix code style and comments of new math code

Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch.  This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
---
 newlib/libm/common/exp.c         | 22 +++++++++++++------
 newlib/libm/common/exp2.c        | 18 +++++++++++-----
 newlib/libm/common/log.c         | 46 ++++++++++++++++++++++------------------
 newlib/libm/common/log2.c        | 31 ++++++++++++++-------------
 newlib/libm/common/math_config.h | 34 ++++++++++++++++++++++++-----
 newlib/libm/common/pow.c         | 29 +++++++++++++++++++------
 newlib/libm/common/sincosf.c     | 16 +++++++-------
 newlib/libm/common/sinf.c        | 12 +++++------
 8 files changed, 134 insertions(+), 74 deletions(-)

diff --git a/newlib/libm/common/exp.c b/newlib/libm/common/exp.c
index f295b6173..157473de8 100644
--- a/newlib/libm/common/exp.c
+++ b/newlib/libm/common/exp.c
@@ -45,6 +45,13 @@
 #define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
 #define C6 __exp_data.poly[9 - EXP_POLY_ORDER]
 
+/* Handle cases that may overflow or underflow when computing the result that
+   is scale*(1+TMP) without intermediate rounding.  The bit representation of
+   scale is in SBITS, however it has a computed exponent that may have
+   overflown into the sign bit so that needs to be adjusted before using it as
+   a double.  (int32_t)KI is the k used in the argument reduction and exponent
+   adjustment of scale, positive k here means the result may overflow and
+   negative k means the result may underflow.  */
 static inline double
 specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
 {
@@ -83,6 +90,7 @@ specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
   return check_uflow (y);
 }
 
+/* Top 12 bits of a double (sign and exponent bits).  */
 static inline uint32_t
 top12 (double x)
 {
@@ -136,29 +144,29 @@ exp (double x)
   ki = asuint64 (kd);
   kd -= Shift;
 #endif
-  r = x + kd*NegLn2hiN + kd*NegLn2loN;
+  r = x + kd * NegLn2hiN + kd * NegLn2loN;
   /* 2^(k/N) ~= scale * (1 + tail).  */
-  idx = 2*(ki % N);
+  idx = 2 * (ki % N);
   top = ki << (52 - EXP_TABLE_BITS);
   tail = asdouble (T[idx]);
   /* This is only a valid scale when -1023*N < k < 1024*N.  */
   sbits = T[idx + 1] + top;
   /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1).  */
   /* Evaluation is optimized assuming superscalar pipelined execution.  */
-  r2 = r*r;
+  r2 = r * r;
   /* Without fma the worst case error is 0.25/N ulp larger.  */
   /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp.  */
 #if EXP_POLY_ORDER == 4
-  tmp = tail + r + r2*C2 + r*r2*(C3 + r*C4);
+  tmp = tail + r + r2 * C2 + r * r2 * (C3 + r * C4);
 #elif EXP_POLY_ORDER == 5
-  tmp = tail + r + r2*(C2 + r*C3) + r2*r2*(C4 + r*C5);
+  tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5);
 #elif EXP_POLY_ORDER == 6
-  tmp = tail + r + r2*(0.5 + r*C3) + r2*r2*(C4 + r*C5 + r2*C6);
+  tmp = tail + r + r2 * (0.5 + r * C3) + r2 * r2 * (C4 + r * C5 + r2 * C6);
 #endif
   if (unlikely (abstop == 0))
     return specialcase (tmp, sbits, ki);
   scale = asdouble (sbits);
-  /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there
+  /* Note: tmp == 0 or |tmp| > 2^-65 and scale > 2^-739, so there
      is no spurious underflow here even without fma.  */
   return scale + scale * tmp;
 }
diff --git a/newlib/libm/common/exp2.c b/newlib/libm/common/exp2.c
index 6579e16fd..d4883d8f6 100644
--- a/newlib/libm/common/exp2.c
+++ b/newlib/libm/common/exp2.c
@@ -43,6 +43,13 @@
 #define C5 __exp_data.exp2_poly[4]
 #define C6 __exp_data.exp2_poly[5]
 
+/* Handle cases that may overflow or underflow when computing the result that
+   is scale*(1+TMP) without intermediate rounding.  The bit representation of
+   scale is in SBITS, however it has a computed exponent that may have
+   overflown into the sign bit so that needs to be adjusted before using it as
+   a double.  (int32_t)KI is the k used in the argument reduction and exponent
+   adjustment of scale, positive k here means the result may overflow and
+   negative k means the result may underflow.  */
 static inline double
 specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
 {
@@ -81,6 +88,7 @@ specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
   return check_uflow (y);
 }
 
+/* Top 12 bits of a double (sign and exponent bits).  */
 static inline uint32_t
 top12 (double x)
 {
@@ -125,22 +133,22 @@ exp2 (double x)
   kd -= Shift; /* k/N for int k.  */
   r = x - kd;
   /* 2^(k/N) ~= scale * (1 + tail).  */
-  idx = 2*(ki % N);
+  idx = 2 * (ki % N);
   top = ki << (52 - EXP_TABLE_BITS);
   tail = asdouble (T[idx]);
   /* This is only a valid scale when -1023*N < k < 1024*N.  */
   sbits = T[idx + 1] + top;
   /* exp2(x) = 2^(k/N) * 2^r ~= scale + scale * (tail + 2^r - 1).  */
   /* Evaluation is optimized assuming superscalar pipelined execution.  */
-  r2 = r*r;
+  r2 = r * r;
   /* Without fma the worst case error is 0.5/N ulp larger.  */
   /* Worst case error is less than 0.5+0.86/N+(abs poly error * 2^53) ulp.  */
 #if EXP2_POLY_ORDER == 4
-  tmp = tail + r*C1 + r2*C2 + r*r2*(C3 + r*C4);
+  tmp = tail + r * C1 + r2 * C2 + r * r2 * (C3 + r * C4);
 #elif EXP2_POLY_ORDER == 5
-  tmp = tail + r*C1 + r2*(C2 + r*C3) + r2*r2*(C4 + r*C5);
+  tmp = tail + r * C1 + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5);
 #elif EXP2_POLY_ORDER == 6
-  tmp = tail + r*C1 + r2*(0.5 + r*C3) + r2*r2*(C4 + r*C5 + r2*C6);
+  tmp = tail + r * C1 + r2 * (0.5 + r * C3) + r2 * r2 * (C4 + r * C5 + r2 * C6);
 #endif
   if (unlikely (abstop == 0))
     return specialcase (tmp, sbits, ki);
diff --git a/newlib/libm/common/log.c b/newlib/libm/common/log.c
index 32752f55c..2e350749f 100644
--- a/newlib/libm/common/log.c
+++ b/newlib/libm/common/log.c
@@ -42,6 +42,7 @@
 #define N (1 << LOG_TABLE_BITS)
 #define OFF 0x3fe6000000000000
 
+/* Top 16 bits of a double.  */
 static inline uint32_t
 top16 (double x)
 {
@@ -74,44 +75,44 @@ log (double x)
       if (WANT_ROUNDING && unlikely (ix == asuint64 (1.0)))
 	return 0;
       r = x - 1.0;
-      r2 = r*r;
-      r3 = r*r2;
+      r2 = r * r;
+      r3 = r * r2;
 #if LOG_POLY1_ORDER == 10
       /* Worst-case error is around 0.516 ULP.  */
-      y = r3*(B[1] + r*B[2] + r2*B[3]
-	      + r3*(B[4] + r*B[5] + r2*B[6] + r3*(B[7] + r*B[8])));
-      w = B[0]*r2; /* B[0] == -0.5.  */
+      y = r3 * (B[1] + r * B[2] + r2 * B[3]
+		+ r3 * (B[4] + r * B[5] + r2 * B[6] + r3 * (B[7] + r * B[8])));
+      w = B[0] * r2; /* B[0] == -0.5.  */
       hi = r + w;
       y += r - hi + w;
       y += hi;
 #elif LOG_POLY1_ORDER == 11
       /* Worst-case error is around 0.516 ULP.  */
-      y = r3*(B[1] + r*B[2]
-	      + r2*(B[3] + r*B[4] + r2*B[5]
-		    + r3*(B[6] + r*B[7] + r2*B[8] + r3*B[9])));
-      w = B[0]*r2; /* B[0] == -0.5.  */
+      y = r3 * (B[1] + r * B[2]
+		+ r2 * (B[3] + r * B[4] + r2 * B[5]
+			+ r3 * (B[6] + r * B[7] + r2 * B[8] + r3 * B[9])));
+      w = B[0] * r2; /* B[0] == -0.5.  */
       hi = r + w;
       y += r - hi + w;
       y += hi;
 #elif LOG_POLY1_ORDER == 12
-      y = r3*(B[1] + r*B[2] + r2*B[3]
-	      + r3*(B[4] + r*B[5] + r2*B[6]
-		    + r3*(B[7] + r*B[8] + r2*B[9] + r3*B[10])));
+      y = r3 * (B[1] + r * B[2] + r2 * B[3]
+		+ r3 * (B[4] + r * B[5] + r2 * B[6]
+			+ r3 * (B[7] + r * B[8] + r2 * B[9] + r3 * B[10])));
 # if N <= 64
       /* Worst-case error is around 0.532 ULP.  */
-      w = B[0]*r2; /* B[0] == -0.5.  */
+      w = B[0] * r2; /* B[0] == -0.5.  */
       hi = r + w;
       y += r - hi + w;
       y += hi;
 # else
       /* Worst-case error is around 0.507 ULP.  */
-      w = r*0x1p27;
+      w = r * 0x1p27;
       double_t rhi = r + w - w;
       double_t rlo = r - rhi;
-      w = rhi*rhi*B[0]; /* B[0] == -0.5.  */
+      w = rhi * rhi * B[0]; /* B[0] == -0.5.  */
       hi = r + w;
       lo = r - hi + w;
-      lo += B[0]*rlo*(rhi + r);
+      lo += B[0] * rlo * (rhi + r);
       y += lo;
       y += hi;
 # endif
@@ -150,14 +151,14 @@ log (double x)
   r = fma (z, invc, -1.0);
 #else
   /* rounding error: 0x1p-55/N + 0x1p-66.  */
-  r = (z - T2[i].chi - T2[i].clo)*invc;
+  r = (z - T2[i].chi - T2[i].clo) * invc;
 #endif
   kd = (double_t) k;
 
   /* hi + lo = r + log(c) + k*Ln2.  */
-  w = kd*Ln2hi + logc;
+  w = kd * Ln2hi + logc;
   hi = w + r;
-  lo = w - hi + r + kd*Ln2lo;
+  lo = w - hi + r + kd * Ln2lo;
 
   /* log(x) = lo + (log1p(r) - r) + hi.  */
   r2 = r * r; /* rounding error: 0x1p-54/N^2.  */
@@ -166,9 +167,12 @@ log (double x)
      Worst case error if |y| > 0x1p-4:
      0.5 + 2.06/N + abs-poly-error*2^56 ULP (+ 0.001 ULP without fma).  */
 #if LOG_POLY_ORDER == 6
-  y = lo + r2*A[0] + r*r2*(A[1] + r*A[2] + r2*(A[3] + r*A[4])) + hi;
+  y = lo + r2 * A[0] + r * r2 * (A[1] + r * A[2] + r2 * (A[3] + r * A[4])) + hi;
 #elif LOG_POLY_ORDER == 7
-  y = lo + r2*(A[0] + r*A[1] + r2*(A[2] + r*A[3]) + r2*r2*(A[4] + r*A[5])) + hi;
+  y = lo
+      + r2 * (A[0] + r * A[1] + r2 * (A[2] + r * A[3])
+	      + r2 * r2 * (A[4] + r * A[5]))
+      + hi;
 #endif
   return y;
 }
diff --git a/newlib/libm/common/log2.c b/newlib/libm/common/log2.c
index 64617de6c..a2da93e74 100644
--- a/newlib/libm/common/log2.c
+++ b/newlib/libm/common/log2.c
@@ -42,6 +42,7 @@
 #define N (1 << LOG2_TABLE_BITS)
 #define OFF 0x3fe6000000000000
 
+/* Top 16 bits of a double.  */
 static inline uint32_t
 top16 (double x)
 {
@@ -72,24 +73,24 @@ double
 	return 0;
       r = x - 1.0;
 #if __HAVE_FAST_FMA
-      hi = r*InvLn2hi;
-      lo = r*InvLn2lo + fma (r, InvLn2hi, -hi);
+      hi = r * InvLn2hi;
+      lo = r * InvLn2lo + fma (r, InvLn2hi, -hi);
 #else
       double_t rhi, rlo;
-      rhi = asdouble (asuint64 (r) & -1ULL<<32);
+      rhi = asdouble (asuint64 (r) & -1ULL << 32);
       rlo = r - rhi;
-      hi = rhi*InvLn2hi;
-      lo = rlo*InvLn2hi + r*InvLn2lo;
+      hi = rhi * InvLn2hi;
+      lo = rlo * InvLn2hi + r * InvLn2lo;
 #endif
       r2 = r * r; /* rounding error: 0x1p-62.  */
       r4 = r2 * r2;
 #if LOG2_POLY1_ORDER == 11
       /* Worst-case error is less than 0.54 ULP (0.55 ULP without fma).  */
-      p = r2*(B[0] + r*B[1]);
+      p = r2 * (B[0] + r * B[1]);
       y = hi + p;
       lo += hi - y + p;
-      lo += r4*(B[2] + r*B[3] + r2*(B[4] + r*B[5])
-		+ r4*(B[6] + r*B[7] + r2*(B[8] + r*B[9])));
+      lo += r4 * (B[2] + r * B[3] + r2 * (B[4] + r * B[5])
+		  + r4 * (B[6] + r * B[7] + r2 * (B[8] + r * B[9])));
       y += lo;
 #endif
       return y;
@@ -125,16 +126,16 @@ double
 #if __HAVE_FAST_FMA
   /* rounding error: 0x1p-55/N.  */
   r = fma (z, invc, -1.0);
-  t1 = r*InvLn2hi;
-  t2 = r*InvLn2lo + fma (r, InvLn2hi, -t1);
+  t1 = r * InvLn2hi;
+  t2 = r * InvLn2lo + fma (r, InvLn2hi, -t1);
 #else
   double_t rhi, rlo;
   /* rounding error: 0x1p-55/N + 0x1p-65.  */
-  r = (z - T2[i].chi - T2[i].clo)*invc;
+  r = (z - T2[i].chi - T2[i].clo) * invc;
   rhi = asdouble (asuint64 (r) & -1ULL << 32);
   rlo = r - rhi;
-  t1 = rhi*InvLn2hi;
-  t2 = rlo*InvLn2hi + r*InvLn2lo;
+  t1 = rhi * InvLn2hi;
+  t2 = rlo * InvLn2hi + r * InvLn2lo;
 #endif
 
   /* hi + lo = r/ln2 + log2(c) + k.  */
@@ -149,8 +150,8 @@ double
 #if LOG2_POLY_ORDER == 7
   /* Worst-case error if |y| > 0x1p-4: 0.547 ULP (0.550 ULP without fma).
      ~ 0.5 + 2/N/ln2 + abs-poly-error*0x1p56 ULP (+ 0.003 ULP without fma).  */
-  p = A[0] + r*A[1] + r2*(A[2] + r*A[3]) + r4*(A[4] + r*A[5]);
-  y = lo + r2*p + hi;
+  p = A[0] + r * A[1] + r2 * (A[2] + r * A[3]) + r4 * (A[4] + r * A[5]);
+  y = lo + r2 * p + hi;
 #endif
   return y;
 }
diff --git a/newlib/libm/common/math_config.h b/newlib/libm/common/math_config.h
index 3a8cebbe3..aec9cd0d6 100644
--- a/newlib/libm/common/math_config.h
+++ b/newlib/libm/common/math_config.h
@@ -233,28 +233,48 @@ eval_as_double (double x)
 # define unlikely(x) (x)
 #endif
 
-/* Error handling tail calls for special cases, with sign argument.  */
+/* Error handling tail calls for special cases, with a sign argument.
+   The sign of the return value is set if the argument is non-zero.  */
+
+/* The result overflows.  */
 HIDDEN float __math_oflowf (uint32_t);
+/* The result underflows to 0 in nearest rounding mode.  */
 HIDDEN float __math_uflowf (uint32_t);
+/* The result underflows to 0 in some directed rounding mode only.  */
 HIDDEN float __math_may_uflowf (uint32_t);
+/* Division by zero.  */
 HIDDEN float __math_divzerof (uint32_t);
+/* The result overflows.  */
 HIDDEN double __math_oflow (uint32_t);
+/* The result underflows to 0 in nearest rounding mode.  */
 HIDDEN double __math_uflow (uint32_t);
+/* The result underflows to 0 in some directed rounding mode only.  */
 HIDDEN double __math_may_uflow (uint32_t);
+/* Division by zero.  */
 HIDDEN double __math_divzero (uint32_t);
+
 /* Error handling using input checking.  */
+
+/* Invalid input unless it is a quiet NaN.  */
 HIDDEN float __math_invalidf (float);
+/* Invalid input unless it is a quiet NaN.  */
 HIDDEN double __math_invalid (double);
+
 /* Error handling using output checking, only for errno setting.  */
+
+/* Check if the result overflowed to infinity.  */
 HIDDEN double __math_check_oflow (double);
+/* Check if the result underflowed to 0.  */
 HIDDEN double __math_check_uflow (double);
 
+/* Check if the result overflowed to infinity.  */
 static inline double
 check_oflow (double x)
 {
   return WANT_ERRNO ? __math_check_oflow (x) : x;
 }
 
+/* Check if the result underflowed to 0.  */
 static inline double
 check_uflow (double x)
 {
@@ -324,7 +344,8 @@ extern const struct powf_log2_data
 #define EXP_USE_TOINT_NARROW 0
 #define EXP2_POLY_ORDER 5
 #define EXP2_POLY_WIDE 0
-extern const struct exp_data {
+extern const struct exp_data
+{
   double invln2N;
   double shift;
   double negln2hiN;
@@ -338,7 +359,8 @@ extern const struct exp_data {
 #define LOG_TABLE_BITS 7
 #define LOG_POLY_ORDER 6
 #define LOG_POLY1_ORDER 12
-extern const struct log_data {
+extern const struct log_data
+{
   double ln2hi;
   double ln2lo;
   double poly[LOG_POLY_ORDER - 1]; /* First coefficient is 1.  */
@@ -352,7 +374,8 @@ extern const struct log_data {
 #define LOG2_TABLE_BITS 6
 #define LOG2_POLY_ORDER 7
 #define LOG2_POLY1_ORDER 11
-extern const struct log2_data {
+extern const struct log2_data
+{
   double invln2hi;
   double invln2lo;
   double poly[LOG2_POLY_ORDER - 1];
@@ -365,7 +388,8 @@ extern const struct log2_data {
 
 #define POW_LOG_TABLE_BITS 7
 #define POW_LOG_POLY_ORDER 8
-extern const struct pow_log_data {
+extern const struct pow_log_data
+{
   double ln2hi;
   double ln2lo;
   double poly[POW_LOG_POLY_ORDER - 1]; /* First coefficient is 1.  */
diff --git a/newlib/libm/common/pow.c b/newlib/libm/common/pow.c
index da7081cb4..7d8060751 100644
--- a/newlib/libm/common/pow.c
+++ b/newlib/libm/common/pow.c
@@ -46,12 +46,16 @@ ulperr_exp: 0.509 ULP (ULP error of exp, 0.511 ULP without fma)
 #define N (1 << POW_LOG_TABLE_BITS)
 #define OFF 0x3fe6955500000000
 
+/* Top 12 bits of a double (sign and exponent bits).  */
 static inline uint32_t
 top12 (double x)
 {
   return asuint64 (x) >> 52;
 }
 
+/* Compute y+tail = log(x) where the rounded result is y and tail has about
+   additional 15 bits precision.  The bit representation of x if in ix, but
+   normalized in the subnormal range using sign bit too for the exponent.  */
 static inline double_t
 log_inline (uint64_t ix, double_t *tail)
 {
@@ -111,7 +115,8 @@ log_inline (uint64_t ix, double_t *tail)
 #endif
   /* p = log1p(r) - r - A[0]*r*r.  */
 #if POW_LOG_POLY_ORDER == 8
-  p = ar3*(A[1] + r*A[2] + ar2*(A[3] + r*A[4] + ar2*(A[5] + r*A[6])));
+  p = (ar3
+       * (A[1] + r * A[2] + ar2 * (A[3] + r * A[4] + ar2 * (A[5] + r * A[6]))));
 #endif
   lo = lo1 + lo2 + lo3 + lo4 + p;
   y = hi + lo;
@@ -133,6 +138,13 @@ log_inline (uint64_t ix, double_t *tail)
 #define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
 #define C6 __exp_data.poly[9 - EXP_POLY_ORDER]
 
+/* Handle cases that may overflow or underflow when computing the result that
+   is scale*(1+TMP) without intermediate rounding.  The bit representation of
+   scale is in SBITS, however it has a computed exponent that may have
+   overflown into the sign bit so that needs to be adjusted before using it as
+   a double.  (int32_t)KI is the k used in the argument reduction and exponent
+   adjustment of scale, positive k here means the result may overflow and
+   negative k means the result may underflow.  */
 static inline double
 specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
 {
@@ -176,6 +188,8 @@ specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
 
 #define SIGN_BIAS (0x800 << EXP_TABLE_BITS)
 
+/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|.
+   The sign_bias argument is SIGN_BIAS or 0 and sets the sign to -1 or 1.  */
 static inline double
 exp_inline (double x, double xtail, uint32_t sign_bias)
 {
@@ -223,26 +237,26 @@ exp_inline (double x, double xtail, uint32_t sign_bias)
   ki = asuint64 (kd);
   kd -= Shift;
 #endif
-  r = x + kd*NegLn2hiN + kd*NegLn2loN;
+  r = x + kd * NegLn2hiN + kd * NegLn2loN;
   /* The code assumes 2^-200 < |xtail| < 2^-8/N.  */
   r += xtail;
   /* 2^(k/N) ~= scale * (1 + tail).  */
-  idx = 2*(ki % N);
+  idx = 2 * (ki % N);
   top = (ki + sign_bias) << (52 - EXP_TABLE_BITS);
   tail = asdouble (T[idx]);
   /* This is only a valid scale when -1023*N < k < 1024*N.  */
   sbits = T[idx + 1] + top;
   /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1).  */
   /* Evaluation is optimized assuming superscalar pipelined execution.  */
-  r2 = r*r;
+  r2 = r * r;
   /* Without fma the worst case error is 0.25/N ulp larger.  */
   /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp.  */
 #if EXP_POLY_ORDER == 4
-  tmp = tail + r + r2*C2 + r*r2*(C3 + r*C4);
+  tmp = tail + r + r2 * C2 + r * r2 * (C3 + r * C4);
 #elif EXP_POLY_ORDER == 5
-  tmp = tail + r + r2*(C2 + r*C3) + r2*r2*(C4 + r*C5);
+  tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5);
 #elif EXP_POLY_ORDER == 6
-  tmp = tail + r + r2*(0.5 + r*C3) + r2*r2*(C4 + r*C5 + r2*C6);
+  tmp = tail + r + r2 * (0.5 + r * C3) + r2 * r2 * (C4 + r * C5 + r2 * C6);
 #endif
   if (unlikely (abstop == 0))
     return specialcase (tmp, sbits, ki);
@@ -268,6 +282,7 @@ checkint (uint64_t iy)
   return 2;
 }
 
+/* Returns 1 if input is the bit representation of 0, infinity or nan.  */
 static inline int
 zeroinfnan (uint64_t i)
 {
diff --git a/newlib/libm/common/sincosf.c b/newlib/libm/common/sincosf.c
index 85f12642f..2ead353e8 100644
--- a/newlib/libm/common/sincosf.c
+++ b/newlib/libm/common/sincosf.c
@@ -50,14 +50,14 @@ sincosf (float y, float *sinp, float *cosp)
       double x2 = x * x;
 
       if (unlikely (abstop12 (y) < abstop12 (0x1p-12f)))
-      {
-	if (unlikely (abstop12 (y) < abstop12 (0x1p-126f)))
-	  /* Force underflow for tiny y.  */
-	  force_eval_float (x2);
-	*sinp = y;
-	*cosp = 1.0f;
-	return;
-      }
+	{
+	  if (unlikely (abstop12 (y) < abstop12 (0x1p-126f)))
+	    /* Force underflow for tiny y.  */
+	    force_eval_float (x2);
+	  *sinp = y;
+	  *cosp = 1.0f;
+	  return;
+	}
 
       sincosf_poly (x, x2, p, 0, sinp, cosp);
     }
diff --git a/newlib/libm/common/sinf.c b/newlib/libm/common/sinf.c
index c1baf4237..715bdc8d0 100644
--- a/newlib/libm/common/sinf.c
+++ b/newlib/libm/common/sinf.c
@@ -49,12 +49,12 @@ sinf (float y)
       s = x * x;
 
       if (unlikely (abstop12 (y) < abstop12 (0x1p-12f)))
-      {
-	if (unlikely (abstop12 (y) < abstop12 (0x1p-126f)))
-	  /* Force underflow for tiny y.  */
-	  force_eval_float (s);
-	return y;
-      }
+	{
+	  if (unlikely (abstop12 (y) < abstop12 (0x1p-126f)))
+	    /* Force underflow for tiny y.  */
+	    force_eval_float (s);
+	  return y;
+	}
 
       return sinf_poly (x, s, p, 0);
     }
-- 
2.14.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] Change the return type of converttoint and document the, semantics
  2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
  2018-07-04 15:53 ` [PATCH 1/3] Fix code style and comments of " Szabolcs Nagy
@ 2018-07-04 15:54 ` Szabolcs Nagy
  2018-07-04 16:57 ` [PATCH 3/3] Fix large ulp error in pow without fma very near 1.0 Szabolcs Nagy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-04 15:54 UTC (permalink / raw)
  To: newlib; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0002-Change-the-return-type-of-converttoint-and-document-.patch --]
[-- Type: text/x-patch, Size: 2249 bytes --]

From 99ee950a35cdca2f19ac0b9692d4ad8077cdaad9 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Wed, 4 Jul 2018 11:09:39 +0100
Subject: [PATCH 2/3] Change the return type of converttoint and document the
 semantics

The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.

The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.

But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.

In newlib with default settings only aarch64 is affected and there is no
significant code generation change with gcc after the patch.
---
 newlib/libm/common/math_config.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/newlib/libm/common/math_config.h b/newlib/libm/common/math_config.h
index aec9cd0d6..220ebd6ef 100644
--- a/newlib/libm/common/math_config.h
+++ b/newlib/libm/common/math_config.h
@@ -62,15 +62,23 @@
 #endif
 
 #if HAVE_FAST_ROUND
+/* When set, the roundtoint and converttoint functions are provided with
+   the semantics documented below.  */
 # define TOINT_INTRINSICS 1
 
+/* Round x to nearest int in all rounding modes, ties have to be rounded
+   consistently with converttoint so the results match.  If the result
+   would be outside of [-2^31, 2^31-1] then the semantics is unspecified.  */
 static inline double_t
 roundtoint (double_t x)
 {
   return round (x);
 }
 
-static inline uint64_t
+/* Convert x to nearest int in all rounding modes, ties have to be rounded
+   consistently with roundtoint.  If the result is not representible in an
+   int32_t then the semantics is unspecified.  */
+static inline int32_t
 converttoint (double_t x)
 {
 # if HAVE_FAST_LROUND
-- 
2.14.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] Fix large ulp error in pow without fma very near 1.0
  2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
  2018-07-04 15:53 ` [PATCH 1/3] Fix code style and comments of " Szabolcs Nagy
  2018-07-04 15:54 ` [PATCH 2/3] Change the return type of converttoint and document the, semantics Szabolcs Nagy
@ 2018-07-04 16:57 ` Szabolcs Nagy
  2018-07-05  8:34 ` [PATCH] Fix namespace issues in sinf, cosf and sincosf Szabolcs Nagy
  2018-07-05  9:08 ` [PATCH 0/3] Updates to the new math code Corinna Vinschen
  4 siblings, 0 replies; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-04 16:57 UTC (permalink / raw)
  To: newlib; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0003-Fix-large-ulp-error-in-pow-without-fma-very-near-1.0.patch --]
[-- Type: text/x-patch, Size: 1510 bytes --]

From 068894aab1088aeda0a411d7f609ba38928881ff Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Tue, 3 Jul 2018 13:05:31 +0100
Subject: [PATCH 3/3] Fix large ulp error in pow without fma very near 1.0

The !__HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
magnitude than r which later caused large rounding errors.

So do a nearest rounding instead of truncation at the split.

In newlib with default settings this was observable on some arm targets
that enable the new math code but has no fma.
---
 newlib/libm/common/pow.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/newlib/libm/common/pow.c b/newlib/libm/common/pow.c
index 7d8060751..4863821a5 100644
--- a/newlib/libm/common/pow.c
+++ b/newlib/libm/common/pow.c
@@ -79,11 +79,13 @@ log_inline (uint64_t ix, double_t *tail)
   logc = T[i].logc;
   logctail = T[i].logctail;
 
-  /* r = z/c - 1, arranged to be exact.  */
+  /* Note: 1/c is j/N or j/N/2 where j is an integer in [N,2N) and
+     |z/c - 1| < 1/N, so r = z/c - 1 is exactly representible.  */
 #if __HAVE_FAST_FMA
   r = fma (z, invc, -1.0);
 #else
-  double_t zhi = asdouble (iz & (-1ULL << 32));
+  /* Split z such that rhi, rlo and rhi*rhi are exact and |rlo| <= |r|.  */
+  double_t zhi = asdouble ((iz + (1ULL << 31)) & (-1ULL << 32));
   double_t zlo = z - zhi;
   double_t rhi = zhi * invc - 1.0;
   double_t rlo = zlo * invc;
-- 
2.14.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Fix namespace issues in sinf, cosf and sincosf
  2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
                   ` (2 preceding siblings ...)
  2018-07-04 16:57 ` [PATCH 3/3] Fix large ulp error in pow without fma very near 1.0 Szabolcs Nagy
@ 2018-07-05  8:34 ` Szabolcs Nagy
  2018-07-05  9:08 ` [PATCH 0/3] Updates to the new math code Corinna Vinschen
  4 siblings, 0 replies; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-05  8:34 UTC (permalink / raw)
  To: newlib; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On 04/07/18 16:49, Szabolcs Nagy wrote:
> There are some modifications and bug fixes in the Arm Optimized
> Routines repo and i'd like to sync newlib with it.
> 
> Szabolcs Nagy (3):
>  Â  Fix code style and comments of new math code
>  Â  Change the return type of converttoint and document the semantics
>  Â  Fix large ulp error in pow without fma very near 1.0
> 

sorry i forgot to add this one when sending out the cover letter.

[-- Attachment #2: 0001-Fix-namespace-issues-in-sinf-cosf-and-sincosf.patch --]
[-- Type: text/x-patch, Size: 6218 bytes --]

From 9af28fe277fb5123916a60d89e72c5f01ddc6cc6 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Wed, 4 Jul 2018 17:52:36 +0100
Subject: [PATCH] Fix namespace issues in sinf, cosf and sincosf

Use const sincos_t for clarity instead of making the typedef const.
Use __inv_pi4 and __sincosf_table to avoid namespace issues with
static linking.
---
 newlib/libm/common/cosf.c         |  6 +++---
 newlib/libm/common/sincosf.c      |  6 +++---
 newlib/libm/common/sincosf.h      | 15 ++++++++-------
 newlib/libm/common/sincosf_data.c |  4 ++--
 newlib/libm/common/sinf.c         |  6 +++---
 5 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/newlib/libm/common/cosf.c b/newlib/libm/common/cosf.c
index aac0a9aee..f87186c68 100644
--- a/newlib/libm/common/cosf.c
+++ b/newlib/libm/common/cosf.c
@@ -43,7 +43,7 @@ cosf (float y)
   double x = y;
   double s;
   int n;
-  sincos_t *p = &sincosf_table[0];
+  const sincos_t *p = &__sincosf_table[0];
 
   if (abstop12 (y) < abstop12 (pio4))
     {
@@ -62,7 +62,7 @@ cosf (float y)
       s = p->sign[n & 3];
 
       if (n & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       return sinf_poly (x * s, x * x, p, n ^ 1);
     }
@@ -77,7 +77,7 @@ cosf (float y)
       s = p->sign[(n + sign) & 3];
 
       if ((n + sign) & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       return sinf_poly (x * s, x * x, p, n ^ 1);
     }
diff --git a/newlib/libm/common/sincosf.c b/newlib/libm/common/sincosf.c
index 2ead353e8..65dd05e6e 100644
--- a/newlib/libm/common/sincosf.c
+++ b/newlib/libm/common/sincosf.c
@@ -43,7 +43,7 @@ sincosf (float y, float *sinp, float *cosp)
   double x = y;
   double s;
   int n;
-  sincos_t *p = &sincosf_table[0];
+  const sincos_t *p = &__sincosf_table[0];
 
   if (abstop12 (y) < abstop12 (pio4))
     {
@@ -69,7 +69,7 @@ sincosf (float y, float *sinp, float *cosp)
       s = p->sign[n & 3];
 
       if (n & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       sincosf_poly (x * s, x * x, p, n, sinp, cosp);
     }
@@ -84,7 +84,7 @@ sincosf (float y, float *sinp, float *cosp)
       s = p->sign[(n + sign) & 3];
 
       if ((n + sign) & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       sincosf_poly (x * s, x * x, p, n, sinp, cosp);
     }
diff --git a/newlib/libm/common/sincosf.h b/newlib/libm/common/sincosf.h
index 955a73312..27f24ffe3 100644
--- a/newlib/libm/common/sincosf.h
+++ b/newlib/libm/common/sincosf.h
@@ -33,15 +33,15 @@ static const double pi64 = 0x1.921FB54442D18p-62;
 /* PI / 4.  */
 static const double pio4 = 0x1.921FB54442D18p-1;
 
-typedef const struct
+typedef struct
 {
   double sign[4];
   double hpi_inv, hpi, c0, c1, c2, c3, c4, s1, s2, s3;
 } sincos_t;
 
-extern sincos_t sincosf_table[2] HIDDEN;
+extern const sincos_t __sincosf_table[2] HIDDEN;
 
-extern const uint32_t inv_pio4[] HIDDEN;
+extern const uint32_t __inv_pio4[] HIDDEN;
 
 /* abstop12 assumes floating point reinterpret is fast by default.
    If floating point comparisons are faster, define PREFER_FLOAT_COMPARISON.  */
@@ -63,7 +63,8 @@ abstop12 (float x)
    polynomial P and store the results in SINP and COSP.  N is the quadrant,
    if odd the cosine and sine polynomials are swapped.  */
 static inline void
-sincosf_poly (double x, double x2, sincos_t *p, int n, float *sinp, float *cosp)
+sincosf_poly (double x, double x2, const sincos_t *p, int n, float *sinp,
+	      float *cosp)
 {
   double x3, x4, x5, x6, s, c, c1, c2, s1;
 
@@ -91,7 +92,7 @@ sincosf_poly (double x, double x2, sincos_t *p, int n, float *sinp, float *cosp)
 /* Return the sine of inputs X and X2 (X squared) using the polynomial P.
    N is the quadrant, and if odd the cosine polynomial is used.  */
 static inline float
-sinf_poly (double x, double x2, sincos_t *p, int n)
+sinf_poly (double x, double x2, const sincos_t *p, int n)
 {
   double x3, x4, x6, x7, s, c, c1, c2, s1;
 
@@ -126,7 +127,7 @@ sinf_poly (double x, double x2, sincos_t *p, int n)
    Use round/lround if inlined, otherwise convert to int.  To avoid inaccuracies
    introduced by truncating negative values, compute the quadrant * 2^24.  */
 static inline double
-reduce_fast (double x, sincos_t *p, int *np)
+reduce_fast (double x, const sincos_t *p, int *np)
 {
   double r;
 #if TOINT_INTRINSICS
@@ -151,7 +152,7 @@ reduce_fast (double x, sincos_t *p, int *np)
 static inline double
 reduce_large (uint32_t xi, int *np)
 {
-  const uint32_t *arr = &inv_pio4[(xi >> 26) & 15];
+  const uint32_t *arr = &__inv_pio4[(xi >> 26) & 15];
   int shift = (xi >> 23) & 7;
   uint64_t n, res0, res1, res2;
 
diff --git a/newlib/libm/common/sincosf_data.c b/newlib/libm/common/sincosf_data.c
index 069ee7bf6..47fe960c1 100644
--- a/newlib/libm/common/sincosf_data.c
+++ b/newlib/libm/common/sincosf_data.c
@@ -34,7 +34,7 @@
 
 /* The constants and polynomials for sine and cosine.  The 2nd entry
    computes -cos (x) rather than cos (x) to get negation for free.  */
-sincos_t sincosf_table[2] =
+const sincos_t __sincosf_table[2] =
 {
   {
     { 1.0, -1.0, -1.0, 1.0 },
@@ -74,7 +74,7 @@ sincos_t sincosf_table[2] =
 
 /* Table with 4/PI to 192 bit precision.  To avoid unaligned accesses
    only 8 new bits are added per entry, making the table 4 times larger.  */
-const uint32_t inv_pio4[24] =
+const uint32_t __inv_pio4[24] =
 {
   0xa2,       0xa2f9,	  0xa2f983,   0xa2f9836e,
   0xf9836e4e, 0x836e4e44, 0x6e4e4415, 0x4e441529,
diff --git a/newlib/libm/common/sinf.c b/newlib/libm/common/sinf.c
index 715bdc8d0..c2e61039b 100644
--- a/newlib/libm/common/sinf.c
+++ b/newlib/libm/common/sinf.c
@@ -42,7 +42,7 @@ sinf (float y)
   double x = y;
   double s;
   int n;
-  sincos_t *p = &sincosf_table[0];
+  const sincos_t *p = &__sincosf_table[0];
 
   if (abstop12 (y) < abstop12 (pio4))
     {
@@ -66,7 +66,7 @@ sinf (float y)
       s = p->sign[n & 3];
 
       if (n & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       return sinf_poly (x * s, x * x, p, n);
     }
@@ -81,7 +81,7 @@ sinf (float y)
       s = p->sign[(n + sign) & 3];
 
       if ((n + sign) & 2)
-	p = &sincosf_table[1];
+	p = &__sincosf_table[1];
 
       return sinf_poly (x * s, x * x, p, n);
     }
-- 
2.14.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Updates to the new math code
  2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
                   ` (3 preceding siblings ...)
  2018-07-05  8:34 ` [PATCH] Fix namespace issues in sinf, cosf and sincosf Szabolcs Nagy
@ 2018-07-05  9:08 ` Corinna Vinschen
  2018-07-05 10:41   ` Szabolcs Nagy
  4 siblings, 1 reply; 9+ messages in thread
From: Corinna Vinschen @ 2018-07-05  9:08 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: newlib, nd

[-- Attachment #1: Type: text/plain, Size: 2081 bytes --]

Hi Szabolcs,

On Jul  4 16:49, Szabolcs Nagy wrote:
> There are some modifications and bug fixes in the Arm Optimized
> Routines repo and i'd like to sync newlib with it.
> 
> Szabolcs Nagy (3):
>   Fix code style and comments of new math code
>   Change the return type of converttoint and document the semantics
>   Fix large ulp error in pow without fma very near 1.0
> 
>  newlib/libm/common/exp.c         | 22 +++++++++++++------
>  newlib/libm/common/exp2.c        | 18 +++++++++++-----
>  newlib/libm/common/log.c         | 46 ++++++++++++++++++++++------------------
>  newlib/libm/common/log2.c        | 31 ++++++++++++++-------------
>  newlib/libm/common/math_config.h | 44 ++++++++++++++++++++++++++++++++------
>  newlib/libm/common/pow.c         | 35 ++++++++++++++++++++++--------
>  newlib/libm/common/sincosf.c     | 16 +++++++-------
>  newlib/libm/common/sinf.c        | 12 +++++------
>  8 files changed, 147 insertions(+), 77 deletions(-)

while you're working on this, I have three questions:

1. There's __HAVE_FAST_FMA defined in libc/include/machine/ieeefp.h and
   HAVE_FAST_ROUND/HAVE_FAST_LROUND defined in libm/common/math_config.h.
   Wouldn't it make more sense to define all of them in one file?
   Not sure which one, but libm/common/math_config.h looks right to me.

2. You're changing converttoint to return int32_t, but the #else
   branch is casting to long:

    # if HAVE_FAST_LROUND
      return lround (x);
    # else
      return (long) round (x);
    # endif
   
   This looks wrong to me independent of the return type of
   converttoint.  Shouldn't that cast to an int of fixed size, i.e.,
   int32_t?

3. Along the same lines, in newlib/libm/common/sf_exp.c there's the
   following expression:

    #if TOINT_INTRINSICS
      kd = roundtoint (z);
      ki = converttoint (z);
    #elif TOINT_RINT
      kd = rint (z);
      ki = (long) kd;
      ^^^^^^^^^^^^^^^

   Shouldn't this cast to int32_t as well?


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Updates to the new math code
  2018-07-05  9:08 ` [PATCH 0/3] Updates to the new math code Corinna Vinschen
@ 2018-07-05 10:41   ` Szabolcs Nagy
  2018-07-05 11:35     ` Szabolcs Nagy
  0 siblings, 1 reply; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-05 10:41 UTC (permalink / raw)
  To: newlib; +Cc: nd

On 05/07/18 09:48, Corinna Vinschen wrote:
> Hi Szabolcs,
> 
> On Jul  4 16:49, Szabolcs Nagy wrote:
>> There are some modifications and bug fixes in the Arm Optimized
>> Routines repo and i'd like to sync newlib with it.
>>
>> Szabolcs Nagy (3):
>>    Fix code style and comments of new math code
>>    Change the return type of converttoint and document the semantics
>>    Fix large ulp error in pow without fma very near 1.0
>>
>>   newlib/libm/common/exp.c         | 22 +++++++++++++------
>>   newlib/libm/common/exp2.c        | 18 +++++++++++-----
>>   newlib/libm/common/log.c         | 46 ++++++++++++++++++++++------------------
>>   newlib/libm/common/log2.c        | 31 ++++++++++++++-------------
>>   newlib/libm/common/math_config.h | 44 ++++++++++++++++++++++++++++++++------
>>   newlib/libm/common/pow.c         | 35 ++++++++++++++++++++++--------
>>   newlib/libm/common/sincosf.c     | 16 +++++++-------
>>   newlib/libm/common/sinf.c        | 12 +++++------
>>   8 files changed, 147 insertions(+), 77 deletions(-)
> 
> while you're working on this, I have three questions:
> 
> 1. There's __HAVE_FAST_FMA defined in libc/include/machine/ieeefp.h and
>     HAVE_FAST_ROUND/HAVE_FAST_LROUND defined in libm/common/math_config.h.
>     Wouldn't it make more sense to define all of them in one file?
>     Not sure which one, but libm/common/math_config.h looks right to me.
> 

ok, i can move HAVE_FAST_FMA (without __ to be consistent
with the others) to math_config.h

> 2. You're changing converttoint to return int32_t, but the #else
>     branch is casting to long:
> 
>      # if HAVE_FAST_LROUND
>        return lround (x);
>      # else
>        return (long) round (x);
>      # endif
>     
>     This looks wrong to me independent of the return type of
>     converttoint.  Shouldn't that cast to an int of fixed size, i.e.,
>     int32_t?
> 

the (long) cast was used so it's the same operation as
lround (assuming -fno-math-errno semantics) since the
compiler has a builtin for that which targets may inline,
but i guess (int32_t) would work too (since out of bound
conversion is either undefined or unspecified).

> 3. Along the same lines, in newlib/libm/common/sf_exp.c there's the
>     following expression:
> 
>      #if TOINT_INTRINSICS
>        kd = roundtoint (z);
>        ki = converttoint (z);
>      #elif TOINT_RINT
>        kd = rint (z);
>        ki = (long) kd;
>        ^^^^^^^^^^^^^^^
> 
>     Shouldn't this cast to int32_t as well?
> 

likewise.

i'll experiment and if int32_t works i'll change it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Updates to the new math code
  2018-07-05 10:41   ` Szabolcs Nagy
@ 2018-07-05 11:35     ` Szabolcs Nagy
  2018-07-05 12:07       ` Corinna Vinschen
  0 siblings, 1 reply; 9+ messages in thread
From: Szabolcs Nagy @ 2018-07-05 11:35 UTC (permalink / raw)
  To: newlib; +Cc: nd

On 05/07/18 11:01, Szabolcs Nagy wrote:
> On 05/07/18 09:48, Corinna Vinschen wrote:
>> 2. You're changing converttoint to return int32_t, but the #else
>> Â Â Â  branch is casting to long:
>>
>> Â Â Â Â  # if HAVE_FAST_LROUND
>> Â Â Â Â Â Â  return lround (x);
>> Â Â Â Â  # else
>> Â Â Â Â Â Â  return (long) round (x);
>> Â Â Â Â  # endif
>> Â Â Â  This looks wrong to me independent of the return type of
>> Â Â Â  converttoint.Â  Shouldn't that cast to an int of fixed size, i.e.,
>> Â Â Â  int32_t?
>>
> 
> the (long) cast was used so it's the same operation as
> lround (assuming -fno-math-errno semantics) since the
> compiler has a builtin for that which targets may inline,
> but i guess (int32_t) would work too (since out of bound
> conversion is either undefined or unspecified).
> 

the int32_t cast changes code generation with clang (it adds
sign extension instructions) it seems compilers can figure out

  (uint64_t)(int32_t)(long)round() << 32;

but not

  (uint64_t)(int32_t)round() << 32;

because the former lowers into a double->int64 conversion
on 64bit systems the latter lowers into double->int32
conversion and then the int32 requires a sign extend.

so i'll keep the code as is for now.

>> 3. Along the same lines, in newlib/libm/common/sf_exp.c there's the
>> Â Â Â  following expression:
>>
>> Â Â Â Â  #if TOINT_INTRINSICS
>> Â Â Â Â Â Â  kd = roundtoint (z);
>> Â Â Â Â Â Â  ki = converttoint (z);
>> Â Â Â Â  #elif TOINT_RINT
>> Â Â Â Â Â Â  kd = rint (z);
>> Â Â Â Â Â Â  ki = (long) kd;
>> Â Â Â Â Â Â  ^^^^^^^^^^^^^^^
>>
>> Â Â Â  Shouldn't this cast to int32_t as well?
>>
> 
> likewise.
> 
> i'll experiment and if int32_t works i'll change it.

i think i should remove this part and only use TOINT_INTRINSICS
vs !TOINT_INTRINSICS instead of various TOINT_* macros that
nobody uses.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Updates to the new math code
  2018-07-05 11:35     ` Szabolcs Nagy
@ 2018-07-05 12:07       ` Corinna Vinschen
  0 siblings, 0 replies; 9+ messages in thread
From: Corinna Vinschen @ 2018-07-05 12:07 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: newlib, nd

[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]

On Jul  5 12:21, Szabolcs Nagy wrote:
> On 05/07/18 11:01, Szabolcs Nagy wrote:
> > On 05/07/18 09:48, Corinna Vinschen wrote:
> > > 2. You're changing converttoint to return int32_t, but the #else
> > >     branch is casting to long:
> > > 
> > >      # if HAVE_FAST_LROUND
> > >        return lround (x);
> > >      # else
> > >        return (long) round (x);
> > >      # endif
> > >     This looks wrong to me independent of the return type of
> > >     converttoint.  Shouldn't that cast to an int of fixed size, i.e.,
> > >     int32_t?
> > > 
> > 
> > the (long) cast was used so it's the same operation as
> > lround (assuming -fno-math-errno semantics) since the
> > compiler has a builtin for that which targets may inline,
> > but i guess (int32_t) would work too (since out of bound
> > conversion is either undefined or unspecified).
> > 
> 
> the int32_t cast changes code generation with clang (it adds
> sign extension instructions) it seems compilers can figure out
> 
>  (uint64_t)(int32_t)(long)round() << 32;
> 
> but not
> 
>  (uint64_t)(int32_t)round() << 32;
> 
> because the former lowers into a double->int64 conversion
> on 64bit systems the latter lowers into double->int32
> conversion and then the int32 requires a sign extend.
> 
> so i'll keep the code as is for now.
> 
> > > 3. Along the same lines, in newlib/libm/common/sf_exp.c there's the
> > >     following expression:
> > > 
> > >      #if TOINT_INTRINSICS
> > >        kd = roundtoint (z);
> > >        ki = converttoint (z);
> > >      #elif TOINT_RINT
> > >        kd = rint (z);
> > >        ki = (long) kd;
> > >        ^^^^^^^^^^^^^^^
> > > 
> > >     Shouldn't this cast to int32_t as well?
> > > 
> > 
> > likewise.
> > 
> > i'll experiment and if int32_t works i'll change it.
> 
> i think i should remove this part and only use TOINT_INTRINSICS
> vs !TOINT_INTRINSICS instead of various TOINT_* macros that
> nobody uses.

Sounds good.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-07-05 12:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-04 15:52 [PATCH 0/3] Updates to the new math code Szabolcs Nagy
2018-07-04 15:53 ` [PATCH 1/3] Fix code style and comments of " Szabolcs Nagy
2018-07-04 15:54 ` [PATCH 2/3] Change the return type of converttoint and document the, semantics Szabolcs Nagy
2018-07-04 16:57 ` [PATCH 3/3] Fix large ulp error in pow without fma very near 1.0 Szabolcs Nagy
2018-07-05  8:34 ` [PATCH] Fix namespace issues in sinf, cosf and sincosf Szabolcs Nagy
2018-07-05  9:08 ` [PATCH 0/3] Updates to the new math code Corinna Vinschen
2018-07-05 10:41   ` Szabolcs Nagy
2018-07-05 11:35     ` Szabolcs Nagy
2018-07-05 12:07       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).