[PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-12-08 23:08 Patrick McGehearty
  2017-12-11  8:14 ` Siddhesh Poyarekar
  2017-12-14  1:28 ` Joseph Myers
  0 siblings, 2 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-08 23:08 UTC (permalink / raw)
  To: libc-alpha

Version 8 of proposed patch.

Renamed ln2_32hi2 and ln2_32lo2 to be ln2_64hi and ln2_64lo.
Revised comments to more accurately describe these constants.
Revised constants t2, t3, t4, t5 to better match values of 1/n factorial.
Change eliminated 1 ulp error in 942 tests out 40 million values tested.

Version 7 of proposed patch.

Fixed formatting issue in sysdeps/ieee754/dbl-64/e_exp.c

Version 6 of proposed patch.

Fixed error in patch revision.
Cleaned up formatting of return () and location of '+' for line breaks.
Fixed comments in eexp.tbl. Adjusted 3 values in eexp.tbl to be correctly
rounded in ulp as computed by quad precision.

Modified e_exp.c and eexp.tbl to use table of 64 intervals instead of
32 intervals for computing exp(x). That change reduced the differences
from the prior ieee754 exp(x) to 16 in 10,000 from 29 in 10,000. Also
reduced the make check differences for exp to 1 from 3. No observed
change in performance for using the larger table on either x86 or Sparc.

Version 5 of proposed patch.

Cleaned up formatting of comments and braces.
Returned to single patch for submission.

Version 4 of proposed patch.

New comments revised to use GNU standard comment formating.
Limited comment added in eexp.tbl for TBL[]. The original src
used for porting to Linux did not have a comment about TBL[].
The new comment is limited to the current worker's level of
understanding.

The (-xx.x > threshold2) case is changed to return force_underflow.
For FE_TONEAREST, tiny*tiny will always be zero but for
FE_UPWARD, it will be the smallest representable value.

That change caused no change in the math test results for Sparc or x86.

Version 3 changes

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  398 +++++++++++++++------------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  255 +++++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 475 insertions(+), 323 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index ae84abd..24cd0db 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32	\
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index f29898c..689dc54 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..d273213 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,238 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
+
+	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/64) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
-
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
-      }
-
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
-
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
-    else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
-  }
-ret:
-  return retval;
-
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return retval;
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return retval;
+	    }
+	  /* Use FE_TONEAREST rounding mode for computing yy.y.
+	     Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST)
+	    {
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2)
+			     + (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	    } 
+	  else
+	    {
+	      libc_fesetround (FE_TONEAREST);
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2)
+			     + (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	      libc_fesetround (fe_val);
+	    }
+	  return retval;
+	}
+
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* Use FE_TONEAREST rounding mode for computing yy.y.
+	 Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST)
+	{
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	}
+      else
+	{
+	  libc_fesetround (FE_TONEAREST);
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	  libc_fesetround (fe_val);
+	}
+      return retval;
+    }
+
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return zero;	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
+
+  /* Use FE_TONEAREST rounding mode for computing yy.y.
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST)
+    {
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_64hi) - k * ln2_64lo;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    }
+  else
+    {
+      libc_fesetround (FE_TONEAREST);
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_64hi) - k * ln2_64lo;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+      libc_fesetround (fe_val);
+    }
+
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return yy.y;
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..41efdc2
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,255 @@
+/* EXP function tables - for use in computing double precision exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/*
+   TBL[2*j] is 2**(j/64), rounded to nearest.
+   TBL[2*j+1] is 2**(j/64) - TBL[2*j], rounded to nearest.
+   These values are used to approximate exp(x) using the formula
+   given in the comments for e_exp.c.  */
+
+static const double TBL[128] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.02c9a3e778061p+0, -0x1.19083535b085dp-56,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0874518759bc8p+0,  0x1.186be4bb284ffp-57,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610bp-54,
+    0x1.0e3ec32d3d1a2p+0,  0x1.03a1727c57b52p-59,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.1429aaea92de0p+0, -0x1.32fbf9af1369ep-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1a35beb6fcb75p+0,  0x1.e5b4c7b4968e4p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2063b88628cd6p+0,  0x1.dc775814a8495p-55,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.26b4565e27cddp+0,  0x1.2bd339940e9d9p-55,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.2d285a6e4030bp+0,  0x1.0024754db41d5p-54,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.33c08b26416ffp+0,  0x1.32721843659a6p-54,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3a7db34e59ff7p+0, -0x1.5e436d661f5e3p-56,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.4160a21f72e2ap+0, -0x1.ef3691c309278p-58,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.486a2b5c13cd0p+0,  0x1.3c1a3b69062f0p-56,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.4f9b2769d2ca7p+0, -0x1.4b309d25957e3p-54,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cadp-55,
+    0x1.56f4736b527dap+0,  0x1.9bb2c011d93adp-54,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.5e76f15ad2148p+0,  0x1.ba6f93080e65ep-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6623882552225p+0, -0x1.bb60987591c34p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.6dfb23c651a2fp+0, -0x1.bbe3a683c88abp-57,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.75feb564267c9p+0, -0x1.0245957316dd3p-54,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.7e2f336cf4e62p+0,  0x1.05d02ba15797ep-56,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.868d99b4492edp+0, -0x1.fc6f89bd4f6bap-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.8f1ae99157736p+0,  0x1.5cc13a2e3976cp-55,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.97d829fde4e50p+0, -0x1.d185b7c1b85d1p-54,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a0c667b5de565p+0, -0x1.359495d1cd533p-54,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.a9e6b5579fdbfp+0,  0x1.0fac90ef7fd31p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b33a2b84f15fbp+0, -0x1.2805e3084d708p-57,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.bcc1e904bc1d2p+0,  0x1.23dd07a2d9e84p-55,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.c67f12e57d14bp+0,  0x1.2884dff483cadp-54,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d072d4a07897cp+0, -0x1.cbc3743797a9cp-54,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3707p-55,
+    0x1.da9e603db3285p+0,  0x1.c2300696db532p-54,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.e502ee78b3ff6p+0,  0x1.39e8980a9cc8fp-55,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.efa1bee615a27p+0,  0x1.dc7f486a4b6b0p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54,
+    0x1.fa7c1819e90d8p+0,  0x1.74853f3a5931ep-55};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* invln2_64 = 64/ln2 - used to scale x to primary range. */
+  invln2_64 = 0x1.71547652b82fep+6,
+/* ln2_64hi = high 32 bits of log(2.)/64. */
+  ln2_64hi = 0x1.62e42fee00000p-7, 
+/* ln2_64lo = remainder bits for log(2.)/64 - ln2_64hi. */
+  ln2_64lo = 0x1.a39ef35793c76p-39,
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555555555p-3, /* 1.6666666666666665741e-1 */
+  t3 = 0x1.5555555555555p-5, /* 4.1666666666666664354e-2 */
+  t4 = 0x1.1111111111111p-7, /* 8.3333333333333332177e-3 */
+  t5 = 0x1.6c16c16c16c17p-10, /* 1.3888888888888719040e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index cab84bf..9d8fa1a 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -51,7 +50,7 @@ CFLAGS-s_sinf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -73,14 +72,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -91,7 +89,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-08 23:08 [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86 Patrick McGehearty
@ 2017-12-11  8:14 ` Siddhesh Poyarekar
  2017-12-11 17:04   ` Patrick McGehearty
  2017-12-14  1:28 ` Joseph Myers
  1 sibling, 1 reply; 44+ messages in thread
From: Siddhesh Poyarekar @ 2017-12-11  8:14 UTC (permalink / raw)
  To: Patrick McGehearty, libc-alpha

On Saturday 09 December 2017 04:33 AM, Patrick McGehearty wrote:
> +/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
> +/* exp(x)
> +   Hybrid algorithm of Peter Tang's Table driven method (for large
> +   arguments) and an accurate table (for small arguments).
> +   Written by K.C. Ng, November 1988.
> +   Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
> +   Method (large arguments):
> +	1. Argument Reduction: given the input x, find r and integer k
> +	   and j such that
> +	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
> +
> +	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
> +	   a. expm1(r) is approximated by a polynomial:
> +	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
> +	      Here t1 = 1/2 exactly.
> +	   b. 2^(j/64) is represented to twice double precision
> +	      as TBL[2j]+TBL[2j+1].
> +
> +   Note: If divide were fast enough, we could use another approximation
> +	 in 2.a:
> +	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
> +	      (for the same t1 and t2 as above)
> +
> +   Special cases:
> +	exp(INF) is INF, exp(NaN) is NaN;
> +	exp(-INF)=  0;
> +	for finite argument, only exp(0)=1 is exact.
> +
> +   Accuracy:
> +	According to an error analysis, the error is always less than
> +	an ulp (unit in the last place).  The largest errors observed
> +	are less than 0.55 ulp for normal results and less than 0.75 ulp
> +	for subnormal results.
> +
> +   Misc. info.
> +	For IEEE double
> +		if x >  7.09782712893383973096e+02 then exp(x) overflow
> +		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
> +

Are you planning to work on the log implementation as well?

Siddhesh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-11  8:14 ` Siddhesh Poyarekar
@ 2017-12-11 17:04   ` Patrick McGehearty
  2017-12-11 17:53     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-11 17:04 UTC (permalink / raw)
  To: libc-alpha

On 12/11/2017 2:14 AM, Siddhesh Poyarekar wrote:
> On Saturday 09 December 2017 04:33 AM, Patrick McGehearty wrote:
>> +/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
>> +/* exp(x)
>> +   Hybrid algorithm of Peter Tang's Table driven method (for large
>> +   arguments) and an accurate table (for small arguments).
>> +   Written by K.C. Ng, November 1988.
>> +   Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
>> +   Method (large arguments):
>> +	1. Argument Reduction: given the input x, find r and integer k
>> +	   and j such that
>> +	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
>> +
>> +	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
>> +	   a. expm1(r) is approximated by a polynomial:
>> +	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
>> +	      Here t1 = 1/2 exactly.
>> +	   b. 2^(j/64) is represented to twice double precision
>> +	      as TBL[2j]+TBL[2j+1].
>> +
>> +   Note: If divide were fast enough, we could use another approximation
>> +	 in 2.a:
>> +	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
>> +	      (for the same t1 and t2 as above)
>> +
>> +   Special cases:
>> +	exp(INF) is INF, exp(NaN) is NaN;
>> +	exp(-INF)=  0;
>> +	for finite argument, only exp(0)=1 is exact.
>> +
>> +   Accuracy:
>> +	According to an error analysis, the error is always less than
>> +	an ulp (unit in the last place).  The largest errors observed
>> +	are less than 0.55 ulp for normal results and less than 0.75 ulp
>> +	for subnormal results.
>> +
>> +   Misc. info.
>> +	For IEEE double
>> +		if x >  7.09782712893383973096e+02 then exp(x) overflow
>> +		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
>> +
> Are you planning to work on the log implementation as well?
>
> Siddhesh

log, log10, pow, cbrt are on my short list of functions to investigate
as these all show significant performance advantage (1.8x or greater)
using the Solaris Studio libm functions vs the Linux libm functions in
the preliminary testing I did months ago. I intend to take these one
at a time, first with a trial port and extensive accuracy and perf tests.

Assuming the performance advantage applies across multiple platforms
and accuracy does not suffer to an unacceptable degree, I will
propose each in turn for patching. "log" is my next target, likely
with log10 following close behind. I'll also look at the 32 bit functions
to see if they offer similar opportunities, but I haven't written/run those
tests yet. I don't have plans to work on 80 bit or 128 bit (long double)
functions at this time.

As always, the above is not a formal commitment as my management
may redirect efforts to meet other corporate goals.
My background project right now is supporting Vladimir Mezentsev's
work on improving accuracy and range of complex divide.
By range, I mean the range of input values which currently cause
overflow/underflow but don't need to with appropriate scaling.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-11 17:04   ` Patrick McGehearty
@ 2017-12-11 17:53     ` Siddhesh Poyarekar
  0 siblings, 0 replies; 44+ messages in thread
From: Siddhesh Poyarekar @ 2017-12-11 17:53 UTC (permalink / raw)
  To: Patrick McGehearty, libc-alpha

On Monday 11 December 2017 10:29 PM, Patrick McGehearty wrote:
> log, log10, pow, cbrt are on my short list of functions to investigate
> as these all show significant performance advantage (1.8x or greater)
> using the Solaris Studio libm functions vs the Linux libm functions in
> the preliminary testing I did months ago. I intend to take these one
> at a time, first with a trial port and extensive accuracy and perf tests.
> 
> Assuming the performance advantage applies across multiple platforms
> and accuracy does not suffer to an unacceptable degree, I will
> propose each in turn for patching. "log" is my next target, likely
> with log10 following close behind. I'll also look at the 32 bit functions
> to see if they offer similar opportunities, but I haven't written/run those
> tests yet. I don't have plans to work on 80 bit or 128 bit (long double)
> functions at this time.
> 
> As always, the above is not a formal commitment as my management
> may redirect efforts to meet other corporate goals.
> My background project right now is supporting Vladimir Mezentsev's
> work on improving accuracy and range of complex divide.
> By range, I mean the range of input values which currently cause
> overflow/underflow but don't need to with appropriate scaling.

Sure, I won't hold you to it :)

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-08 23:08 [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86 Patrick McGehearty
  2017-12-11  8:14 ` Siddhesh Poyarekar
@ 2017-12-14  1:28 ` Joseph Myers
  2017-12-18 20:11   ` Patrick McGehearty
  1 sibling, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2017-12-14  1:28 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Fri, 8 Dec 2017, Patrick McGehearty wrote:

> Revised constants t2, t3, t4, t5 to better match values of 1/n factorial.

To expand on the logic for such a change:

If the values were previously not 1/n! presumably they were coefficients 
in some form of minimax approximation minimising the maximum error 
(however measured) in the interval used in the original implementation.

The maximum error from just using 1/n! would be at the endpoints of the 
interval (whereas a minimax approximation using an nth degree polynomial 
would have equal maximum errors with alternating signs at n+2 points - 
increasing some errors closer to 0 to decrease those at the endpoints).

You've changed the code to use a narrower interval.  Thus, the original 
minimax approximation is no longer optimal for the new interval, and it's 
quite plausible that the maximum error from using 1/n! is smaller when you 
restrict to the new interval.

This patch version is OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-14  1:28 ` Joseph Myers
@ 2017-12-18 20:11   ` Patrick McGehearty
  0 siblings, 0 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-18 20:11 UTC (permalink / raw)
  To: libc-alpha

On 12/13/2017 7:28 PM, Joseph Myers wrote:
> On Fri, 8 Dec 2017, Patrick McGehearty wrote:
>
>> Revised constants t2, t3, t4, t5 to better match values of 1/n factorial.
> To expand on the logic for such a change:
>
> If the values were previously not 1/n! presumably they were coefficients
> in some form of minimax approximation minimising the maximum error
> (however measured) in the interval used in the original implementation.
>
> The maximum error from just using 1/n! would be at the endpoints of the
> interval (whereas a minimax approximation using an nth degree polynomial
> would have equal maximum errors with alternating signs at n+2 points -
> increasing some errors closer to 0 to decrease those at the endpoints).
>
> You've changed the code to use a narrower interval.  Thus, the original
> minimax approximation is no longer optimal for the new interval, and it's
> quite plausible that the maximum error from using 1/n! is smaller when you
> restrict to the new interval.
>
> This patch version is OK.
>

You are correct that the original values for t2-t5 give better results
for the original 64 table entries.

I investigated your suggestion by comparing the prior values for t2-t5
with the 1/n! values on the prior interval size. My test of 10 million
values in 4 rounding modes showed substantially better results with
the prior values. All estimates of error rates based on 10 million
test values run in with each of the usual 4 rounding modes. Differences in
error rates between rounding modes were less than 1%.

Original Studio values: 29.5 of 10,000 tests off by 1ulp
1/n! for t2-t5Â Â Â Â Â Â Â  : 39.2 of 10,000 tests off by 1ulp
org studio t2-t5/128 intervals: 16.3 of 10,000 tests off by 1ulp
1/n! for t2-t5, 128 intervals : 16.1 of 10,000 tests off by 1ulp

That suggests we might be able to further reduce the error rate
either by refining the values for t2-t5 or by increasing the
interval table to use 256 values instead of 128.
I don't have any strong basis for making further tradeoffs
of possible small accuracy gains vs perf costs. I suspect
such effort is beyond the current accuracy expectations
for most users of Linux libm, but that issue might be
revisited at some time in the future as part of a total
review of libm accuracy goals.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-12-29 23:42 Patrick McGehearty
  2018-01-01  1:36 ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-29 23:42 UTC (permalink / raw)
  To: libc-alpha

Version 9 of proposed patch.

Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
to avoid Intel rounding mode issue which showed as test failures in
tgamma_upward. Adds noticable overhead for platforms that incur
significant cost when rounding mode is already FE_TONEAREST. Added
SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
errors for cexp().

Expanded the scaling table from 64 entries to 128 entries, renaming
invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
1.6 to 0.6.

Even with all changes, performance gain is still quite dramatic.
Adding libc_fegetround, libc_fegetroundf and libc_fegetroundl to
the math_private and fenv_private.h macros would allow using
libc_fegetround and libc_fesetround as a future performance enhancement.

Version 8 of proposed patch.

Renamed ln2_32hi2 and ln2_32lo2 to be ln2_64hi and ln2_64lo.
Revised comments to more accurately describe these constants.
Revised constants t2, t3, t4, t5 to better match values of 1/n factorial.
Change eliminated 1 ulp error in 942 tests out 40 million values tested.

Version 7 of proposed patch.

Fixed formatting issue in sysdeps/ieee754/dbl-64/e_exp.c

Version 6 of proposed patch.

Fixed error in patch revision.
Cleaned up formatting of return () and location of '+' for line breaks.
Fixed comments in eexp.tbl. Adjusted 3 values in eexp.tbl to be correctly
rounded in ulp as computed by quad precision.

Modified e_exp.c and eexp.tbl to use table of 64 intervals instead of
32 intervals for computing exp(x). That change reduced the differences
from the prior ieee754 exp(x) to 16 in 10,000 from 29 in 10,000. Also
reduced the make check differences for exp to 1 from 3. No observed
change in performance for using the larger table on either x86 or Sparc.

Version 5 of proposed patch.

Cleaned up formatting of comments and braces.
Returned to single patch for submission.

Version 4 of proposed patch.

New comments revised to use GNU standard comment formating.
Limited comment added in eexp.tbl for TBL[]. The original src
used for porting to Linux did not have a comment about TBL[].
The new comment is limited to the current worker's level of
understanding.

The (-xx.x > threshold2) case is changed to return force_underflow.
For FE_TONEAREST, tiny*tiny will always be zero but for
FE_UPWARD, it will be the smallest representable value.

That change caused no change in the math test results for Sparc or x86.

Version 3 changes

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.
Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  340 ++++++++++-----------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  447 +++++++++++++++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 -----
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 617 insertions(+), 315 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index 8978f2e..ccf7f01 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32	\
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index f29898c..689dc54 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..c4b296a 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,196 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Revised by Patrick McGehearty, Dec 2017 to use j/256 instead of j/32
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/256)*(ln2) + r,  |r| <= (1/512)*ln2
+
+	2. exp(x) = 2^k * (2^(j/256) + 2^(j/256)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/256) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  {
+		    SET_RESTORE_ROUND (FE_TONEAREST);
+		    retval = one + xx.x;
+		  }
+		  return retval;
+		}
+	      {
+		SET_RESTORE_ROUND (FE_TONEAREST);
+		retval = one + xx.x * (one + half * xx.x);
+	      }
+	      return retval;
+	    }
 	  {
-	    retval = res * binexp.x;
-	    goto ret;
+	    SET_RESTORE_ROUND (FE_TONEAREST);
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2)
+			     + (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
 	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
+	  return retval;
+	}
 
-    if (n <= smallint)
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
       {
-	retval = 1.0;
-	goto ret;
+	SET_RESTORE_ROUND (FE_TONEAREST);
+	z = xx.x - TBL2[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2))
+		    + (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
       }
+      return retval;
+    }
 
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return zero;	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
 
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
+  {
+    SET_RESTORE_ROUND (FE_TONEAREST);
+    t = invln2_256 * xx.x;
+    if (ix < 0)
+      t -= half;
     else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
+      t += half;
+    k = (int) t;
+    j = (k & 0xff) << 1;
+    m = k >> 8;
+    z = (xx.x - k * ln2_256hi) - k * ln2_256lo;
+
+    /* z is now in primary range.  */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
   }
-ret:
-  return retval;
-
- ret_huge:
-  return hhuge * hhuge;
 
- ret_tiny:
-  return tiny * tiny;
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return yy.y;
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 8c7fb74..d3cb42d 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..70bc74c
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,447 @@
+/* EXP function tables - for use in computing double precision exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/*
+   TBL[2*j] is 2**(j/256), rounded to nearest.
+   TBL[2*j+1] is 2**(j/256) - TBL[2*j], rounded to nearest.
+   These values are used to approximate exp(x) using the formula
+   given in the comments for e_exp.c.  */
+
+static const double TBL[512] = {
+    0x1.0000000000000p+0,   0x0.0000000000000p+0,
+    0x1.00b1afa5abcbfp+0, -0x1.4f6b2a7609f71p-55,
+    0x1.0163da9fb3335p+0,  0x1.b61299ab8cdb7p-54,
+    0x1.02168143b0281p+0, -0x1.2bf310fc54eb6p-55,
+    0x1.02c9a3e778061p+0, -0x1.19083535b085dp-56,
+    0x1.037d42e11bbccp+0,  0x1.56811eeade11ap-57,
+    0x1.04315e86e7f85p+0, -0x1.0a31c1977c96ep-54,
+    0x1.04e5f72f654b1p+0,  0x1.4c3793aa0d08cp-55,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0650a0e3c1f89p+0, -0x1.5cb7b5799c397p-54,
+    0x1.0706b29ddf6dep+0, -0x1.c91dfe2b13c27p-55,
+    0x1.07bd42b72a836p+0,  0x1.3233454458700p-55,
+    0x1.0874518759bc8p+0,  0x1.186be4bb284ffp-57,
+    0x1.092bdf66607e0p+0, -0x1.68063800a3fd1p-54,
+    0x1.09e3ecac6f383p+0,  0x1.1487818316136p-54,
+    0x1.0a9c79b1f3919p+0,  0x1.5d16c873d1d38p-55,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610bp-54,
+    0x1.0c0f145e46c85p+0,  0x1.4f98906d21cefp-54,
+    0x1.0cc922b7247f7p+0,  0x1.01edc16e24f71p-54,
+    0x1.0d83b23395decp+0, -0x1.bc14de43f316ap-54,
+    0x1.0e3ec32d3d1a2p+0,  0x1.03a1727c57b52p-59,
+    0x1.0efa55fdfa9c5p+0, -0x1.49db9bc54021bp-54,
+    0x1.0fb66affed31bp+0, -0x1.b9bedc44ebd7bp-57,
+    0x1.1073028d7233ep+0,  0x1.d46eb1692fdd5p-55,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.11edbab5e2ab6p+0, -0x1.ca454f703fb72p-54,
+    0x1.12abdc06c31ccp+0, -0x1.1b514b36ca5c7p-58,
+    0x1.136a814f204abp+0, -0x1.7108fba48dcf0p-57,
+    0x1.1429aaea92de0p+0, -0x1.32fbf9af1369ep-54,
+    0x1.14e95934f312ep+0, -0x1.b91e839bf44abp-55,
+    0x1.15a98c8a58e51p+0,  0x1.2406ab9eeab0ap-55,
+    0x1.166a45471c3c2p+0,  0x1.8f23b82ea1a32p-58,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.17ed48695bbc0p+0,  0x1.09e3fe2ac5a64p-56,
+    0x1.18af9388c8deap+0, -0x1.11023d1970f6cp-54,
+    0x1.1972658375d2fp+0,  0x1.4aadd85f17e08p-54,
+    0x1.1a35beb6fcb75p+0,  0x1.e5b4c7b4968e4p-55,
+    0x1.1af99f8138a1cp+0,  0x1.7bf85a4b69280p-54,
+    0x1.1bbe084045cd4p+0, -0x1.95386352ef607p-54,
+    0x1.1c82f95281c6bp+0,  0x1.009778010f8c9p-54,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.1e0e75eb44027p+0, -0x1.6fdd8088cb6dep-54,
+    0x1.1ed5022fcd91dp+0, -0x1.1df98027bb78cp-54,
+    0x1.1f9c18438ce4dp+0, -0x1.bf524a097af5cp-54,
+    0x1.2063b88628cd6p+0,  0x1.dc775814a8495p-55,
+    0x1.212be3578a819p+0,  0x1.3592d2cfcaac9p-54,
+    0x1.21f49917ddc96p+0,  0x1.2a97e9494a5eep-55,
+    0x1.22bdda27912d1p+0,  0x1.d34fb5577d69fp-55,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.2451ffb82140ap+0,  0x1.acfcc911ca996p-55,
+    0x1.251ce4fb2a63fp+0,  0x1.ac155bef4f4a4p-55,
+    0x1.25e85711ece75p+0,  0x1.3e1a24ac31b2cp-54,
+    0x1.26b4565e27cddp+0,  0x1.2bd339940e9d9p-55,
+    0x1.2780e341ddf29p+0,  0x1.e067c05f9e76cp-54,
+    0x1.284dfe1f56381p+0, -0x1.a4c3a8c3f0d7ep-54,
+    0x1.291ba7591bb70p+0, -0x1.2cc7228401cbdp-55,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.2ab8a66d10f13p+0, -0x1.95743191690a7p-54,
+    0x1.2b87fd0dad990p+0, -0x1.10adcd6381aa4p-59,
+    0x1.2c57e39771b2fp+0, -0x1.50145a6eb5124p-54,
+    0x1.2d285a6e4030bp+0,  0x1.0024754db41d5p-54,
+    0x1.2df961f641589p+0,  0x1.d16cffbbce198p-54,
+    0x1.2ecafa93e2f56p+0,  0x1.1ca0f45d52383p-56,
+    0x1.2f9d24abd886bp+0, -0x1.53c55532bda93p-57,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.31432edeeb2fdp+0,  0x1.959a3f3f3fcd1p-55,
+    0x1.32170fc4cd831p+0,  0x1.a9ce78e18047cp-55,
+    0x1.32eb83ba8ea32p+0, -0x1.c45e83cb4f318p-54,
+    0x1.33c08b26416ffp+0,  0x1.32721843659a6p-54,
+    0x1.3496266e3fa2dp+0, -0x1.35a75930881a4p-55,
+    0x1.356c55f929ff1p+0, -0x1.b5cee5c4e4628p-55,
+    0x1.36431a2de883bp+0, -0x1.c3144a06cb85ep-55,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.37f26231e754ap+0, -0x1.9f5ca9eceb23cp-54,
+    0x1.38cae6d05d866p+0, -0x1.e958d3c9904bdp-54,
+    0x1.39a401b7140efp+0, -0x1.9a9a5fc8e2934p-54,
+    0x1.3a7db34e59ff7p+0, -0x1.5e436d661f5e3p-56,
+    0x1.3b57fbfec6cf4p+0,  0x1.54c66e26fff18p-54,
+    0x1.3c32dc313a8e5p+0, -0x1.efff8375d29c3p-54,
+    0x1.3d0e544ede173p+0,  0x1.fe8d08c284c71p-56,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.3ec70df1c5175p+0, -0x1.af6637b8c9bcap-55,
+    0x1.3fa4504ac801cp+0, -0x1.7d023f956f9f3p-54,
+    0x1.40822c367a024p+0,  0x1.bddf8b6f4d048p-55,
+    0x1.4160a21f72e2ap+0, -0x1.ef3691c309278p-58,
+    0x1.423fb2709468ap+0, -0x1.8462dc0b314ddp-54,
+    0x1.431f5d950a897p+0, -0x1.1c7dde35f7999p-55,
+    0x1.43ffa3f84b9d4p+0,  0x1.880be9704c003p-55,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.45c2042a7d232p+0, -0x1.8641982fb1f8ep-57,
+    0x1.46a41ed1d0057p+0,  0x1.c944bd1648a76p-54,
+    0x1.4786d668b3237p+0, -0x1.c20f0ed445733p-54,
+    0x1.486a2b5c13cd0p+0,  0x1.3c1a3b69062f0p-56,
+    0x1.494e1e192aed2p+0, -0x1.3b2895e499ea0p-55,
+    0x1.4a32af0d7d3dep+0,  0x1.9cb62f3d1be56p-54,
+    0x1.4b17dea6db7d7p+0, -0x1.125b87f2897f0p-55,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.4ce41b817c114p+0,  0x1.05e29690abd5dp-54,
+    0x1.4dcb299fddd0dp+0,  0x1.8ecdbbc6a7833p-54,
+    0x1.4eb2d81d8abffp+0, -0x1.5257d2e5d7a52p-54,
+    0x1.4f9b2769d2ca7p+0, -0x1.4b309d25957e3p-54,
+    0x1.508417f4531eep+0,  0x1.a249b49b7465fp-56,
+    0x1.516daa2cf6642p+0, -0x1.f768569bd93efp-55,
+    0x1.5257de83f4eefp+0, -0x1.c998d43efef71p-56,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cadp-55,
+    0x1.542e2f4f6ad27p+0,  0x1.7926d192d5f7ep-55,
+    0x1.551a4ca5d920fp+0, -0x1.d689cefede59bp-55,
+    0x1.56070dde910d2p+0, -0x1.0fb6e168eebf0p-54,
+    0x1.56f4736b527dap+0,  0x1.9bb2c011d93adp-54,
+    0x1.57e27dbe2c4cfp+0, -0x1.0b98c8a57b9c4p-54,
+    0x1.58d12d497c7fdp+0,  0x1.295e15b9a1de8p-55,
+    0x1.59c0827ff07ccp+0, -0x1.7e2cee467e60fp-54,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.5ba11fba87a03p+0, -0x1.b77a14c233e1ap-54,
+    0x1.5c9268a5946b7p+0,  0x1.c4b1b816986a2p-60,
+    0x1.5d84590998b93p+0, -0x1.cd6a7a8b45643p-54,
+    0x1.5e76f15ad2148p+0,  0x1.ba6f93080e65ep-54,
+    0x1.5f6a320dceb71p+0, -0x1.9eadde3cdcf92p-55,
+    0x1.605e1b976dc09p+0, -0x1.3e2429b56de47p-54,
+    0x1.6152ae6cdf6f4p+0,  0x1.e4b3e4ab84c27p-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.633dd1d1929fdp+0,  0x1.84710beb964e5p-54,
+    0x1.6434634ccc320p+0, -0x1.c483c759d8933p-55,
+    0x1.652b9febc8fb7p+0, -0x1.ae3d5c9a73e09p-54,
+    0x1.6623882552225p+0, -0x1.bb60987591c34p-54,
+    0x1.671c1c70833f6p+0, -0x1.e8732586c6134p-55,
+    0x1.68155d44ca973p+0,  0x1.038ae44f73e65p-57,
+    0x1.690f4b19e9538p+0,  0x1.804bd9aeb445dp-55,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.6b052fa75173ep+0,  0x1.a38f52c9a9d0ep-56,
+    0x1.6c012750bdabfp+0, -0x1.2895667ff0b0dp-56,
+    0x1.6cfdcddd47645p+0,  0x1.c7aa9b6f17309p-54,
+    0x1.6dfb23c651a2fp+0, -0x1.bbe3a683c88abp-57,
+    0x1.6ef9298593ae5p+0, -0x1.0b9749e1ac8b2p-54,
+    0x1.6ff7df9519484p+0, -0x1.83c0f25860ef6p-55,
+    0x1.70f7466f42e87p+0,  0x1.9d644d45aa65fp-58,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.72f8286ead08ap+0, -0x1.20aa02cd62c72p-54,
+    0x1.73f9a48a58174p+0, -0x1.0a8d96c65d53cp-54,
+    0x1.74fbd35d7cbfdp+0,  0x1.047fd618a6e1cp-54,
+    0x1.75feb564267c9p+0, -0x1.0245957316dd3p-54,
+    0x1.77024b1ab6e09p+0,  0x1.b7877169147f8p-54,
+    0x1.780694fde5d3fp+0,  0x1.866b80a02162dp-54,
+    0x1.790b938ac1cf6p+0,  0x1.349a862aadd3ep-54,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.7b17b0976cfdbp+0, -0x1.bebb58468dc88p-54,
+    0x1.7c1ed0130c132p+0,  0x1.f124cd1164dd6p-54,
+    0x1.7d26a62ff86f0p+0,  0x1.1bddbfb72b8b4p-54,
+    0x1.7e2f336cf4e62p+0,  0x1.05d02ba15797ep-56,
+    0x1.7f3878491c491p+0, -0x1.07f11cf9311aep-55,
+    0x1.80427543e1a12p+0, -0x1.27c86626d972bp-54,
+    0x1.814d2add106d9p+0,  0x1.464370d151d4dp-54,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.8364c1eb941f7p+0,  0x1.99b9a31df2bd5p-54,
+    0x1.8471a4623c7adp+0, -0x1.8d684a341cdfbp-55,
+    0x1.857f4179f5b21p+0, -0x1.ba748f8b216d0p-58,
+    0x1.868d99b4492edp+0, -0x1.fc6f89bd4f6bap-54,
+    0x1.879cad931a436p+0,  0x1.5d2d7d2db47bdp-55,
+    0x1.88ac7d98a6699p+0,  0x1.994c2f37cb53ap-54,
+    0x1.89bd0a478580fp+0,  0x1.d53954475202ap-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.8be05bad61778p+0,  0x1.ecb5efc43446ep-54,
+    0x1.8cf3216b5448cp+0, -0x1.0d55e32e9e3aap-56,
+    0x1.8e06a5e0866d9p+0, -0x1.7114a6fc9b2e6p-54,
+    0x1.8f1ae99157736p+0,  0x1.5cc13a2e3976cp-55,
+    0x1.902fed0282c8ap+0,  0x1.592ca85fe3fd2p-54,
+    0x1.9145b0b91ffc6p+0, -0x1.dd6792e582524p-54,
+    0x1.925c353aa2fe2p+0, -0x1.3455fa639db7fp-55,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.948b82b5f98e5p+0, -0x1.dc3d6797d2d99p-55,
+    0x1.95a44cbc8520fp+0, -0x1.64b7c96a5f039p-56,
+    0x1.96bdd9a7670b3p+0, -0x1.ba5967f19c896p-58,
+    0x1.97d829fde4e50p+0, -0x1.d185b7c1b85d1p-54,
+    0x1.98f33e47a22a2p+0,  0x1.cabdaa24c78ecp-56,
+    0x1.9a0f170ca07bap+0, -0x1.173bd91cee632p-54,
+    0x1.9b2bb4d53fe0dp+0, -0x1.dd84e4df6d518p-54,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.9d674194bb8d5p+0, -0x1.516bea3dd8233p-54,
+    0x1.9e86319e32323p+0,  0x1.824ca78e64c6ep-56,
+    0x1.9fa5e8d07f29ep+0, -0x1.4a9ceaaf1facep-55,
+    0x1.a0c667b5de565p+0, -0x1.359495d1cd533p-54,
+    0x1.a1e7aed8eb8bbp+0,  0x1.c6618ee8be70ep-54,
+    0x1.a309bec4a2d33p+0,  0x1.6305c7ddc36abp-54,
+    0x1.a42c980460ad8p+0, -0x1.aa780589fb120p-54,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.a674a8af46052p+0,  0x1.50f5630670366p-57,
+    0x1.a799e1330b358p+0,  0x1.bcb7ecac563c7p-54,
+    0x1.a8bfe53c12e59p+0, -0x1.4f867b2ba15a9p-54,
+    0x1.a9e6b5579fdbfp+0,  0x1.0fac90ef7fd31p-54,
+    0x1.ab0e521356ebap+0,  0x1.89c31dae94545p-55,
+    0x1.ac36bbfd3f37ap+0, -0x1.f9234cae76cd0p-55,
+    0x1.ad5ff3a3c2774p+0,  0x1.7ef3bb6b1b8e5p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.afb4ce622f2ffp+0, -0x1.4b2fc0f315ecdp-54,
+    0x1.b0e07298db666p+0, -0x1.bdef54c80e425p-54,
+    0x1.b20ce6c9a8952p+0,  0x1.4dd024a0756ccp-54,
+    0x1.b33a2b84f15fbp+0, -0x1.2805e3084d708p-57,
+    0x1.b468415b749b1p+0, -0x1.f763de9df7c90p-56,
+    0x1.b59728de5593ap+0, -0x1.c71dfbbba6de3p-54,
+    0x1.b6c6e29f1c52ap+0,  0x1.2a8f352883f6ep-54,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.b928cf22749e4p+0, -0x1.b721654cb65c6p-54,
+    0x1.ba5b030a1064ap+0, -0x1.efcd30e54292ep-54,
+    0x1.bb8e0b79a6f1fp+0, -0x1.f52d1c9696205p-60,
+    0x1.bcc1e904bc1d2p+0,  0x1.23dd07a2d9e84p-55,
+    0x1.bdf69c3f3a207p+0, -0x1.c262360ea5b52p-60,
+    0x1.bf2c25bd71e09p+0, -0x1.efdca3f6b9c73p-54,
+    0x1.c06286141b33dp+0, -0x1.d8a5aa1fbca34p-55,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.c2d1cd9fa652cp+0, -0x1.6e51617c8a5d7p-54,
+    0x1.c40ab5fffd07ap+0,  0x1.b4537e083c60ap-54,
+    0x1.c544778fafb22p+0,  0x1.12f072493b5afp-54,
+    0x1.c67f12e57d14bp+0,  0x1.2884dff483cadp-54,
+    0x1.c7ba88988c933p+0, -0x1.e76bbbe255559p-55,
+    0x1.c8f6d9406e7b5p+0,  0x1.1acbc48805c44p-56,
+    0x1.ca3405751c4dbp+0, -0x1.7f2bed10d08f5p-55,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.ccb0f2e6d1675p+0, -0x1.d220f86009092p-56,
+    0x1.cdf0b555dc3fap+0, -0x1.dd83b53829d72p-55,
+    0x1.cf3155b5bab74p+0, -0x1.a08e9b86dff57p-54,
+    0x1.d072d4a07897cp+0, -0x1.cbc3743797a9cp-54,
+    0x1.d1b532b08c968p+0,  0x1.55636219a36eep-54,
+    0x1.d2f87080d89f2p+0, -0x1.d487b719d8578p-54,
+    0x1.d43c8eacaa1d6p+0,  0x1.3db53bf5a1614p-54,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3707p-55,
+    0x1.d6c76e862e6d3p+0,  0x1.fe87a4a8165a0p-58,
+    0x1.d80e316c98398p+0, -0x1.11ec18beddfe8p-54,
+    0x1.d955d71ff6075p+0,  0x1.a052dbb9af6bep-54,
+    0x1.da9e603db3285p+0,  0x1.c2300696db532p-54,
+    0x1.dbe7cd63a8315p+0, -0x1.b76f1926b8be4p-54,
+    0x1.dd321f301b460p+0,  0x1.2da5778f018c3p-54,
+    0x1.de7d5641c0658p+0, -0x1.ca5528e79ba8fp-54,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.e11676b197d17p+0, -0x1.2b529bd5c7f44p-56,
+    0x1.e264614f5a129p+0, -0x1.7b627817a1496p-54,
+    0x1.e3b333b16ee12p+0, -0x1.9f4a431fdc68bp-54,
+    0x1.e502ee78b3ff6p+0,  0x1.39e8980a9cc8fp-55,
+    0x1.e653924676d76p+0, -0x1.63ff87522b735p-55,
+    0x1.e7a51fbc74c83p+0,  0x1.2d522ca0c8de2p-54,
+    0x1.e8f7977cdb740p+0, -0x1.1089480b054b1p-54,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.eb9f4867cca6ep+0,  0x1.4832f2293e4f2p-54,
+    0x1.ecf482d8e67f1p+0, -0x1.c93f3b411ad8cp-54,
+    0x1.ee4aaa2188510p+0,  0x1.1c68da487568dp-54,
+    0x1.efa1bee615a27p+0,  0x1.dc7f486a4b6b0p-54,
+    0x1.f0f9c1cb6412ap+0, -0x1.3220065181d45p-54,
+    0x1.f252b376bba97p+0,  0x1.3a1a5bf0d8e43p-54,
+    0x1.f3ac948dd7274p+0, -0x1.95a5a3ed837dep-56,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54,
+    0x1.f6632798844f8p+0,  0x1.fa37b3539343ep-54,
+    0x1.f7bfdad9cbe14p+0, -0x1.dbb12d006350ap-54,
+    0x1.f91d802243c89p+0, -0x1.12ea8a779f689p-57,
+    0x1.fa7c1819e90d8p+0,  0x1.74853f3a5931ep-55,
+    0x1.fbdba3692d514p+0, -0x1.9677315098eb6p-56,
+    0x1.fd3c22b8f71f1p+0,  0x1.2eb74966579e7p-57,
+    0x1.fe9d96b2a23d9p+0,  0x1.4a6037442fde3p-56};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* invln2_256 = 256/ln2 - used to scale x to primary range. */
+  invln2_256 = 0x1.71547652b82fep+8,
+/* ln2_256hi = high 32 bits of log(2.)/256. */
+  ln2_256hi = 0x1.62e42fee00000p-9,
+/* ln2_256lo = remainder bits for log(2.)/256 - ln2_256hi. */
+  ln2_256lo = 0x1.a39ef35793c76p-41,
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555555555p-3, /* 1.6666666666666665741e-1 */
+  t3 = 0x1.5555555555555p-5, /* 4.1666666666666664354e-2 */
+  t4 = 0x1.1111111111111p-7, /* 8.3333333333333332177e-3 */
+  t5 = 0x1.6c16c16c16c17p-10, /* 1.388888888888889419e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index 0825340..bec45e0 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -52,7 +51,7 @@ CFLAGS-s_cosf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -74,14 +73,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -92,7 +90,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-29 23:42 Patrick McGehearty
@ 2018-01-01  1:36 ` Joseph Myers
  2018-01-01 16:31   ` Patrick McGehearty
  0 siblings, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2018-01-01  1:36 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Fri, 29 Dec 2017, Patrick McGehearty wrote:

> Version 9 of proposed patch.
> 
> Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
> to avoid Intel rounding mode issue which showed as test failures in
> tgamma_upward. Adds noticable overhead for platforms that incur
> significant cost when rounding mode is already FE_TONEAREST. Added
> SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
> errors for cexp().

Is the "5x" in the subject still correct?  (The subject line of a patch 
should be suitable for the summary line of the commit message, so must 
always reflect the current patch version accurately.  Likewise, the text 
of the patch submission, minus anything about changes relative to a 
previous patch version, must be suitable for the longer part of the commit 
message.)

> Expanded the scaling table from 64 entries to 128 entries, renaming
> invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
> and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
> 1.6 to 0.6.

I think this expansion - and corresponding increase in cache usage - is a 
bad idea.  Table size should be kept down, consistent with suitable 
accuracy of e.g. < 1 ulp in this case.

This patch version also appears to be missing the fixups I made as part of 
committing the previous patch.  See 
<https://sourceware.org/ml/libc-alpha/2017-12/msg00649.html>.  In addition 
to the extra slowexp.c removals I noted there, I also needed to remove 
trailing whitespace from a few lines - you should make sure your patch 
does not include any additions of lines with trailing spaces, as the 
commit hooks will reject any push that adds such lines.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2018-01-01  1:36 ` Joseph Myers
@ 2018-01-01 16:31   ` Patrick McGehearty
  2018-01-01 16:41     ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2018-01-01 16:31 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On 12/31/2017 7:36 PM, Joseph Myers wrote:
> On Fri, 29 Dec 2017, Patrick McGehearty wrote:
>
>> Version 9 of proposed patch.
>>
>> Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
>> to avoid Intel rounding mode issue which showed as test failures in
>> tgamma_upward. Adds noticable overhead for platforms that incur
>> significant cost when rounding mode is already FE_TONEAREST. Added
>> SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
>> errors for cexp().
> Is the "5x" in the subject still correct?  (The subject line of a patch
> should be suitable for the summary line of the commit message, so must
> always reflect the current patch version accurately.  Likewise, the text
> of the patch submission, minus anything about changes relative to a
> previous patch version, must be suitable for the longer part of the commit
> message.)
I can't be sure the "make bench" test gives representative data since 
many of the test values
seem at the extreme ranges and may trigger the "extremely precise and 
slow" paths in
the old code. Those tests still show very large improves (mean: old 5473 
vs new 418).
I'll reconfirm with my own "typical values" tests.Â  I used 5x as a value 
that I considered
very conservative originally to allow for platform to platform 
variations and for differences
with different ideas of "typical values".
>
>> Expanded the scaling table from 64 entries to 128 entries, renaming
>> invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
>> and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
>> 1.6 to 0.6.
> I think this expansion - and corresponding increase in cache usage - is a
> bad idea.  Table size should be kept down, consistent with suitable
> accuracy of e.g. < 1 ulp in this case.

The table size verses error rate tradeoff:
 Â 32 entries (512 bytes) 2.95
 Â 64 entries (1K bytes)Â  1.62
128 entries (2K bytes)Â  0.98
256 entries (4K bytes)Â  0.65
That is the number of 1 ulp diffs from the old version per 1000 test values.
As far as I can determine, the old code had an error rate 0.0 values per 
1000.
I was going for more accuracy to reduce the chances for
pushback from the field once this change starts getting used
more widely.

With typical L3 caches now measured in Mbytes/thread
and L2 caches at least 64Kbytes/thread if not 256Kbytes/thread
having modestly larger tables is a reasonable tradeoff,
especially since we are trading so much performance improvement
but giving up some accuracy. I retained the 64 and 128 entry
versions, so I can switch out the table size easily.

>
> This patch version also appears to be missing the fixups I made as part of
> committing the previous patch.  See
> <https://sourceware.org/ml/libc-alpha/2017-12/msg00649.html>.  In addition
> to the extra slowexp.c removals I noted there, I also needed to remove
> trailing whitespace from a few lines - you should make sure your patch
> does not include any additions of lines with trailing spaces, as the
> commit hooks will reject any push that adds such lines.
>
I'll fix the i386, ia64, and m68k issues and look for trailing 
whitespace in all the changed files.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2018-01-01 16:31   ` Patrick McGehearty
@ 2018-01-01 16:41     ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2018-01-01 16:41 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Mon, 1 Jan 2018, Patrick McGehearty wrote:

> With typical L3 caches now measured in Mbytes/thread
> and L2 caches at least 64Kbytes/thread if not 256Kbytes/thread
> having modestly larger tables is a reasonable tradeoff,
> especially since we are trading so much performance improvement
> but giving up some accuracy. I retained the 64 and 128 entry
> versions, so I can switch out the table size easily.

I think L1 cache size is relevant as well (and in practical uses you have 
more than just the exp function and data competing for cache space; what's 
optimal for a benchmark just calling a particular function may not be 
optimal for a typical system as a whole).  I think going back to the 64 
entry version is appropriate.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-12-04 21:53 Patrick McGehearty
  2017-12-05 23:20 ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-04 21:53 UTC (permalink / raw)
  To: libc-alpha

Version 7 of proposed patch.

Fixed formatting issue in sysdeps/ieee754/dbl-64/e_exp.c

Version 6 of proposed patch.

Fixed error in patch revision.
Cleaned up formatting of return () and location of '+' for line breaks.
Fixed comments in eexp.tbl. Adjusted 3 values in eexp.tbl to be correctly
rounded in ulp as computed by quad precision.

Modified e_exp.c and eexp.tbl to use table of 64 intervals instead of
32 intervals for computing exp(x). That change reduced the differences
from the prior ieee754 exp(x) to 16 in 10,000 from 29 in 10,000. Also
reduced the make check differences for exp to 1 from 3. No observed
change in performance for using the larger table on either x86 or Sparc.

Version 5 of proposed patch.

Cleaned up formatting of comments and braces.
Returned to single patch for submission.

Version 4 of proposed patch.

New comments revised to use GNU standard comment formating.
Limited comment added in eexp.tbl for TBL[]. The original src
used for porting to Linux did not have a comment about TBL[].
The new comment is limited to the current worker's level of
understanding.

The (-xx.x > threshold2) case is changed to return force_underflow.
For FE_TONEAREST, tiny*tiny will always be zero but for
FE_UPWARD, it will be the smallest representable value.

That change caused no change in the math test results for Sparc or x86.

Version 3 changes

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  398 +++++++++++++++------------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  255 +++++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 475 insertions(+), 323 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index 668c283..21f315a 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32	\
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index f29898c..689dc54 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..1861c97 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,238 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
+
+	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/64) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
-
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
-      }
-
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
-
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
-    else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
-  }
-ret:
-  return retval;
-
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return retval;
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return retval;
+	    }
+	  /* Use FE_TONEAREST rounding mode for computing yy.y.
+	     Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST)
+	    {
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2)
+			     + (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	    } 
+	  else
+	    {
+	      libc_fesetround (FE_TONEAREST);
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2)
+			     + (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	      libc_fesetround (fe_val);
+	    }
+	  return retval;
+	}
+
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* Use FE_TONEAREST rounding mode for computing yy.y.
+	 Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST)
+	{
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	}
+      else
+	{
+	  libc_fesetround (FE_TONEAREST);
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	  libc_fesetround (fe_val);
+	}
+      return retval;
+    }
+
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return zero;	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
+
+  /* Use FE_TONEAREST rounding mode for computing yy.y.
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST)
+    {
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_32hi2) - k * ln2_32lo2;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    }
+  else
+    {
+      libc_fesetround (FE_TONEAREST);
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_32hi2) - k * ln2_32lo2;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+      libc_fesetround (fe_val);
+    }
+
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return yy.y;
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..d5fa3dd
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,255 @@
+/* EXP function tables - for use in computing double precision exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/*
+   TBL[2*j] is 2**(j/64), rounded to nearest.
+   TBL[2*j+1] is 2**(j/64) - TBL[2*j], rounded to nearest.
+   These values are used to approximate exp(x) using the formula
+   given in the comments for e_exp.c.  */
+
+static const double TBL[128] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.02c9a3e778061p+0, -0x1.19083535b085dp-56,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0874518759bc8p+0,  0x1.186be4bb284ffp-57,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610bp-54,
+    0x1.0e3ec32d3d1a2p+0,  0x1.03a1727c57b52p-59,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.1429aaea92de0p+0, -0x1.32fbf9af1369ep-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1a35beb6fcb75p+0,  0x1.e5b4c7b4968e4p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2063b88628cd6p+0,  0x1.dc775814a8495p-55,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.26b4565e27cddp+0,  0x1.2bd339940e9d9p-55,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.2d285a6e4030bp+0,  0x1.0024754db41d5p-54,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.33c08b26416ffp+0,  0x1.32721843659a6p-54,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3a7db34e59ff7p+0, -0x1.5e436d661f5e3p-56,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.4160a21f72e2ap+0, -0x1.ef3691c309278p-58,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.486a2b5c13cd0p+0,  0x1.3c1a3b69062f0p-56,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.4f9b2769d2ca7p+0, -0x1.4b309d25957e3p-54,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cadp-55,
+    0x1.56f4736b527dap+0,  0x1.9bb2c011d93adp-54,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.5e76f15ad2148p+0,  0x1.ba6f93080e65ep-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6623882552225p+0, -0x1.bb60987591c34p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.6dfb23c651a2fp+0, -0x1.bbe3a683c88abp-57,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.75feb564267c9p+0, -0x1.0245957316dd3p-54,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.7e2f336cf4e62p+0,  0x1.05d02ba15797ep-56,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.868d99b4492edp+0, -0x1.fc6f89bd4f6bap-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.8f1ae99157736p+0,  0x1.5cc13a2e3976cp-55,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.97d829fde4e50p+0, -0x1.d185b7c1b85d1p-54,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a0c667b5de565p+0, -0x1.359495d1cd533p-54,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.a9e6b5579fdbfp+0,  0x1.0fac90ef7fd31p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b33a2b84f15fbp+0, -0x1.2805e3084d708p-57,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.bcc1e904bc1d2p+0,  0x1.23dd07a2d9e84p-55,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.c67f12e57d14bp+0,  0x1.2884dff483cadp-54,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d072d4a07897cp+0, -0x1.cbc3743797a9cp-54,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3707p-55,
+    0x1.da9e603db3285p+0,  0x1.c2300696db532p-54,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.e502ee78b3ff6p+0,  0x1.39e8980a9cc8fp-55,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.efa1bee615a27p+0,  0x1.dc7f486a4b6b0p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54,
+    0x1.fa7c1819e90d8p+0,  0x1.74853f3a5931ep-55};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* invln2_64 = 64/ln2 - used to scale x to primary range. */
+  invln2_64 = 0x1.71547652b82fep+6,
+/* ln2_32hi2 = high 32 bits of log(1./2.)/2. */
+  ln2_32hi2 = 0x1.62e42fee00000p-7, 
+/* ln2_32lo2 = low 32 bits of log(1./2.)/2. */
+  ln2_32lo2 = 0x1.a39ef35793c76p-39,
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555548f7cp-3, /* 1.6666666666526086527e-1 */
+  t3 = 0x1.5555555545d4ep-5, /* 4.1666666666226079285e-2 */
+  t4 = 0x1.11115b7aa905ep-7, /* 8.3333679843421958056e-3 */
+  t5 = 0x1.6c1728d739765p-10, /* 1.3888949086377719040e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index c78624b..e06c059 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -48,7 +47,7 @@ CFLAGS-e_powf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -70,14 +69,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -88,7 +86,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-04 21:53 Patrick McGehearty
@ 2017-12-05 23:20 ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-12-05 23:20 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Mon, 4 Dec 2017, Patrick McGehearty wrote:

> +/* ln2_32hi2 = high 32 bits of log(1./2.)/2. */
> +  ln2_32hi2 = 0x1.62e42fee00000p-7, 
> +/* ln2_32lo2 = low 32 bits of log(1./2.)/2. */
> +  ln2_32lo2 = 0x1.a39ef35793c76p-39,

Those comments aren't accurate descriptions.  In this patch version these 
have changed to be high and low parts of log(2)/64, so the comments should 
reflect that - and the name should be changed to say 64 not 32 and the 
users updated accordingly.  Furthermore, the low part isn't just a low 32 
bits, it has more than that, so again the comment should reflect that.

OK with those fixes.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-12-01  0:51 Patrick McGehearty
  2017-12-01  0:56 ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-01  0:51 UTC (permalink / raw)
  To: libc-alpha

Version 6 of proposed patch.

Fixed error in patch revision.
Cleaned up formatting of return () and location of '+' for line breaks.
Fixed comments in eexp.tbl. Adjusted 3 values in eexp.tbl to be correctly
rounded in ulp as computed by quad precision.

Modified e_exp.c and eexp.tbl to use table of 64 intervals instead of
32 intervals for computing exp(x). That change reduced the differences
from the prior ieee754 exp(x) to 16 in 10,000 from 29 in 10,000. Also
reduced the make check differences for exp to 1 from 3. No observed
change in performance for using the larger table on either x86 or Sparc.

Version 5 of proposed patch.

Cleaned up formatting of comments and braces.
Returned to single patch for submission.

Version 4 of proposed patch.

New comments revised to use GNU standard comment formating.
Limited comment added in eexp.tbl for TBL[]. The original src
used for porting to Linux did not have a comment about TBL[].
The new comment is limited to the current worker's level of
understanding.

The (-xx.x > threshold2) case is changed to return force_underflow.
For FE_TONEAREST, tiny*tiny will always be zero but for
FE_UPWARD, it will be the smallest representable value.

That change caused no change in the math test results for Sparc or x86.

Version 3 changes

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  400 +++++++++++++++------------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  255 +++++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 477 insertions(+), 323 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index 668c283..21f315a 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32	\
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index f29898c..689dc54 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..b002511 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,240 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
+
+	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/64) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
-
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
-      }
-
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
-
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
-    else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
-  }
-ret:
-  return retval;
-
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return retval;
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return retval;
+	    }
+	  /* Use FE_TONEAREST rounding mode for computing yy.y.
+	     Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST)
+	    {
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	    } 
+	  else
+	    {
+	      libc_fesetround (FE_TONEAREST);
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	      libc_fesetround (fe_val);
+	    }
+	  return retval;
+	}
+
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* Use FE_TONEAREST rounding mode for computing yy.y.
+	 Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST)
+	{
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	}
+      else
+	{
+	  libc_fesetround (FE_TONEAREST);
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2))
+		      + (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	  libc_fesetround (fe_val);
+	}
+      return retval;
+    }
+
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return zero;	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
+
+  /* Use FE_TONEAREST rounding mode for computing yy.y.
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST)
+    {
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_32hi2) - k * ln2_32lo2;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2)
+		  + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    }
+  else
+    {
+      libc_fesetround (FE_TONEAREST);
+      t = invln2_64 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x3f) << 1;
+      m = k >> 6;
+      z = (xx.x - k * ln2_32hi2) - k * ln2_32lo2;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2)
+		  + (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+      libc_fesetround (fe_val);
+    }
+
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return yy.y;
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..d5fa3dd
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,255 @@
+/* EXP function tables - for use in computing double precision exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/*
+   TBL[2*j] is 2**(j/64), rounded to nearest.
+   TBL[2*j+1] is 2**(j/64) - TBL[2*j], rounded to nearest.
+   These values are used to approximate exp(x) using the formula
+   given in the comments for e_exp.c.  */
+
+static const double TBL[128] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.02c9a3e778061p+0, -0x1.19083535b085dp-56,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0874518759bc8p+0,  0x1.186be4bb284ffp-57,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610bp-54,
+    0x1.0e3ec32d3d1a2p+0,  0x1.03a1727c57b52p-59,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.1429aaea92de0p+0, -0x1.32fbf9af1369ep-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1a35beb6fcb75p+0,  0x1.e5b4c7b4968e4p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2063b88628cd6p+0,  0x1.dc775814a8495p-55,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.26b4565e27cddp+0,  0x1.2bd339940e9d9p-55,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.2d285a6e4030bp+0,  0x1.0024754db41d5p-54,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.33c08b26416ffp+0,  0x1.32721843659a6p-54,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3a7db34e59ff7p+0, -0x1.5e436d661f5e3p-56,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.4160a21f72e2ap+0, -0x1.ef3691c309278p-58,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.486a2b5c13cd0p+0,  0x1.3c1a3b69062f0p-56,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.4f9b2769d2ca7p+0, -0x1.4b309d25957e3p-54,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cadp-55,
+    0x1.56f4736b527dap+0,  0x1.9bb2c011d93adp-54,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.5e76f15ad2148p+0,  0x1.ba6f93080e65ep-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6623882552225p+0, -0x1.bb60987591c34p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.6dfb23c651a2fp+0, -0x1.bbe3a683c88abp-57,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.75feb564267c9p+0, -0x1.0245957316dd3p-54,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.7e2f336cf4e62p+0,  0x1.05d02ba15797ep-56,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.868d99b4492edp+0, -0x1.fc6f89bd4f6bap-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.8f1ae99157736p+0,  0x1.5cc13a2e3976cp-55,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.97d829fde4e50p+0, -0x1.d185b7c1b85d1p-54,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a0c667b5de565p+0, -0x1.359495d1cd533p-54,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.a9e6b5579fdbfp+0,  0x1.0fac90ef7fd31p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b33a2b84f15fbp+0, -0x1.2805e3084d708p-57,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.bcc1e904bc1d2p+0,  0x1.23dd07a2d9e84p-55,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.c67f12e57d14bp+0,  0x1.2884dff483cadp-54,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d072d4a07897cp+0, -0x1.cbc3743797a9cp-54,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3707p-55,
+    0x1.da9e603db3285p+0,  0x1.c2300696db532p-54,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.e502ee78b3ff6p+0,  0x1.39e8980a9cc8fp-55,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.efa1bee615a27p+0,  0x1.dc7f486a4b6b0p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54,
+    0x1.fa7c1819e90d8p+0,  0x1.74853f3a5931ep-55};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* invln2_64 = 64/ln2 - used to scale x to primary range. */
+  invln2_64 = 0x1.71547652b82fep+6,
+/* ln2_32hi2 = high 32 bits of log(1./2.)/2. */
+  ln2_32hi2 = 0x1.62e42fee00000p-7, 
+/* ln2_32lo2 = low 32 bits of log(1./2.)/2. */
+  ln2_32lo2 = 0x1.a39ef35793c76p-39,
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555548f7cp-3, /* 1.6666666666526086527e-1 */
+  t3 = 0x1.5555555545d4ep-5, /* 4.1666666666226079285e-2 */
+  t4 = 0x1.11115b7aa905ep-7, /* 8.3333679843421958056e-3 */
+  t5 = 0x1.6c1728d739765p-10, /* 1.3888949086377719040e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index c78624b..e06c059 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -48,7 +47,7 @@ CFLAGS-e_powf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -70,14 +69,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -88,7 +86,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-12-01  0:51 Patrick McGehearty
@ 2017-12-01  0:56 ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-12-01  0:56 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 30 Nov 2017, Patrick McGehearty wrote:

> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));

You still have a line break after an operator here.

> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));

Likewise.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-11-07  4:25 Patrick McGehearty
  2017-11-16 17:52 ` Patrick McGehearty
  2017-11-23 21:19 ` Joseph Myers
  0 siblings, 2 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-11-07  4:25 UTC (permalink / raw)
  To: libc-alpha

Version 5 of proposed patch.

Cleaned up formatting of comments and braces.
Returned to single patch for submission.

Version 4 of proposed patch.

New comments revised to use GNU standard comment formating.
Limited comment added in eexp.tbl for TBL[]. The original src
used for porting to Linux did not have a comment about TBL[].
The new comment is limited to the current worker's level of
understanding.

The (-xx.x > threshold2) case is changed to return force_underflow.
For FE_TONEAREST, tiny*tiny will always be zero but for
FE_UPWARD, it will be the smallest representable value.

That change caused no change in the math test results for Sparc or x86.

Version 3 changes

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    8 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  399 +++++++++++++++------------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  219 +++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 444 insertions(+), 325 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index b2bd3d3..f70aebf 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32 \
 		       slowpow sincostab k_rem_pio2
 
 # float support
@@ -561,8 +561,10 @@ math-CPPFLAGS += -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
 ifneq ($(long-double-fcts),yes)
 # The `double' and `long double' types are the same on this machine.
 # We won't compile the `long double' code at all.  Tell the `double' code
-# to define aliases for the `FUNCl' names.
-math-CPPFLAGS += -DNO_LONG_DOUBLE
+# to define aliases for the `FUNCl' names.  To avoid type conflicts in
+# defining those aliases, tell <math.h> to declare the `FUNCl' names with
+# `double' instead of `long double'.
+math-CPPFLAGS += -DNO_LONG_DOUBLE -D_Mlong_double_=double
 endif
 
 # These files quiet sNaNs in a way that is optimized away without
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index 80c7c92..30fc3c9 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..87e86a6 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,239 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/32)*(ln2) + r,  |r| <= (1/64)*ln2
+
+	2. exp(x) = 2^k * (2^(j/32) + 2^(j/32)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/32) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
-
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
-      }
-
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
-
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
-    else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
-  }
-ret:
-  return retval;
-
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return (retval);
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return (retval);
+	    }
+	  /* Use FE_TONEAREST rounding mode for computing yy.y.
+	     Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST)
+	    {
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	    } 
+	  else
+	    {
+	      libc_fesetround (FE_TONEAREST);
+	      t = xx.x * xx.x;
+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));
+	      retval = one + yy.y;
+	      libc_fesetround (fe_val);
+	    }
+	  return (retval);
+	}
+
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* Use FE_TONEAREST rounding mode for computing yy.y.
+	 Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST)
+	{
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2)) +
+		      (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	}
+      else
+	{
+	  libc_fesetround (FE_TONEAREST);
+	  z = xx.x - TBL2[j];
+	  t = z * z;
+	  yy.y = z + (t * (half + (z * t2)) +
+		      (t * t) * (t3 + z * t4 + t * t5));
+	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	  libc_fesetround (fe_val);
+	}
+      return (retval);
+    }
+
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return (zero);	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
+
+  /* Use FE_TONEAREST rounding mode for computing yy.y.
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST)
+    {
+      t = invln2_32 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x1f) << 1;
+      m = k >> 5;
+      z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) + 
+		  (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    }
+  else
+    {
+      libc_fesetround (FE_TONEAREST);
+      t = invln2_32 * xx.x;
+      if (ix < 0)
+	t -= half;
+      else
+	t += half;
+      k = (int) t;
+      j = (k & 0x1f) << 1;
+      m = k >> 5;
+      z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+      /* z is now in primary range.  */
+      t = z * z;
+      yy.y = z + (t * (half + z * t2) +
+		  (t * t) * (t3 + z * t4 + t * t5));
+      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+      libc_fesetround (fe_val);
+    }
+
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return (yy.y);
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..776369f
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,219 @@
+/* EXP function tables - for use in ocmputing double precisoin exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/* TBL[2*j] and TBL[2*j+1] are double precision numbers used to
+   approximate exp(x) using the formula given in the comments
+   for e_exp.c.  */
+
+static const double TBL[64] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* Following three values used to scale x to primary range.  */
+  invln2_32 = 0x1.71547652b82fep+5, /* 4.61662413084468283841e+01 */
+  ln2_32hi = 0x1.62e42fee00000p-6, /* 2.16608493865351192653e-02 */
+  ln2_32lo = 0x1.a39ef35793c76p-38, /* 5.96317165397058656257e-12 */
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555548f7cp-3, /* 1.6666666666526086527e-1 */
+  t3 = 0x1.5555555545d4ep-5, /* 4.1666666666226079285e-2 */
+  t4 = 0x1.11115b7aa905ep-7, /* 8.3333679843421958056e-3 */
+  t5 = 0x1.6c1728d739765p-10, /* 1.3888949086377719040e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index c78624b..e06c059 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -48,7 +47,7 @@ CFLAGS-e_powf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -70,14 +69,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -88,7 +86,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-11-07  4:25 Patrick McGehearty
@ 2017-11-16 17:52 ` Patrick McGehearty
  2017-11-16 18:27   ` Carlos O'Donell
  2017-11-16 18:31   ` Joseph Myers
  2017-11-23 21:19 ` Joseph Myers
  1 sibling, 2 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-11-16 17:52 UTC (permalink / raw)
  To: libc-alpha

It has been 10 days with no further feedback/comments.
What's the next step for getting this patch included in the src tree?
It would be good to make that happen before the Thanksgiving break.

- Patrick McGehearty

On 11/6/2017 10:24 PM, Patrick McGehearty wrote:
> Version 5 of proposed patch.
>
> Cleaned up formatting of comments and braces.
> Returned to single patch for submission.
>
> Version 4 of proposed patch.
>
> New comments revised to use GNU standard comment formating.
> Limited comment added in eexp.tbl for TBL[]. The original src
> used for porting to Linux did not have a comment about TBL[].
> The new comment is limited to the current worker's level of
> understanding.
>
> The (-xx.x > threshold2) case is changed to return force_underflow.
> For FE_TONEAREST, tiny*tiny will always be zero but for
> FE_UPWARD, it will be the smallest representable value.
>
> That change caused no change in the math test results for Sparc or x86.
>
> Version 3 changes
>
> All hex constants in version 2 replaced with C99 double hex constants,
> allowing Big Endian and Little Endian versions to be merged.
> Only e_exp.c and eexp.tbl changed from version 2.
> Minor changes in performance results due to system noise.
> No other changes from version 2.
>
> Version 2 of proposed patch.
> Revised copyright notice and formatting issues.
> Removed slowexp.c and related references.
> Replaced tables of double constants with hex constants, taking special
>    attention to correctly handle little endian and big endian versions.
>    Using hex initialization also required changing variables to be declared
>    as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
> Replaced __fegetround(), __fesetround() with get_rounding_mode and
>    libc_fesetround().
> Removed use of "small". "inexact mode" now ignored.
> Retested and rebenchmarked on sparc and x86 with the above changes.
>
> These changes will be active for all platforms that don't provide
> their own exp() routines. They will also be active for ieee754
> versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
> erf.
>
> Typical performance gains is typically around 5x when measured on
> Sparc s7 for common values between exp(1) and exp(40).
>
> Using the glibc perf tests on sparc,
>        sparc (nsec)    x86 (nsec)
>        old     new     old     new
> max   17629   395    5173     144
> min     399    54      15      13
> mean   5317   200    1349      23
>
> The extreme max times for the old (ieee754) exp are due to the
> multiprecision computation in the old algorithm when the true value is
> very near 0.5 ulp away from an value representable in double
> precision. The new algorithm does not take special measures for those
> cases. The current glibc exp perf tests overrepresent those values.
> Informal testing suggests approximately one in 200 cases might
> invoke the high cost computation. The performance advantage of the new
> algorithm for other values is still large but not as large as indicated
> by the chart above.
>
> Glibc correctness tests for exp() and expf() were run. Within the
> test suite 3 input values were found to cause 1 bit differences (ulp)
> when "FE_TONEAREST" rounding mode is set. No differences in exp() were
> seen for the tested values for the other rounding modes.
> Typical example:
> exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
>   new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
>   old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
>      exp    =  2.31973271630014285508337 (high precision)
> Old delta: off by 0.49 ulp
> New delta: off by 0.51 ulp
>
> In addition, because ieee754_exp() is used by other routines, cexp()
> showed test results with very small imaginary input values where the
> imaginary portion of the result was off by 3 ulp when in upward
> rounding mode, but not in the other rounding modes.  For x86, tgamma
> showed a few values where the ulp increased to 6 (max ulp for tgamma
> is 5). Sparc tgamma did not show these failures.  I presume the tgamma
> differences are due to compiler optimization differences within the
> gamma function.The gamma function is known to be difficult to compute
> accurately.
> ---
>   manual/probes.texi                          |   14 -
>   math/Makefile                               |    8 +-
>   sysdeps/generic/math_private.h              |    1 -
>   sysdeps/ieee754/dbl-64/e_exp.c              |  399 +++++++++++++++------------
>   sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
>   sysdeps/ieee754/dbl-64/eexp.tbl             |  219 +++++++++++++++
>   sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
>   sysdeps/powerpc/power4/fpu/Makefile         |    1 -
>   sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
>   sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
>   sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
>   sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
>   sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
>   sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
>   sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
>   15 files changed, 444 insertions(+), 325 deletions(-)
>   create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
>   delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
>   delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
>   delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
>   delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
>
> diff --git a/manual/probes.texi b/manual/probes.texi
> index 8ab6756..f8ae64b 100644
> --- a/manual/probes.texi
> +++ b/manual/probes.texi
> @@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
>   precision in the mantissa of the multiple precision number.  Hence, a precision
>   level of 32 implies 768 bits of precision in the mantissa.
>   
> -@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
> -This probe is triggered when the @code{exp} function is called with an
> -input that results in multiple precision computation with precision
> -6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
> -computed output.
> -@end deftp
> -
> -@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
> -This probe is triggered when the @code{exp} function is called with an
> -input that results in multiple precision computation with precision
> -32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
> -computed output.
> -@end deftp
> -
>   @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
>   This probe is triggered when the @code{pow} function is called with
>   inputs that result in multiple precision computation with precision
> diff --git a/math/Makefile b/math/Makefile
> index b2bd3d3..f70aebf 100644
> --- a/math/Makefile
> +++ b/math/Makefile
> @@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
>   # double support
>   type-double-suffix :=
>   type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
> -		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
> +		       mpatan mpexp mplog mpsqrt mptan sincos32 \
>   		       slowpow sincostab k_rem_pio2
>   
>   # float support
> @@ -561,8 +561,10 @@ math-CPPFLAGS += -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
>   ifneq ($(long-double-fcts),yes)
>   # The `double' and `long double' types are the same on this machine.
>   # We won't compile the `long double' code at all.  Tell the `double' code
> -# to define aliases for the `FUNCl' names.
> -math-CPPFLAGS += -DNO_LONG_DOUBLE
> +# to define aliases for the `FUNCl' names.  To avoid type conflicts in
> +# defining those aliases, tell <math.h> to declare the `FUNCl' names with
> +# `double' instead of `long double'.
> +math-CPPFLAGS += -DNO_LONG_DOUBLE -D_Mlong_double_=double
>   endif
>   
>   # These files quiet sNaNs in a way that is optimized away without
> diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
> index 80c7c92..30fc3c9 100644
> --- a/sysdeps/generic/math_private.h
> +++ b/sysdeps/generic/math_private.h
> @@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
>   extern double __cos32 (double __x, double __res, double __res1);
>   extern double __mpsin (double __x, double __dx, bool __range_reduce);
>   extern double __mpcos (double __x, double __dx, bool __range_reduce);
> -extern double __slowexp (double __x);
>   extern double __slowpow (double __x, double __y, double __z);
>   extern void __docos (double __x, double __dx, double __v[]);
>   
> diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
> index 6757a14..87e86a6 100644
> --- a/sysdeps/ieee754/dbl-64/e_exp.c
> +++ b/sysdeps/ieee754/dbl-64/e_exp.c
> @@ -1,3 +1,4 @@
> +/* EXP function - Compute double precision exponential */
>   /*
>    * IBM Accurate Mathematical Library
>    * written by International Business Machines Corp.
> @@ -23,7 +24,7 @@
>   /*           exp1                                                          */
>   /*                                                                         */
>   /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
> -/*              mpa.c mpexp.x slowexp.c                                    */
> +/*              mpa.c mpexp.x                                              */
>   /*                                                                         */
>   /* An ultimate exp routine. Given an IEEE double machine number x          */
>   /* it computes the correctly rounded (to nearest) value of e^x             */
> @@ -32,207 +33,239 @@
>   /*                                                                         */
>   /***************************************************************************/
>   
> +/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains.  */
> +/* exp(x)
> +   Hybrid algorithm of Peter Tang's Table driven method (for large
> +   arguments) and an accurate table (for small arguments).
> +   Written by K.C. Ng, November 1988.
> +   Method (large arguments):
> +	1. Argument Reduction: given the input x, find r and integer k
> +	   and j such that
> +	             x = (k+j/32)*(ln2) + r,  |r| <= (1/64)*ln2
> +
> +	2. exp(x) = 2^k * (2^(j/32) + 2^(j/32)*expm1(r))
> +	   a. expm1(r) is approximated by a polynomial:
> +	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
> +	      Here t1 = 1/2 exactly.
> +	   b. 2^(j/32) is represented to twice double precision
> +	      as TBL[2j]+TBL[2j+1].
> +
> +   Note: If divide were fast enough, we could use another approximation
> +	 in 2.a:
> +	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
> +	      (for the same t1 and t2 as above)
> +
> +   Special cases:
> +	exp(INF) is INF, exp(NaN) is NaN;
> +	exp(-INF)=  0;
> +	for finite argument, only exp(0)=1 is exact.
> +
> +   Accuracy:
> +	According to an error analysis, the error is always less than
> +	an ulp (unit in the last place).  The largest errors observed
> +	are less than 0.55 ulp for normal results and less than 0.75 ulp
> +	for subnormal results.
> +
> +   Misc. info.
> +	For IEEE double
> +		if x >  7.09782712893383973096e+02 then exp(x) overflow
> +		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
> +
>   #include <math.h>
> +#include <math-svid-compat.h>
> +#include <math_private.h>
> +#include <errno.h>
>   #include "endian.h"
>   #include "uexp.h"
> +#include "uexp.tbl"
>   #include "mydefs.h"
>   #include "MathLib.h"
> -#include "uexp.tbl"
> -#include <math_private.h>
>   #include <fenv.h>
>   #include <float.h>
>   
> -#ifndef SECTION
> -# define SECTION
> -#endif
> +extern double __ieee754_exp (double);
> +
> +#include "eexp.tbl"
> +
> +static const double
> +  half = 0.5,
> +  one = 1.0;
>   
> -double __slowexp (double);
>   
> -/* An ultimate exp routine. Given an IEEE double machine number x it computes
> -   the correctly rounded (to nearest) value of e^x.  */
>   double
> -SECTION
> -__ieee754_exp (double x)
> +__ieee754_exp (double x_arg)
>   {
> -  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
> -  mynumber junk1, junk2, binexp = {{0, 0}};
> -  int4 i, j, m, n, ex;
> +  double z, t;
>     double retval;
> -
> +  int hx, ix, k, j, m;
> +  int fe_val;
> +  union
>     {
> -    SET_RESTORE_ROUND (FE_TONEAREST);
> -
> -    junk1.x = x;
> -    m = junk1.i[HIGH_HALF];
> -    n = m & hugeint;
> -
> -    if (n > smallint && n < bigint)
> -      {
> -	y = x * log2e.x + three51.x;
> -	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
> -
> -	junk1.x = y;
> -
> -	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
> -	t = x - bexp * ln_two1.x;
> -
> -	y = t + three33.x;
> -	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
> -	junk2.x = y;
> -	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
> -	eps = del + del * del * (p3.x * del + p2.x);
> -
> -	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
> -
> -	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
> -	j = (junk2.i[LOW_HALF] & 511) << 1;
> -
> -	al = coar.x[i] * fine.x[j];
> -	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
> -	       + coar.x[i + 1] * fine.x[j + 1]);
> -
> -	rem = (bet + bet * eps) + al * eps;
> -	res = al + rem;
> -	cor = (al - res) + rem;
> -	if (res == (res + cor * err_0))
> -	  {
> -	    retval = res * binexp.x;
> -	    goto ret;
> -	  }
> -	else
> -	  {
> -	    retval = __slowexp (x);
> -	    goto ret;
> -	  }			/*if error is over bound */
> -      }
> -
> -    if (n <= smallint)
> -      {
> -	retval = 1.0;
> -	goto ret;
> -      }
> -
> -    if (n >= badint)
> -      {
> -	if (n > infint)
> -	  {
> -	    retval = x + x;
> -	    goto ret;
> -	  }			/* x is NaN */
> -	if (n < infint)
> -	  {
> -	    if (x > 0)
> -	      goto ret_huge;
> -	    else
> -	      goto ret_tiny;
> -	  }
> -	/* x is finite,  cause either overflow or underflow  */
> -	if (junk1.i[LOW_HALF] != 0)
> -	  {
> -	    retval = x + x;
> -	    goto ret;
> -	  }			/*  x is NaN  */
> -	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
> -	goto ret;
> -      }
> -
> -    y = x * log2e.x + three51.x;
> -    bexp = y - three51.x;
> -    junk1.x = y;
> -    eps = bexp * ln_two2.x;
> -    t = x - bexp * ln_two1.x;
> -    y = t + three33.x;
> -    base = y - three33.x;
> -    junk2.x = y;
> -    del = (t - base) - eps;
> -    eps = del + del * del * (p3.x * del + p2.x);
> -    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
> -    j = (junk2.i[LOW_HALF] & 511) << 1;
> -    al = coar.x[i] * fine.x[j];
> -    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
> -	   + coar.x[i + 1] * fine.x[j + 1]);
> -    rem = (bet + bet * eps) + al * eps;
> -    res = al + rem;
> -    cor = (al - res) + rem;
> -    if (m >> 31)
> -      {
> -	ex = junk1.i[LOW_HALF];
> -	if (res < 1.0)
> -	  {
> -	    res += res;
> -	    cor += cor;
> -	    ex -= 1;
> -	  }
> -	if (ex >= -1022)
> -	  {
> -	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
> -	    if (res == (res + cor * err_0))
> -	      {
> -		retval = res * binexp.x;
> -		goto ret;
> -	      }
> -	    else
> -	      {
> -		retval = __slowexp (x);
> -		goto check_uflow_ret;
> -	      }			/*if error is over bound */
> -	  }
> -	ex = -(1022 + ex);
> -	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
> -	res *= binexp.x;
> -	cor *= binexp.x;
> -	eps = 1.0000000001 + err_0 * binexp.x;
> -	t = 1.0 + res;
> -	y = ((1.0 - t) + res) + cor;
> -	res = t + y;
> -	cor = (t - res) + y;
> -	if (res == (res + eps * cor))
> -	  {
> -	    binexp.i[HIGH_HALF] = 0x00100000;
> -	    retval = (res - 1.0) * binexp.x;
> -	    goto check_uflow_ret;
> -	  }
> -	else
> -	  {
> -	    retval = __slowexp (x);
> -	    goto check_uflow_ret;
> -	  }			/*   if error is over bound    */
> -      check_uflow_ret:
> -	if (retval < DBL_MIN)
> -	  {
> -	    double force_underflow = tiny * tiny;
> -	    math_force_eval (force_underflow);
> -	  }
> -	if (retval == 0)
> -	  goto ret_tiny;
> -	goto ret;
> -      }
> -    else
> -      {
> -	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
> -	if (res == (res + cor * err_0))
> -	  retval = res * binexp.x * t256.x;
> -	else
> -	  retval = __slowexp (x);
> -	if (isinf (retval))
> -	  goto ret_huge;
> -	else
> -	  goto ret;
> -      }
> -  }
> -ret:
> -  return retval;
> -
> - ret_huge:
> -  return hhuge * hhuge;
> -
> - ret_tiny:
> -  return tiny * tiny;
> +    int i_part[2];
> +    double x;
> +  } xx;
> +  union
> +  {
> +    int y_part[2];
> +    double y;
> +  } yy;
> +  xx.x = x_arg;
> +
> +  ix = xx.i_part[HIGH_HALF];
> +  hx = ix & ~0x80000000;
> +
> +  if (hx < 0x3ff0a2b2)
> +    {				/* |x| < 3/2 ln 2 */
> +      if (hx < 0x3f862e42)
> +	{			/* |x| < 1/64 ln 2 */
> +	  if (hx < 0x3ed00000)
> +	    {			/* |x| < 2^-18 */
> +	      if (hx < 0x3e300000)
> +		{
> +		  retval = one + xx.x;
> +		  return (retval);
> +		}
> +	      retval = one + xx.x * (one + half * xx.x);
> +	      return (retval);
> +	    }
> +	  /* Use FE_TONEAREST rounding mode for computing yy.y.
> +	     Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
> +	  fe_val = get_rounding_mode ();
> +	  if (fe_val == FE_TONEAREST)
> +	    {
> +	      t = xx.x * xx.x;
> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));
> +	      retval = one + yy.y;
> +	    }
> +	  else
> +	    {
> +	      libc_fesetround (FE_TONEAREST);
> +	      t = xx.x * xx.x;
> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));
> +	      retval = one + yy.y;
> +	      libc_fesetround (fe_val);
> +	    }
> +	  return (retval);
> +	}
> +
> +      /* Find the multiple of 2^-6 nearest x.  */
> +      k = hx >> 20;
> +      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
> +      j = (j - 1) & ~1;
> +      if (ix < 0)
> +	j += 134;
> +      /* Use FE_TONEAREST rounding mode for computing yy.y.
> +	 Avoid set/reset of rounding mode if in FE_TONEAREST mode.  */
> +      fe_val = get_rounding_mode ();
> +      if (fe_val == FE_TONEAREST)
> +	{
> +	  z = xx.x - TBL2[j];
> +	  t = z * z;
> +	  yy.y = z + (t * (half + (z * t2)) +
> +		      (t * t) * (t3 + z * t4 + t * t5));
> +	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
> +	}
> +      else
> +	{
> +	  libc_fesetround (FE_TONEAREST);
> +	  z = xx.x - TBL2[j];
> +	  t = z * z;
> +	  yy.y = z + (t * (half + (z * t2)) +
> +		      (t * t) * (t3 + z * t4 + t * t5));
> +	  retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
> +	  libc_fesetround (fe_val);
> +	}
> +      return (retval);
> +    }
> +
> +  if (hx >= 0x40862e42)
> +    {				/* x is large, infinite, or nan.  */
> +      if (hx >= 0x7ff00000)
> +	{
> +	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
> +	    return (zero);	/* exp(-inf) = 0.  */
> +	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
> +	}
> +      if (xx.x > threshold1)
> +	{			/* Set overflow error condition.  */
> +	  retval = hhuge * hhuge;
> +	  return retval;
> +	}
> +      if (-xx.x > threshold2)
> +	{			/* Set underflow error condition.  */
> +	  double force_underflow = tiny * tiny;
> +	  math_force_eval (force_underflow);
> +	  retval = force_underflow;
> +	  return retval;
> +	}
> +    }
> +
> +  /* Use FE_TONEAREST rounding mode for computing yy.y.
> +     Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */
> +  fe_val = get_rounding_mode ();
> +  if (fe_val == FE_TONEAREST)
> +    {
> +      t = invln2_32 * xx.x;
> +      if (ix < 0)
> +	t -= half;
> +      else
> +	t += half;
> +      k = (int) t;
> +      j = (k & 0x1f) << 1;
> +      m = k >> 5;
> +      z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
> +
> +      /* z is now in primary range.  */
> +      t = z * z;
> +      yy.y = z + (t * (half + z * t2) +
> +		  (t * t) * (t3 + z * t4 + t * t5));
> +      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
> +    }
> +  else
> +    {
> +      libc_fesetround (FE_TONEAREST);
> +      t = invln2_32 * xx.x;
> +      if (ix < 0)
> +	t -= half;
> +      else
> +	t += half;
> +      k = (int) t;
> +      j = (k & 0x1f) << 1;
> +      m = k >> 5;
> +      z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
> +
> +      /* z is now in primary range.  */
> +      t = z * z;
> +      yy.y = z + (t * (half + z * t2) +
> +		  (t * t) * (t3 + z * t4 + t * t5));
> +      yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
> +      libc_fesetround (fe_val);
> +    }
> +
> +  if (m < -1021)
> +    {
> +      yy.y_part[HIGH_HALF] += (m + 54) << 20;
> +      retval = twom54 * yy.y;
> +      if (retval < DBL_MIN)
> +	{
> +	  double force_underflow = tiny * tiny;
> +	  math_force_eval (force_underflow);
> +	}
> +      return retval;
> +    }
> +  yy.y_part[HIGH_HALF] += m << 20;
> +  return (yy.y);
>   }
>   #ifndef __ieee754_exp
>   strong_alias (__ieee754_exp, __exp_finite)
>   #endif
>   
> +#ifndef SECTION
> +# define SECTION
> +#endif
> +
>   /* Compute e^(x+xx).  The routine also receives bound of error of previous
>      calculation.  If after computing exp the error exceeds the allowed bounds,
>      the routine returns a non-positive number.  Otherwise it returns the
> diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
> index 9f6439e..2eb8dbf 100644
> --- a/sysdeps/ieee754/dbl-64/e_pow.c
> +++ b/sysdeps/ieee754/dbl-64/e_pow.c
> @@ -25,7 +25,7 @@
>   /*             log1                                                        */
>   /*             checkint                                                    */
>   /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
> -/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
> +/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
>   /*                          uexp.c  upow.c				   */
>   /*               root.tbl uexp.tbl upow.tbl                                */
>   /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
> diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
> new file mode 100644
> index 0000000..776369f
> --- /dev/null
> +++ b/sysdeps/ieee754/dbl-64/eexp.tbl
> @@ -0,0 +1,219 @@
> +/* EXP function tables - for use in ocmputing double precisoin exponential
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +
> +/* TBL[2*j] and TBL[2*j+1] are double precision numbers used to
> +   approximate exp(x) using the formula given in the comments
> +   for e_exp.c.  */
> +
> +static const double TBL[64] = {
> +    0x1.0000000000000p+0,  0x0.0000000000000p+0,
> +    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
> +    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,
> +    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
> +    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
> +    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
> +    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
> +    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
> +    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
> +    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
> +    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
> +    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
> +    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
> +    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,
> +    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
> +    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
> +    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
> +    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
> +    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
> +    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
> +    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
> +    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
> +    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
> +    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
> +    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
> +    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
> +    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
> +    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
> +    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,
> +    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
> +    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
> +    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54};
> +
> +/* For i = 0, ..., 66,
> +     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
> +     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
> +     than 2^-60.
> +
> +   For i = 67, ..., 133,
> +     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
> +     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
> +     than 2^-60.  */
> +
> +static const double TBL2[268] = {
> +    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
> +    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
> +    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
> +    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
> +    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
> +    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
> +    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
> +    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
> +    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
> +    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
> +    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
> +    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
> +    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
> +    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
> +    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
> +    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
> +    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
> +    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
> +    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
> +    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
> +    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
> +    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
> +    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
> +    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
> +    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
> +    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
> +    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
> +    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
> +    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
> +    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
> +    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
> +    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
> +    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
> +    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
> +    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
> +    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
> +    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
> +    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
> +    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
> +    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
> +    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
> +    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
> +    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
> +    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
> +    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
> +    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
> +    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
> +    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
> +    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
> +    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
> +    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
> +    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
> +    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
> +    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
> +    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
> +    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
> +    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
> +    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
> +    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
> +    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
> +    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
> +    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
> +    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
> +    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
> +    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
> +    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
> +    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
> +   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
> +   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
> +   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
> +   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
> +   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
> +   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
> +   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
> +   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
> +   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
> +   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
> +   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
> +   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
> +   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
> +   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
> +   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
> +   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
> +   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
> +   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
> +   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
> +   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
> +   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
> +   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
> +   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
> +   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
> +   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
> +   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
> +   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
> +   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
> +   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
> +   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
> +   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
> +   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
> +   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
> +   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
> +   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
> +   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
> +   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
> +   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
> +   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
> +   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
> +   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
> +   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
> +   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
> +   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
> +   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
> +   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
> +   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
> +   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
> +   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
> +   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
> +   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
> +   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
> +   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
> +   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
> +   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
> +   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
> +   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
> +   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
> +   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
> +   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
> +   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
> +   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
> +   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
> +   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
> +   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
> +   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
> +   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
> +
> +static const double
> +/* Following three values used to scale x to primary range.  */
> +  invln2_32 = 0x1.71547652b82fep+5, /* 4.61662413084468283841e+01 */
> +  ln2_32hi = 0x1.62e42fee00000p-6, /* 2.16608493865351192653e-02 */
> +  ln2_32lo = 0x1.a39ef35793c76p-38, /* 5.96317165397058656257e-12 */
> +/* t2-t5 terms used for polynomial computation.  */
> +  t2 = 0x1.5555555548f7cp-3, /* 1.6666666666526086527e-1 */
> +  t3 = 0x1.5555555545d4ep-5, /* 4.1666666666226079285e-2 */
> +  t4 = 0x1.11115b7aa905ep-7, /* 8.3333679843421958056e-3 */
> +  t5 = 0x1.6c1728d739765p-10, /* 1.3888949086377719040e-3 */
> +/* Maximum value for x to not overflow.  */
> +  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
> +/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
> +  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
> +/* Scaling factor used when result near zero.  */
> +  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
> diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
> deleted file mode 100644
> index e8fa2e2..0000000
> --- a/sysdeps/ieee754/dbl-64/slowexp.c
> +++ /dev/null
> @@ -1,86 +0,0 @@
> -/*
> - * IBM Accurate Mathematical Library
> - * written by International Business Machines Corp.
> - * Copyright (C) 2001-2017 Free Software Foundation, Inc.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU Lesser General Public License as published by
> - * the Free Software Foundation; either version 2.1 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU Lesser General Public License for more details.
> - *
> - * You should have received a copy of the GNU Lesser General Public License
> - * along with this program; if not, see <http://www.gnu.org/licenses/>.
> - */
> -/**************************************************************************/
> -/*  MODULE_NAME:slowexp.c                                                 */
> -/*                                                                        */
> -/*  FUNCTION:slowexp                                                      */
> -/*                                                                        */
> -/*  FILES NEEDED:mpa.h                                                    */
> -/*               mpa.c mpexp.c                                            */
> -/*                                                                        */
> -/*Converting from double precision to Multi-precision and calculating     */
> -/* e^x                                                                    */
> -/**************************************************************************/
> -#include <math_private.h>
> -
> -#include <stap-probe.h>
> -
> -#ifndef USE_LONG_DOUBLE_FOR_MP
> -# include "mpa.h"
> -void __mpexp (mp_no *x, mp_no *y, int p);
> -#endif
> -
> -#ifndef SECTION
> -# define SECTION
> -#endif
> -
> -/*Converting from double precision to Multi-precision and calculating  e^x */
> -double
> -SECTION
> -__slowexp (double x)
> -{
> -#ifndef USE_LONG_DOUBLE_FOR_MP
> -  double w, z, res, eps = 3.0e-26;
> -  int p;
> -  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
> -
> -  /* Use the multiple precision __MPEXP function to compute the exponential
> -     First at 144 bits and if it is not accurate enough, at 768 bits.  */
> -  p = 6;
> -  __dbl_mp (x, &mpx, p);
> -  __mpexp (&mpx, &mpy, p);
> -  __dbl_mp (eps, &mpeps, p);
> -  __mul (&mpeps, &mpy, &mpcor, p);
> -  __add (&mpy, &mpcor, &mpw, p);
> -  __sub (&mpy, &mpcor, &mpz, p);
> -  __mp_dbl (&mpw, &w, p);
> -  __mp_dbl (&mpz, &z, p);
> -  if (w == z)
> -    {
> -      /* Track how often we get to the slow exp code plus
> -	 its input/output values.  */
> -      LIBC_PROBE (slowexp_p6, 2, &x, &w);
> -      return w;
> -    }
> -  else
> -    {
> -      p = 32;
> -      __dbl_mp (x, &mpx, p);
> -      __mpexp (&mpx, &mpy, p);
> -      __mp_dbl (&mpy, &res, p);
> -
> -      /* Track how often we get to the uber-slow exp code plus
> -	 its input/output values.  */
> -      LIBC_PROBE (slowexp_p32, 2, &x, &res);
> -      return res;
> -    }
> -#else
> -  return (double) __ieee754_expl((long double)x);
> -#endif
> -}
> diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
> index e17d32f..ded9976 100644
> --- a/sysdeps/powerpc/power4/fpu/Makefile
> +++ b/sysdeps/powerpc/power4/fpu/Makefile
> @@ -3,5 +3,4 @@
>   ifeq ($(subdir),math)
>   CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
>   CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
> -CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
>   endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
> index c78624b..e06c059 100644
> --- a/sysdeps/x86_64/fpu/multiarch/Makefile
> +++ b/sysdeps/x86_64/fpu/multiarch/Makefile
> @@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
>   
>   libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
>   			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
> -			mplog-fma mpa-fma slowexp-fma slowpow-fma \
> +			mplog-fma mpa-fma slowpow-fma \
>   			sincos32-fma doasin-fma dosincos-fma \
>   			halfulp-fma mpexp-fma \
>   			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
> @@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
>   CFLAGS-mptan-fma.c = -mfma -mavx2
>   CFLAGS-s_atan-fma.c = -mfma -mavx2
>   CFLAGS-sincos32-fma.c = -mfma -mavx2
> -CFLAGS-slowexp-fma.c = -mfma -mavx2
>   CFLAGS-slowpow-fma.c = -mfma -mavx2
>   CFLAGS-s_sin-fma.c = -mfma -mavx2
>   CFLAGS-s_tan-fma.c = -mfma -mavx2
> @@ -48,7 +47,7 @@ CFLAGS-e_powf-fma.c = -mfma -mavx2
>   
>   libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
>   			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
> -			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
> +			mplog-fma4 mpa-fma4 slowpow-fma4 \
>   			sincos32-fma4 doasin-fma4 dosincos-fma4 \
>   			halfulp-fma4 mpexp-fma4 \
>   			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
> @@ -70,14 +69,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
>   CFLAGS-mptan-fma4.c = -mfma4
>   CFLAGS-s_atan-fma4.c = -mfma4
>   CFLAGS-sincos32-fma4.c = -mfma4
> -CFLAGS-slowexp-fma4.c = -mfma4
>   CFLAGS-slowpow-fma4.c = -mfma4
>   CFLAGS-s_sin-fma4.c = -mfma4
>   CFLAGS-s_tan-fma4.c = -mfma4
>   
>   libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
>   			e_atan2-avx s_sin-avx s_tan-avx \
> -			mplog-avx mpa-avx slowexp-avx \
> +			mplog-avx mpa-avx \
>   			mpexp-avx
>   
>   CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
> @@ -88,7 +86,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
>   CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
>   CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
>   CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
> -CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
>   CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
>   endif
>   
> diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
> index ee5dd6d..afd9174 100644
> --- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
> +++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
> @@ -1,6 +1,5 @@
>   #define __ieee754_exp __ieee754_exp_avx
>   #define __exp1 __exp1_avx
> -#define __slowexp __slowexp_avx
>   #define SECTION __attribute__ ((section (".text.avx")))
>   
>   #include <sysdeps/ieee754/dbl-64/e_exp.c>
> diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
> index 6e0fdb7..765b1b9 100644
> --- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
> +++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
> @@ -1,6 +1,5 @@
>   #define __ieee754_exp __ieee754_exp_fma
>   #define __exp1 __exp1_fma
> -#define __slowexp __slowexp_fma
>   #define SECTION __attribute__ ((section (".text.fma")))
>   
>   #include <sysdeps/ieee754/dbl-64/e_exp.c>
> diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
> index ae6eb67..9ac7aca 100644
> --- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
> +++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
> @@ -1,6 +1,5 @@
>   #define __ieee754_exp __ieee754_exp_fma4
>   #define __exp1 __exp1_fma4
> -#define __slowexp __slowexp_fma4
>   #define SECTION __attribute__ ((section (".text.fma4")))
>   
>   #include <sysdeps/ieee754/dbl-64/e_exp.c>
> diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
> deleted file mode 100644
> index d01c6d7..0000000
> --- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -#define __slowexp __slowexp_avx
> -#define __add __add_avx
> -#define __dbl_mp __dbl_mp_avx
> -#define __mpexp __mpexp_avx
> -#define __mul __mul_avx
> -#define __sub __sub_avx
> -#define SECTION __attribute__ ((section (".text.avx")))
> -
> -#include <sysdeps/ieee754/dbl-64/slowexp.c>
> diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
> deleted file mode 100644
> index 6fffca1..0000000
> --- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -#define __slowexp __slowexp_fma
> -#define __add __add_fma
> -#define __dbl_mp __dbl_mp_fma
> -#define __mpexp __mpexp_fma
> -#define __mul __mul_fma
> -#define __sub __sub_fma
> -#define SECTION __attribute__ ((section (".text.fma")))
> -
> -#include <sysdeps/ieee754/dbl-64/slowexp.c>
> diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
> deleted file mode 100644
> index 3bcde84..0000000
> --- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -#define __slowexp __slowexp_fma4
> -#define __add __add_fma4
> -#define __dbl_mp __dbl_mp_fma4
> -#define __mpexp __mpexp_fma4
> -#define __mul __mul_fma4
> -#define __sub __sub_fma4
> -#define SECTION __attribute__ ((section (".text.fma4")))
> -
> -#include <sysdeps/ieee754/dbl-64/slowexp.c>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-11-16 17:52 ` Patrick McGehearty
@ 2017-11-16 18:27   ` Carlos O'Donell
  2017-11-16 18:31   ` Joseph Myers
  1 sibling, 0 replies; 44+ messages in thread
From: Carlos O'Donell @ 2017-11-16 18:27 UTC (permalink / raw)
  To: Patrick McGehearty, libc-alpha

On 11/16/2017 09:52 AM, Patrick McGehearty wrote:
> It has been 10 days with no further feedback/comments.
> What's the next step for getting this patch included in the src tree?
> It would be good to make that happen before the Thanksgiving break.

Doing exactly what you are doing. Pinging again and falling on the mercy
of the senior reviewers to review v5.

Joseph Myers is the expert and subsystem maintainer here. I would ask
him directly for a final review.

As always you may need to wait. The next development freeze is January
1st 2018 (1 month from now). So you have a little more time before that
point.

I would suggest proposing your changes as release blockers given the
performance benefit to users. Add them here:
https://sourceware.org/glibc/wiki/Release/2.27

See previous release blockers to see how we structure that text:
https://sourceware.org/glibc/wiki/Release/2.26#Release_blockers.3F

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-11-16 17:52 ` Patrick McGehearty
  2017-11-16 18:27   ` Carlos O'Donell
@ 2017-11-16 18:31   ` Joseph Myers
  1 sibling, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-11-16 18:31 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 16 Nov 2017, Patrick McGehearty wrote:

> It has been 10 days with no further feedback/comments.
> What's the next step for getting this patch included in the src tree?
> It would be good to make that happen before the Thanksgiving break.

I think that once GCC mainline has stabilized after the end of development 
stage 1 - which in practice will probably take a few more weeks as 
features posted before the freeze get reviewed and so get into GCC - 
people may have more time for glibc patch review, both because it's no 
longer competing for time with review and development of GCC patches for 
GCC 8, and because once GCC stabilizes less time will be taken in glibc 
development in investigating and fixing problems that arise building with 
GCC mainline, and in reviewing patches to fix such issues.  (Right now the 
glibc build with GCC mainline is fixed, but some issue is resulting in 
linknamespace failures for all configurations - maybe more likely a 
mainline binutils issue than a mainline GCC one - which I need to 
investigate to get that bot back to a reasonable state.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-11-07  4:25 Patrick McGehearty
  2017-11-16 17:52 ` Patrick McGehearty
@ 2017-11-23 21:19 ` Joseph Myers
  2017-12-01  0:47   ` Patrick McGehearty
  1 sibling, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2017-11-23 21:19 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Mon, 6 Nov 2017, Patrick McGehearty wrote:

> @@ -561,8 +561,10 @@ math-CPPFLAGS += -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
>  ifneq ($(long-double-fcts),yes)
>  # The `double' and `long double' types are the same on this machine.
>  # We won't compile the `long double' code at all.  Tell the `double' code
> -# to define aliases for the `FUNCl' names.
> -math-CPPFLAGS += -DNO_LONG_DOUBLE
> +# to define aliases for the `FUNCl' names.  To avoid type conflicts in
> +# defining those aliases, tell <math.h> to declare the `FUNCl' names with
> +# `double' instead of `long double'.
> +math-CPPFLAGS += -DNO_LONG_DOUBLE -D_Mlong_double_=double
>  endif
>  
>  # These files quiet sNaNs in a way that is optimized away without

This diff hunk is bogus (reverting a recent change I made) and should not 
be included in this patch.

> +	      if (hx < 0x3e300000)
> +		{
> +		  retval = one + xx.x;
> +		  return (retval);

No parentheses around return value.

> +		}
> +	      retval = one + xx.x * (one + half * xx.x);
> +	      return (retval);

Likewise.

> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));

Split lines before an operator, not after.

> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
> +			     (t * t) * (t3 + xx.x * t4 + t * t5));

Likewise.

> +	  yy.y = z + (t * (half + (z * t2)) +
> +		      (t * t) * (t3 + z * t4 + t * t5));

Likewise.

> +	  yy.y = z + (t * (half + (z * t2)) +
> +		      (t * t) * (t3 + z * t4 + t * t5));

Likewise.

> +      return (retval);

Avoid parentheses around return value.

> +	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
> +	    return (zero);	/* exp(-inf) = 0.  */

Likewise.

> +	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */

Likewise.

> +      yy.y = z + (t * (half + z * t2) + 
> +		  (t * t) * (t3 + z * t4 + t * t5));

Split line before operator.

> +      yy.y = z + (t * (half + z * t2) +
> +		  (t * t) * (t3 + z * t4 + t * t5));

Likewise.

> +  return (yy.y);

Remove parentheses.

> /* EXP function tables - for use in ocmputing double precisoin exponential

s/ocmputing/computing/

s/precisoin/precision/

> +/* TBL[2*j] and TBL[2*j+1] are double precision numbers used to
> +   approximate exp(x) using the formula given in the comments
> +   for e_exp.c.  */

I believe the correct semantics to describe are: TBL[2*j] is 2**(j/32), 
rounded to nearest; TBL[2*j+1] is 2**(j/32) - TBL[2*j], rounded to 
nearest.  Now if that's the case, three of the low parts should be 
adjusted by 1ulp because the current values aren't actually rounded to 
nearest (unless you have some concrete reason why the present values, that 
aren't rounded to nearest, are optimal):

> +    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,

0x1.8a62e4adc610ap-54 should be 0x1.8a62e4adc610bp-54.

> +    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,

-0x1.07abe1db13cacp-55 should be -0x1.07abe1db13cadp-55.

> +    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,

0x1.2ed02d75b3706p-55 should be 0x1.2ed02d75b3707p-55.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-11-23 21:19 ` Joseph Myers
@ 2017-12-01  0:47   ` Patrick McGehearty
  0 siblings, 0 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-12-01  0:47 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

Thank you for the continued detailed reviews.
Due to your comments about TBL[2*j] and TBL[2*j+1],
I computed exp(x) over 10 million algorithmically generated
values for x using both the TBL values used by the Solaris/Studio
version of exp() and the TBL values you suggested.
There was no case where exp(x) differed.
I computed the values for TBL using quad precision and
got the same values you recommend. That got me thinking
some more and I realized changing from 32 table entries
to 64 table entries was really not that difficult.
The values for TBL are generated as you recommend.

My next patch submission (coming shortly) will use
j/64 with 64 TBL entries for TBL[2*j] and TBL[2*j+1].
That approach gives the same performance with fewer
ulp errors. On that same 10 million value test,
I'm seeing roughly 16 differences per 10,000 values
instead of 29 differences per 10,000 values with
the 32 TBL entry version. In addition, we only see
one difference in test-double-exp.out instead of three.
The difference is still a single ulp.

I've tested the new version on Sparc and x86.

- patrick



On 11/23/2017 3:19 PM, Joseph Myers wrote:
> On Mon, 6 Nov 2017, Patrick McGehearty wrote:
>
>> @@ -561,8 +561,10 @@ math-CPPFLAGS += -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
>>   ifneq ($(long-double-fcts),yes)
>>   # The `double' and `long double' types are the same on this machine.
>>   # We won't compile the `long double' code at all.  Tell the `double' code
>> -# to define aliases for the `FUNCl' names.
>> -math-CPPFLAGS += -DNO_LONG_DOUBLE
>> +# to define aliases for the `FUNCl' names.  To avoid type conflicts in
>> +# defining those aliases, tell <math.h> to declare the `FUNCl' names with
>> +# `double' instead of `long double'.
>> +math-CPPFLAGS += -DNO_LONG_DOUBLE -D_Mlong_double_=double
>>   endif
>>   
>>   # These files quiet sNaNs in a way that is optimized away without
> This diff hunk is bogus (reverting a recent change I made) and should not
> be included in this patch.
>
>> +	      if (hx < 0x3e300000)
>> +		{
>> +		  retval = one + xx.x;
>> +		  return (retval);
> No parentheses around return value.
>
>> +		}
>> +	      retval = one + xx.x * (one + half * xx.x);
>> +	      return (retval);
> Likewise.
>
>> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
>> +			     (t * t) * (t3 + xx.x * t4 + t * t5));
> Split lines before an operator, not after.
>
>> +	      yy.y = xx.x + (t * (half + xx.x * t2) +
>> +			     (t * t) * (t3 + xx.x * t4 + t * t5));
> Likewise.
>
>> +	  yy.y = z + (t * (half + (z * t2)) +
>> +		      (t * t) * (t3 + z * t4 + t * t5));
> Likewise.
>
>> +	  yy.y = z + (t * (half + (z * t2)) +
>> +		      (t * t) * (t3 + z * t4 + t * t5));
> Likewise.
>
>> +      return (retval);
> Avoid parentheses around return value.
>
>> +	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
>> +	    return (zero);	/* exp(-inf) = 0.  */
> Likewise.
>
>> +	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
> Likewise.
>
>> +      yy.y = z + (t * (half + z * t2) +
>> +		  (t * t) * (t3 + z * t4 + t * t5));
> Split line before operator.
>
>> +      yy.y = z + (t * (half + z * t2) +
>> +		  (t * t) * (t3 + z * t4 + t * t5));
> Likewise.
>
>> +  return (yy.y);
> Remove parentheses.
>
>> /* EXP function tables - for use in ocmputing double precisoin exponential
> s/ocmputing/computing/
>
> s/precisoin/precision/
>
>> +/* TBL[2*j] and TBL[2*j+1] are double precision numbers used to
>> +   approximate exp(x) using the formula given in the comments
>> +   for e_exp.c.  */
> I believe the correct semantics to describe are: TBL[2*j] is 2**(j/32),
> rounded to nearest; TBL[2*j+1] is 2**(j/32) - TBL[2*j], rounded to
> nearest.  Now if that's the case, three of the low parts should be
> adjusted by 1ulp because the current values aren't actually rounded to
> nearest (unless you have some concrete reason why the present values, that
> aren't rounded to nearest, are optimal):
>
>> +    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,
> 0x1.8a62e4adc610ap-54 should be 0x1.8a62e4adc610bp-54.
>
>> +    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,
> -0x1.07abe1db13cacp-55 should be -0x1.07abe1db13cadp-55.
>
>> +    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,
> 0x1.2ed02d75b3706p-55 should be 0x1.2ed02d75b3707p-55.
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-10-26 22:53 Patrick McGehearty
  2017-11-01  0:26 ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-26 22:53 UTC (permalink / raw)
  To: libc-alpha

Version 3 of proposed patch.

All hex constants in version 2 replaced with C99 double hex constants,
allowing Big Endian and Little Endian versions to be merged.
Only e_exp.c and eexp.tbl changed from version 2.
Minor changes in performance results due to system noise.
No other changes from version 2.

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  378 +++++++++++++++------------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  215 +++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 425 insertions(+), 313 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index 1feb425..f70aebf 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32 \
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index 80c7c92..30fc3c9 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..f118f33 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,238 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains. */
+/*
+   exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/32)*(ln2) + r,  |r| <= (1/64)*ln2
+
+	2. exp(x) = 2^k * (2^(j/32) + 2^(j/32)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/32) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow
+ */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return (retval);
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return (retval);
+	    }
+	  /* 
+	     Use FE_TONEAREST rounding mode for computing yy.y 
+	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+	  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST) {
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2) +
+			   (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
+	  } else {
+	    libc_fesetround (FE_TONEAREST);
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2) +
+			   (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
+	    libc_fesetround (fe_val);
 	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
+	  return (retval);
+	}
 
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
+      /* find the multiple of 2^-6 nearest x */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* 
+	 Use FE_TONEAREST rounding mode for computing yy.y 
+	 Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+      */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST) {
+	z = xx.x - TBL2[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2)) +
+		    (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+      } else {
+	libc_fesetround (FE_TONEAREST);
+	z = xx.x - TBL2[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2)) +
+		    (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	libc_fesetround (fe_val);
       }
+      return (retval);
+    }
 
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return (zero);	/* exp(-inf) = 0 */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf */
+	}
+      if (xx.x > threshold1)
+	{			/* set overflow error condition */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* set underflow error condition */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = zero;
+	  return retval;
+	}
+    }
 
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
+  /* 
+     Use FE_TONEAREST rounding mode for computing yy.y 
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST) {
+    t = invln2_32 * xx.x;
+    if (ix < 0)
+      t -= half;
     else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) + 
+		(t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+  } else {
+    libc_fesetround (FE_TONEAREST);
+    t = invln2_32 * xx.x;
+    if (ix < 0)
+      t -= half;
+    else
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) +
+		(t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    libc_fesetround (fe_val);
   }
-ret:
-  return retval;
 
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return (yy.y);
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..ec48489
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,215 @@
+/* EXP function tables - for use in ocmputing double precisoin exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+static const double TBL[64] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54};
+/*
+   For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+*/
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* Following three values used to scale x to primary range */
+  invln2_32 = 0x1.71547652b82fep+5, /* 4.61662413084468283841e+01 */
+  ln2_32hi = 0x1.62e42fee00000p-6, /* 2.16608493865351192653e-02 */
+  ln2_32lo = 0x1.a39ef35793c76p-38, /* 5.96317165397058656257e-12 */
+/* t2-t5 terms used for polynomial computation */
+  t2 = 0x1.5555555548f7cp-3, /* 1.6666666666526086527e-1 */
+  t3 = 0x1.5555555545d4ep-5, /* 4.1666666666226079285e-2 */
+  t4 = 0x1.11115b7aa905ep-7, /* 8.3333679843421958056e-3 */
+  t5 = 0x1.6c1728d739765p-10, /* 1.3888949086377719040e-3 */
+/* maximum value for x to not overflow */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* maximum value for -x to not underflow */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* scaling factor used when result near zero*/
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index c78624b..e06c059 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -48,7 +47,7 @@ CFLAGS-e_powf-fma.c = -mfma -mavx2
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -70,14 +69,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -88,7 +86,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-26 22:53 Patrick McGehearty
@ 2017-11-01  0:26 ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-11-01  0:26 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 26 Oct 2017, Patrick McGehearty wrote:

> +	  /* 
> +	     Use FE_TONEAREST rounding mode for computing yy.y 
> +	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
> +	  */

Generally, throughout this patch, note GNU standard comment formatting.  
Sentences should end with "." (and, for other comments, start with a 
capital letter).  No blank line at the top of this comment, or at the end.  
So

/* Use FE_TONEAREST rounding mode for computing yy.y.
   Avoid set/reset of rounding mode if already in FE_TONEAREST mode.  */

(appropriately indented) would be a properly formatted version of this 
comment.

> +      if (-xx.x > threshold2)
> +	{			/* set underflow error condition */
> +	  double force_underflow = tiny * tiny;
> +	  math_force_eval (force_underflow);
> +	  retval = zero;
> +	  return retval;

As previously noted, I'd expect force_underflow to be the value returned 
in this case, not zero, as that's proper (in accordance with glibc's 
accuracy goals for underflowing values) in FE_UPWARD mode.

> +static const double TBL[64] = {
> +    0x1.0000000000000p+0,  0x0.0000000000000p+0,
> +    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,

There should be a comment on the TBL array explaining what the values in 
it are, like the comment on TBL2.

> +/* maximum value for -x to not underflow */
> +  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */

I think you mean not to underflow *to zero in round-to-nearest mode* 
(since some less-negative values would still result in underflow, but with 
the result being subnormal not zero).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-10-26 16:44 Patrick McGehearty
  2017-10-26 17:20 ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-26 16:44 UTC (permalink / raw)
  To: libc-alpha

Version 2 of proposed patch.
Revised copyright notice and formatting issues.
Removed slowexp.c and related references.
Replaced tables of double float constants with hex constants, taking special
  attention to correctly handle little endian and big endian versions.
  Using hex initialization also required changing variables to be declared
  as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.
Replaced __fegetround(), __fesetround() with get_rounding_mode and
  libc_fesetround().
Removed use of "small". "inexact mode" now ignored.
Retested and rebenchmarked on sparc and x86 with the above changes.

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   381    5173     766
min     399    54      15      13
mean   5317   199    1349      24

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function. The gamma function is known to be difficult to compute
accurately.
---
 manual/probes.texi                          |   14 -
 math/Makefile                               |    2 +-
 sysdeps/generic/math_private.h              |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |  379 +++++++++++++-----------
 sysdeps/ieee754/dbl-64/e_pow.c              |    2 +-
 sysdeps/ieee754/dbl-64/eexp.tbl             |  440 +++++++++++++++++++++++++++
 sysdeps/ieee754/dbl-64/slowexp.c            |   86 ------
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/multiarch/Makefile       |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp-avx.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c  |    9 -
 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c |    9 -
 15 files changed, 651 insertions(+), 313 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
 delete mode 100644 sysdeps/ieee754/dbl-64/slowexp.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

diff --git a/manual/probes.texi b/manual/probes.texi
index 8ab6756..f8ae64b 100644
--- a/manual/probes.texi
+++ b/manual/probes.texi
@@ -258,20 +258,6 @@ Unless explicitly mentioned otherwise, a precision of 1 implies 24 bits of
 precision in the mantissa of the multiple precision number.  Hence, a precision
 level of 32 implies 768 bits of precision in the mantissa.
 
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32.  Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
 @deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
 This probe is triggered when the @code{pow} function is called with
 inputs that result in multiple precision computation with precision
diff --git a/math/Makefile b/math/Makefile
index 1feb425..f70aebf 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -114,7 +114,7 @@ type-ldouble-yes := ldouble
 # double support
 type-double-suffix :=
 type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
-		       mpatan mpexp mplog mpsqrt mptan sincos32 slowexp	\
+		       mpatan mpexp mplog mpsqrt mptan sincos32 \
 		       slowpow sincostab k_rem_pio2
 
 # float support
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
index 80c7c92..30fc3c9 100644
--- a/sysdeps/generic/math_private.h
+++ b/sysdeps/generic/math_private.h
@@ -262,7 +262,6 @@ extern double __sin32 (double __x, double __res, double __res1);
 extern double __cos32 (double __x, double __res, double __res1);
 extern double __mpsin (double __x, double __dx, bool __range_reduce);
 extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
 extern double __slowpow (double __x, double __y, double __z);
 extern void __docos (double __x, double __dx, double __v[]);
 
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..0b0e0e3 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -23,7 +24,7 @@
 /*           exp1                                                          */
 /*                                                                         */
 /* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
+/*              mpa.c mpexp.x                                              */
 /*                                                                         */
 /* An ultimate exp routine. Given an IEEE double machine number x          */
 /* it computes the correctly rounded (to nearest) value of e^x             */
@@ -32,207 +33,239 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains. */
+/*
+   exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/32)*(ln2) + r,  |r| <= (1/64)*ln2
+
+	2. exp(x) = 2^k * (2^(j/32) + 2^(j/32)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/32) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow
+ */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      /* raise inexact if x != 0 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return (retval);
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return (retval);
+	    }
+	  /* 
+	     Use FE_TONEAREST rounding mode for computing yy.y 
+	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+	  */
+	  fe_val = get_rounding_mode ();
+	  if (fe_val == FE_TONEAREST) {
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2.x) +
+			   (t * t) * (t3.x + xx.x * t4.x + t * t5.x));
+	    retval = one + yy.y;
+	  } else {
+	    libc_fesetround (FE_TONEAREST);
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2.x) +
+			   (t * t) * (t3.x + xx.x * t4.x + t * t5.x));
+	    retval = one + yy.y;
+	    libc_fesetround (fe_val);
 	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
+	  return (retval);
+	}
 
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
+      /* find the multiple of 2^-6 nearest x */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* 
+	 Use FE_TONEAREST rounding mode for computing yy.y 
+	 Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+      */
+      fe_val = get_rounding_mode ();
+      if (fe_val == FE_TONEAREST) {
+	z = xx.x - TBL2.x[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2.x)) +
+		    (t * t) * (t3.x + z * t4.x + t * t5.x));
+	retval = TBL2.x[j + 1] + TBL2.x[j + 1] * yy.y;
+      } else {
+	libc_fesetround (FE_TONEAREST);
+	z = xx.x - TBL2.x[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2.x)) +
+		    (t * t) * (t3.x + z * t4.x + t * t5.x));
+	retval = TBL2.x[j + 1] + TBL2.x[j + 1] * yy.y;
+	libc_fesetround (fe_val);
       }
+      return (retval);
+    }
 
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return (zero);	/* exp(-inf) = 0 */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf */
+	}
+      if (xx.x > threshold1.x)
+	{			/* set overflow error condition */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2.x)
+	{			/* set underflow error condition */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = zero;
+	  return retval;
+	}
+    }
 
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
+  /* 
+     Use FE_TONEAREST rounding mode for computing yy.y 
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+  */
+  fe_val = get_rounding_mode ();
+  if (fe_val == FE_TONEAREST) {
+    t = invln2_32.x * xx.x;
+    if (ix < 0)
+      t -= half;
     else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi.x) - k * ln2_32lo.x;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2.x) + 
+		(t * t) * (t3.x + z * t4.x + t * t5.x));
+    yy.y = TBL.x[j] + (TBL.x[j + 1] + TBL.x[j] * yy.y);
+  } else {
+    libc_fesetround (FE_TONEAREST);
+    t = invln2_32.x * xx.x;
+    if (ix < 0)
+      t -= half;
+    else
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi.x) - k * ln2_32lo.x;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2.x) +
+		(t * t) * (t3.x + z * t4.x + t * t5.x));
+    yy.y = TBL.x[j] + (TBL.x[j + 1] + TBL.x[j] * yy.y);
+    libc_fesetround (fe_val);
   }
-ret:
-  return retval;
 
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54.x * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return (yy.y);
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
diff --git a/sysdeps/ieee754/dbl-64/e_pow.c b/sysdeps/ieee754/dbl-64/e_pow.c
index 9f6439e..2eb8dbf 100644
--- a/sysdeps/ieee754/dbl-64/e_pow.c
+++ b/sysdeps/ieee754/dbl-64/e_pow.c
@@ -25,7 +25,7 @@
 /*             log1                                                        */
 /*             checkint                                                    */
 /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h                             */
-/*               halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c       */
+/*               halfulp.c mpexp.c mplog.c slowpow.c mpa.c                 */
 /*                          uexp.c  upow.c				   */
 /*               root.tbl uexp.tbl upow.tbl                                */
 /* An ultimate power routine. Given two IEEE double machine numbers y,x    */
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..cecb8d4
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,440 @@
+/* EXP function tables - for use in ocmputing double precisoin exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef BIG_ENDI
+static const union {
+  int i[128];
+  double x[64];
+} TBL = { .i = {
+ 0x3FF00000, 0x00000000, 0x00000000, 0x00000000,
+ 0x3FF059B0, 0xD3158574, 0x3C8D73E2, 0xA475B465,
+ 0x3FF0B558, 0x6CF9890F, 0x3C98A62E, 0x4ADC610A,
+ 0x3FF11301, 0xD0125B51, 0xBC96C510, 0x39449B3A,
+ 0x3FF172B8, 0x3C7D517B, 0xBC819041, 0xB9D78A76,
+ 0x3FF1D487, 0x3168B9AA, 0x3C9E016E, 0x00A2643C,
+ 0x3FF2387A, 0x6E756238, 0x3C99B07E, 0xB6C70573,
+ 0x3FF29E9D, 0xF51FDEE1, 0x3C8612E8, 0xAFAD1255,
+ 0x3FF306FE, 0x0A31B715, 0x3C86F46A, 0xD23182E4,
+ 0x3FF371A7, 0x373AA9CB, 0xBC963AEA, 0xBF42EAE2,
+ 0x3FF3DEA6, 0x4C123422, 0x3C8ADA09, 0x11F09EBC,
+ 0x3FF44E08, 0x6061892D, 0x3C489B7A, 0x04EF80D0,
+ 0x3FF4BFDA, 0xD5362A27, 0x3C7D4397, 0xAFEC42E2,
+ 0x3FF5342B, 0x569D4F82, 0xBC807ABE, 0x1DB13CAC,
+ 0x3FF5AB07, 0xDD485429, 0x3C96324C, 0x054647AD,
+ 0x3FF6247E, 0xB03A5585, 0xBC9383C1, 0x7E40B497,
+ 0x3FF6A09E, 0x667F3BCD, 0xBC9BDD34, 0x13B26456,
+ 0x3FF71F75, 0xE8EC5F74, 0xBC816E47, 0x86887A99,
+ 0x3FF7A114, 0x73EB0187, 0xBC841577, 0xEE04992F,
+ 0x3FF82589, 0x994CCE13, 0xBC9D4C1D, 0xD41532D8,
+ 0x3FF8ACE5, 0x422AA0DB, 0x3C96E9F1, 0x56864B27,
+ 0x3FF93737, 0xB0CDC5E5, 0xBC675FC7, 0x81B57EBC,
+ 0x3FF9C491, 0x82A3F090, 0x3C7C7C46, 0xB071F2BE,
+ 0x3FFA5503, 0xB23E255D, 0xBC9D2F6E, 0xDB8D41E1,
+ 0x3FFAE89F, 0x995AD3AD, 0x3C97A1CD, 0x345DCC81,
+ 0x3FFB7F76, 0xF2FB5E47, 0xBC75584F, 0x7E54AC3B,
+ 0x3FFC199B, 0xDD85529C, 0x3C811065, 0x895048DD,
+ 0x3FFCB720, 0xDCEF9069, 0x3C7503CB, 0xD1E949DB,
+ 0x3FFD5818, 0xDCFBA487, 0x3C82ED02, 0xD75B3706,
+ 0x3FFDFC97, 0x337B9B5F, 0xBC91A5CD, 0x4F184B5C,
+ 0x3FFEA4AF, 0xA2A490DA, 0xBC9E9C23, 0x179C2893,
+ 0x3FFF5076, 0x5B6E4540, 0x3C99D3E1, 0x2DD8A18B } };
+
+/*
+   For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+*/
+static const union {
+  int i[536];
+  double x[268];
+} TBL2 = { .i = {
+ 0x3F8FFFFF, 0xFFFFFC82, 0x3FF04080, 0xAB55DE32,
+ 0x3F9FFFFF, 0xFFFFFFDB, 0x3FF08205, 0x601127EC,
+ 0x3FA80000, 0x000000A0, 0x3FF0C492, 0x36829E91,
+ 0x3FAFFFFF, 0xFFFFFF79, 0x3FF1082B, 0x577D34E9,
+ 0x3FB3FFFF, 0xFFFFFFFC, 0x3FF14CD4, 0xFC989CD6,
+ 0x3FB80000, 0x00000060, 0x3FF19293, 0x7074E0D4,
+ 0x3FBC0000, 0x00000061, 0x3FF1D96B, 0x0EFF0E80,
+ 0x3FBFFFFF, 0xFFFFFFD6, 0x3FF22160, 0x45B6F5CA,
+ 0x3FC1FFFF, 0xFFFFFF58, 0x3FF26A77, 0x93F6014C,
+ 0x3FC3FFFF, 0xFFFFFF75, 0x3FF2B4B5, 0x8B372C65,
+ 0x3FC5FFFF, 0xFFFFFF00, 0x3FF3001E, 0xCF601AD1,
+ 0x3FC80000, 0x00000020, 0x3FF34CB8, 0x170B583A,
+ 0x3FC9FFFF, 0xFFFFA629, 0x3FF39A86, 0x2BD3B344,
+ 0x3FCC0000, 0x0000000F, 0x3FF3E98D, 0xEAA11DCE,
+ 0x3FCE0000, 0x0000007F, 0x3FF439D4, 0x43F5F16D,
+ 0x3FD00000, 0x00000072, 0x3FF48B5E, 0x3C3E81AB,
+ 0x3FD0FFFF, 0xFFFFFECA, 0x3FF4DE30, 0xEC211DFB,
+ 0x3FD1FFFF, 0xFFFFFF8F, 0x3FF53251, 0x80CFACD2,
+ 0x3FD30000, 0x0000003B, 0x3FF587C5, 0x3C5A7B04,
+ 0x3FD40000, 0x00000034, 0x3FF5DE91, 0x76046007,
+ 0x3FD4FFFF, 0xFFFFFF89, 0x3FF636BB, 0x9A98322F,
+ 0x3FD5FFFF, 0xFFFFFFE7, 0x3FF69049, 0x2CBF942A,
+ 0x3FD6FFFF, 0xFFFFFF78, 0x3FF6EB3F, 0xC55B1E45,
+ 0x3FD7FFFF, 0xFFFFFF65, 0x3FF747A5, 0x13DBEF32,
+ 0x3FD8FFFF, 0xFFFFFFD5, 0x3FF7A57E, 0xDE9EA22E,
+ 0x3FD9FFFF, 0xFFFFFF6E, 0x3FF804D3, 0x0347B50F,
+ 0x3FDAFFFF, 0xFFFFFFC3, 0x3FF865A7, 0x772164AE,
+ 0x3FDC0000, 0x00000053, 0x3FF8C802, 0x477B0030,
+ 0x3FDD0000, 0x0000004D, 0x3FF92BE9, 0x9A09BF1E,
+ 0x3FDE0000, 0x00000096, 0x3FF99163, 0xAD4B1E08,
+ 0x3FDEFFFF, 0xFFFFFEFA, 0x3FF9F876, 0xD8E8C4FC,
+ 0x3FDFFFFF, 0xFFFFFFD0, 0x3FFA6129, 0x8E1E0688,
+ 0x3FE08000, 0x00000002, 0x3FFACB82, 0x581EEE56,
+ 0x3FE10000, 0x0000001F, 0x3FFB3787, 0xDC80F979,
+ 0x3FE17FFF, 0xFFFFFFF8, 0x3FFBA540, 0xDBA56E4F,
+ 0x3FE1FFFF, 0xFFFFFFFA, 0x3FFC14B4, 0x31256441,
+ 0x3FE27FFF, 0xFFFFFFC4, 0x3FFC85E8, 0xD43F7C9B,
+ 0x3FE2FFFF, 0xFFFFFFFD, 0x3FFCF8E5, 0xD84758A6,
+ 0x3FE38000, 0x0000001F, 0x3FFD6DB2, 0x6D16CD84,
+ 0x3FE3FFFF, 0xFFFFFFD8, 0x3FFDE455, 0xDF80E39B,
+ 0x3FE48000, 0x00000052, 0x3FFE5CD7, 0x99C6A59C,
+ 0x3FE4FFFF, 0xFFFFFFC8, 0x3FFED73F, 0x240DC10C,
+ 0x3FE58000, 0x00000013, 0x3FFF5394, 0x24D90F71,
+ 0x3FE5FFFF, 0xFFFFFFBC, 0x3FFFD1DE, 0x6182F885,
+ 0x3FE68000, 0x0000002D, 0x40002912, 0xDF5CE741,
+ 0x3FE70000, 0x00000040, 0x40006A39, 0x207F0A2A,
+ 0x3FE78000, 0x0000004F, 0x4000AC66, 0x0691652A,
+ 0x3FE7FFFF, 0xFFFFFF6F, 0x4000EF9D, 0xB467DCAB,
+ 0x3FE87FFF, 0xFFFFFFE5, 0x400133E4, 0x5D82E943,
+ 0x3FE90000, 0x00000035, 0x4001793E, 0x4652CC6D,
+ 0x3FE97FFF, 0xFFFFFFB3, 0x4001BFAF, 0xC47BDA48,
+ 0x3FEA0000, 0x00000000, 0x4002073D, 0x3F1BD518,
+ 0x3FEA8000, 0x0000004A, 0x40024FEB, 0x2F105CE2,
+ 0x3FEAFFFF, 0xFFFFFFED, 0x400299BE, 0x1F3E7F11,
+ 0x3FEB7FFF, 0xFFFFFFFB, 0x4002E4BA, 0xACDB6611,
+ 0x3FEC0000, 0x0000001D, 0x400330E5, 0x87B62B39,
+ 0x3FEC8000, 0x00000079, 0x40037E43, 0x7282D538,
+ 0x3FECFFFF, 0xFFFFFF51, 0x4003CCD9, 0x43268248,
+ 0x3FED7FFF, 0xFFFFFF74, 0x40041CAB, 0xE304CADC,
+ 0x3FEE0000, 0x00000011, 0x40046DC0, 0x4F4E5343,
+ 0x3FEE8000, 0x0000001E, 0x4004C01B, 0x9950A124,
+ 0x3FEEFFFF, 0xFFFFFF9E, 0x400513C2, 0xE6C73196,
+ 0x3FEF7FFF, 0xFFFFFFED, 0x400568BB, 0x722DD586,
+ 0x3FF00000, 0x00000034, 0x4005BF0A, 0x8B1457B0,
+ 0x3FF03FFF, 0xFFFFFFE2, 0x400616B5, 0x967376DF,
+ 0x3FF07FFF, 0xFFFFFF4B, 0x40066FC2, 0x0F0337A9,
+ 0x3FF0BFFF, 0xFFFFFFFD, 0x4006CA35, 0x859290F5,
+ 0xBF8FFFFF, 0xFFFFFFE4, 0x3FEF80FE, 0xABFEEFA5,
+ 0xBF9FFFFF, 0xFFFFFB0B, 0x3FEF03F5, 0x6A88B5FE,
+ 0xBFA7FFFF, 0xFFFFFFA7, 0x3FEE88DC, 0x6AFECFC5,
+ 0xBFAFFFFF, 0xFFFFFEA8, 0x3FEE0FAB, 0xFBC702B8,
+ 0xBFB3FFFF, 0xFFFFFFB3, 0x3FED985C, 0x89D041AC,
+ 0xBFB7FFFF, 0xFFFFFFE3, 0x3FED22E6, 0xA0197C06,
+ 0xBFBBFFFF, 0xFFFFFF9A, 0x3FECAF42, 0xE73A4C89,
+ 0xBFBFFFFF, 0xFFFFFF98, 0x3FEC3D6A, 0x24ED822D,
+ 0xBFC1FFFF, 0xFFFFFFE9, 0x3FEBCD55, 0x3B9D7B67,
+ 0xBFC3FFFF, 0xFFFFFFE0, 0x3FEB5EFD, 0x29F24C2D,
+ 0xBFC5FFFF, 0xFFFFF553, 0x3FEAF25B, 0x0A61A9F4,
+ 0xBFC7FFFF, 0xFFFFFF8B, 0x3FEA8768, 0x12C08794,
+ 0xBFC9FFFF, 0xFFFFFE51, 0x3FEA1E1D, 0x93D68828,
+ 0xBFCBFFFF, 0xFFFFFF6E, 0x3FE9B674, 0xF8F2F3F5,
+ 0xBFCDFFFF, 0xFFFFFF7F, 0x3FE95067, 0xC7837A0C,
+ 0xBFCFFFFF, 0xFFFFFF7A, 0x3FE8EBEF, 0x9EAC8225,
+ 0xBFD0FFFF, 0xFFFFFFFE, 0x3FE88906, 0x36E31F55,
+ 0xBFD1FFFF, 0xFFFFFF41, 0x3FE827A5, 0x6188975E,
+ 0xBFD2FFFF, 0xFFFFFFBA, 0x3FE7C7C7, 0x08877656,
+ 0xBFD3FFFF, 0xFFFFFFF8, 0x3FE76965, 0x2DF22F81,
+ 0xBFD4FFFF, 0xFFFFFF90, 0x3FE70C79, 0xEBA33C2F,
+ 0xBFD5FFFF, 0xFFFFFFDB, 0x3FE6B0FF, 0x72DEB8AA,
+ 0xBFD6FFFF, 0xFFFFFF9A, 0x3FE656F0, 0x0BF5798E,
+ 0xBFD7FFFF, 0xFFFFFF9F, 0x3FE5FE46, 0x15E98EB0,
+ 0xBFD8FFFF, 0xFFFFFFEE, 0x3FE5A6FC, 0x061433CE,
+ 0xBFD9FFFF, 0xFFFFFC4A, 0x3FE5510C, 0x67CD26CD,
+ 0xBFDAFFFF, 0xFFFFFF30, 0x3FE4FC71, 0xDC13566B,
+ 0xBFDBFFFF, 0xFFFFFFF0, 0x3FE4A927, 0x1936FD0E,
+ 0xBFDCFFFF, 0xFFFFFFF3, 0x3FE45726, 0xEA84FB8C,
+ 0xBFDDFFFF, 0xFFFFFFF3, 0x3FE4066C, 0x2FF3912B,
+ 0xBFDEFFFF, 0xFFFFFF80, 0x3FE3B6F1, 0xDDD05AB9,
+ 0xBFDFFFFF, 0xFFFFFFDF, 0x3FE368B2, 0xFC6F9614,
+ 0xBFE08000, 0x00000000, 0x3FE31BAA, 0xA7DCA843,
+ 0xBFE0FFFF, 0xFFFFFFA4, 0x3FE2CFD4, 0x0F8BDCE4,
+ 0xBFE17FFF, 0xFFFFFF0A, 0x3FE2852A, 0x760D5CE7,
+ 0xBFE20000, 0x00000000, 0x3FE23BA9, 0x30C1568B,
+ 0xBFE27FFF, 0xFFFFFFBB, 0x3FE1F34B, 0xA78D568D,
+ 0xBFE2FFFF, 0xFFFFFE32, 0x3FE1AC0D, 0x5492C1DB,
+ 0xBFE37FFF, 0xFFFFF042, 0x3FE165E9, 0xC3E67EF2,
+ 0xBFE3FFFF, 0xFFFFFF77, 0x3FE120DC, 0x93499431,
+ 0xBFE47FFF, 0xFFFFFF6B, 0x3FE0DCE1, 0x71E34ECE,
+ 0xBFE4FFFF, 0xFFFFFFF1, 0x3FE099F4, 0x1FFBE588,
+ 0xBFE57FFF, 0xFFFFFE02, 0x3FE05810, 0x6EB8A7AE,
+ 0xBFE5FFFF, 0xFFFFFFE5, 0x3FE01732, 0x3FD9002E,
+ 0xBFE67FFF, 0xFFFFFFB0, 0x3FDFAEAB, 0x0AE9386C,
+ 0xBFE6FFFF, 0xFFFFFFB2, 0x3FDF30EC, 0x837503D7,
+ 0xBFE77FFF, 0xFFFFFF7F, 0x3FDEB521, 0x0D627133,
+ 0xBFE7FFFF, 0xFFFFFFE8, 0x3FDE3B40, 0xEBEFCD95,
+ 0xBFE87FFF, 0xFFFFFFC8, 0x3FDDC344, 0x8110DAE2,
+ 0xBFE8FFFF, 0xFFFFFB30, 0x3FDD4D24, 0x4CF4EF06,
+ 0xBFE97FFF, 0xFFFFFFEF, 0x3FDCD8D8, 0xED8EE395,
+ 0xBFE9FFFF, 0xFFFFFFA7, 0x3FDC665B, 0x1E1F1E5C,
+ 0xBFEA7FFF, 0xFFFFFFDC, 0x3FDBF5A3, 0xB6BF18D6,
+ 0xBFEAFFFF, 0xFFFFFF95, 0x3FDB86AB, 0xABEEF93B,
+ 0xBFEB7FFF, 0xFFFFFFCB, 0x3FDB196C, 0x0E24D256,
+ 0xBFEBFFFF, 0xFFFFFF32, 0x3FDAADDE, 0x095DADF7,
+ 0xBFEC7FFF, 0xFFFFFF6A, 0x3FDA43FA, 0xE4B047C9,
+ 0xBFECFFFF, 0xFFFFFFB6, 0x3FD9DBBC, 0x01E182A4,
+ 0xBFED7FFF, 0xFFFFFFCA, 0x3FD9751A, 0xDCFA81EC,
+ 0xBFEDFFFF, 0xFFFFFFCD, 0x3FD91011, 0x0BE0699E,
+ 0xBFEE7FFF, 0xFFFFFFFB, 0x3FD8AC98, 0x3DEDBC69,
+ 0xBFEEFFFF, 0xFFFFFF88, 0x3FD84AAA, 0x3B8D51A9,
+ 0xBFEF7FFF, 0xFFFFFFBB, 0x3FD7EA40, 0xE5D6D92E,
+ 0xBFEFFFFF, 0xFFFFFFDB, 0x3FD78B56, 0x362CEF53,
+ 0xBFF03FFF, 0xFFFFFF00, 0x3FD72DE4, 0x3DDCB1F2,
+ 0xBFF07FFF, 0xFFFFFE6F, 0x3FD6D1E5, 0x25BED085,
+ 0xBFF0BFFF, 0xFFFFFFD6, 0x3FD67753, 0x2DDA1C57 } };
+
+#else
+#ifdef LITTLE_ENDI
+
+static const union {
+  int i[128];
+  double x[64];
+} TBL = { .i = {
+ 0x00000000, 0x3FF00000, 0x00000000, 0x00000000,
+ 0xD3158574, 0x3FF059B0, 0xA475B465, 0x3C8D73E2,
+ 0x6CF9890F, 0x3FF0B558, 0x4ADC610A, 0x3C98A62E,
+ 0xD0125B51, 0x3FF11301, 0x39449B3A, 0xBC96C510,
+ 0x3C7D517B, 0x3FF172B8, 0xB9D78A76, 0xBC819041,
+ 0x3168B9AA, 0x3FF1D487, 0x00A2643C, 0x3C9E016E,
+ 0x6E756238, 0x3FF2387A, 0xB6C70573, 0x3C99B07E,
+ 0xF51FDEE1, 0x3FF29E9D, 0xAFAD1255, 0x3C8612E8,
+ 0x0A31B715, 0x3FF306FE, 0xD23182E4, 0x3C86F46A,
+ 0x373AA9CB, 0x3FF371A7, 0xBF42EAE2, 0xBC963AEA,
+ 0x4C123422, 0x3FF3DEA6, 0x11F09EBC, 0x3C8ADA09,
+ 0x6061892D, 0x3FF44E08, 0x04EF80D0, 0x3C489B7A,
+ 0xD5362A27, 0x3FF4BFDA, 0xAFEC42E2, 0x3C7D4397,
+ 0x569D4F82, 0x3FF5342B, 0x1DB13CAC, 0xBC807ABE,
+ 0xDD485429, 0x3FF5AB07, 0x054647AD, 0x3C96324C,
+ 0xB03A5585, 0x3FF6247E, 0x7E40B497, 0xBC9383C1,
+ 0x667F3BCD, 0x3FF6A09E, 0x13B26456, 0xBC9BDD34,
+ 0xE8EC5F74, 0x3FF71F75, 0x86887A99, 0xBC816E47,
+ 0x73EB0187, 0x3FF7A114, 0xEE04992F, 0xBC841577,
+ 0x994CCE13, 0x3FF82589, 0xD41532D8, 0xBC9D4C1D,
+ 0x422AA0DB, 0x3FF8ACE5, 0x56864B27, 0x3C96E9F1,
+ 0xB0CDC5E5, 0x3FF93737, 0x81B57EBC, 0xBC675FC7,
+ 0x82A3F090, 0x3FF9C491, 0xB071F2BE, 0x3C7C7C46,
+ 0xB23E255D, 0x3FFA5503, 0xDB8D41E1, 0xBC9D2F6E,
+ 0x995AD3AD, 0x3FFAE89F, 0x345DCC81, 0x3C97A1CD,
+ 0xF2FB5E47, 0x3FFB7F76, 0x7E54AC3B, 0xBC75584F,
+ 0xDD85529C, 0x3FFC199B, 0x895048DD, 0x3C811065,
+ 0xDCEF9069, 0x3FFCB720, 0xD1E949DB, 0x3C7503CB,
+ 0xDCFBA487, 0x3FFD5818, 0xD75B3706, 0x3C82ED02,
+ 0x337B9B5F, 0x3FFDFC97, 0x4F184B5C, 0xBC91A5CD,
+ 0xA2A490DA, 0x3FFEA4AF, 0x179C2893, 0xBC9E9C23,
+ 0x5B6E4540, 0x3FFF5076, 0x2DD8A18B, 0x3C99D3E1 } };
+/*
+   For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+*/
+static const union {
+  int i[536];
+  double x[268];
+} TBL2 = { .i = {
+ 0xFFFFFC82, 0x3F8FFFFF, 0xAB55DE32, 0x3FF04080,
+ 0xFFFFFFDB, 0x3F9FFFFF, 0x601127EC, 0x3FF08205,
+ 0x000000A0, 0x3FA80000, 0x36829E91, 0x3FF0C492,
+ 0xFFFFFF79, 0x3FAFFFFF, 0x577D34E9, 0x3FF1082B,
+ 0xFFFFFFFC, 0x3FB3FFFF, 0xFC989CD6, 0x3FF14CD4,
+ 0x00000060, 0x3FB80000, 0x7074E0D4, 0x3FF19293,
+ 0x00000061, 0x3FBC0000, 0x0EFF0E80, 0x3FF1D96B,
+ 0xFFFFFFD6, 0x3FBFFFFF, 0x45B6F5CA, 0x3FF22160,
+ 0xFFFFFF58, 0x3FC1FFFF, 0x93F6014C, 0x3FF26A77,
+ 0xFFFFFF75, 0x3FC3FFFF, 0x8B372C65, 0x3FF2B4B5,
+ 0xFFFFFF00, 0x3FC5FFFF, 0xCF601AD1, 0x3FF3001E,
+ 0x00000020, 0x3FC80000, 0x170B583A, 0x3FF34CB8,
+ 0xFFFFA629, 0x3FC9FFFF, 0x2BD3B344, 0x3FF39A86,
+ 0x0000000F, 0x3FCC0000, 0xEAA11DCE, 0x3FF3E98D,
+ 0x0000007F, 0x3FCE0000, 0x43F5F16D, 0x3FF439D4,
+ 0x00000072, 0x3FD00000, 0x3C3E81AB, 0x3FF48B5E,
+ 0xFFFFFECA, 0x3FD0FFFF, 0xEC211DFB, 0x3FF4DE30,
+ 0xFFFFFF8F, 0x3FD1FFFF, 0x80CFACD2, 0x3FF53251,
+ 0x0000003B, 0x3FD30000, 0x3C5A7B04, 0x3FF587C5,
+ 0x00000034, 0x3FD40000, 0x76046007, 0x3FF5DE91,
+ 0xFFFFFF89, 0x3FD4FFFF, 0x9A98322F, 0x3FF636BB,
+ 0xFFFFFFE7, 0x3FD5FFFF, 0x2CBF942A, 0x3FF69049,
+ 0xFFFFFF78, 0x3FD6FFFF, 0xC55B1E45, 0x3FF6EB3F,
+ 0xFFFFFF65, 0x3FD7FFFF, 0x13DBEF32, 0x3FF747A5,
+ 0xFFFFFFD5, 0x3FD8FFFF, 0xDE9EA22E, 0x3FF7A57E,
+ 0xFFFFFF6E, 0x3FD9FFFF, 0x0347B50F, 0x3FF804D3,
+ 0xFFFFFFC3, 0x3FDAFFFF, 0x772164AE, 0x3FF865A7,
+ 0x00000053, 0x3FDC0000, 0x477B0030, 0x3FF8C802,
+ 0x0000004D, 0x3FDD0000, 0x9A09BF1E, 0x3FF92BE9,
+ 0x00000096, 0x3FDE0000, 0xAD4B1E08, 0x3FF99163,
+ 0xFFFFFEFA, 0x3FDEFFFF, 0xD8E8C4FC, 0x3FF9F876,
+ 0xFFFFFFD0, 0x3FDFFFFF, 0x8E1E0688, 0x3FFA6129,
+ 0x00000002, 0x3FE08000, 0x581EEE56, 0x3FFACB82,
+ 0x0000001F, 0x3FE10000, 0xDC80F979, 0x3FFB3787,
+ 0xFFFFFFF8, 0x3FE17FFF, 0xDBA56E4F, 0x3FFBA540,
+ 0xFFFFFFFA, 0x3FE1FFFF, 0x31256441, 0x3FFC14B4,
+ 0xFFFFFFC4, 0x3FE27FFF, 0xD43F7C9B, 0x3FFC85E8,
+ 0xFFFFFFFD, 0x3FE2FFFF, 0xD84758A6, 0x3FFCF8E5,
+ 0x0000001F, 0x3FE38000, 0x6D16CD84, 0x3FFD6DB2,
+ 0xFFFFFFD8, 0x3FE3FFFF, 0xDF80E39B, 0x3FFDE455,
+ 0x00000052, 0x3FE48000, 0x99C6A59C, 0x3FFE5CD7,
+ 0xFFFFFFC8, 0x3FE4FFFF, 0x240DC10C, 0x3FFED73F,
+ 0x00000013, 0x3FE58000, 0x24D90F71, 0x3FFF5394,
+ 0xFFFFFFBC, 0x3FE5FFFF, 0x6182F885, 0x3FFFD1DE,
+ 0x0000002D, 0x3FE68000, 0xDF5CE741, 0x40002912,
+ 0x00000040, 0x3FE70000, 0x207F0A2A, 0x40006A39,
+ 0x0000004F, 0x3FE78000, 0x0691652A, 0x4000AC66,
+ 0xFFFFFF6F, 0x3FE7FFFF, 0xB467DCAB, 0x4000EF9D,
+ 0xFFFFFFE5, 0x3FE87FFF, 0x5D82E943, 0x400133E4,
+ 0x00000035, 0x3FE90000, 0x4652CC6D, 0x4001793E,
+ 0xFFFFFFB3, 0x3FE97FFF, 0xC47BDA48, 0x4001BFAF,
+ 0x00000000, 0x3FEA0000, 0x3F1BD518, 0x4002073D,
+ 0x0000004A, 0x3FEA8000, 0x2F105CE2, 0x40024FEB,
+ 0xFFFFFFED, 0x3FEAFFFF, 0x1F3E7F11, 0x400299BE,
+ 0xFFFFFFFB, 0x3FEB7FFF, 0xACDB6611, 0x4002E4BA,
+ 0x0000001D, 0x3FEC0000, 0x87B62B39, 0x400330E5,
+ 0x00000079, 0x3FEC8000, 0x7282D538, 0x40037E43,
+ 0xFFFFFF51, 0x3FECFFFF, 0x43268248, 0x4003CCD9,
+ 0xFFFFFF74, 0x3FED7FFF, 0xE304CADC, 0x40041CAB,
+ 0x00000011, 0x3FEE0000, 0x4F4E5343, 0x40046DC0,
+ 0x0000001E, 0x3FEE8000, 0x9950A124, 0x4004C01B,
+ 0xFFFFFF9E, 0x3FEEFFFF, 0xE6C73196, 0x400513C2,
+ 0xFFFFFFED, 0x3FEF7FFF, 0x722DD586, 0x400568BB,
+ 0x00000034, 0x3FF00000, 0x8B1457B0, 0x4005BF0A,
+ 0xFFFFFFE2, 0x3FF03FFF, 0x967376DF, 0x400616B5,
+ 0xFFFFFF4B, 0x3FF07FFF, 0x0F0337A9, 0x40066FC2,
+ 0xFFFFFFFD, 0x3FF0BFFF, 0x859290F5, 0x4006CA35,
+ 0xFFFFFFE4, 0xBF8FFFFF, 0xABFEEFA5, 0x3FEF80FE,
+ 0xFFFFFB0B, 0xBF9FFFFF, 0x6A88B5FE, 0x3FEF03F5,
+ 0xFFFFFFA7, 0xBFA7FFFF, 0x6AFECFC5, 0x3FEE88DC,
+ 0xFFFFFEA8, 0xBFAFFFFF, 0xFBC702B8, 0x3FEE0FAB,
+ 0xFFFFFFB3, 0xBFB3FFFF, 0x89D041AC, 0x3FED985C,
+ 0xFFFFFFE3, 0xBFB7FFFF, 0xA0197C06, 0x3FED22E6,
+ 0xFFFFFF9A, 0xBFBBFFFF, 0xE73A4C89, 0x3FECAF42,
+ 0xFFFFFF98, 0xBFBFFFFF, 0x24ED822D, 0x3FEC3D6A,
+ 0xFFFFFFE9, 0xBFC1FFFF, 0x3B9D7B67, 0x3FEBCD55,
+ 0xFFFFFFE0, 0xBFC3FFFF, 0x29F24C2D, 0x3FEB5EFD,
+ 0xFFFFF553, 0xBFC5FFFF, 0x0A61A9F4, 0x3FEAF25B,
+ 0xFFFFFF8B, 0xBFC7FFFF, 0x12C08794, 0x3FEA8768,
+ 0xFFFFFE51, 0xBFC9FFFF, 0x93D68828, 0x3FEA1E1D,
+ 0xFFFFFF6E, 0xBFCBFFFF, 0xF8F2F3F5, 0x3FE9B674,
+ 0xFFFFFF7F, 0xBFCDFFFF, 0xC7837A0C, 0x3FE95067,
+ 0xFFFFFF7A, 0xBFCFFFFF, 0x9EAC8225, 0x3FE8EBEF,
+ 0xFFFFFFFE, 0xBFD0FFFF, 0x36E31F55, 0x3FE88906,
+ 0xFFFFFF41, 0xBFD1FFFF, 0x6188975E, 0x3FE827A5,
+ 0xFFFFFFBA, 0xBFD2FFFF, 0x08877656, 0x3FE7C7C7,
+ 0xFFFFFFF8, 0xBFD3FFFF, 0x2DF22F81, 0x3FE76965,
+ 0xFFFFFF90, 0xBFD4FFFF, 0xEBA33C2F, 0x3FE70C79,
+ 0xFFFFFFDB, 0xBFD5FFFF, 0x72DEB8AA, 0x3FE6B0FF,
+ 0xFFFFFF9A, 0xBFD6FFFF, 0x0BF5798E, 0x3FE656F0,
+ 0xFFFFFF9F, 0xBFD7FFFF, 0x15E98EB0, 0x3FE5FE46,
+ 0xFFFFFFEE, 0xBFD8FFFF, 0x061433CE, 0x3FE5A6FC,
+ 0xFFFFFC4A, 0xBFD9FFFF, 0x67CD26CD, 0x3FE5510C,
+ 0xFFFFFF30, 0xBFDAFFFF, 0xDC13566B, 0x3FE4FC71,
+ 0xFFFFFFF0, 0xBFDBFFFF, 0x1936FD0E, 0x3FE4A927,
+ 0xFFFFFFF3, 0xBFDCFFFF, 0xEA84FB8C, 0x3FE45726,
+ 0xFFFFFFF3, 0xBFDDFFFF, 0x2FF3912B, 0x3FE4066C,
+ 0xFFFFFF80, 0xBFDEFFFF, 0xDDD05AB9, 0x3FE3B6F1,
+ 0xFFFFFFDF, 0xBFDFFFFF, 0xFC6F9614, 0x3FE368B2,
+ 0x00000000, 0xBFE08000, 0xA7DCA843, 0x3FE31BAA,
+ 0xFFFFFFA4, 0xBFE0FFFF, 0x0F8BDCE4, 0x3FE2CFD4,
+ 0xFFFFFF0A, 0xBFE17FFF, 0x760D5CE7, 0x3FE2852A,
+ 0x00000000, 0xBFE20000, 0x30C1568B, 0x3FE23BA9,
+ 0xFFFFFFBB, 0xBFE27FFF, 0xA78D568D, 0x3FE1F34B,
+ 0xFFFFFE32, 0xBFE2FFFF, 0x5492C1DB, 0x3FE1AC0D,
+ 0xFFFFF042, 0xBFE37FFF, 0xC3E67EF2, 0x3FE165E9,
+ 0xFFFFFF77, 0xBFE3FFFF, 0x93499431, 0x3FE120DC,
+ 0xFFFFFF6B, 0xBFE47FFF, 0x71E34ECE, 0x3FE0DCE1,
+ 0xFFFFFFF1, 0xBFE4FFFF, 0x1FFBE588, 0x3FE099F4,
+ 0xFFFFFE02, 0xBFE57FFF, 0x6EB8A7AE, 0x3FE05810,
+ 0xFFFFFFE5, 0xBFE5FFFF, 0x3FD9002E, 0x3FE01732,
+ 0xFFFFFFB0, 0xBFE67FFF, 0x0AE9386C, 0x3FDFAEAB,
+ 0xFFFFFFB2, 0xBFE6FFFF, 0x837503D7, 0x3FDF30EC,
+ 0xFFFFFF7F, 0xBFE77FFF, 0x0D627133, 0x3FDEB521,
+ 0xFFFFFFE8, 0xBFE7FFFF, 0xEBEFCD95, 0x3FDE3B40,
+ 0xFFFFFFC8, 0xBFE87FFF, 0x8110DAE2, 0x3FDDC344,
+ 0xFFFFFB30, 0xBFE8FFFF, 0x4CF4EF06, 0x3FDD4D24,
+ 0xFFFFFFEF, 0xBFE97FFF, 0xED8EE395, 0x3FDCD8D8,
+ 0xFFFFFFA7, 0xBFE9FFFF, 0x1E1F1E5C, 0x3FDC665B,
+ 0xFFFFFFDC, 0xBFEA7FFF, 0xB6BF18D6, 0x3FDBF5A3,
+ 0xFFFFFF95, 0xBFEAFFFF, 0xABEEF93B, 0x3FDB86AB,
+ 0xFFFFFFCB, 0xBFEB7FFF, 0x0E24D256, 0x3FDB196C,
+ 0xFFFFFF32, 0xBFEBFFFF, 0x095DADF7, 0x3FDAADDE,
+ 0xFFFFFF6A, 0xBFEC7FFF, 0xE4B047C9, 0x3FDA43FA,
+ 0xFFFFFFB6, 0xBFECFFFF, 0x01E182A4, 0x3FD9DBBC,
+ 0xFFFFFFCA, 0xBFED7FFF, 0xDCFA81EC, 0x3FD9751A,
+ 0xFFFFFFCD, 0xBFEDFFFF, 0x0BE0699E, 0x3FD91011,
+ 0xFFFFFFFB, 0xBFEE7FFF, 0x3DEDBC69, 0x3FD8AC98,
+ 0xFFFFFF88, 0xBFEEFFFF, 0x3B8D51A9, 0x3FD84AAA,
+ 0xFFFFFFBB, 0xBFEF7FFF, 0xE5D6D92E, 0x3FD7EA40,
+ 0xFFFFFFDB, 0xBFEFFFFF, 0x362CEF53, 0x3FD78B56,
+ 0xFFFFFF00, 0xBFF03FFF, 0x3DDCB1F2, 0x3FD72DE4,
+ 0xFFFFFE6F, 0xBFF07FFF, 0x25BED085, 0x3FD6D1E5,
+ 0xFFFFFFD6, 0xBFF0BFFF, 0x2DDA1C57, 0x3FD67753 } };
+
+#endif
+#endif
+
+
+#ifdef BIG_ENDI
+
+static const mynumber 
+/* Following three values used to scale x to primary range */
+  invln2_32 = {{0x40471547, 0x652b82fe}}, /* 4.61662413084468283841e+01 */
+  ln2_32hi =  {{0x3f962e42, 0xfee00000}}, /* 2.16608493865351192653e-02 */
+  ln2_32lo =  {{0x3d9a39ef, 0x35793c76}}, /* 5.96317165397058656257e-12 */
+/* t2-t5 terms used for polynomial computation */
+  t2 = {{0x3fc55555, 0x55548f7c}}, /* 1.6666666666526086527e-1 */
+  t3 = {{0x3fa55555, 0x55545d4e}}, /* 4.1666666666226079285e-2 */
+  t4 = {{0x3f811115, 0xb7aa905e}}, /* 8.3333679843421958056e-3 */
+  t5 = {{0x3f56c172, 0x8d739765}}, /* 1.3888949086377719040e-3 */
+/* maximum value for x to not overflow */
+  threshold1 = {{0x40862E42, 0xFEFA39EF}}, /* 7.09782712893383973096e+02 */
+/* maximum value for -x to not underflow */
+  threshold2 = {{0x40874910, 0xD52D3051}}, /* 7.45133219101941108420e+02 */
+/* scaling factor used when result near zero*/
+  twom54 = {{0x3c900000, 0x00000000}}; /* 5.55111512312578270212e-17 */
+
+#else
+#ifdef LITTLE_ENDI
+
+static const mynumber 
+/* Following three values used to scale x to primary range */
+  invln2_32 = {{0x652b82fe, 0x40471547}}, /* 4.61662413084468283841e+01 */
+  ln2_32hi =  {{0xfee00000, 0x3f962e42}}, /* 2.16608493865351192653e-02 */
+  ln2_32lo =  {{0x35793c76, 0x3d9a39ef}}, /* 5.96317165397058656257e-12 */
+/* t2-t5 terms used for polynomial computation */
+  t2 = {{0x55548f7c, 0x3fc55555}}, /* 1.6666666666526086527e-1 */
+  t3 = {{0x55545d4e, 0x3fa55555}}, /* 4.1666666666226079285e-2 */
+  t4 = {{0xb7aa905e, 0x3f811115}}, /* 8.3333679843421958056e-3 */
+  t5 = {{0x8d739765, 0x3f56c172}}, /* 1.3888949086377719040e-3 */
+/* maximum value for x to not overflow */
+  threshold1 = {{0xFEFA39EF, 0x40862E42}}, /* 7.09782712893383973096e+02 */
+/* maximum value for -x to not underflow */
+  threshold2 = {{0xD52D3051, 0x40874910}}, /* 7.45133219101941108420e+02 */
+/* scaling factor used when result near zero*/
+  twom54 = {{0x00000000, 0x3c900000}}; /* 5.55111512312578270212e-17 */
+
+#endif
+#endif
diff --git a/sysdeps/ieee754/dbl-64/slowexp.c b/sysdeps/ieee754/dbl-64/slowexp.c
deleted file mode 100644
index e8fa2e2..0000000
--- a/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/*  MODULE_NAME:slowexp.c                                                 */
-/*                                                                        */
-/*  FUNCTION:slowexp                                                      */
-/*                                                                        */
-/*  FILES NEEDED:mpa.h                                                    */
-/*               mpa.c mpexp.c                                            */
-/*                                                                        */
-/*Converting from double precision to Multi-precision and calculating     */
-/* e^x                                                                    */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating  e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
-  double w, z, res, eps = 3.0e-26;
-  int p;
-  mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
-  /* Use the multiple precision __MPEXP function to compute the exponential
-     First at 144 bits and if it is not accurate enough, at 768 bits.  */
-  p = 6;
-  __dbl_mp (x, &mpx, p);
-  __mpexp (&mpx, &mpy, p);
-  __dbl_mp (eps, &mpeps, p);
-  __mul (&mpeps, &mpy, &mpcor, p);
-  __add (&mpy, &mpcor, &mpw, p);
-  __sub (&mpy, &mpcor, &mpz, p);
-  __mp_dbl (&mpw, &w, p);
-  __mp_dbl (&mpz, &z, p);
-  if (w == z)
-    {
-      /* Track how often we get to the slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p6, 2, &x, &w);
-      return w;
-    }
-  else
-    {
-      p = 32;
-      __dbl_mp (x, &mpx, p);
-      __mpexp (&mpx, &mpy, p);
-      __mp_dbl (&mpy, &res, p);
-
-      /* Track how often we get to the uber-slow exp code plus
-	 its input/output values.  */
-      LIBC_PROBE (slowexp_p32, 2, &x, &res);
-      return res;
-    }
-#else
-  return (double) __ieee754_expl((long double)x);
-#endif
-}
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
index e17d32f..ded9976 100644
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ b/sysdeps/powerpc/power4/fpu/Makefile
@@ -3,5 +3,4 @@
 ifeq ($(subdir),math)
 CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
 CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index d660552..4051232 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -10,7 +10,7 @@ libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \
 
 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \
 			e_asin-fma e_atan2-fma s_sin-fma s_tan-fma \
-			mplog-fma mpa-fma slowexp-fma slowpow-fma \
+			mplog-fma mpa-fma slowpow-fma \
 			sincos32-fma doasin-fma dosincos-fma \
 			halfulp-fma mpexp-fma \
 			mpatan2-fma mpatan-fma mpsqrt-fma mptan-fma
@@ -32,7 +32,6 @@ CFLAGS-mpsqrt-fma.c = -mfma -mavx2
 CFLAGS-mptan-fma.c = -mfma -mavx2
 CFLAGS-s_atan-fma.c = -mfma -mavx2
 CFLAGS-sincos32-fma.c = -mfma -mavx2
-CFLAGS-slowexp-fma.c = -mfma -mavx2
 CFLAGS-slowpow-fma.c = -mfma -mavx2
 CFLAGS-s_sin-fma.c = -mfma -mavx2
 CFLAGS-s_tan-fma.c = -mfma -mavx2
@@ -42,7 +41,7 @@ libm-sysdep_routines += e_expf-fma
 
 libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
 			e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
-			mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+			mplog-fma4 mpa-fma4 slowpow-fma4 \
 			sincos32-fma4 doasin-fma4 dosincos-fma4 \
 			halfulp-fma4 mpexp-fma4 \
 			mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
@@ -64,14 +63,13 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
 CFLAGS-mptan-fma4.c = -mfma4
 CFLAGS-s_atan-fma4.c = -mfma4
 CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
 CFLAGS-slowpow-fma4.c = -mfma4
 CFLAGS-s_sin-fma4.c = -mfma4
 CFLAGS-s_tan-fma4.c = -mfma4
 
 libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
 			e_atan2-avx s_sin-avx s_tan-avx \
-			mplog-avx mpa-avx slowexp-avx \
+			mplog-avx mpa-avx \
 			mpexp-avx
 
 CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -82,7 +80,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
 CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
 endif
 
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
index ee5dd6d..afd9174 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_avx
 #define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
 #define SECTION __attribute__ ((section (".text.avx")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
index 6e0fdb7..765b1b9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma
 #define __exp1 __exp1_fma
-#define __slowexp __slowexp_fma
 #define SECTION __attribute__ ((section (".text.fma")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
index ae6eb67..9ac7aca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
 #define __ieee754_exp __ieee754_exp_fma4
 #define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
 #define SECTION __attribute__ ((section (".text.fma4")))
 
 #include <sysdeps/ieee754/dbl-64/e_exp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c b/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
deleted file mode 100644
index d01c6d7..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
deleted file mode 100644
index 6fffca1..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma
-#define __add __add_fma
-#define __dbl_mp __dbl_mp_fma
-#define __mpexp __mpexp_fma
-#define __mul __mul_fma
-#define __sub __sub_fma
-#define SECTION __attribute__ ((section (".text.fma")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
diff --git a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c b/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
deleted file mode 100644
index 3bcde84..0000000
--- a/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-26 16:44 Patrick McGehearty
@ 2017-10-26 17:20 ` Joseph Myers
  2017-10-26 17:25   ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2017-10-26 17:20 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 26 Oct 2017, Patrick McGehearty wrote:

> Replaced tables of double float constants with hex constants, taking special
>   attention to correctly handle little endian and big endian versions.
>   Using hex initialization also required changing variables to be declared
>   as unions.  Tables moved from e_exp.c to sysdeps/ieee754/dbl-64/eexp.tbl.

There should be no endian-dependent constants or tables of constants.  
See my comments in the commit message for commit 
60f435bb0c097ead2d4609aa7e45a203eb24e43c where I cleaned up some such 
existing endian-dependent definitions.

Unless a particular constant, table etc. is needed in the code both as 
integers and as double, just define it as double and initialize with a C99 
hex float constant, without involving unions at all.  This certainly 
applies to some of your constants, possibly all of them.

If both the double and int parts of a union are actually referenced (and 
while the code certainly references both parts for *variables*, it seems 
less likely the integer parts are of use for *constants*), use C99 
designated initializers to initialize the double part of the union, again 
using a C99 hex float constant.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-26 17:20 ` Joseph Myers
@ 2017-10-26 17:25   ` Joseph Myers
  2017-10-26 18:30     ` Patrick McGehearty
  0 siblings, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2017-10-26 17:25 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 26 Oct 2017, Joseph Myers wrote:

> Unless a particular constant, table etc. is needed in the code both as 
> integers and as double, just define it as double and initialize with a C99 
> hex float constant, without involving unions at all.  This certainly 
> applies to some of your constants, possibly all of them.

To be clear: all my past and present comments about hex floats always mean 
C99 0x1.2p3 and similar constants.  Never hex integer values with a union 
as in this patch.  You should never, anywhere in glibc, have any occasion 
to initialize a floating-point constant via specifying the integer values 
of its representation (except in testcases for special ldbl-96 and 
ldbl-128ibm representations).  Proper hex floating-point constants are 
always better.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-26 17:25   ` Joseph Myers
@ 2017-10-26 18:30     ` Patrick McGehearty
  2017-10-26 19:44       ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-26 18:30 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On 10/26/2017 12:25 PM, Joseph Myers wrote:
> On Thu, 26 Oct 2017, Joseph Myers wrote:
>
>> Unless a particular constant, table etc. is needed in the code both as
>> integers and as double, just define it as double and initialize with a C99
>> hex float constant, without involving unions at all.  This certainly
>> applies to some of your constants, possibly all of them.
> To be clear: all my past and present comments about hex floats always mean
> C99 0x1.2p3 and similar constants.  Never hex integer values with a union
> as in this patch.  You should never, anywhere in glibc, have any occasion
> to initialize a floating-point constant via specifying the integer values
> of its representation (except in testcases for special ldbl-96 and
> ldbl-128ibm representations).  Proper hex floating-point constants are
> always better.
>
I was following the hex pattern used in sysdeps/ieee754/dbl-64/uexp.tbl

To be sure I'm understanding your comment correctly, you are
recommending I change from the union form of the hex constants
to 0x1.23p3 form of the constants and resubmit the patch.
That would reduce the chance of errors in any future updates
of constant values.
Is that correct?

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-26 18:30     ` Patrick McGehearty
@ 2017-10-26 19:44       ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-26 19:44 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Thu, 26 Oct 2017, Patrick McGehearty wrote:

> On 10/26/2017 12:25 PM, Joseph Myers wrote:
> > On Thu, 26 Oct 2017, Joseph Myers wrote:
> > 
> > > Unless a particular constant, table etc. is needed in the code both as
> > > integers and as double, just define it as double and initialize with a C99
> > > hex float constant, without involving unions at all.  This certainly
> > > applies to some of your constants, possibly all of them.
> > To be clear: all my past and present comments about hex floats always mean
> > C99 0x1.2p3 and similar constants.  Never hex integer values with a union
> > as in this patch.  You should never, anywhere in glibc, have any occasion
> > to initialize a floating-point constant via specifying the integer values
> > of its representation (except in testcases for special ldbl-96 and
> > ldbl-128ibm representations).  Proper hex floating-point constants are
> > always better.
> > 
> I was following the hex pattern used in sysdeps/ieee754/dbl-64/uexp.tbl

That's an obsolescent pattern, not to be used in new code; it just so 
happens that file hasn't yet been cleaned up to define double constants 
directly with C99 hex floats.

> To be sure I'm understanding your comment correctly, you are
> recommending I change from the union form of the hex constants
> to 0x1.23p3 form of the constants and resubmit the patch.

Yes.  For anything intending a particular, pre-computed floating-point 
value to be used, represent that value as a C99 hex float constant as the 
preferred way in new code of making the intended constant unambiguous.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-10-20 13:38 Wilco Dijkstra
  2017-10-20 14:58 ` Patrick McGehearty
  0 siblings, 1 reply; 44+ messages in thread
From: Wilco Dijkstra @ 2017-10-20 13:38 UTC (permalink / raw)
  To: patrick.mcgehearty; +Cc: nd, Joseph Myers, libc-alpha

Patrick wrote:

        +      fe_val = __fegetround();

> Failing to use __fegetround(),__fesetround() causes over 40 math test
> accuracy failures for other rounding modes in exp, cexp, cpow, cosh,
> ccos, ccosh, csin, and csinh.  If the glibc/Linux community were to
> declare that the only rounding mode that was fully supported was
> FE_TONEAREST, we could simplify/speed up a lot of code. :-)

"Failure" is too unspecific. What is the difference in ULP? If it is from 
say 2 to 20 then yes that's an issue. If it is from 2 to 3 in a non-nearest
rounding mode then that's perfectly acceptable.

Anyway, never use __fegetround/__fesetround. Use get_rounding_mode
which is inlined on all targets and libc_fesetround which is inlined on most
targets.

Wilco

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-20 13:38 Wilco Dijkstra
@ 2017-10-20 14:58 ` Patrick McGehearty
  0 siblings, 0 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-20 14:58 UTC (permalink / raw)
  To: libc-alpha

On 10/20/2017 8:38 AM, Wilco Dijkstra wrote:
> Patrick wrote:
>
>          +      fe_val = __fegetround();
>
>> Failing to use __fegetround(),__fesetround() causes over 40 math test
>> accuracy failures for other rounding modes in exp, cexp, cpow, cosh,
>> ccos, ccosh, csin, and csinh.  If the glibc/Linux community were to
>> declare that the only rounding mode that was fully supported was
>> FE_TONEAREST, we could simplify/speed up a lot of code. :-)
> "Failure" is too unspecific. What is the difference in ULP? If it is from
> say 2 to 20 then yes that's an issue. If it is from 2 to 3 in a non-nearest
> rounding mode then that's perfectly acceptable.
>
> Anyway, never use __fegetround/__fesetround. Use get_rounding_mode
> which is inlined on all targets and libc_fesetround which is inlined on most
> targets.
>
> Wilco
>
>
The ULP diffs were mostly in the 2 to 3 range for the complex functions.
There no gross errors.

I appreciate the pointer to get_rounding_mode and libc_fesetround.
I was not previously aware of those functions.
I will give those a try today.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
@ 2017-10-16 16:56 Patrick McGehearty
  2017-10-18 17:22 ` Joseph Myers
  2017-10-23 12:25 ` Siddhesh Poyarekar
  0 siblings, 2 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-16 16:56 UTC (permalink / raw)
  To: libc-alpha

modified file:   sysdeps/ieee754/dbl-64/e_exp.c

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for the ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf which call __ieee754_exp() directly or indirectly.

Typical performance gains as measured on Sparc s7 testing common
values between exp(1) and exp(40) is typically around 5x.

Using the glibc perf tests on sparc and x86_64,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   400    5173     802
min     399    64      15      15
mean   5317   211    1349      29

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run on sparc and x86_64.
The results match on both platforms. Within the test suite,
3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.
---
 sysdeps/ieee754/dbl-64/e_exp.c |  618 +++++++++++++++++++++++++++-------------
 1 files changed, 416 insertions(+), 202 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 6757a14..555a47f 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,238 +1,452 @@
+/* EXP function - Compute double precision exponential
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
 /*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
+   exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/32)*(ln2) + r,  |r| <= (1/64)*ln2
+
+	2. exp(x) = 2^k * (2^(j/32) + 2^(j/32)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/32) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow
  */
-/***************************************************************************/
-/*  MODULE_NAME:uexp.c                                                     */
-/*                                                                         */
-/*  FUNCTION:uexp                                                          */
-/*           exp1                                                          */
-/*                                                                         */
-/* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h                       */
-/*              mpa.c mpexp.x slowexp.c                                    */
-/*                                                                         */
-/* An ultimate exp routine. Given an IEEE double machine number x          */
-/* it computes the correctly rounded (to nearest) value of e^x             */
-/* Assumption: Machine arithmetic operations are performed in              */
-/* round to nearest mode of IEEE 754 standard.                             */
-/*                                                                         */
-/***************************************************************************/
 
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
-#ifndef SECTION
-# define SECTION
-#endif
+extern double __ieee754_exp (double);
+
+
+static const double TBL[] = {
+  1.00000000000000000000e+00, 0.00000000000000000000e+00,
+  1.02189714865411662714e+00, 5.10922502897344389359e-17,
+  1.04427378242741375480e+00, 8.55188970553796365958e-17,
+  1.06714040067682369717e+00, -7.89985396684158212226e-17,
+  1.09050773266525768967e+00, -3.04678207981247114697e-17,
+  1.11438674259589243221e+00, 1.04102784568455709549e-16,
+  1.13878863475669156458e+00, 8.91281267602540777782e-17,
+  1.16372485877757747552e+00, 3.82920483692409349872e-17,
+  1.18920711500272102690e+00, 3.98201523146564611098e-17,
+  1.21524735998046895524e+00, -7.71263069268148813091e-17,
+  1.24185781207348400201e+00, 4.65802759183693679123e-17,
+  1.26905095719173321989e+00, 2.66793213134218609523e-18,
+  1.29683955465100964055e+00, 2.53825027948883149593e-17,
+  1.32523664315974132322e+00, -2.85873121003886075697e-17,
+  1.35425554693689265129e+00, 7.70094837980298946162e-17,
+  1.38390988196383202258e+00, -6.77051165879478628716e-17,
+  1.41421356237309514547e+00, -9.66729331345291345105e-17,
+  1.44518080697704665027e+00, -3.02375813499398731940e-17,
+  1.47682614593949934623e+00, -3.48399455689279579579e-17,
+  1.50916442759342284141e+00, -1.01645532775429503911e-16,
+  1.54221082540794074411e+00, 7.94983480969762085616e-17,
+  1.57598084510788649659e+00, -1.01369164712783039808e-17,
+  1.61049033194925428347e+00, 2.47071925697978878522e-17,
+  1.64575547815396494578e+00, -1.01256799136747726038e-16,
+  1.68179283050742900407e+00, 8.19901002058149652013e-17,
+  1.71861929812247793414e+00, -1.85138041826311098821e-17,
+  1.75625216037329945351e+00, 2.96014069544887330703e-17,
+  1.79470907500310716820e+00, 1.82274584279120867698e-17,
+  1.83400808640934243066e+00, 3.28310722424562658722e-17,
+  1.87416763411029996256e+00, -6.12276341300414256164e-17,
+  1.91520656139714740007e+00, -1.06199460561959626376e-16,
+  1.95714412417540017941e+00, 8.96076779103666776760e-17,
+};
+
+/*
+   For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+*/
+static const double TBL2[] = {
+  1.56249999999984491572e-02, 1.01574770858668417262e+00,
+  3.12499999999998716305e-02, 1.03174340749910253834e+00,
+  4.68750000000011102230e-02, 1.04799100201663386578e+00,
+  6.24999999999990632493e-02, 1.06449445891785843266e+00,
+  7.81249999999999444888e-02, 1.08125780744903954300e+00,
+  9.37500000000013322676e-02, 1.09828514030782731226e+00,
+  1.09375000000001346145e-01, 1.11558061464248226002e+00,
+  1.24999999999999417133e-01, 1.13314845306682565607e+00,
+  1.40624999999995337063e-01, 1.15099294469117108264e+00,
+  1.56249999999996141975e-01, 1.16911844616949989195e+00,
+  1.71874999999992894573e-01, 1.18752938276309216725e+00,
+  1.87500000000000888178e-01, 1.20623024942098178158e+00,
+  2.03124999999361649516e-01, 1.22522561187652545556e+00,
+  2.18750000000000416334e-01, 1.24452010776609567344e+00,
+  2.34375000000003524958e-01, 1.26411844775347081971e+00,
+  2.50000000000006328271e-01, 1.28402541668774961003e+00,
+  2.65624999999982791543e-01, 1.30424587476761533189e+00,
+  2.81249999999993727240e-01, 1.32478475872885725906e+00,
+  2.96875000000003275158e-01, 1.34564708304941493822e+00,
+  3.12500000000002886580e-01, 1.36683794117380030819e+00,
+  3.28124999999993394173e-01, 1.38836250675661765364e+00,
+  3.43749999999998612221e-01, 1.41022603492570874906e+00,
+  3.59374999999992450483e-01, 1.43243386356506730017e+00,
+  3.74999999999991395772e-01, 1.45499141461818881638e+00,
+  3.90624999999997613020e-01, 1.47790419541173490003e+00,
+  4.06249999999991895372e-01, 1.50117780000011058483e+00,
+  4.21874999999996613820e-01, 1.52481791053132154090e+00,
+  4.37500000000004607426e-01, 1.54883029863414023453e+00,
+  4.53125000000004274359e-01, 1.57322082682725961078e+00,
+  4.68750000000008326673e-01, 1.59799544995064657371e+00,
+  4.84374999999985456078e-01, 1.62316021661928200359e+00,
+  4.99999999999997335465e-01, 1.64872127070012375327e+00,
+  5.15625000000000222045e-01, 1.67468485281178436352e+00,
+  5.31250000000003441691e-01, 1.70105730184840653330e+00,
+  5.46874999999999111822e-01, 1.72784505652716169344e+00,
+  5.62499999999999333866e-01, 1.75505465696029738787e+00,
+  5.78124999999993338662e-01, 1.78269274625180318417e+00,
+  5.93749999999999666933e-01, 1.81076607211938656050e+00,
+  6.09375000000003441691e-01, 1.83928148854178719063e+00,
+  6.24999999999995559108e-01, 1.86824595743221411048e+00,
+  6.40625000000009103829e-01, 1.89766655033813602671e+00,
+  6.56249999999993782751e-01, 1.92755045016753268072e+00,
+  6.71875000000002109424e-01, 1.95790495294292221651e+00,
+  6.87499999999992450483e-01, 1.98873746958227681780e+00,
+  7.03125000000004996004e-01, 2.02005552770870666635e+00,
+  7.18750000000007105427e-01, 2.05186677348799140219e+00,
+  7.34375000000008770762e-01, 2.08417897349558689513e+00,
+  7.49999999999983901766e-01, 2.11700001661264058939e+00,
+  7.65624999999997002398e-01, 2.15033791595229351046e+00,
+  7.81250000000005884182e-01, 2.18420081081563077774e+00,
+  7.96874999999991451283e-01, 2.21859696867912603579e+00,
+  8.12500000000000000000e-01, 2.25353478721320854561e+00,
+  8.28125000000008215650e-01, 2.28902279633221983346e+00,
+  8.43749999999997890576e-01, 2.32506966027711614586e+00,
+  8.59374999999999444888e-01, 2.36168417973090827289e+00,
+  8.75000000000003219647e-01, 2.39887529396710563745e+00,
+  8.90625000000013433699e-01, 2.43665208303232461162e+00,
+  9.06249999999980571097e-01, 2.47502376996297712708e+00,
+  9.21874999999984456878e-01, 2.51399972303748420188e+00,
+  9.37500000000001887379e-01, 2.55358945806293169412e+00,
+  9.53125000000003330669e-01, 2.59380264069854327147e+00,
+  9.68749999999989119814e-01, 2.63464908881560244680e+00,
+  9.84374999999997890576e-01, 2.67613877489447116176e+00,
+  1.00000000000001154632e+00, 2.71828182845907662113e+00,
+  1.01562499999999333866e+00, 2.76108853855008318234e+00,
+  1.03124999999995980993e+00, 2.80456935623711389738e+00,
+  1.04687499999999933387e+00, 2.84873489717039740654e+00,
+  -1.56249999999999514277e-02, 9.84496437005408453480e-01,
+  -3.12499999999955972718e-02, 9.69233234476348348707e-01,
+  -4.68749999999993824384e-02, 9.54206665969188905230e-01,
+  -6.24999999999976130205e-02, 9.39413062813478028090e-01,
+  -7.81249999999989314103e-02, 9.24848813216205822840e-01,
+  -9.37499999999995975442e-02, 9.10510361380034494161e-01,
+  -1.09374999999998584466e-01, 8.96394206635151680196e-01,
+  -1.24999999999998556710e-01, 8.82496902584596676355e-01,
+  -1.40624999999999361622e-01, 8.68815056262843721235e-01,
+  -1.56249999999999111822e-01, 8.55345327307423297647e-01,
+  -1.71874999999924144012e-01, 8.42084427143446223596e-01,
+  -1.87499999999996752598e-01, 8.29029118180403035154e-01,
+  -2.03124999999988037347e-01, 8.16176213022349550386e-01,
+  -2.18749999999995947686e-01, 8.03522573689063990265e-01,
+  -2.34374999999996419531e-01, 7.91065110850298847112e-01,
+  -2.49999999999996280753e-01, 7.78800783071407765057e-01,
+  -2.65624999999999888978e-01, 7.66726596070820165529e-01,
+  -2.81249999999989397370e-01, 7.54839601989015340777e-01,
+  -2.96874999999996114219e-01, 7.43136898668761203268e-01,
+  -3.12499999999999555911e-01, 7.31615628946642115871e-01,
+  -3.28124999999993782751e-01, 7.20272979955444259126e-01,
+  -3.43749999999997946087e-01, 7.09106182437399867879e-01,
+  -3.59374999999994337863e-01, 6.98112510068129799023e-01,
+  -3.74999999999994615418e-01, 6.87289278790975899369e-01,
+  -3.90624999999999000799e-01, 6.76633846161729612945e-01,
+  -4.06249999999947264406e-01, 6.66143610703522903727e-01,
+  -4.21874999999988453681e-01, 6.55816011271509125002e-01,
+  -4.37499999999999111822e-01, 6.45648526427892610613e-01,
+  -4.53124999999999278355e-01, 6.35638673826052436056e-01,
+  -4.68749999999999278355e-01, 6.25784009604591573428e-01,
+  -4.84374999999992894573e-01, 6.16082127790682609891e-01,
+  -4.99999999999998168132e-01, 6.06530659712634534486e-01,
+  -5.15625000000000000000e-01, 5.97127273421627413619e-01,
+  -5.31249999999989785948e-01, 5.87869673122352498496e-01,
+  -5.46874999999972688514e-01, 5.78755598612500032907e-01,
+  -5.62500000000000000000e-01, 5.69782824730923009859e-01,
+  -5.78124999999992339461e-01, 5.60949160814475100700e-01,
+  -5.93749999999948707696e-01, 5.52252450163048691500e-01,
+  -6.09374999999552580121e-01, 5.43690569513243682209e-01,
+  -6.24999999999984789945e-01, 5.35261428518998383375e-01,
+  -6.40624999999983457677e-01, 5.26962969243379708573e-01,
+  -6.56249999999998334665e-01, 5.18793165653890220312e-01,
+  -6.71874999999943378626e-01, 5.10750023129039609771e-01,
+  -6.87499999999997002398e-01, 5.02831577970942467104e-01,
+  -7.03124999999991118216e-01, 4.95035896926202978463e-01,
+  -7.18749999999991340260e-01, 4.87361076713623331269e-01,
+  -7.34374999999985678123e-01, 4.79805243559684402310e-01,
+  -7.49999999999997335465e-01, 4.72366552741015965911e-01,
+  -7.65624999999993782751e-01, 4.65043188134059204408e-01,
+  -7.81249999999863220523e-01, 4.57833361771676883301e-01,
+  -7.96874999999998112621e-01, 4.50735313406363247157e-01,
+  -8.12499999999990119015e-01, 4.43747310081084256339e-01,
+  -8.28124999999996003197e-01, 4.36867645705559026759e-01,
+  -8.43749999999988120614e-01, 4.30094640640067360504e-01,
+  -8.59374999999994115818e-01, 4.23426641285265303871e-01,
+  -8.74999999999977129406e-01, 4.16862019678517936594e-01,
+  -8.90624999999983346655e-01, 4.10399173096376801428e-01,
+  -9.06249999999991784350e-01, 4.04036523663345414903e-01,
+  -9.21874999999994004796e-01, 3.97772517966614058693e-01,
+  -9.37499999999994337863e-01, 3.91605626676801210628e-01,
+  -9.53124999999999444888e-01, 3.85534344174578935682e-01,
+  -9.68749999999986677324e-01, 3.79557188183094640355e-01,
+  -9.84374999999992339461e-01, 3.73672699406045860648e-01,
+  -9.99999999999995892175e-01, 3.67879441171443832825e-01,
+  -1.01562499999994315658e+00, 3.62175999080846300338e-01,
+  -1.03124999999991096011e+00, 3.56560980663978732697e-01,
+  -1.04687499999999067413e+00, 3.51033015038813400732e-01,
+};
+
+static const double
+  half		=0.5,
+/* Following three values used to scale x to primary range */
+  invln2_32	=4.61662413084468283841e+01,	/* 0x40471547, 0x652b82fe */
+  ln2_32hi	=2.16608493865351192653e-02,	/* 0x3f962e42, 0xfee00000 */
+  ln2_32lo	=5.96317165397058656257e-12,	/* 0x3d9a39ef, 0x35793c76 */
+/* t2-t5 terms used for polynomial computation */
+  t2		=1.6666666666526086527e-1,	/* 3fc5555555548f7c */
+  t3		=4.1666666666226079285e-2,	/* 3fa5555555545d4e */
+  t4		=8.3333679843421958056e-3,	/* 3f811115b7aa905e */
+  t5		=1.3888949086377719040e-3,	/* 3f56c1728d739765 */
+  one		=1.0,
+/* maximum value for x to not overflow */
+  threshold1	=7.09782712893383973096e+02,	/* 0x40862E42, 0xFEFA39EF */
+/* maximum value for -x to not underflow */
+  threshold2	=7.45133219101941108420e+02,	/* 0x40874910, 0xD52D3051 */
+/* scaling factor used when result near zero*/
+  twom54	=5.55111512312578270212e-17,	/* 0x3c900000, 0x00000000 */
+/* value used to force inexact condition */
+  small		=1.0e-100;
 
-double __slowexp (double);
 
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
-   the correctly rounded (to nearest) value of e^x.  */
 double
-SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  int fe_val;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
-
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
-
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	cor = (al - res) + rem;
-	if (res == (res + cor * err_0))
-	  {
-	    retval = res * binexp.x;
-	    goto ret;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
+
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
+
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      /* raise inexact if x != 0 */
+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return (retval);
+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return (retval);
+	    }
+	  /* 
+	     Use FE_TONEAREST rounding mode for computing yy.y 
+	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+	  */
+	  fe_val = __fegetround();
+	  if (fe_val == FE_TONEAREST) {
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2) +
+			   (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
+	  } else {
+	    __fesetround(FE_TONEAREST);
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2) +
+			   (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
+	    __fesetround(fe_val);
 	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto ret;
-	  }			/*if error is over bound */
-      }
+	  return (retval);
+	}
 
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
+      /* find the multiple of 2^-6 nearest x */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
+      /* 
+	 Use FE_TONEAREST rounding mode for computing yy.y 
+	 Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+      */
+      fe_val = __fegetround();
+      if (fe_val == FE_TONEAREST) {
+	z = xx.x - TBL2[j];
+	t = z * z;
+	/* the "small" term below guarantees inexact will be raised */
+	yy.y = z + (t * (half + (z * t2 + small)) +
+		    (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+      } else {
+	__fesetround(FE_TONEAREST);
+	z = xx.x - TBL2[j];
+	t = z * z;
+	/* the "small" term below guarantees inexact will be raised */
+	yy.y = z + (t * (half + (z * t2 + small)) +
+		    (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
+	__fesetround(fe_val);
       }
+      return (retval);
+    }
 
-    if (n >= badint)
-      {
-	if (n > infint)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
-	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return (zero);	/* exp(-inf) = 0 */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf */
+	}
+      if (xx.x > threshold1)
+	{			/* set overflow error condition */
+	  retval = hhuge * hhuge;
+	  return retval;
+	} 
+      if (-xx.x > threshold2)
+	{			/* set underflow error condition */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = zero;
+	  return retval;
+	}
+    }
 
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
-      {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    if (res == (res + cor * err_0))
-	      {
-		retval = res * binexp.x;
-		goto ret;
-	      }
-	    else
-	      {
-		retval = __slowexp (x);
-		goto check_uflow_ret;
-	      }			/*if error is over bound */
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	eps = 1.0000000001 + err_0 * binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	cor = (t - res) + y;
-	if (res == (res + eps * cor))
-	  {
-	    binexp.i[HIGH_HALF] = 0x00100000;
-	    retval = (res - 1.0) * binexp.x;
-	    goto check_uflow_ret;
-	  }
-	else
-	  {
-	    retval = __slowexp (x);
-	    goto check_uflow_ret;
-	  }			/*   if error is over bound    */
-      check_uflow_ret:
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
-      }
+  /* 
+     Use FE_TONEAREST rounding mode for computing yy.y 
+     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
+  */
+  fe_val = __fegetround();
+  if (fe_val == FE_TONEAREST) {
+    t = invln2_32 * xx.x;
+    if (ix < 0)
+      t -= half;
     else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	if (res == (res + cor * err_0))
-	  retval = res * binexp.x * t256.x;
-	else
-	  retval = __slowexp (x);
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+  } else {
+    __fesetround(FE_TONEAREST);
+    t = invln2_32 * xx.x;
+    if (ix < 0)
+      t -= half;
+    else
+      t += half;
+    k = (int) t;
+    j = (k & 0x1f) << 1;
+    m = k >> 5;
+    z = (xx.x - k * ln2_32hi) - k * ln2_32lo;
+
+    /* z is now in primary range */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+    __fesetround(fe_val);
   }
-ret:
-  return retval;
 
- ret_huge:
-  return hhuge * hhuge;
-
- ret_tiny:
-  return tiny * tiny;
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return (yy.y);
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
 #endif
 
+#ifndef SECTION
+# define SECTION
+#endif
+
 /* Compute e^(x+xx).  The routine also receives bound of error of previous
    calculation.  If after computing exp the error exceeds the allowed bounds,
    the routine returns a non-positive number.  Otherwise it returns the
-- 
1.7.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-16 16:56 Patrick McGehearty
@ 2017-10-18 17:22 ` Joseph Myers
  2017-10-18 23:22   ` Joseph Myers
  2017-10-19 22:31   ` Patrick McGehearty
  2017-10-23 12:25 ` Siddhesh Poyarekar
  1 sibling, 2 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-18 17:22 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Mon, 16 Oct 2017, Patrick McGehearty wrote:

> diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
> index 6757a14..555a47f 100644
> --- a/sysdeps/ieee754/dbl-64/e_exp.c
> +++ b/sysdeps/ieee754/dbl-64/e_exp.c
> @@ -1,238 +1,452 @@
> +/* EXP function - Compute double precision exponential
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
>  /*
> - * IBM Accurate Mathematical Library
> - * written by International Business Machines Corp.
> - * Copyright (C) 2001-2017 Free Software Foundation, Inc.

The file still contains __exp1, used by pow.  Thus, I do not think 
removing the existing copyright dates is appropriate unless as a 
preliminary patch you move __exp1 to another file (e_pow.c being the 
obvious one; watch out for the sysdeps/x86_64/fpu/multiarch/e_exp-*.c with 
their own macro definitions of __exp1 that would no longer be appropriate 
after such a move).

Does this patch remove all calls to __slowexp?  If so, I'd expect 
slowexp.c to be removed as part of the patch (including 
architecture-specific / multiarch variants, and Makefile references to 
special options etc. for building slowexp.c).

> +extern double __ieee754_exp (double);
> +
> +
> +static const double TBL[] = {
> +  1.00000000000000000000e+00, 0.00000000000000000000e+00,
> +  1.02189714865411662714e+00, 5.10922502897344389359e-17,
> +  1.04427378242741375480e+00, 8.55188970553796365958e-17,

As per previous comments, this sort of table of precomputed values should 
use hex float constants.  You can use before-and-after comparison of 
object files to make sure the hex floats do represent the same values as 
the decimal constants.

> +static const double
> +  half		=0.5,
> +/* Following three values used to scale x to primary range */
> +  invln2_32	=4.61662413084468283841e+01,	/* 0x40471547, 0x652b82fe */
> +  ln2_32hi	=2.16608493865351192653e-02,	/* 0x3f962e42, 0xfee00000 */
> +  ln2_32lo	=5.96317165397058656257e-12,	/* 0x3d9a39ef, 0x35793c76 */
> +/* t2-t5 terms used for polynomial computation */
> +  t2		=1.6666666666526086527e-1,	/* 3fc5555555548f7c */
> +  t3		=4.1666666666226079285e-2,	/* 3fa5555555545d4e */
> +  t4		=8.3333679843421958056e-3,	/* 3f811115b7aa905e */
> +  t5		=1.3888949086377719040e-3,	/* 3f56c1728d739765 */
> +  one		=1.0,
> +/* maximum value for x to not overflow */
> +  threshold1	=7.09782712893383973096e+02,	/* 0x40862E42, 0xFEFA39EF */
> +/* maximum value for -x to not underflow */
> +  threshold2	=7.45133219101941108420e+02,	/* 0x40874910, 0xD52D3051 */
> +/* scaling factor used when result near zero*/
> +  twom54	=5.55111512312578270212e-17,	/* 0x3c900000, 0x00000000 */
> +/* value used to force inexact condition */
> +  small		=1.0e-100;

Likewise hex floats for these values (other than half / one / small).  
Also, the formatting is far from GNU standard; I'd expect e.g.:

/* Scaling factor used when result near zero.  */
static const double twom54 = 0x1p-54;

(single space either side of =, comment before each variable definition 
starting with a capital letter and ending with ".  ").

> +	  /* 
> +	     Use FE_TONEAREST rounding mode for computing yy.y 
> +	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
> +	  */
> +	  fe_val = __fegetround();
> +	  if (fe_val == FE_TONEAREST) {
> +	    t = xx.x * xx.x;
> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
> +	    retval = one + yy.y;
> +	  } else {
> +	    __fesetround(FE_TONEAREST);
> +	    t = xx.x * xx.x;
> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
> +	    retval = one + yy.y;
> +	    __fesetround(fe_val);

The formatting here is off, missing space before '(' in function calls.  
And you should be using the optimized SET_RESTORE_ROUND (FE_TONEAREST), 
which in addition to avoiding setting the rounding mode to a value it 
already has, also e.g. arranges on x86_64 to set only the SSE rounding 
mode and not the x87 mode because only the SSE mode needs setting for 
types other than long double.

(I'm not clear why you actually need to set the rounding mode here.  Does 
excessive inaccuracy result in this particular case if you don't set it?)

> +      fe_val = __fegetround();

Likewise again, and subsequently.

> +      if (fe_val == FE_TONEAREST) {
> +	z = xx.x - TBL2[j];
> +	t = z * z;
> +	/* the "small" term below guarantees inexact will be raised */

Guaranteeing "inexact" is not part of the goals for most libm functions, 
so I expect you can remove that term.

> +      if (-xx.x > threshold2)
> +	{			/* set underflow error condition */
> +	  double force_underflow = tiny * tiny;
> +	  math_force_eval (force_underflow);
> +	  retval = zero;
> +	  return retval;

I'd expect the force_underflow value to be returned in this case (so the 
return value is appropriate in round-upward mode).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-18 17:22 ` Joseph Myers
@ 2017-10-18 23:22   ` Joseph Myers
  2017-10-19 22:31   ` Patrick McGehearty
  1 sibling, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-18 23:22 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Wed, 18 Oct 2017, Joseph Myers wrote:

> Does this patch remove all calls to __slowexp?  If so, I'd expect 
> slowexp.c to be removed as part of the patch (including 
> architecture-specific / multiarch variants, and Makefile references to 
> special options etc. for building slowexp.c).

Just to add: one of the less obvious places needing updating on removing 
slowexp.c is the manual/probes.texi documentation of the probe points 
therein.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-18 17:22 ` Joseph Myers
  2017-10-18 23:22   ` Joseph Myers
@ 2017-10-19 22:31   ` Patrick McGehearty
  2017-10-19 22:48     ` Joseph Myers
  2017-10-20 11:41     ` Szabolcs Nagy
  1 sibling, 2 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-19 22:31 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On 10/18/2017 12:22 PM, Joseph Myers wrote:

Thank you Joseph for your detailed review.
Comments below.
> On Mon, 16 Oct 2017, Patrick McGehearty wrote:
>
>> diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
>> index 6757a14..555a47f 100644
>> --- a/sysdeps/ieee754/dbl-64/e_exp.c
>> +++ b/sysdeps/ieee754/dbl-64/e_exp.c
>> @@ -1,238 +1,452 @@
>> +/* EXP function - Compute double precision exponential
>> +   Copyright (C) 2017 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <http://www.gnu.org/licenses/>.  */
>> +
>>   /*
>> - * IBM Accurate Mathematical Library
>> - * written by International Business Machines Corp.
>> - * Copyright (C) 2001-2017 Free Software Foundation, Inc.
> The file still contains __exp1, used by pow.  Thus, I do not think
> removing the existing copyright dates is appropriate unless as a
> preliminary patch you move __exp1 to another file (e_pow.c being the
> obvious one; watch out for the sysdeps/x86_64/fpu/multiarch/e_exp-*.c with
> their own macro definitions of __exp1 that would no longer be appropriate
> after such a move).
For the copyright issue, would it be appropriate to move the
previous copyright notice to right before __exp1?
I'm reluctant to move __exp1 as that might also require changes
to Makefiles which are not currently required.


>
> Does this patch remove all calls to __slowexp?  If so, I'd expect
> slowexp.c to be removed as part of the patch (including
> architecture-specific / multiarch variants, and Makefile references to
> special options etc. for building slowexp.c).
For slowexp, I have some reservations that removing slowexp.o
might cause older object modules to break due to the missing
reference. I know they should not be directly referencing
an internal function, but still...
Anyway, I can't find any other direct usage of __slowexp.
I will remove all references to __slowexp and see
if anything breaks to show I missed a reference.

I find the following files have references, including some more
files to be removed.

sysdeps/generic/math_private.h

manual/probes.texi - simply remove references to slowexp_p6 and slowexp_p32
math/Makefile - remove slowexp references
sysdeps/generic/math_private.h
sysdeps/ieee754/dbl-64/e_exp.cÂ Â  (removed in the new code)
sysdeps/ieee754/dbl-64/slowexp.c (file to be removed)
sysdeps/ieee754/dbl-64/e_pow.c:Â  (comment only)

The following include ieee754/e_exp.c; can remove references to slowexp
sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
sysdeps/x86_64/fpu/multiarch/e_exp-fma.c
sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c

The following include ieee754/dbl-64/slowexp.c; no longer needed
sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
sysdeps/x86_64/fpu/multiarch/slowexp-fma.c
sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c

sysdeps/x86_64/fpu/multiarch/Makefile - remove slowexp references


benchtests/strcol1-inputs/filelist#C and filelist#en_US.UTF-8
have references to slowexp*.c
I'm supposing those should also be removed.

>
>
>> +extern double __ieee754_exp (double);
>> +
>> +
>> +static const double TBL[] = {
>> +  1.00000000000000000000e+00, 0.00000000000000000000e+00,
>> +  1.02189714865411662714e+00, 5.10922502897344389359e-17,
>> +  1.04427378242741375480e+00, 8.55188970553796365958e-17,
> As per previous comments, this sort of table of precomputed values should
> use hex float constants.  You can use before-and-after comparison of
> object files to make sure the hex floats do represent the same values as
> the decimal constants.
>
>> +static const double
>> +  half		=0.5,
>> +/* Following three values used to scale x to primary range */
>> +  invln2_32	=4.61662413084468283841e+01,	/* 0x40471547, 0x652b82fe */
>> +  ln2_32hi	=2.16608493865351192653e-02,	/* 0x3f962e42, 0xfee00000 */
>> +  ln2_32lo	=5.96317165397058656257e-12,	/* 0x3d9a39ef, 0x35793c76 */
>> +/* t2-t5 terms used for polynomial computation */
>> +  t2		=1.6666666666526086527e-1,	/* 3fc5555555548f7c */
>> +  t3		=4.1666666666226079285e-2,	/* 3fa5555555545d4e */
>> +  t4		=8.3333679843421958056e-3,	/* 3f811115b7aa905e */
>> +  t5		=1.3888949086377719040e-3,	/* 3f56c1728d739765 */
>> +  one		=1.0,
>> +/* maximum value for x to not overflow */
>> +  threshold1	=7.09782712893383973096e+02,	/* 0x40862E42, 0xFEFA39EF */
>> +/* maximum value for -x to not underflow */
>> +  threshold2	=7.45133219101941108420e+02,	/* 0x40874910, 0xD52D3051 */
>> +/* scaling factor used when result near zero*/
>> +  twom54	=5.55111512312578270212e-17,	/* 0x3c900000, 0x00000000 */
>> +/* value used to force inexact condition */
>> +  small		=1.0e-100;
> Likewise hex floats for these values (other than half / one / small).
> Also, the formatting is far from GNU standard; I'd expect e.g.:
>
> /* Scaling factor used when result near zero.  */
> static const double twom54 = 0x1p-54;
>
> (single space either side of =, comment before each variable definition
> starting with a capital letter and ending with ".  ").

Table of hex float constants. I can readily adjust the formating. What
you see is the formating used in the original source.
I've been uncomfortable with hex floats approach
as it only works for ieee754 representations
that use base 2. I admit that is most current machines.
And the prior ieee754 exp table uses hex format.
My second reason for resisting the change is my philosophy
when porting code is that every change without a good reason
is an opportunity to introduce errors without corresponding benefit.

If you feel strongly about suing hex constants, I will make an effort
to convert these values to hex format.Â  This conversion seems likely
to require the most effort on my part.


>
>> +	  /*
>> +	     Use FE_TONEAREST rounding mode for computing yy.y
>> +	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
>> +	  */
>> +	  fe_val = __fegetround();
>> +	  if (fe_val == FE_TONEAREST) {
>> +	    t = xx.x * xx.x;
>> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
>> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
>> +	    retval = one + yy.y;
>> +	  } else {
>> +	    __fesetround(FE_TONEAREST);
>> +	    t = xx.x * xx.x;
>> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
>> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
>> +	    retval = one + yy.y;
>> +	    __fesetround(fe_val);
> The formatting here is off, missing space before '(' in function calls.
> And you should be using the optimized SET_RESTORE_ROUND (FE_TONEAREST),
> which in addition to avoiding setting the rounding mode to a value it
> already has, also e.g. arranges on x86_64 to set only the SSE rounding
> mode and not the x87 mode because only the SSE mode needs setting for
> types other than long double.
>
> (I'm not clear why you actually need to set the rounding mode here.  Does
> excessive inaccuracy result in this particular case if you don't set it?)
>
>> +      fe_val = __fegetround();
Failing to use __fegetround(),__fesetround() causes over 40 math test
accuracy failures for other rounding modes in exp, cexp, cpow, cosh,
ccos, ccosh, csin, and csinh.Â  If the glibc/Linux community were to
declare that the only rounding mode that was fully supported was
FE_TONEAREST, we could simplify/speed up a lot of code. :-)
If I were wearing a benchmarker's hat, I could argue for that
approach. As a math librarian developer, I felt compelled to add the
rounding mode tests to maintain accuracy for the other rounding modes
in spite of the performance cost.

While SET_RESTORE_ROUND is reasonably optimized, it is not ideal,
especially on Sparc. Empirically, using SET_RESTORE_ROUND adds more
than twice the overhead compared to using __fegetround() on Sparc.
Interestingly, for x86, the two approaches have similar performance
perhaps due to more attention paid to x86 optimization with gcc over
the years.

Underneath the definitions of SET_RESTORE_ROUND, it ultimately relies
on __fegetround() and __fesetround(). The extra cost for
SET_RESTORE_ROUND is that it requires a flag to always be set (mode
changed or did not change). A short time the flag must tested to
determine if the rounding mode needs to be restored. If the compiler
puts that flag on the stack or in memory, the reading of a value that
was just written to cache triggers a "Read after Write" HW hazard,
causing a typical delay of 30-40 cycles. Avoiding the test also
avoids a possible mis-predicted branch.Â  For M7 and earlier, Sparc
branch prediction is not ideal and mis-prediction is expensive.Â  The
code I provided avoids the need to set the flag or test it by
duplicating small segments of code for each case.


> Likewise again, and subsequently.
>
>> +      if (fe_val == FE_TONEAREST) {
>> +	z = xx.x - TBL2[j];
>> +	t = z * z;
>> +	/* the "small" term below guarantees inexact will be raised */
> Guaranteeing "inexact" is not part of the goals for most libm functions,
> so I expect you can remove that term.
The "inexact" test was required to pass the (make check) math lib tests.

>
>> +      if (-xx.x > threshold2)
>> +	{			/* set underflow error condition */
>> +	  double force_underflow = tiny * tiny;
>> +	  math_force_eval (force_underflow);
>> +	  retval = zero;
>> +	  return retval;
> I'd expect the force_underflow value to be returned in this case (so the
> return value is appropriate in round-upward mode).
When -xx.x > threshold2, e**x = 0.Â  I'll investigate when/whether 
round-upward
expects the ulp to be 1 instead of zero. My first impression is that you
are right, but I want to think about it a bit more.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-19 22:31   ` Patrick McGehearty
@ 2017-10-19 22:48     ` Joseph Myers
  2017-10-20 15:04       ` Patrick McGehearty
  2017-10-21  5:23       ` Patrick McGehearty
  2017-10-20 11:41     ` Szabolcs Nagy
  1 sibling, 2 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-19 22:48 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 4004 bytes --]

On Thu, 19 Oct 2017, Patrick McGehearty wrote:

> For the copyright issue, would it be appropriate to move the
> previous copyright notice to right before __exp1?
> I'm reluctant to move __exp1 as that might also require changes
> to Makefiles which are not currently required.

There should be exactly one copyright notice with a given copyright holder 
in a given source file, so only one set of dates (2001-2017) given.  The 
"IBM Accurate Mathematical Library" comment might be updated to describe 
what parts of the file come from where.

> For slowexp, I have some reservations that removing slowexp.o
> might cause older object modules to break due to the missing
> reference. I know they should not be directly referencing
> an internal function, but still...
> Anyway, I can't find any other direct usage of __slowexp.
> I will remove all references to __slowexp and see
> if anything breaks to show I missed a reference.
> 
> I find the following files have references, including some more
> files to be removed.

This list seems to be missing (at least) 
sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c setting).

> benchtests/strcol1-inputs/filelist#C and filelist#en_US.UTF-8
> have references to slowexp*.c
> I'm supposing those should also be removed.

I know some people do edit this benchmark input when removing files, but I 
think that's inappropriate; it's just a test input that happens to be 
based on a list of files in some version of glibc, there is no need for it 
to correspond to current glibc, and in the interests of comparability of 
benchmark results it would be best for it never to change.

> I've been uncomfortable with hex floats approach
> as it only works for ieee754 representations
> that use base 2. I admit that is most current machines.

This file is in a directory for such a binary format, and it's tuned for 
that particular format (regarding polynomial size chosen, etc.).  Any code 
written for decimal formats would naturally use decimal constants, but in 
code written for binary formats, hex constants are appropriate for such 
precomputed constants to make clear exactly what value / representation is 
intended.  (And in the 0x1p-54 case, it makes the code a lot more readable 
to put that in hex.)

> Underneath the definitions of SET_RESTORE_ROUND, it ultimately relies
> on __fegetround() and __fesetround(). The extra cost for
> SET_RESTORE_ROUND is that it requires a flag to always be set (mode
> changed or did not change). A short time the flag must tested to
> determine if the rounding mode needs to be restored. If the compiler
> puts that flag on the stack or in memory, the reading of a value that
> was just written to cache triggers a "Read after Write" HW hazard,
> causing a typical delay of 30-40 cycles. Avoiding the test also
> avoids a possible mis-predicted branch.Â  For M7 and earlier, Sparc
> branch prediction is not ideal and mis-prediction is expensive.Â  The
> code I provided avoids the need to set the flag or test it by
> duplicating small segments of code for each case.

Well, there are lots of lower-level interfaces such as libc_fesetround 
that could be used if appropriate.  Maybe additional such interfaces need 
to be added to support such cases of duplicating code.  It would seem 
natural to start with the existing interfaces (SET_RESTORE_ROUND), with a 
view to a followup possibly then adding further optimizations, just as the 
expf changes started by adding something wrapped by the existing wrapper, 
then arranged to avoid a wrapper in subsequent patches in the series.

> > Guaranteeing "inexact" is not part of the goals for most libm functions,
> > so I expect you can remove that term.
> The "inexact" test was required to pass the (make check) math lib tests.

You'll need to explain more.  For functions which are not fully defined by 
a binding to IEEE operations, both spurious and missing "inexact" should 
be allowed by the testsuite.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-19 22:48     ` Joseph Myers
@ 2017-10-20 15:04       ` Patrick McGehearty
  2017-10-21  5:23       ` Patrick McGehearty
  1 sibling, 0 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-20 15:04 UTC (permalink / raw)
  To: libc-alpha

On 10/19/2017 5:48 PM, Joseph Myers wrote:
> On Thu, 19 Oct 2017, Patrick McGehearty wrote:
>
>> For the copyright issue, would it be appropriate to move the
>> previous copyright notice to right before __exp1?
>> I'm reluctant to move __exp1 as that might also require changes
>> to Makefiles which are not currently required.
> There should be exactly one copyright notice with a given copyright holder
> in a given source file, so only one set of dates (2001-2017) given.  The
> "IBM Accurate Mathematical Library" comment might be updated to describe
> what parts of the file come from where.
I'll see what I can do.
>> For slowexp, I have some reservations that removing slowexp.o
>> might cause older object modules to break due to the missing
>> reference. I know they should not be directly referencing
>> an internal function, but still...
>> Anyway, I can't find any other direct usage of __slowexp.
>> I will remove all references to __slowexp and see
>> if anything breaks to show I missed a reference.
>>
>> I find the following files have references, including some more
>> files to be removed.
> This list seems to be missing (at least)
> sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c setting).
Thanks. I missed that in my grep review.
>> benchtests/strcol1-inputs/filelist#C and filelist#en_US.UTF-8
>> have references to slowexp*.c
>> I'm supposing those should also be removed.
> I know some people do edit this benchmark input when removing files, but I
> think that's inappropriate; it's just a test input that happens to be
> based on a list of files in some version of glibc, there is no need for it
> to correspond to current glibc, and in the interests of comparability of
> benchmark results it would be best for it never to change.
Ok, I'll leave those alone.
>> I've been uncomfortable with hex floats approach
>> as it only works for ieee754 representations
>> that use base 2. I admit that is most current machines.
> This file is in a directory for such a binary format, and it's tuned for
> that particular format (regarding polynomial size chosen, etc.).  Any code
> written for decimal formats would naturally use decimal constants, but in
> code written for binary formats, hex constants are appropriate for such
> precomputed constants to make clear exactly what value / representation is
> intended.  (And in the 0x1p-54 case, it makes the code a lot more readable
> to put that in hex.)
As noted in other email, I'll do the hex conversion.
>> Underneath the definitions of SET_RESTORE_ROUND, it ultimately relies
>> on __fegetround() and __fesetround(). The extra cost for
>> SET_RESTORE_ROUND is that it requires a flag to always be set (mode
>> changed or did not change). A short time the flag must tested to
>> determine if the rounding mode needs to be restored. If the compiler
>> puts that flag on the stack or in memory, the reading of a value that
>> was just written to cache triggers a "Read after Write" HW hazard,
>> causing a typical delay of 30-40 cycles. Avoiding the test also
>> avoids a possible mis-predicted branch.Â  For M7 and earlier, Sparc
>> branch prediction is not ideal and mis-prediction is expensive.Â  The
>> code I provided avoids the need to set the flag or test it by
>> duplicating small segments of code for each case.
> Well, there are lots of lower-level interfaces such as libc_fesetround
> that could be used if appropriate.  Maybe additional such interfaces need
> to be added to support such cases of duplicating code.  It would seem
> natural to start with the existing interfaces (SET_RESTORE_ROUND), with a
> view to a followup possibly then adding further optimizations, just as the
> expf changes started by adding something wrapped by the existing wrapper,
> then arranged to avoid a wrapper in subsequent patches in the series.
I'll test Wilco's suggestion of using get_rounding_mode and libc_fesetround.
>>> Guaranteeing "inexact" is not part of the goals for most libm functions,
>>> so I expect you can remove that term.
>> The "inexact" test was required to pass the (make check) math lib tests.
> You'll need to explain more.  For functions which are not fully defined by
> a binding to IEEE operations, both spurious and missing "inexact" should
> be allowed by the testsuite.
>
It was months ago that I made the inexact modifications. It was part
of dealing with failure to report underflow/overflow/NAN conditions.
I've forgotten the exact details.
I'll remove the inexact code and report what test failures are seen if any.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-19 22:48     ` Joseph Myers
  2017-10-20 15:04       ` Patrick McGehearty
@ 2017-10-21  5:23       ` Patrick McGehearty
  2017-10-23 12:47         ` Joseph Myers
  1 sibling, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-21  5:23 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On 10/19/2017 5:48 PM, Joseph Myers wrote:
> On Thu, 19 Oct 2017, Patrick McGehearty wrote:
>
>
>>> Guaranteeing "inexact" is not part of the goals for most libm functions,
>>> so I expect you can remove that term.
>> The "inexact" test was required to pass the (make check) math lib tests.
> You'll need to explain more.  For functions which are not fully defined by
> a binding to IEEE operations, both spurious and missing "inexact" should
> be allowed by the testsuite.
>
When the following lines are commented out:
 Â Â Â Â  double force_underflow = tiny * tiny;
 Â Â Â Â  math_force_eval (force_underflow);

The following failures appear (plus repeated for other rounding modes)
Failure: exp (-0x2.c4edp+12): Exception "Underflow" not set
Failure: exp (-0x2.c5b2319c4843ap+12): Exception "Underflow" not set
Failure: exp (-0x2.c5b2319c4843cp+12): Exception "Underflow" not set
Failure: exp (-0x2.c5b234p+12): Exception "Underflow" not set
Failure: exp (-0x2.c5b23p+12): Exception "Underflow" not set
Failure: exp (-0x2.c5bd48bdc7c0cp+12): Exception "Underflow" not set
Failure: exp (-0x2.c5bd48bdc7c0ep+12): Exception "Underflow" not set
Failure: exp (-0x2.c5bd48p+12): Exception "Underflow" not set
Failure: exp (-0x2.c5bd4cp+12): Exception "Underflow" not set
Failure: exp (-0x2.ebe224p+8): Exception "Underflow" not set
Failure: exp (-0x2.ebe227861639p+8): Exception "Underflow" not set
Failure: exp (-0x2.ebe228p+8): Exception "Underflow" not set
Failure: exp (-0x4.d2p+8): Exception "Underflow" not set
Failure: exp (-0xf.ffffffffffff8p+1020): Exception "Underflow" not set
Failure: exp (-0xf.fffffp+124): Exception "Underflow" not set

The same pair of instructions are used in the old e_exp.c
to set the underflow exception, which means the new code
will match the behavior of the old code for underflows.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-21  5:23       ` Patrick McGehearty
@ 2017-10-23 12:47         ` Joseph Myers
  2017-10-23 19:58           ` Patrick McGehearty
  0 siblings, 1 reply; 44+ messages in thread
From: Joseph Myers @ 2017-10-23 12:47 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

On Sat, 21 Oct 2017, Patrick McGehearty wrote:

> On 10/19/2017 5:48 PM, Joseph Myers wrote:
> > On Thu, 19 Oct 2017, Patrick McGehearty wrote:
> > 
> > 
> > > > Guaranteeing "inexact" is not part of the goals for most libm functions,
> > > > so I expect you can remove that term.
> > > The "inexact" test was required to pass the (make check) math lib tests.
> > You'll need to explain more.  For functions which are not fully defined by
> > a binding to IEEE operations, both spurious and missing "inexact" should
> > be allowed by the testsuite.
> > 
> When the following lines are commented out:
> Â Â Â Â  double force_underflow = tiny * tiny;
> Â Â Â Â  math_force_eval (force_underflow);

That's not what I was referring to.  I was referring to the comment I 
quoted, '/* the "small" term below guarantees inexact will be raised */', 
and the associated uses of the constant "small".  Inexact, not underflow.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-23 12:47         ` Joseph Myers
@ 2017-10-23 19:58           ` Patrick McGehearty
  2017-10-23 21:31             ` Joseph Myers
  0 siblings, 1 reply; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-23 19:58 UTC (permalink / raw)
  To: libc-alpha

On 10/23/2017 7:47 AM, Joseph Myers wrote:
> On Sat, 21 Oct 2017, Patrick McGehearty wrote:
>
>> On 10/19/2017 5:48 PM, Joseph Myers wrote:
>>> On Thu, 19 Oct 2017, Patrick McGehearty wrote:
>>>
>>>
>>>>> Guaranteeing "inexact" is not part of the goals for most libm functions,
>>>>> so I expect you can remove that term.
>>>> The "inexact" test was required to pass the (make check) math lib tests.
>>> You'll need to explain more.  For functions which are not fully defined by
>>> a binding to IEEE operations, both spurious and missing "inexact" should
>>> be allowed by the testsuite.
>>>
>> When the following lines are commented out:
>>  Â Â Â Â  double force_underflow = tiny * tiny;
>>  Â Â Â Â  math_force_eval (force_underflow);
> That's not what I was referring to.  I was referring to the comment I
> quoted, '/* the "small" term below guarantees inexact will be raised */',
> and the associated uses of the constant "small".  Inexact, not underflow.
>
Now I understand your point and researched the source of the comment and
reason for the use of "small". By my reading of the ieee754 definition 
of "inexact",
exp(x) for any x except 0.0 should set the inexact bit. The Studio version
of exp() included "small" for that reason.
The current Linux ieee754 e_exp() makes no attempt to insure inexact is set.
I can easily match that behavior without changing accuracy by removing all
uses of "small" (and associated comments).

I will do that for my next patch submission.

- patrick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-23 19:58           ` Patrick McGehearty
@ 2017-10-23 21:31             ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-23 21:31 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-alpha

On Mon, 23 Oct 2017, Patrick McGehearty wrote:

> Now I understand your point and researched the source of the comment and 
> reason for the use of "small". By my reading of the ieee754 definition 
> of "inexact", exp(x) for any x except 0.0 should set the inexact bit.

However, while that would apply for TS 18661-4 crexp (corresponding 
directly to the IEEE 754 exp operation), glibc's accuracy goals for 
functions not bound to IEEE operations are as documented in math.texi, and 
those do not include correctness in whether "inexact" is raised.  (There 
is some existing code in glibc to set "inexact" in cases where it's not 
necessary to do so.  Removing such code would be reasonable cleanups, but 
is independent of this exp patch.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-19 22:31   ` Patrick McGehearty
  2017-10-19 22:48     ` Joseph Myers
@ 2017-10-20 11:41     ` Szabolcs Nagy
  2017-10-20 14:56       ` Patrick McGehearty
  2017-10-20 16:10       ` Joseph Myers
  1 sibling, 2 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2017-10-20 11:41 UTC (permalink / raw)
  To: Patrick McGehearty, Joseph Myers; +Cc: nd, libc-alpha

On 19/10/17 23:31, Patrick McGehearty wrote:
> Table of hex float constants. I can readily adjust the formating. What
> you see is the formating used in the original source.
> I've been uncomfortable with hex floats approach
> as it only works for ieee754 representations
> that use base 2. I admit that is most current machines.

your entire algorithm depends on ieee754 binary representation,
that's not a good reason for avoiding hexfloats.

decimal floats are not even required to be correctly rounded by
the compiler in iso c, they are only faithfully rounded, so
this is a portability bug in the original source too, you can
silently get completely wrong code generation because of it.

> And the prior ieee754 exp table uses hex format.
> My second reason for resisting the change is my philosophy
> when porting code is that every change without a good reason
> is an opportunity to introduce errors without corresponding benefit.
> 
> If you feel strongly about suing hex constants, I will make an effort
> to convert these values to hex format.  This conversion seems likely
> to require the most effort on my part.
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-20 11:41     ` Szabolcs Nagy
@ 2017-10-20 14:56       ` Patrick McGehearty
  2017-10-20 16:10       ` Joseph Myers
  1 sibling, 0 replies; 44+ messages in thread
From: Patrick McGehearty @ 2017-10-20 14:56 UTC (permalink / raw)
  To: libc-alpha

On 10/20/2017 6:41 AM, Szabolcs Nagy wrote:
> On 19/10/17 23:31, Patrick McGehearty wrote:
>> Table of hex float constants. I can readily adjust the formating. What
>> you see is the formating used in the original source.
>> I've been uncomfortable with hex floats approach
>> as it only works for ieee754 representations
>> that use base 2. I admit that is most current machines.
> your entire algorithm depends on ieee754 binary representation,
> that's not a good reason for avoiding hexfloats.
>
> decimal floats are not even required to be correctly rounded by
> the compiler in iso c, they are only faithfully rounded, so
> this is a portability bug in the original source too, you can
> silently get completely wrong code generation because of it.
>
>> And the prior ieee754 exp table uses hex format.
>> My second reason for resisting the change is my philosophy
>> when porting code is that every change without a good reason
>> is an opportunity to introduce errors without corresponding benefit.
>>
>> If you feel strongly about suing hex constants, I will make an effort
>> to convert these values to hex format.  This conversion seems likely
>> to require the most effort on my part.
>>
You make good points. In the prior context of the code being built
and used for Sparc/Solaris, the Studio compiler was known to
correctly round floating point constants since the early 90's.
In the gnu context, we have to be prepared for any C standard
compliant compiler, which is a greater challenge.
I'll make the conversion to using hex constants. I'm wondering
if it would be beneficial to retain the decimal constants in comment form?
Maybe not for the big table, but for the constants used for
specific thresholds, etc.Â  I know when I first looked at
some of this code, I wrote a little program to display the
decimal equivalents of some of the hex constants to better understand
what the control flow was doing.

- patrick



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-20 11:41     ` Szabolcs Nagy
  2017-10-20 14:56       ` Patrick McGehearty
@ 2017-10-20 16:10       ` Joseph Myers
  1 sibling, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-20 16:10 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Patrick McGehearty, nd, libc-alpha

On Fri, 20 Oct 2017, Szabolcs Nagy wrote:

> decimal floats are not even required to be correctly rounded by
> the compiler in iso c, they are only faithfully rounded, so
> this is a portability bug in the original source too, you can
> silently get completely wrong code generation because of it.

GCC 4.9 (the minimum version for building glibc) and later correctly round 
decimal floating-point constants for binary formats (see GCC bug 21718).  
Before that we did at least once have an issue with a decimal constant 
where someone had computed a value exactly half way between two 
representable values and different compilers rounded it differently 
(glibc bug 14803).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-16 16:56 Patrick McGehearty
  2017-10-18 17:22 ` Joseph Myers
@ 2017-10-23 12:25 ` Siddhesh Poyarekar
  2017-10-23 15:58   ` Joseph Myers
  1 sibling, 1 reply; 44+ messages in thread
From: Siddhesh Poyarekar @ 2017-10-23 12:25 UTC (permalink / raw)
  To: Patrick McGehearty, libc-alpha

On Monday 16 October 2017 10:26 PM, Patrick McGehearty wrote:
> The extreme max times for the old (ieee754) exp are due to the
> multiprecision computation in the old algorithm when the true value is
> very near 0.5 ulp away from an value representable in double
> precision. The new algorithm does not take special measures for those
> cases. The current glibc exp perf tests overrepresent those values.
> Informal testing suggests approximately one in 200 cases might
> invoke the high cost computation. The performance advantage of the new
> algorithm for other values is still large but not as large as indicated
> by the chart above.

The inputs were curated such that the multiple precision ones were
weeded out into separate tests to avoid seeing such deviations.  This
was based on the premise that the result precision for these inputs
would be consistent (not necessarily the same) across platforms but that
doesn't seem to be the case and some inputs seem to have sneaked in.

On a related note, if we are comfortable dropping exp slow path, we
should probably take a serious look at the log slow path too since IIRC
I wasn't even able to trigger it after days of running the test and it's
quite possible that nobody cares.  We could drop it and see if anybody
notices.

Siddhesh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  2017-10-23 12:25 ` Siddhesh Poyarekar
@ 2017-10-23 15:58   ` Joseph Myers
  0 siblings, 0 replies; 44+ messages in thread
From: Joseph Myers @ 2017-10-23 15:58 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Patrick McGehearty, libc-alpha

On Mon, 23 Oct 2017, Siddhesh Poyarekar wrote:

> On a related note, if we are comfortable dropping exp slow path, we
> should probably take a serious look at the log slow path too since IIRC
> I wasn't even able to trigger it after days of running the test and it's
> quite possible that nobody cares.  We could drop it and see if anybody
> notices.

The basis for removing a slow path in an existing implementation should be 
an error analysis of the remaining code that justifies that the slow path 
is never needed to avoid large errors.  That might be an error analysis 
that assumes the correctness of the existing code and deduces an error 
bound on the basis that larger errors would make a check for being 
correctly rounded (probably the one that gates entering the slow path in 
question) incorrect.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-01-01 16:41 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 23:08 [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86 Patrick McGehearty
2017-12-11  8:14 ` Siddhesh Poyarekar
2017-12-11 17:04   ` Patrick McGehearty
2017-12-11 17:53     ` Siddhesh Poyarekar
2017-12-14  1:28 ` Joseph Myers
2017-12-18 20:11   ` Patrick McGehearty
  -- strict thread matches above, loose matches on Subject: below --
2017-12-29 23:42 Patrick McGehearty
2018-01-01  1:36 ` Joseph Myers
2018-01-01 16:31   ` Patrick McGehearty
2018-01-01 16:41     ` Joseph Myers
2017-12-04 21:53 Patrick McGehearty
2017-12-05 23:20 ` Joseph Myers
2017-12-01  0:51 Patrick McGehearty
2017-12-01  0:56 ` Joseph Myers
2017-11-07  4:25 Patrick McGehearty
2017-11-16 17:52 ` Patrick McGehearty
2017-11-16 18:27   ` Carlos O'Donell
2017-11-16 18:31   ` Joseph Myers
2017-11-23 21:19 ` Joseph Myers
2017-12-01  0:47   ` Patrick McGehearty
2017-10-26 22:53 Patrick McGehearty
2017-11-01  0:26 ` Joseph Myers
2017-10-26 16:44 Patrick McGehearty
2017-10-26 17:20 ` Joseph Myers
2017-10-26 17:25   ` Joseph Myers
2017-10-26 18:30     ` Patrick McGehearty
2017-10-26 19:44       ` Joseph Myers
2017-10-20 13:38 Wilco Dijkstra
2017-10-20 14:58 ` Patrick McGehearty
2017-10-16 16:56 Patrick McGehearty
2017-10-18 17:22 ` Joseph Myers
2017-10-18 23:22   ` Joseph Myers
2017-10-19 22:31   ` Patrick McGehearty
2017-10-19 22:48     ` Joseph Myers
2017-10-20 15:04       ` Patrick McGehearty
2017-10-21  5:23       ` Patrick McGehearty
2017-10-23 12:47         ` Joseph Myers
2017-10-23 19:58           ` Patrick McGehearty
2017-10-23 21:31             ` Joseph Myers
2017-10-20 11:41     ` Szabolcs Nagy
2017-10-20 14:56       ` Patrick McGehearty
2017-10-20 16:10       ` Joseph Myers
2017-10-23 12:25 ` Siddhesh Poyarekar
2017-10-23 15:58   ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).