v3: - Add empty e_pow_log_data.c to targets that have their own pow. - Fix GNU style issues. - Document internal function semantics. - Add NEWS entry. v2: - use __FP_FAST_FMA and __builtin_fma - update x86_64 makefiles too to allow fma contraction. The algorithm is exp(y * log(x)), where log(x) is computed with about 1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). The __exp1 internal symbol is no longer necessary. There is separate code path when fma is not available but the worst case error is about 0.54 ULP in both cases. The lookup table and consts for log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: latency: 1.8x thruput: 2.5x 2018-06-29 Szabolcs Nagy * NEWS: Mention pow improvements. * math/Makefile (type-double-routines): Add e_pow_log_data. * sysdeps/generic/math_private.h (__exp1): Remove. * sysdeps/i386/fpu/e_pow_log_data.c: New file. * sysdeps/ia64/fpu/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma contraction. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove. (exp_inline): Remove. (__ieee754_exp): Only single double input is handled. * sysdeps/ieee754/dbl-64/e_pow.c: Rewrite. * sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define. (__pow_log_data): Define. * sysdeps/ieee754/dbl-64/upow.h: Remove. * sysdeps/ieee754/dbl-64/upow.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma contraction. (CFLAGS-e_pow-fma4.c): Likewise --- NEWS | 4 +- math/Makefile | 2 +- sysdeps/generic/math_private.h | 1 - sysdeps/i386/fpu/e_pow_log_data.c | 1 + sysdeps/ia64/fpu/e_pow_log_data.c | 1 + sysdeps/ieee754/dbl-64/Makefile | 1 - sysdeps/ieee754/dbl-64/e_exp.c | 34 +- sysdeps/ieee754/dbl-64/e_pow.c | 663 +- sysdeps/ieee754/dbl-64/e_pow_log_data.c | 173 + sysdeps/ieee754/dbl-64/math_config.h | 21 + sysdeps/ieee754/dbl-64/upow.h | 76 - sysdeps/ieee754/dbl-64/upow.tbl | 10188 ----------------------------- sysdeps/m68k/m680x0/fpu/e_pow_log_data.c | 1 + sysdeps/x86_64/fpu/multiarch/Makefile | 4 +- 14 files changed, 555 insertions(+), 10615 deletions(-) create mode 100644 sysdeps/i386/fpu/e_pow_log_data.c create mode 100644 sysdeps/ia64/fpu/e_pow_log_data.c create mode 100644 sysdeps/ieee754/dbl-64/e_pow_log_data.c delete mode 100644 sysdeps/ieee754/dbl-64/upow.h delete mode 100644 sysdeps/ieee754/dbl-64/upow.tbl create mode 100644 sysdeps/m68k/m680x0/fpu/e_pow_log_data.c