Adhemerval Zanella wrote: > On 18/08/2021 14:11, Stefan Kanthak wrote: >> Szabolcs Nagy wrote: >> >>> The 08/16/2021 18:03, Stefan Kanthak wrote: >>>> Testing and benchmarking an exponential function that consumes >>>> about 10ns/call on an AMD EPYC 7262, I noticed that nextafter() >>>> itself is DEAD SLOW: it consumes about 8.5ns/call! >>>> >>>> The following (quick&dirty) implementation consumes 0.7ns/call, >>>> i.e. is about an order of magnitude faster: >>> >>> correctly measuring latency on a modern x86_64 core: >>> >>> musl: 3.16 ns >>> glibc: 5.68 ns >>> your: 5.72 ns > > Thanks for bring this up, if you want to contribute a patch See attachment. For completeness sake and to ease an initial review, I post the full source here too: without ALTERNATE defined, it uses integer arithmetic to the extent possible, like the original. Choose whatever you like best or gives better throughput/latency. Changes from the 1st version of the patch: - leftover FP comparision substituted with integer operations; - increment/decrement logic simplified/unified, avoiding a conditional branch. I expect the compiler to detect the common clause "&& ((xx.s < 0) == (yy.s < 0)))" introduced with these changes. Stefan JFTR: .../sysdeps/ieee754/flt-32/s_nextafterf.c as well as .../sysdeps/i386/fpu/s_nextafterl.c will need the same changes. --- .../math/s_nextafter.c --- /* IEEE functions * nextafter(x,y) * return the next machine floating-point number of x in the * direction toward y. * Special cases: */ /* Ugly hack so that the aliasing works. */ #define __nexttoward __internal___nexttoward #define nexttoward __internal_nexttoward #include #include #include #include #include #include double __nextafter(double x, double y) { union { double d; int64_t s; uint64_t u; } xx = {x}, yy = {y}; uint64_t ax = xx.u + xx.u; /* |x|<<1 */ uint64_t ay = yy.u + yy.u; /* |y|<<1 */ if(ax > 0x7ffull << 53) /* x is nan */ return x; if(ay > 0x7ffull << 53) /* y is nan */ return y; #ifdef ALTERNATE if(x == y) return y; /* x=y, return y */ #else if((ax == ay) && ((xx.s < 0) == (yy.s < 0))) return y; #endif if(ax == 0) { /* x == 0 */ xx.u = yy.s < 0 ? 0x8000000000000001ull : 1ull; xx.d = math_opt_barrier (xx.d); xx.d += 0.0; math_force_eval (xx.d); /* raise underflow flag */ return xx.d; /* return +-minsubnormal */ } #ifdef ALTERNATE if((x < y) == (x < 0.0)) --xx.u; else ++xx.u; #else if((ax < ay) && ((xx.s < 0) == (yy.s < 0))) ++xx.u; else --xx.u; #endif ax = xx.u + xx.u; /* |x|<<1 */ if((ax >= 0x7ffull << 53) /* overflow */ || (ax < 0x001ull << 53)) { /* underflow */ xx.d = math_opt_barrier (xx.d); xx.d += 0.0; math_force_eval (xx.d); /* raise overflow/underflow flag */ __set_errno (ERANGE); } return xx.d; } libm_alias_double (__nextafter, nextafter) #ifdef NO_LONG_DOUBLE strong_alias (__nextafter, __nexttowardl) weak_alias (__nexttowardl, nexttowardl) #undef __nexttoward strong_alias (__nextafter, __nexttoward) #undef nexttoward weak_alias (__nextafter, nexttoward) #endif --- EOF ---