Evandro, Yes, we also have the 1/x approximation. However we do not have the test cases yet, and it also would need some clean up. I am going to provide a patch for that soon (say next week). Also, for this optimization we have *not* yet found a benchmark with significant improvements. Best Regards, Benedikt > On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich wrote: > > Evandro, > > We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal sqrt. > > Also, the “reciprocal divide” patches are floating around in various of our git-tree, but > aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on potential > timelines for getting that pushed out. > > Best, > Philipp. > >> On 24 Jun 2015, at 18:42, Evandro Menezes wrote: >> >> Benedikt, >> >> You beat me to it! :-) Do you have the implementation for dividing using >> the Newton series as well? >> >> I'm not sure that the series is always for all data types and on all >> processors. It would be useful to allow each AArch64 processor to enable >> this or not depending on the data type. BTW, do you have some tests showing >> the speed up? >> >> Thank you, >> >> -- >> Evandro Menezes Austin, TX >> >>> -----Original Message----- >>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] >> On >>> Behalf Of Benedikt Huber >>> Sent: Thursday, June 18, 2015 7:04 >>> To: gcc-patches@gcc.gnu.org >>> Cc: benedikt.huber@theobroma-systems.com; philipp.tomsich@theobroma- >>> systems.com >>> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) >>> estimation in -ffast-math >>> >>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation >> and >>> a Newton-Raphson step, respectively. >>> There are ARMv8 implementations where this is faster than using fdiv and >>> rsqrt. >>> It runs three steps for double and two steps for float to achieve the >> needed >>> precision. >>> >>> There is one caveat and open question. >>> Since -ffast-math enables flush to zero intermediate values between >>> approximation steps will be flushed to zero if they are denormal. >>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). >>> The test cases pass, but it is unclear to me whether this is expected >>> behavior with -ffast-math. >>> >>> The patch applies to commit: >>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 >>> >>> Please consider including this patch. >>> Thank you and best regards, >>> Benedikt Huber >>> >>> Benedikt Huber (1): >>> 2015-06-15 Benedikt Huber >>> >>> gcc/ChangeLog | 9 +++ >>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ >>> gcc/config/aarch64/aarch64-protos.h | 2 + >>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ >>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ >>> gcc/config/aarch64/aarch64.md | 3 + >>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 >>> +++++++++++++++++++++++++++++++ >>> 7 files changed, 277 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c >>> >>> -- >>> 1.9.1 >> >