Evandro,

Yes, we also have the 1/x approximation.
However we do not have the test cases yet, and it also would need some clean up.
I am going to provide a patch for that soon (say next week).
Also, for this optimization we have *not* yet found a benchmark with significant improvements.

Best Regards,
Benedikt


> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich <philipp.tomsich@theobroma-systems.com> wrote:
> 
> Evandro,
> 
> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal sqrt.
> 
> Also, the “reciprocal divide” patches are floating around in various of our git-tree, but
> aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on potential
> timelines for getting that pushed out.
> 
> Best,
> Philipp.
> 
>> On 24 Jun 2015, at 18:42, Evandro Menezes <e.menezes@samsung.com> wrote:
>> 
>> Benedikt,
>> 
>> You beat me to it! :-)  Do you have the implementation for dividing using
>> the Newton series as well?
>> 
>> I'm not sure that the series is always for all data types and on all
>> processors.  It would be useful to allow each AArch64 processor to enable
>> this or not depending on the data type.  BTW, do you have some tests showing
>> the speed up?
>> 
>> Thank you,
>> 
>> --
>> Evandro Menezes                              Austin, TX
>> 
>>> -----Original Message-----
>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org]
>> On
>>> Behalf Of Benedikt Huber
>>> Sent: Thursday, June 18, 2015 7:04
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: benedikt.huber@theobroma-systems.com; philipp.tomsich@theobroma-
>>> systems.com
>>> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>>> estimation in -ffast-math
>>> 
>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
>> and
>>> a Newton-Raphson step, respectively.
>>> There are ARMv8 implementations where this is faster than using fdiv and
>>> rsqrt.
>>> It runs three steps for double and two steps for float to achieve the
>> needed
>>> precision.
>>> 
>>> There is one caveat and open question.
>>> Since -ffast-math enables flush to zero intermediate values between
>>> approximation steps will be flushed to zero if they are denormal.
>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
>>> The test cases pass, but it is unclear to me whether this is expected
>>> behavior with -ffast-math.
>>> 
>>> The patch applies to commit:
>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
>>> 
>>> Please consider including this patch.
>>> Thank you and best regards,
>>> Benedikt Huber
>>> 
>>> Benedikt Huber (1):
>>> 2015-06-15  Benedikt Huber  <benedikt.huber@theobroma-systems.com>
>>> 
>>> gcc/ChangeLog                            |   9 +++
>>> gcc/config/aarch64/aarch64-builtins.c    |  60 ++++++++++++++++
>>> gcc/config/aarch64/aarch64-protos.h      |   2 +
>>> gcc/config/aarch64/aarch64-simd.md       |  27 ++++++++
>>> gcc/config/aarch64/aarch64.c             |  63 +++++++++++++++++
>>> gcc/config/aarch64/aarch64.md            |   3 +
>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
>>> +++++++++++++++++++++++++++++++
>>> 7 files changed, 277 insertions(+)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>> 
>>> --
>>> 1.9.1
>> <Mail Attachment.eml>
>