From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-401209-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 66777 invoked by alias); 24 Jun 2015 20:39:34 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 66765 invoked by uid 89); 24 Jun 2015 20:39:33 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: usmailout3.samsung.com
Received: from mailout3.w2.samsung.com (HELO usmailout3.samsung.com) (211.189.100.13) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 24 Jun 2015 20:39:31 +0000
Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by usmailout3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May  5 2014)) with ESMTP id <0NQG0045IUPTZ7A0@usmailout3.samsung.com> for gcc-patches@gcc.gnu.org; Wed, 24 Jun 2015 16:39:29 -0400 (EDT)
Received: from ussync4.samsung.com ( [203.254.195.84])	by uscpsbgm2.samsung.com (USCPMTA) with SMTP id CD.D4.29819.1851B855; Wed, 24 Jun 2015 16:39:29 -0400 (EDT)
Received: from WEMENEZES ([105.140.33.224]) by ussync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May  5 2014)) with ESMTPA id <0NQG00GC1UPSJD40@ussync4.samsung.com>; Wed, 24 Jun 2015 16:39:29 -0400 (EDT)
From: Evandro Menezes <e.menezes@samsung.com>
To: "'Dr. Philipp Tomsich'" <philipp.tomsich@theobroma-systems.com>
Cc: 'Benedikt Huber' <benedikt.huber@theobroma-systems.com>, gcc-patches@gcc.gnu.org
References: <1434629045-24650-1-git-send-email-benedikt.huber@theobroma-systems.com> <027701d0ae9c$b8f3eff0$2adbcfd0$@samsung.com> <56A9A836-05BF-409C-A8D4-91B7ABEC5EE9@theobroma-systems.com> <D4A9BB7C-4582-4D3B-892B-E03B38F0D3E8@theobroma-systems.com> <02a401d0aeaa$5a5e7ec0$0f1b7c40$@samsung.com> <3F0CA634-AF3D-4AD9-8702-9B5D19821889@theobroma-systems.com>
In-reply-to: <3F0CA634-AF3D-4AD9-8702-9B5D19821889@theobroma-systems.com>
Subject: RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Date: Wed, 24 Jun 2015 20:54:00 -0000
Message-id: <02c701d0aebd$de09f1b0$9a1dd510$@samsung.com>
MIME-version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-06/txt/msg01744.txt.bz2

Philipp,

I think that execute_cse_reciprocals_1() applies only when the denominator =
is known at compile-time, otherwise the division stays.  It doesn't seem to=
 know whether the target supports the approximate reciprocal or not.

Cheers,

--=20
Evandro Menezes                              Austin, TX


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org=
] On
> Behalf Of Dr. Philipp Tomsich
> Sent: Wednesday, June 24, 2015 15:08
> To: Evandro Menezes
> Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
>=20
> Evandro,
>=20
> Shouldn't =E2=80=98execute_cse_reciprocals_1=E2=80=99 take care of this, =
once the reciprocal-
> division is implemented?
> Do you think there=E2=80=99s additional work needed to catch all cases/op=
portunities?
>=20
> Best,
> Philipp.
>=20
> > On 24 Jun 2015, at 20:19, Evandro Menezes <e.menezes@samsung.com> wrote:
> >
> > Benedikt,
> >
> > Are you developing the reciprocal approximation just for 1/x proper or =
for
> any division, as in x/y =3D x * 1/y?
> >
> > Thank you,
> >
> > --
> > Evandro Menezes                              Austin, TX
> >
> >
> >> -----Original Message-----
> >> From: Benedikt Huber [mailto:benedikt.huber@theobroma-systems.com]
> >> Sent: Wednesday, June 24, 2015 12:11
> >> To: Dr. Philipp Tomsich
> >> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
> >> (rsqrt) estimation in -ffast-math
> >>
> >> Evandro,
> >>
> >> Yes, we also have the 1/x approximation.
> >> However we do not have the test cases yet, and it also would need
> >> some clean up.
> >> I am going to provide a patch for that soon (say next week).
> >> Also, for this optimization we have *not* yet found a benchmark with
> >> significant improvements.
> >>
> >> Best Regards,
> >> Benedikt
> >>
> >>
> >>> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich
> >>> <philipp.tomsich@theobroma-
> >> systems.com> wrote:
> >>>
> >>> Evandro,
> >>>
> >>> We=E2=80=99ve seen a 28% speed-up on gromacs in SPECfp for the (scala=
r)
> >>> reciprocal
> >> sqrt.
> >>>
> >>> Also, the =E2=80=9Creciprocal divide=E2=80=9D patches are floating ar=
ound in various
> >>> of our git-tree, but aren=E2=80=99t ready for public consumption, yet=
=E2=80=A6 I=E2=80=99ll
> >>> leave Benedikt to comment on potential timelines for getting that
> >>> pushed
> >> out.
> >>>
> >>> Best,
> >>> Philipp.
> >>>
> >>>> On 24 Jun 2015, at 18:42, Evandro Menezes <e.menezes@samsung.com> wr=
ote:
> >>>>
> >>>> Benedikt,
> >>>>
> >>>> You beat me to it! :-)  Do you have the implementation for dividing
> >>>> using the Newton series as well?
> >>>>
> >>>> I'm not sure that the series is always for all data types and on
> >>>> all processors.  It would be useful to allow each AArch64 processor
> >>>> to enable this or not depending on the data type.  BTW, do you have
> >>>> some tests showing the speed up?
> >>>>
> >>>> Thank you,
> >>>>
> >>>> --
> >>>> Evandro Menezes                              Austin, TX
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: gcc-patches-owner@gcc.gnu.org
> >>>>> [mailto:gcc-patches-owner@gcc.gnu.org]
> >>>> On
> >>>>> Behalf Of Benedikt Huber
> >>>>> Sent: Thursday, June 18, 2015 7:04
> >>>>> To: gcc-patches@gcc.gnu.org
> >>>>> Cc: benedikt.huber@theobroma-systems.com;
> >>>>> philipp.tomsich@theobroma- systems.com
> >>>>> Subject: [PATCH] [aarch64] Implemented reciprocal square root
> >>>>> (rsqrt) estimation in -ffast-math
> >>>>>
> >>>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> >>>>> estimation
> >>>> and
> >>>>> a Newton-Raphson step, respectively.
> >>>>> There are ARMv8 implementations where this is faster than using
> >>>>> fdiv and rsqrt.
> >>>>> It runs three steps for double and two steps for float to achieve
> >>>>> the
> >>>> needed
> >>>>> precision.
> >>>>>
> >>>>> There is one caveat and open question.
> >>>>> Since -ffast-math enables flush to zero intermediate values
> >>>>> between approximation steps will be flushed to zero if they are
> denormal.
> >>>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MA=
X).
> >>>>> The test cases pass, but it is unclear to me whether this is
> >>>>> expected behavior with -ffast-math.
> >>>>>
> >>>>> The patch applies to commit:
> >>>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> >>>>>
> >>>>> Please consider including this patch.
> >>>>> Thank you and best regards,
> >>>>> Benedikt Huber
> >>>>>
> >>>>> Benedikt Huber (1):
> >>>>> 2015-06-15  Benedikt Huber  <benedikt.huber@theobroma-systems.com>
> >>>>>
> >>>>> gcc/ChangeLog                            |   9 +++
> >>>>> gcc/config/aarch64/aarch64-builtins.c    |  60 ++++++++++++++++
> >>>>> gcc/config/aarch64/aarch64-protos.h      |   2 +
> >>>>> gcc/config/aarch64/aarch64-simd.md       |  27 ++++++++
> >>>>> gcc/config/aarch64/aarch64.c             |  63 +++++++++++++++++
> >>>>> gcc/config/aarch64/aarch64.md            |   3 +
> >>>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> >>>>> +++++++++++++++++++++++++++++++
> >>>>> 7 files changed, 277 insertions(+) create mode 100644
> >>>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c
> >>>>>
> >>>>> --
> >>>>> 1.9.1
> >>>> <Mail Attachment.eml>
> >>>
> >
> >