From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32948 invoked by alias); 24 Jun 2015 20:08:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 32938 invoked by uid 89); 24 Jun 2015 20:08:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,T_HK_NAME_DR autolearn=ham version=3.3.2 X-HELO: mail.theobroma-systems.com Received: from vegas.theobroma-systems.com (HELO mail.theobroma-systems.com) (144.76.126.164) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 24 Jun 2015 20:08:09 +0000 Received: from 178-18-170-150.customer.bnet.at ([178.18.170.150]:63334 helo=[192.168.2.129]) by mail.theobroma-systems.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1Z7qxq-0006ry-IQ; Wed, 24 Jun 2015 22:08:06 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math From: "Dr. Philipp Tomsich" In-Reply-To: <02a401d0aeaa$5a5e7ec0$0f1b7c40$@samsung.com> Date: Wed, 24 Jun 2015 20:11:00 -0000 Cc: Benedikt Huber , gcc-patches@gcc.gnu.org Content-Transfer-Encoding: quoted-printable Message-Id: <3F0CA634-AF3D-4AD9-8702-9B5D19821889@theobroma-systems.com> References: <1434629045-24650-1-git-send-email-benedikt.huber@theobroma-systems.com> <027701d0ae9c$b8f3eff0$2adbcfd0$@samsung.com> <56A9A836-05BF-409C-A8D4-91B7ABEC5EE9@theobroma-systems.com> <02a401d0aeaa$5a5e7ec0$0f1b7c40$@samsung.com> To: Evandro Menezes X-SW-Source: 2015-06/txt/msg01738.txt.bz2 Evandro, Shouldn't =E2=80=98execute_cse_reciprocals_1=E2=80=99 take care of this, on= ce the reciprocal-division is implemented? Do you think there=E2=80=99s additional work needed to catch all cases/oppo= rtunities? Best, Philipp. > On 24 Jun 2015, at 20:19, Evandro Menezes wrote: >=20 > Benedikt, >=20 > Are you developing the reciprocal approximation just for 1/x proper or fo= r any division, as in x/y =3D x * 1/y? >=20 > Thank you, >=20 > --=20 > Evandro Menezes Austin, TX >=20 >=20 >> -----Original Message----- >> From: Benedikt Huber [mailto:benedikt.huber@theobroma-systems.com] >> Sent: Wednesday, June 24, 2015 12:11 >> To: Dr. Philipp Tomsich >> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) >> estimation in -ffast-math >>=20 >> Evandro, >>=20 >> Yes, we also have the 1/x approximation. >> However we do not have the test cases yet, and it also would need some c= lean >> up. >> I am going to provide a patch for that soon (say next week). >> Also, for this optimization we have *not* yet found a benchmark with >> significant improvements. >>=20 >> Best Regards, >> Benedikt >>=20 >>=20 >>> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich > systems.com> wrote: >>>=20 >>> Evandro, >>>=20 >>> We=E2=80=99ve seen a 28% speed-up on gromacs in SPECfp for the (scalar)= reciprocal >> sqrt. >>>=20 >>> Also, the =E2=80=9Creciprocal divide=E2=80=9D patches are floating arou= nd in various >>> of our git-tree, but aren=E2=80=99t ready for public consumption, yet= =E2=80=A6 I=E2=80=99ll >>> leave Benedikt to comment on potential timelines for getting that pushed >> out. >>>=20 >>> Best, >>> Philipp. >>>=20 >>>> On 24 Jun 2015, at 18:42, Evandro Menezes wrot= e: >>>>=20 >>>> Benedikt, >>>>=20 >>>> You beat me to it! :-) Do you have the implementation for dividing >>>> using the Newton series as well? >>>>=20 >>>> I'm not sure that the series is always for all data types and on all >>>> processors. It would be useful to allow each AArch64 processor to >>>> enable this or not depending on the data type. BTW, do you have some >>>> tests showing the speed up? >>>>=20 >>>> Thank you, >>>>=20 >>>> -- >>>> Evandro Menezes Austin, TX >>>>=20 >>>>> -----Original Message----- >>>>> From: gcc-patches-owner@gcc.gnu.org >>>>> [mailto:gcc-patches-owner@gcc.gnu.org] >>>> On >>>>> Behalf Of Benedikt Huber >>>>> Sent: Thursday, June 18, 2015 7:04 >>>>> To: gcc-patches@gcc.gnu.org >>>>> Cc: benedikt.huber@theobroma-systems.com; philipp.tomsich@theobroma- >>>>> systems.com >>>>> Subject: [PATCH] [aarch64] Implemented reciprocal square root >>>>> (rsqrt) estimation in -ffast-math >>>>>=20 >>>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt >>>>> estimation >>>> and >>>>> a Newton-Raphson step, respectively. >>>>> There are ARMv8 implementations where this is faster than using fdiv >>>>> and rsqrt. >>>>> It runs three steps for double and two steps for float to achieve >>>>> the >>>> needed >>>>> precision. >>>>>=20 >>>>> There is one caveat and open question. >>>>> Since -ffast-math enables flush to zero intermediate values between >>>>> approximation steps will be flushed to zero if they are denormal. >>>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). >>>>> The test cases pass, but it is unclear to me whether this is >>>>> expected behavior with -ffast-math. >>>>>=20 >>>>> The patch applies to commit: >>>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 >>>>>=20 >>>>> Please consider including this patch. >>>>> Thank you and best regards, >>>>> Benedikt Huber >>>>>=20 >>>>> Benedikt Huber (1): >>>>> 2015-06-15 Benedikt Huber >>>>>=20 >>>>> gcc/ChangeLog | 9 +++ >>>>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ >>>>> gcc/config/aarch64/aarch64-protos.h | 2 + >>>>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ >>>>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ >>>>> gcc/config/aarch64/aarch64.md | 3 + >>>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 >>>>> +++++++++++++++++++++++++++++++ >>>>> 7 files changed, 277 insertions(+) >>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c >>>>>=20 >>>>> -- >>>>> 1.9.1 >>>> >>>=20 >=20 >=20