From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13107 invoked by alias); 14 Jul 2015 22:04:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13098 invoked by uid 89); 14 Jul 2015 22:04:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_05,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: usmailout4.samsung.com Received: from mailout4.w2.samsung.com (HELO usmailout4.samsung.com) (211.189.100.14) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 14 Jul 2015 22:04:53 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by usmailout4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0NRI00H9U0021O70@usmailout4.samsung.com> for gcc-patches@gcc.gnu.org; Tue, 14 Jul 2015 18:04:50 -0400 (EDT) Received: from ussync2.samsung.com ( [203.254.195.82]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 08.60.29819.28785A55; Tue, 14 Jul 2015 18:04:50 -0400 (EDT) Received: from WEMENEZES ([105.140.33.224]) by ussync2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0NRI009FW001ZF00@ussync2.samsung.com>; Tue, 14 Jul 2015 18:04:50 -0400 (EDT) From: Evandro Menezes To: "'Dr. Philipp Tomsich'" , "'Kumar, Venkataramanan'" Cc: pinskia@gmail.com, 'Benedikt Huber' , gcc-patches@gcc.gnu.org References: <1434629045-24650-1-git-send-email-benedikt.huber@theobroma-systems.com> <8B73CF78-11D4-4963-A60A-E1C2A3B219E2@gmail.com> <7794A52CE4D579448B959EED7DD0A4723DD10430@satlexdag06.amd.com> <1E4680F0-02C8-4999-958C-8B531BC850DA@theobroma-systems.com> <7794A52CE4D579448B959EED7DD0A4723DD104AF@satlexdag06.amd.com> <08D3EBD5-B67B-4D97-9940-3CAE6D020DC6@gmail.com> <7794A52CE4D579448B959EED7DD0A4723DD109D3@satlexdag06.amd.com> <1FEA8C0A-15E0-4309-B10D-B45032A68306@theobroma-systems.com> In-reply-to: <1FEA8C0A-15E0-4309-B10D-B45032A68306@theobroma-systems.com> Subject: RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Date: Tue, 14 Jul 2015 22:20:00 -0000 Message-id: <07eb01d0be81$1b0b2c00$51218400$@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg01212.txt.bz2 For both FRECPE and FRSQRTE the ARMv8 ISA guide states in their pseudo-code= that: "Result is double-precision and a multiple of 1/256 in the range 1 to 511/2= 56." This suggests that the estimate is merely 8 bits long. IIRC, x86 returns 12 bits for its equivalent insns, requiring then a single= series iteration for both SP and DP to achieve a precise enough result. --=20 Evandro Menezes Austin, TX > -----Original Message----- > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org= ] On > Behalf Of Dr. Philipp Tomsich > Sent: Monday, June 29, 2015 3:47 > To: Kumar, Venkataramanan > Cc: pinskia@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) > estimation in -ffast-math >=20 > Kumar, >=20 > This does not come unexpected, as the initial estimation and each iterati= on > will add an architecturally-defined number of bits of precision (ARMv8 > guarantuees only a minimum number of bits provided per operation=E2=80=A6= the exact > number is specific to each micro-arch, though). > Depending on your architecture and on the required number of precise bits= by > any given benchmark, one may see miscompares. >=20 > Do you know the exact number of bits that the initial estimate and the > subsequent refinement steps add for your micro-arch? >=20 > Thanks, > Philipp. >=20 > > On 29 Jun 2015, at 10:17, Kumar, Venkataramanan > wrote: > > > > > > Hmm, Reducing the iterations to "1 step for float" and "2 steps for > double" > > > > I got VE (miscompares) on following benchmarks 416.gamess > > 453.povray > > 454.calculix > > 459.GemsFDTD > > > > Benedikt , I have ICE for 444.namd with your patch, not sure if someth= ing > wrong in my local tree. > > > > Regards, > > Venkat. > > > >> -----Original Message----- > >> From: pinskia@gmail.com [mailto:pinskia@gmail.com] > >> Sent: Sunday, June 28, 2015 8:35 PM > >> To: Kumar, Venkataramanan > >> Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root > >> (rsqrt) estimation in -ffast-math > >> > >> > >> > >> > >> > >>> On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan > >> wrote: > >>> > >>> I got around ~12% gain with -Ofast -mcpu=3Dcortex-a57. > >> > >> I get around 11/12% on thunderX with the patch and the decreasing the > >> iterations change (1/2) compared to without the patch. > >> > >> Thanks, > >> Andrew > >> > >> > >>> > >>> Regards, > >>> Venkat. > >>> > >>>> -----Original Message----- > >>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > >>>> owner@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich > >>>> Sent: Thursday, June 25, 2015 9:13 PM > >>>> To: Kumar, Venkataramanan > >>>> Cc: Benedikt Huber; pinskia@gmail.com; gcc-patches@gcc.gnu.org > >>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root > >>>> (rsqrt) estimation in -ffast-math > >>>> > >>>> Kumar, > >>>> > >>>> what is the relative gain that you see on Cortex-A57? > >>>> > >>>> Thanks, > >>>> Philipp. > >>>> > >>>>>> On 25 Jun 2015, at 17:35, Kumar, Venkataramanan > >>>>> wrote: > >>>>> > >>>>> Changing to "1 step for float" and "2 steps for double" gives > >>>>> better gains > >>>> now for gromacs on cortex-a57. > >>>>> > >>>>> Regards, > >>>>> Venkat. > >>>>>> -----Original Message----- > >>>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > >>>>>> owner@gcc.gnu.org] On Behalf Of Benedikt Huber > >>>>>> Sent: Thursday, June 25, 2015 4:09 PM > >>>>>> To: pinskia@gmail.com > >>>>>> Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- > >> systems.com > >>>>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root > >>>>>> (rsqrt) estimation in -ffast-math > >>>>>> > >>>>>> Andrew, > >>>>>> > >>>>>>> This is NOT a win on thunderX at least for single precision > >>>>>>> because you have > >>>>>> to do the divide and sqrt in the same time as it takes 5 > >>>>>> multiples (estimate and step are multiplies in the thunderX pipeli= ne). > >>>>>> Doubles is 10 multiplies which is just the same as what the patch > >>>>>> does (but it is really slightly less than 10, I rounded up). So > >>>>>> in the end this is NOT a win at all for thunderX unless we do one > >>>>>> less step for both single > >>>> and double. > >>>>>> > >>>>>> Yes, the expected benefit from rsqrt estimation is implementation > >>>>>> specific. If one has a better initial rsqrte or an application > >>>>>> that can trade precision for execution time, we could offer a > >>>>>> command line option to do only 2 steps for doulbe and 1 step for > >>>>>> float; similar to - > >>>> mrecip-precision for PowerPC. > >>>>>> What are your thoughts on that? > >>>>>> > >>>>>> Best regards, > >>>>>> Benedikt > >>>