From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-402951-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13107 invoked by alias); 14 Jul 2015 22:04:56 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 13098 invoked by uid 89); 14 Jul 2015 22:04:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_05,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: usmailout4.samsung.com
Received: from mailout4.w2.samsung.com (HELO usmailout4.samsung.com) (211.189.100.14) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 14 Jul 2015 22:04:53 +0000
Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by usmailout4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May  5 2014)) with ESMTP id <0NRI00H9U0021O70@usmailout4.samsung.com> for gcc-patches@gcc.gnu.org; Tue, 14 Jul 2015 18:04:50 -0400 (EDT)
Received: from ussync2.samsung.com ( [203.254.195.82])	by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 08.60.29819.28785A55; Tue, 14 Jul 2015 18:04:50 -0400 (EDT)
Received: from WEMENEZES ([105.140.33.224]) by ussync2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May  5 2014)) with ESMTPA id <0NRI009FW001ZF00@ussync2.samsung.com>; Tue, 14 Jul 2015 18:04:50 -0400 (EDT)
From: Evandro Menezes <e.menezes@samsung.com>
To: "'Dr. Philipp Tomsich'" <philipp.tomsich@theobroma-systems.com>, "'Kumar, Venkataramanan'" <Venkataramanan.Kumar@amd.com>
Cc: pinskia@gmail.com, 'Benedikt Huber' <benedikt.huber@theobroma-systems.com>, gcc-patches@gcc.gnu.org
References: <1434629045-24650-1-git-send-email-benedikt.huber@theobroma-systems.com> <8B73CF78-11D4-4963-A60A-E1C2A3B219E2@gmail.com> <F2FF9755-1DF9-4000-8602-77AB12077240@theobroma-systems.com> <7794A52CE4D579448B959EED7DD0A4723DD10430@satlexdag06.amd.com> <1E4680F0-02C8-4999-958C-8B531BC850DA@theobroma-systems.com> <7794A52CE4D579448B959EED7DD0A4723DD104AF@satlexdag06.amd.com> <08D3EBD5-B67B-4D97-9940-3CAE6D020DC6@gmail.com> <7794A52CE4D579448B959EED7DD0A4723DD109D3@satlexdag06.amd.com> <1FEA8C0A-15E0-4309-B10D-B45032A68306@theobroma-systems.com>
In-reply-to: <1FEA8C0A-15E0-4309-B10D-B45032A68306@theobroma-systems.com>
Subject: RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Date: Tue, 14 Jul 2015 22:20:00 -0000
Message-id: <07eb01d0be81$1b0b2c00$51218400$@samsung.com>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg01212.txt.bz2

For both FRECPE and FRSQRTE the ARMv8 ISA guide states in their pseudo-code=
 that:

"Result is double-precision and a multiple of 1/256 in the range 1 to 511/2=
56."

This suggests that the estimate is merely 8 bits long.

IIRC, x86 returns 12 bits for its equivalent insns, requiring then a single=
 series iteration for both SP and DP to achieve a precise enough result.

--=20
Evandro Menezes                              Austin, TX


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org=
] On
> Behalf Of Dr. Philipp Tomsich
> Sent: Monday, June 29, 2015 3:47
> To: Kumar, Venkataramanan
> Cc: pinskia@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
>=20
> Kumar,
>=20
> This does not come unexpected, as the initial estimation and each iterati=
on
> will add an architecturally-defined number of bits of precision (ARMv8
> guarantuees only a minimum number of bits provided per operation=E2=80=A6=
 the exact
> number is specific to each micro-arch, though).
> Depending on your architecture and on the required number of precise bits=
 by
> any given benchmark, one may see miscompares.
>=20
> Do you know the exact number of bits that the initial estimate and the
> subsequent refinement steps add for your micro-arch?
>=20
> Thanks,
> Philipp.
>=20
> > On 29 Jun 2015, at 10:17, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> >
> >
> > Hmm,  Reducing the iterations to "1 step for float" and "2 steps for
> double"
> >
> > I got VE (miscompares) on following benchmarks 416.gamess
> > 453.povray
> > 454.calculix
> > 459.GemsFDTD
> >
> > Benedikt , I have ICE for 444.namd with your patch,  not sure if someth=
ing
> wrong in my local tree.
> >
> > Regards,
> > Venkat.
> >
> >> -----Original Message-----
> >> From: pinskia@gmail.com [mailto:pinskia@gmail.com]
> >> Sent: Sunday, June 28, 2015 8:35 PM
> >> To: Kumar, Venkataramanan
> >> Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
> >> (rsqrt) estimation in -ffast-math
> >>
> >>
> >>
> >>
> >>
> >>> On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
> >> <Venkataramanan.Kumar@amd.com> wrote:
> >>>
> >>> I got around ~12% gain with -Ofast -mcpu=3Dcortex-a57.
> >>
> >> I get around 11/12% on thunderX with the patch and the decreasing the
> >> iterations change (1/2) compared to without the patch.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>>
> >>> Regards,
> >>> Venkat.
> >>>
> >>>> -----Original Message-----
> >>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> >>>> owner@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
> >>>> Sent: Thursday, June 25, 2015 9:13 PM
> >>>> To: Kumar, Venkataramanan
> >>>> Cc: Benedikt Huber; pinskia@gmail.com; gcc-patches@gcc.gnu.org
> >>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
> >>>> (rsqrt) estimation in -ffast-math
> >>>>
> >>>> Kumar,
> >>>>
> >>>> what is the relative gain that you see on Cortex-A57?
> >>>>
> >>>> Thanks,
> >>>> Philipp.
> >>>>
> >>>>>> On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
> >>>>> <Venkataramanan.Kumar@amd.com> wrote:
> >>>>>
> >>>>> Changing to  "1 step for float" and "2 steps for double" gives
> >>>>> better gains
> >>>> now for gromacs on cortex-a57.
> >>>>>
> >>>>> Regards,
> >>>>> Venkat.
> >>>>>> -----Original Message-----
> >>>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> >>>>>> owner@gcc.gnu.org] On Behalf Of Benedikt Huber
> >>>>>> Sent: Thursday, June 25, 2015 4:09 PM
> >>>>>> To: pinskia@gmail.com
> >>>>>> Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
> >> systems.com
> >>>>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
> >>>>>> (rsqrt) estimation in -ffast-math
> >>>>>>
> >>>>>> Andrew,
> >>>>>>
> >>>>>>> This is NOT a win on thunderX at least for single precision
> >>>>>>> because you have
> >>>>>> to do the divide and sqrt in the same time as it takes 5
> >>>>>> multiples (estimate and step are multiplies in the thunderX pipeli=
ne).
> >>>>>> Doubles is 10 multiplies which is just the same as what the patch
> >>>>>> does (but it is really slightly less than 10, I rounded up). So
> >>>>>> in the end this is NOT a win at all for thunderX unless we do one
> >>>>>> less step for both single
> >>>> and double.
> >>>>>>
> >>>>>> Yes, the expected benefit from rsqrt estimation is implementation
> >>>>>> specific. If one has a better initial rsqrte or an application
> >>>>>> that can trade precision for execution time, we could offer a
> >>>>>> command line option to do only 2 steps for doulbe and 1 step for
> >>>>>> float; similar to -
> >>>> mrecip-precision for PowerPC.
> >>>>>> What are your thoughts on that?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Benedikt
> >>>