From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 61941 invoked by alias); 27 Jun 2015 01:33:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 61924 invoked by uid 89); 27 Jun 2015 01:33:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-la0-f49.google.com Received: from mail-la0-f49.google.com (HELO mail-la0-f49.google.com) (209.85.215.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Sat, 27 Jun 2015 01:33:54 +0000 Received: by laar3 with SMTP id r3so11590039laa.0 for ; Fri, 26 Jun 2015 18:33:50 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.152.21.168 with SMTP id w8mr4054037lae.22.1435368830858; Fri, 26 Jun 2015 18:33:50 -0700 (PDT) Received: by 10.25.62.129 with HTTP; Fri, 26 Jun 2015 18:33:50 -0700 (PDT) In-Reply-To: <558BBAB8.60705@foss.arm.com> References: <1434629045-24650-1-git-send-email-benedikt.huber@theobroma-systems.com> <8B73CF78-11D4-4963-A60A-E1C2A3B219E2@gmail.com> <558BBAB8.60705@foss.arm.com> Date: Sat, 27 Jun 2015 02:01:00 -0000 Message-ID: Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math From: Andrew Pinski To: Ramana Radhakrishnan Cc: Benedikt Huber , "gcc-patches@gcc.gnu.org" , "philipp.tomsich@theobroma-systems.com" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-06/txt/msg02006.txt.bz2 On Thu, Jun 25, 2015 at 1:24 AM, Ramana Radhakrishnan wrote: > Benedikt, > > On 25/06/15 08:01, pinskia@gmail.com wrote: >> >> >> >> >> >>> On Jun 18, 2015, at 5:04 AM, Benedikt Huber >>> wrote: >>> >>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation >>> and >>> a Newton-Raphson step, respectively. >>> There are ARMv8 implementations where this is faster than using fdiv and >>> rsqrt. >>> It runs three steps for double and two steps for float to achieve the >>> needed precision. >> >> >> This is NOT a win on thunderX at least for single precision because you >> have to do the divide and sqrt in the same time as it takes 5 multiples >> (estimate and step are multiplies in the thunderX pipeline). Doubles is 10 >> multiplies which is just the same as what the patch does (but it is really >> slightly less than 10, I rounded up). So in the end this is NOT a win at all >> for thunderX unless we do one less step for both single and double. >> > > > Have you seen this https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00164.html > ? Really this is something that should be gated by the costs infrastructure Yes I saw that in fact I did not look into the latencies of our core until this patch came out. But yes this should be gated by a cost infrastructure and most likely not as part of the -mcpu=generic cost (well the rsqrt if we change it to 1 iterations and 2 iterations). Thanks, Andrew > . > > > regards > Ramana > > > > > > >> Thanks, >> Andrew >> >> >>> >>> There is one caveat and open question. >>> Since -ffast-math enables flush to zero intermediate values between >>> approximation steps >>> will be flushed to zero if they are denormal. >>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). >>> The test cases pass, but it is unclear to me whether this is expected >>> behavior with -ffast-math. >>> >>> The patch applies to commit: >>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 >>> >>> Please consider including this patch. >>> Thank you and best regards, >>> Benedikt Huber >>> >>> Benedikt Huber (1): >>> 2015-06-15 Benedikt Huber >>> >>> gcc/ChangeLog | 9 +++ >>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ >>> gcc/config/aarch64/aarch64-protos.h | 2 + >>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ >>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ >>> gcc/config/aarch64/aarch64.md | 3 + >>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 >>> +++++++++++++++++++++++++++++++ >>> 7 files changed, 277 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c >>> >>> -- >>> 1.9.1 >>> >