From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 53058 invoked by alias); 8 Mar 2016 22:18:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 53034 invoked by uid 89); 8 Mar 2016 22:18:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,KAM_MANYTO,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=newton, HContent-type:format, HContent-type:flowed, sqrt X-HELO: usmailout2.samsung.com Received: from mailout2.w2.samsung.com (HELO usmailout2.samsung.com) (211.189.100.12) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 08 Mar 2016 22:18:32 +0000 Received: from uscpsbgm1.samsung.com (u114.gpu85.samsung.co.kr [203.254.195.114]) by mailout2.w2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O3Q00GE4RAU5380@mailout2.w2.samsung.com> for gcc-patches@gcc.gnu.org; Tue, 08 Mar 2016 17:18:30 -0500 (EST) Received: from ussync2.samsung.com ( [203.254.195.82]) by uscpsbgm1.samsung.com (USCPMTA) with SMTP id 74.87.04845.6BF4FD65; Tue, 8 Mar 2016 17:18:30 -0500 (EST) Received: from [172.31.207.192] ([105.140.31.209]) by ussync2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O3Q00719RARZD40@ussync2.samsung.com>; Tue, 08 Mar 2016 17:18:30 -0500 (EST) Subject: Re: [AArch64] Emit square root using the Newton series To: GCC Patches , Marcus Shawcroft , James Greenhalgh , Andrew Pinski , Benedikt Huber , philipp.tomsich@theobroma-systems.com, Kyrill Tkachov References: <56674D34.80806@samsung.com> <56C38D00.9000403@samsung.com> <56D8D553.6060902@samsung.com> <56DF4D50.4060804@samsung.com> From: Evandro Menezes Message-id: <56DF4FB2.3060207@samsung.com> Date: Tue, 08 Mar 2016 22:18:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-version: 1.0 In-reply-to: <56DF4D50.4060804@samsung.com> Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-03/txt/msg00537.txt.bz2 On 03/08/16 16:08, Evandro Menezes wrote: > On 02/16/16 14:56, Evandro Menezes wrote: >> On 12/08/15 15:35, Evandro Menezes wrote: >>> Emit square root using the Newton series >>> >>> 2015-12-03 Evandro Menezes >>> >>> gcc/ >>> * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): >>> Declare new >>> function. >>> * config/aarch64/aarch64-simd.md (sqrt2): New >>> expansion and >>> insn definitions. >>> * config/aarch64/aarch64-tuning-flags.def >>> (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. >>> * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define >>> new function. >>> * config/aarch64/aarch64.md (sqrt2): New expansion >>> and insn >>> definitions. >>> * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): >>> Expand option >>> description. >>> * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. >>> >>> This patch extends the patch that added support for implementing >>> x^-1/2 using the Newton series by adding support for x^1/2 as well. >>> >>> Is it OK at this point of stage 3? >>> >>> Thank you, >>> >> >> James, >> >> As I was saying, this patch results in some validation errors in >> CPU2000 benchmarks using DF. Although proving the algorithm to be >> pretty solid with a vast set of random values, I'm confused why some >> benchmarks fail to validate with this implementation of the Newton >> series for square root too, when they pass with the Newton series for >> reciprocal square root. >> >> Since I had no problems with the same algorithm on x86-64, I wonder >> if the initial estimate on AArch64, which offers just 8 bits, whereas >> x86-64 offers 11 bits, has to do with it. Then again, the algorithm >> iterated 1 less time on x86-64 than on AArch64. >> >> Since it seems that the initial estimate is sufficient for CPU2000 to >> validate when using SF, I'm leaning towards restricting the Newton >> series for square root only for SF. >> >> Your thoughts on the matter are appreciated, > > Add choices for the reciprocal square root approximation > > Allow a target to prefer such operation depending on the FP > precision. > > gcc/ > * config/aarch64/aarch64-protos.h > (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. > (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. > * config/aarch64/aarch64.c > (use_rsqrt_p): New argument for the mode. > (aarch64_builtin_reciprocal): Devise mode from builtin. > (aarch64_optab_supported_p): New argument for the mode. Emit square root using the Newton series gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_sqrt): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (sqrt*2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md (sqrt*2): Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch, which depends on https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages the reciprocal square root approximation to emit a faster square root approximation. I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate. Perhaps the initial estimate, with just 8 bits, is not good enough for the series to converge given the workloads of such benchmarks; perhaps denormals, known to occur in some of these benchmarks, result in errors. This was the motivation to split the tuning flags between one specific for DF and the other, for SF in the previous related patch. Again, your feedback is appreciated. Thank you, -- Evandro Menezes