From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12225 invoked by alias); 15 Feb 2016 17:25:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 12185 invoked by uid 89); 15 Feb 2016 17:24:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=wit, Hx-languages-length:2516, HContent-transfer-encoding:7bit, suspicion X-HELO: usmailout2.samsung.com Received: from mailout2.w2.samsung.com (HELO usmailout2.samsung.com) (211.189.100.12) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 15 Feb 2016 17:24:58 +0000 Received: from uscpsbgm1.samsung.com (u114.gpu85.samsung.co.kr [203.254.195.114]) by mailout2.w2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O2L004HLN1JZZB0@mailout2.w2.samsung.com> for gcc-patches@gcc.gnu.org; Mon, 15 Feb 2016 12:24:55 -0500 (EST) Received: from ussync1.samsung.com ( [203.254.195.81]) by uscpsbgm1.samsung.com (USCPMTA) with SMTP id BF.9B.23844.7E902C65; Mon, 15 Feb 2016 12:24:55 -0500 (EST) Received: from [172.31.207.192] ([105.140.31.209]) by ussync1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O2L004G3N1IFE90@ussync1.samsung.com>; Mon, 15 Feb 2016 12:24:55 -0500 (EST) Subject: Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning To: James Greenhalgh , gcc-patches@gcc.gnu.org References: <1452513219-25168-1-git-send-email-james.greenhalgh@arm.com> <1452513883-25826-1-git-send-email-james.greenhalgh@arm.com> <20160125112045.GA8599@arm.com> <20160201140000.GB17622@arm.com> <20160208105710.GB39718@arm.com> <20160215105025.GD16295@arm.com> Cc: nd@arm.com, marcus.shawcroft@arm.com, richard.earnshaw@arm.com, Venkataramanan.Kumar@amd.com, philipp.tomsich@theobroma-systems.com, pinskia@gmail.com, Kyrylo.Tkachov@arm.com From: Evandro Menezes Message-id: <56C209E5.9020904@samsung.com> Date: Mon, 15 Feb 2016 17:25:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-version: 1.0 In-reply-to: <20160215105025.GD16295@arm.com> Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-02/txt/msg00998.txt.bz2 On 02/15/16 04:50, James Greenhalgh wrote: > On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote: >> On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote: >>> On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote: >>>> On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: >>>>> Hi, >>>>> >>>>> I've seen a couple of large performance issues caused by expanding >>>>> the high-precision reciprocal square root for Cortex-A57, so I'd like >>>>> to turn it off by default. >>>>> >>>>> This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from >>>>> Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for >>>>> some private microbenchmark kernels which stress the divide/sqrt/multiply >>>>> units. It therefore seems to me to be the correct choice to make across >>>>> a number of workloads. >>>>> >>>>> Bootstrapped and tested on aarch64-none-linux-gnu with no issues. >>>>> >>>>> OK? >>>> *Ping* >>> *pingx2* >> *ping^3* > *ping^4* > > Thanks, > James > >>>>> --- >>>>> 2015-12-11 James Greenhalgh >>>>> >>>>> * config/aarch64/aarch64.c (cortexa57_tunings): Remove >>>>> AARCH64_EXTRA_TUNE_RECIP_SQRT. >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>> index 1d5d898..999c9fc 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = >>>>> 0, /* max_case_values. */ >>>>> 0, /* cache_line_size. */ >>>>> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ >>>>> - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS >>>>> - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ >>>>> + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ >>>>> }; >>>>> >>>>> static const struct tune_params cortexa72_tunings = > James, There seem to be SPEC CPU2000fp validation issues on A57 when this flag is present too. Though I evaluated the algorithm with a huge random set of values, always delivering accuracy around 1ulp, which should be enough for CPU2000fp (wit x86-64), I expected the benchmarks to pass. My suspicion is that the Newton series on AArch64 is probably good only for SP. Then, DP might require an extra round, probably exacerbating the performance penalty. I'd like to try to split this tuning option into one for SP and another for DP. Thoughts? Thank you, -- Evandro Menezes