From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23030 invoked by alias); 20 Feb 2017 14:39:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 23011 invoked by uid 89); 20 Feb 2017 14:39:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=Biggest, Cortex-A53, Hx-languages-length:1971, cortex-a53 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 14:38:59 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 765D12B; Mon, 20 Feb 2017 06:38:57 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B4DC43F220; Mon, 20 Feb 2017 06:38:56 -0800 (PST) Message-ID: <58AAFF7F.8030302@foss.arm.com> Date: Mon, 20 Feb 2017 14:44:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Maxim Kuvyrkov , GCC Patches CC: Andrew Pinski , Richard Guenther Subject: Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2017-02/txt/msg01226.txt.bz2 Hi Maxim, On 30/01/17 11:24, Maxim Kuvyrkov wrote: > This patch series improves -fprefetch-loop-arrays pass through small fixes and tweaks, and then enables it for several AArch64 cores. > > My tunings were done on and for Qualcomm hardware, with results varying between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, depending on hardware revision. > > This patch series enables restricted -fprefetch-loop-arrays at -O2, which also improves SPEC2006 numbers > > Biggest progressions are on 419.mcf and 437.leslie3d, with no serious regressions on other benchmarks. > > I'm now investigating making -fprefetch-loop-arrays more aggressive for Qualcomm hardware, which improves performance on most benchmarks, but also causes big regressions on 454.calculix and 462.libquantum. If I can fix these two regressions, prefetching will give another boost to AArch64. > > Andrew just posted similar prefetching tunings for Cavium's cores, and the two patches have trivial conflicts. I'll post mine as-is, since it address one of the comments on Andrew's review (adding a stand-alone struct for tuning parameters). > > Andrew, feel free to just copy-paste it to your patch, since it is just a mechanical change. > > All patches were bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu. > I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning structs entries appropriately modified to enable the changes on those cores. I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and Cortex-A53 and no noticeable regressions. I've also verified that the improvements are due to the prefetch instructions rather than just the unrolling that the pass does. So I'm in favor of enabling this for the cores that benefit from it. Do you plan to get this in for GCC 8? Thanks, Kyrill > -- > Maxim Kuvyrkov > www.linaro.org > > >