From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119875 invoked by alias); 23 Jan 2018 13:12:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 119860 invoked by uid 89); 23 Jan 2018 13:12:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=H*r:ip*192.168.1.107, 10950 X-HELO: mail-qt0-f196.google.com Received: from mail-qt0-f196.google.com (HELO mail-qt0-f196.google.com) (209.85.216.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 23 Jan 2018 13:12:25 +0000 Received: by mail-qt0-f196.google.com with SMTP id z11so1073411qtm.3 for ; Tue, 23 Jan 2018 05:12:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/s8Er2shLQZtyLOp6/6RuBHbxBYwIWHe0G29TkL5H0c=; b=ayOaOYXpBgU67e1T48dGL8TldfRNdWVgtljVax8g27o+J1hivt9u1qsvWmeqnriuqS WWmHyWTgWjIyKOeTfBoq0OnEU4Ttcg2QNJanImwsGpYF0do+nSAplHWV6hTXdJNkEqVH 5tx+/J7zgTGzpryuywzRane4vWIjPm0yT5LgRvRe4nf4SIRprfCsE9idki5ohhNJZio2 Lq6W/msPQe37gITT4L4Z1ZlkB0XM1yI+DdIWZu8G19x1jEa1M0WmzAuX5XH1HIFI+gK5 ur9MvzP1GKSA/GgGBQlRiTD/Zp/5mtZQ8pYKFnL4FO3vbV73HwgA+8cXfZxpNoXn6iN4 eRdA== X-Gm-Message-State: AKwxytemsz61qZdPF9bYBh9iF021O2Bu06IhWRHi5qfAHl8h6GAbohte 8PbpPDvwdIv6epKVytIrji4yfw== X-Google-Smtp-Source: AH8x2271qnY8/hyBKIgTwX3t3lNzY5eowVoPbFTF2qLicqZ9xls0EKB6l6MwURCNNgCl0RDh13yavQ== X-Received: by 10.237.58.67 with SMTP id n61mr3463423qte.31.1516713143462; Tue, 23 Jan 2018 05:12:23 -0800 (PST) Received: from [192.168.1.107] ([177.180.105.91]) by smtp.gmail.com with ESMTPSA id y23sm255786qtj.20.2018.01.23.05.12.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jan 2018 05:12:22 -0800 (PST) Subject: Re: [PATCH 1/2] Introduce prefetch-minimum stride option To: Kyrill Tkachov , "gcc-patches@gcc.gnu.org" Cc: James Greenhalgh , Richard Earnshaw References: <1516628770-25036-1-git-send-email-luis.machado@linaro.org> <1516628770-25036-2-git-send-email-luis.machado@linaro.org> <5A670133.4010303@foss.arm.com> From: Luis Machado Message-ID: <1a274e3d-d119-257e-69a4-055cc144bd5d@linaro.org> Date: Tue, 23 Jan 2018 13:23:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <5A670133.4010303@foss.arm.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-01/txt/msg01948.txt.bz2 Hi Kyrill, On 01/23/2018 07:32 AM, Kyrill Tkachov wrote: > Hi Luis, > > On 22/01/18 13:46, Luis Machado wrote: >> This patch adds a new option to control the minimum stride, for a memory >> reference, after which the loop prefetch pass may issue software prefetch >> hints for. There are two motivations: >> >> * Make the pass less aggressive, only issuing prefetch hints for >> bigger strides >> that are more likely to benefit from prefetching. I've noticed a case >> in cpu2017 >> where we were issuing thousands of hints, for example. >> > > I've noticed a large amount of prefetch hints being issued as well, but > had not > analysed it further. > I've gathered some numbers for this. Some of the most extreme cases before both patches: CPU2017 xalancbmk_s: 3755 hints wrf_s: 10950 hints parest_r: 8521 hints CPU2006 gamess: 11377 hints wrf: 3238 hints After both patches: CPU2017 xalancbmk_s: 1 hint wrf_s: 20 hints parest_r: 0 hints CPU2006 gamess: 44 hints wrf: 16 hints >> * For processors that have a hardware prefetcher, like Falkor, it >> allows the >> loop prefetch pass to defer prefetching of smaller (less than the >> threshold) >> strides to the hardware prefetcher instead. This prevents conflicts >> between >> the software prefetcher and the hardware prefetcher. >> >> I've noticed considerable reduction in the number of prefetch hints and >> slightly positive performance numbers. This aligns GCC and LLVM in >> terms of >> prefetch behavior for Falkor. > > Do you, by any chance, have a link to the LLVM review that implemented > that behavior? > It's okay if you don't, but I think it would be useful context. > I've dug it up. The base change was implemented here: review: https://reviews.llvm.org/D17945 RFC: http://lists.llvm.org/pipermail/llvm-dev/2015-December/093514.html And then target-specific changes were introduced later for specific processors. One small difference in LLVM is the fact that the second parameter, prefetching of non-constant strides, is implicitly switched off if one sets the minimum stride length. My approach here makes that second parameter adjustable. I've seen big gains due to prefetching of non-constant strides, but it tends to be tricky to control and usually comes together with significant regressions as well. The fact that we potentially unroll loops along with issuing prefetch hints also makes things a bit erratic. >> >> The default settings should guarantee no changes for existing targets. >> Those >> are free to tweak the settings as necessary. >> >> No regressions in the testsuite and bootstrapped ok on aarch64-linux. >> >> Ok? >> > > Are there any benchmark numbers you can share? > I think this approach is sensible. > Comparing the previous, more aggressive, pass behavior with the new one i've seen a slight improvement for CPU2006, 0.15% for both INT and FP. For CPU2017 the previous behavior was actually a bit harmful, regressing performance by about 1.2% in intspeed. The new behavior kept intspeed stable and slightly improved fpspeed by 0.15%. The motivation for the future is to have better control of software prefetching so we can fine-tune the pass, either through generic loop prefetch code or by using the target-specific parameters. > Since your patch touches generic code as well as AArch64 > code you'll need an approval from a midend maintainer as well as an > AArch64 maintainer. > Also, GCC development is now in the regression fixing stage, so unless > this fixes a regression > it may have to wait until GCC 9 development is opened. That is my understanding. I thought i'd put this up for review anyway so people can chime in and provide their thoughts. Thanks for the review. Luis