From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44309 invoked by alias); 28 Nov 2018 14:16:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 43999 invoked by uid 89); 28 Nov 2018 14:16:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,GIT_PATCH_1,SPF_PASS autolearn=ham version=3.3.2 spammy=biggest, 256bit, 256-bit, recognition X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Nov 2018 14:16:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CA0921D6F; Wed, 28 Nov 2018 06:16:15 -0800 (PST) Received: from localhost (unknown [10.32.99.101]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2B3223F5A0; Wed, 28 Nov 2018 06:16:15 -0800 (PST) From: Richard Sandiford To: David Malcolm Mail-Followup-To: David Malcolm ,, richard.sandiford@arm.com Cc: Subject: Re: Add a loop versioning pass References: <874ldb1omb.fsf@arm.com> <1540479702.14521.304.camel@redhat.com> Date: Wed, 28 Nov 2018 14:16:00 -0000 In-Reply-To: <1540479702.14521.304.camel@redhat.com> (David Malcolm's message of "Thu, 25 Oct 2018 11:01:42 -0400") Message-ID: <87sgzli8ya.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2018-11/txt/msg02311.txt.bz2 David Malcolm writes: > On Wed, 2018-10-24 at 14:05 +0100, Richard Sandiford wrote: >> This patch adds a pass that versions loops with variable index >> strides >> for the case in which the stride is 1. E.g.: >> >> for (int i = 0; i < n; ++i) >> x[i * stride] = ...; >> >> becomes: >> >> if (stepx == 1) >> for (int i = 0; i < n; ++i) >> x[i] = ...; >> else >> for (int i = 0; i < n; ++i) >> x[i * stride] = ...; >> >> This is useful for both vector code and scalar code, and in some >> cases >> can enable further optimisations like loop interchange or pattern >> recognition. >> >> The pass gives a 7.6% improvement on Cortex-A72 for 554.roms_r at -O3 >> and a 2.4% improvement for 465.tonto. I haven't found any SPEC tests >> that regress. >> >> Sizewise, there's a 10% increase in .text for both 554.roms_r and >> 465.tonto. That's obviously a lot, but in tonto's case it's because >> the whole program is written using assumed-shape arrays and pointers, >> so a large number of functions really do benefit from versioning. >> roms likewise makes heavy use of assumed-shape arrays, and that >> improvement in performance IMO justifies the code growth. >> >> The next biggest .text increase is 4.5% for 548.exchange2_r. I did >> see >> a small (0.4%) speed improvement there, but although both 3-iteration >> runs >> produced stable results, that might still be noise. There was a >> slightly >> larger (non-noise) improvement for a 256-bit SVE model. >> >> 481.wrf and 521.wrf_r .text grew by 2.8% and 2.5% respectively, but >> without any noticeable improvement in performance. No other test >> grew >> by more than 2%. >> >> Although the main SPEC beneficiaries are all Fortran tests, the >> benchmarks we use for SVE also include some C and C++ tests that >> benefit. >> >> Using -frepack-arrays gives the same benefits in many Fortran cases. >> The problem is that using that option inappropriately can force a >> full >> array copy for arguments that the function only reads once, and so it >> isn't really something we can turn on by default. The new pass is >> supposed to give most of the benefits of -frepack-arrays without >> the risk of unnecessary repacking. >> >> The patch therefore enables the pass by default at -O3. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> > > [...snip...] > >> +/* Run the pass and return a set of TODO_* flags. */ >> + >> +unsigned int >> +loop_versioning::run () >> +{ >> + gcc_assert (scev_initialized_p ()); >> + >> + if (!analyze_blocks () >> + || !prune_conditions () >> + || !make_versioning_decisions () >> + || !implement_versioning_decisions ()) >> + return 0; >> + >> + return TODO_update_ssa; >> +} >> + >> +/* Loop versioningting pass. */ > > (typo) Huh, no idea how I even got there. >> + >> +namespace { > > Could the whole file be within this anonymous namespace, rather than > just the opt_pass subclass? (hiding class loop_versioning, so that the > optimizer knows that the only thing visible outside the TU is > make_pass_loop_versioning). This can be a pain to debug, but you can > always comment out the anon namespace locally when debugging. Yeah, I prefer that style too, so that the TU only exports public interfaces. I'd got the impression from earlier threads that this was frowned on for GCC and so I was deliberately avoiding it, but if it's OK then great. Thanks, Richard