From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 86852 invoked by alias); 12 Nov 2018 18:20:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 86822 invoked by uid 89); 12 Nov 2018 18:20:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT autolearn=no version=3.3.2 spammy=sk:canonic, H*f:sk:5BE565C, H*f:sk:Gd7W7mV X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 12 Nov 2018 18:20:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 561A8A78; Mon, 12 Nov 2018 10:20:05 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 44EF23F5A0; Mon, 12 Nov 2018 10:20:04 -0800 (PST) Message-ID: <5BE9C452.3080709@foss.arm.com> Date: Mon, 12 Nov 2018 18:20:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Richard Biener CC: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh , Richard Sandiford Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64 References: <5BE565CE.5000709@foss.arm.com> <5BE5CA6D.9060408@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-11/txt/msg00977.txt.bz2 On 12/11/18 14:10, Richard Biener wrote: > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov > wrote: >> On 09/11/18 12:18, Richard Biener wrote: >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov >>> wrote: >>>> Hi all, >>>> >>>> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling: >>>> >>>> fully_peel_me: >>>> mov x1, 5 >>>> ptrue p1.d, all >>>> whilelo p0.d, xzr, x1 >>>> ld1d z0.d, p0/z, [x0] >>>> fadd z0.d, z0.d, z0.d >>>> st1d z0.d, p0, [x0] >>>> cntd x2 >>>> addvl x3, x0, #1 >>>> whilelo p0.d, x2, x1 >>>> beq .L1 >>>> ld1d z0.d, p0/z, [x0, #1, mul vl] >>>> fadd z0.d, z0.d, z0.d >>>> st1d z0.d, p0, [x3] >>>> cntw x2 >>>> incb x0, all, mul #2 >>>> whilelo p0.d, x2, x1 >>>> beq .L1 >>>> ld1d z0.d, p0/z, [x0] >>>> fadd z0.d, z0.d, z0.d >>>> st1d z0.d, p0, [x0] >>>> .L1: >>>> ret >>>> >>>> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count. >>>> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size >>>> and hurts icache performance. >>>> >>>> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration >>>> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some >>>> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate >>>> the branches. >>>> >>>> So for the above testcase we generate now: >>>> fully_peel_me: >>>> mov x2, 5 >>>> mov x3, x2 >>>> mov x1, 0 >>>> whilelo p0.d, xzr, x2 >>>> ptrue p1.d, all >>>> .L2: >>>> ld1d z0.d, p0/z, [x0, x1, lsl 3] >>>> fadd z0.d, z0.d, z0.d >>>> st1d z0.d, p0, [x0, x1, lsl 3] >>>> incd x1 >>>> whilelo p0.d, x1, x3 >>>> bne .L2 >>>> ret >>>> >>>> Not perfect still, but it's preferable to the original code. >>>> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged >>>> (until other target people experiment with it and set it, if appropriate). >>>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance. >>>> >>>> Ok for trunk? >>> Hum. Why introduce a new --param and not simply key on >>> flag_peel_loops instead? That is >>> enabled by default at -O3 and with FDO but you of course can control >>> that in your targets >>> post-option-processing hook. >> You mean like this? >> It's certainly a simpler patch, but I was just a bit hesitant of making this change for all targets :) >> But I suppose it's a reasonable change. > No, that change is backward. What I said is that peeling is already > conditional on > flag_peel_loops and that is enabled by -O3. So you want to disable > flag_peel_loops for > SVE instead in the target. Sorry, I got confused by the similarly named functions. I'm talking about try_unroll_loop_completely when run as part of canonicalize_induction_variables i.e. the "ivcanon" pass (sorry about blaming cunroll here). This doesn't get called through the try_unroll_loops_completely path. try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or -fno-unroll-loops. Maybe disabling peeling inside try_unroll_loop_completely itself when !flag_peel_loops is viable? Thanks, Kyrill >>> It might also make sense to have more fine-grained control for this >>> and allow a target >>> to say whether it wants to peel a specific loop or not when the >>> middle-end thinks that >>> would be profitable. >> Can be worth looking at as a follow-up. Do you envisage the target analysing >> the gimple statements of the loop to figure out its cost? > Kind-of. Sth like > > bool targetm.peel_loop (struct loop *); > > I have no idea whether you can easily detect a SVE vectorized loop though. > Maybe there's always a special IV or so (the mask?) > > Richard. > >> Thanks, >> Kyrill >> >> >> 2018-11-09 Kyrylo Tkachov >> >> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll >> loop when number of iterations is not known and flag_peel_loops is in >> effect. >> >> 2018-11-09 Kyrylo Tkachov >> >> * gcc.target/aarch64/sve/unroll-1.c: New test. >>