From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80965 invoked by alias); 13 Nov 2018 09:15:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 80842 invoked by uid 89); 13 Nov 2018 09:15:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT autolearn=no version=3.3.2 spammy=H*f:sk:5BE565C, H*f:sk:Gd7W7mV X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Nov 2018 09:15:34 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 87F54A78; Tue, 13 Nov 2018 01:15:32 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 70F653F5CF; Tue, 13 Nov 2018 01:15:31 -0800 (PST) Message-ID: <5BEA9631.1070500@foss.arm.com> Date: Tue, 13 Nov 2018 09:15:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Richard Biener CC: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh , Richard Sandiford Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64 References: <5BE565CE.5000709@foss.arm.com> <5BE5CA6D.9060408@foss.arm.com> <5BE9C452.3080709@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-11/txt/msg01052.txt.bz2 Hi Richard, On 13/11/18 08:24, Richard Biener wrote: > On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov > wrote: >> >> On 12/11/18 14:10, Richard Biener wrote: >>> On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov >>> wrote: >>>> On 09/11/18 12:18, Richard Biener wrote: >>>>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling: >>>>>> >>>>>> fully_peel_me: >>>>>> mov x1, 5 >>>>>> ptrue p1.d, all >>>>>> whilelo p0.d, xzr, x1 >>>>>> ld1d z0.d, p0/z, [x0] >>>>>> fadd z0.d, z0.d, z0.d >>>>>> st1d z0.d, p0, [x0] >>>>>> cntd x2 >>>>>> addvl x3, x0, #1 >>>>>> whilelo p0.d, x2, x1 >>>>>> beq .L1 >>>>>> ld1d z0.d, p0/z, [x0, #1, mul vl] >>>>>> fadd z0.d, z0.d, z0.d >>>>>> st1d z0.d, p0, [x3] >>>>>> cntw x2 >>>>>> incb x0, all, mul #2 >>>>>> whilelo p0.d, x2, x1 >>>>>> beq .L1 >>>>>> ld1d z0.d, p0/z, [x0] >>>>>> fadd z0.d, z0.d, z0.d >>>>>> st1d z0.d, p0, [x0] >>>>>> .L1: >>>>>> ret >>>>>> >>>>>> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count. >>>>>> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size >>>>>> and hurts icache performance. >>>>>> >>>>>> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration >>>>>> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some >>>>>> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate >>>>>> the branches. >>>>>> >>>>>> So for the above testcase we generate now: >>>>>> fully_peel_me: >>>>>> mov x2, 5 >>>>>> mov x3, x2 >>>>>> mov x1, 0 >>>>>> whilelo p0.d, xzr, x2 >>>>>> ptrue p1.d, all >>>>>> .L2: >>>>>> ld1d z0.d, p0/z, [x0, x1, lsl 3] >>>>>> fadd z0.d, z0.d, z0.d >>>>>> st1d z0.d, p0, [x0, x1, lsl 3] >>>>>> incd x1 >>>>>> whilelo p0.d, x1, x3 >>>>>> bne .L2 >>>>>> ret >>>>>> >>>>>> Not perfect still, but it's preferable to the original code. >>>>>> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged >>>>>> (until other target people experiment with it and set it, if appropriate). >>>>>> >>>>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>>>> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance. >>>>>> >>>>>> Ok for trunk? >>>>> Hum. Why introduce a new --param and not simply key on >>>>> flag_peel_loops instead? That is >>>>> enabled by default at -O3 and with FDO but you of course can control >>>>> that in your targets >>>>> post-option-processing hook. >>>> You mean like this? >>>> It's certainly a simpler patch, but I was just a bit hesitant of making this change for all targets :) >>>> But I suppose it's a reasonable change. >>> No, that change is backward. What I said is that peeling is already >>> conditional on >>> flag_peel_loops and that is enabled by -O3. So you want to disable >>> flag_peel_loops for >>> SVE instead in the target. >> Sorry, I got confused by the similarly named functions. >> I'm talking about try_unroll_loop_completely when run as part of canonicalize_induction_variables i.e. the "ivcanon" pass >> (sorry about blaming cunroll here). This doesn't get called through the try_unroll_loops_completely path. > Well, peeling gets disabled. From your patch I see you want to > disable "unrolling" when > the number of loop iteration is not constant. That is called peeling > where we need to > emit the loop exit test N times. > > Did you check your testcases with -fno-peel-loops? -fno-peel-loops doesn't help in the testcases. The code that does this peeling (try_unroll_loop_completely) can be called through two paths, only one of which is gated on flag_peel_loops. Thanks, Kyrill >> try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or -fno-unroll-loops. >> Maybe disabling peeling inside try_unroll_loop_completely itself when !flag_peel_loops is viable? >> >> Thanks, >> Kyrill >> >>>>> It might also make sense to have more fine-grained control for this >>>>> and allow a target >>>>> to say whether it wants to peel a specific loop or not when the >>>>> middle-end thinks that >>>>> would be profitable. >>>> Can be worth looking at as a follow-up. Do you envisage the target analysing >>>> the gimple statements of the loop to figure out its cost? >>> Kind-of. Sth like >>> >>> bool targetm.peel_loop (struct loop *); >>> >>> I have no idea whether you can easily detect a SVE vectorized loop though. >>> Maybe there's always a special IV or so (the mask?) >>> >>> Richard. >>> >>>> Thanks, >>>> Kyrill >>>> >>>> >>>> 2018-11-09 Kyrylo Tkachov >>>> >>>> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll >>>> loop when number of iterations is not known and flag_peel_loops is in >>>> effect. >>>> >>>> 2018-11-09 Kyrylo Tkachov >>>> >>>> * gcc.target/aarch64/sve/unroll-1.c: New test. >>>>