From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29213 invoked by alias); 9 Nov 2018 12:19:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 29194 invoked by uid 89); 9 Nov 2018 12:19:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy= X-HELO: mail-lj1-f176.google.com Received: from mail-lj1-f176.google.com (HELO mail-lj1-f176.google.com) (209.85.208.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Nov 2018 12:19:08 +0000 Received: by mail-lj1-f176.google.com with SMTP id v15-v6so1384149ljh.13 for ; Fri, 09 Nov 2018 04:19:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=78wdVOy3vvAgltPkTXK7h0yU8exvaPvJYKqpJrsAKVI=; b=AJ2105oaKzPvLQvMoznsgeDLivSai++t0ms+ZhRxl14XgH2MdjOI2CXn9B2UdSOX9s DcOlhtG+n0ozWamafx02I45IfCJqyGBOSmL5DAd93Zq2MWjpece5UJwAXUqr97+HstsN +h+SKBvatSaacYXP215fyly0QU2xEzqwkktQs+tpsiqAVNEqPAirnVmXcn0jFfrb+sfO jPXG0zX0Z1VZG/1pVsHFD9Bg90dj0WEfEHJfMsHpPSJlJgKeRkOL1vLAgbVTWkG7bbbz Xa6sU+3pR/ksBRiN7Vo7JhyFGa4Szwqz3UwF4jVWXHRQYKasafSPzWM85QO+11alaHBw HbGQ== MIME-Version: 1.0 References: <5BE565CE.5000709@foss.arm.com> In-Reply-To: <5BE565CE.5000709@foss.arm.com> From: Richard Biener Date: Fri, 09 Nov 2018 12:19:00 -0000 Message-ID: Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64 To: kyrylo.tkachov@foss.arm.com Cc: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg00710.txt.bz2 On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov wrote: > > Hi all, > > In this testcase the codegen for VLA SVE is worse than it could be due to unrolling: > > fully_peel_me: > mov x1, 5 > ptrue p1.d, all > whilelo p0.d, xzr, x1 > ld1d z0.d, p0/z, [x0] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0] > cntd x2 > addvl x3, x0, #1 > whilelo p0.d, x2, x1 > beq .L1 > ld1d z0.d, p0/z, [x0, #1, mul vl] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x3] > cntw x2 > incb x0, all, mul #2 > whilelo p0.d, x2, x1 > beq .L1 > ld1d z0.d, p0/z, [x0] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0] > .L1: > ret > > In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count. > For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size > and hurts icache performance. > > This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration > count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some > Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate > the branches. > > So for the above testcase we generate now: > fully_peel_me: > mov x2, 5 > mov x3, x2 > mov x1, 0 > whilelo p0.d, xzr, x2 > ptrue p1.d, all > .L2: > ld1d z0.d, p0/z, [x0, x1, lsl 3] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0, x1, lsl 3] > incd x1 > whilelo p0.d, x1, x3 > bne .L2 > ret > > Not perfect still, but it's preferable to the original code. > The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged > (until other target people experiment with it and set it, if appropriate). > > Bootstrapped and tested on aarch64-none-linux-gnu. > Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance. > > Ok for trunk? Hum. Why introduce a new --param and not simply key on flag_peel_loops instead? That is enabled by default at -O3 and with FDO but you of course can control that in your targets post-option-processing hook. It might also make sense to have more fine-grained control for this and allow a target to say whether it wants to peel a specific loop or not when the middle-end thinks that would be profitable. Richard. > Thanks, > Kyrill > > > 2018-11-09 Kyrylo Tkachov > > * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define. > * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to > disable unrolling on unknown iteration count. > * config/aarch64/aarch64.c (aarch64_override_options_internal): Set > PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1. > * doc/invoke.texi (--param unroll-known-loop-iterations-only): > Document. > > 2018-11-09 Kyrylo Tkachov > > * gcc.target/aarch64/sve/unroll-1.c: New test. >