From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-489508-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29213 invoked by alias); 9 Nov 2018 12:19:10 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 29194 invoked by uid 89); 9 Nov 2018 12:19:09 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=
X-HELO: mail-lj1-f176.google.com
Received: from mail-lj1-f176.google.com (HELO mail-lj1-f176.google.com) (209.85.208.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Nov 2018 12:19:08 +0000
Received: by mail-lj1-f176.google.com with SMTP id v15-v6so1384149ljh.13        for <gcc-patches@gcc.gnu.org>; Fri, 09 Nov 2018 04:19:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=78wdVOy3vvAgltPkTXK7h0yU8exvaPvJYKqpJrsAKVI=;        b=AJ2105oaKzPvLQvMoznsgeDLivSai++t0ms+ZhRxl14XgH2MdjOI2CXn9B2UdSOX9s         DcOlhtG+n0ozWamafx02I45IfCJqyGBOSmL5DAd93Zq2MWjpece5UJwAXUqr97+HstsN         +h+SKBvatSaacYXP215fyly0QU2xEzqwkktQs+tpsiqAVNEqPAirnVmXcn0jFfrb+sfO         jPXG0zX0Z1VZG/1pVsHFD9Bg90dj0WEfEHJfMsHpPSJlJgKeRkOL1vLAgbVTWkG7bbbz         Xa6sU+3pR/ksBRiN7Vo7JhyFGa4Szwqz3UwF4jVWXHRQYKasafSPzWM85QO+11alaHBw         HbGQ==
MIME-Version: 1.0
References: <5BE565CE.5000709@foss.arm.com>
In-Reply-To: <5BE565CE.5000709@foss.arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 09 Nov 2018 12:19:00 -0000
Message-ID: <CAFiYyc2fZ4WTdguC9RjLWfYTiUK1t=Gd7W7mVmA2qvA1t2vjuA@mail.gmail.com>
Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64
To: kyrylo.tkachov@foss.arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Marcus Shawcroft <marcus.shawcroft@arm.com>, 	Richard Earnshaw <richard.earnshaw@arm.com>, James Greenhalgh <james.greenhalgh@arm.com>, 	Richard Sandiford <richard.sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2018-11/txt/msg00710.txt.bz2

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi all,
>
> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling:
>
> fully_peel_me:
>          mov     x1, 5
>          ptrue   p1.d, all
>          whilelo p0.d, xzr, x1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
>          cntd    x2
>          addvl   x3, x0, #1
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0, #1, mul vl]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x3]
>          cntw    x2
>          incb    x0, all, mul #2
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
> .L1:
>          ret
>
> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count.
> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size
> and hurts icache performance.
>
> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration
> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some
> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate
> the branches.
>
> So for the above testcase we generate now:
> fully_peel_me:
>          mov     x2, 5
>          mov     x3, x2
>          mov     x1, 0
>          whilelo p0.d, xzr, x2
>          ptrue   p1.d, all
> .L2:
>          ld1d    z0.d, p0/z, [x0, x1, lsl 3]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0, x1, lsl 3]
>          incd    x1
>          whilelo p0.d, x1, x3
>          bne     .L2
>          ret
>
> Not perfect still, but it's preferable to the original code.
> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged
> (until other target people experiment with it and set it, if appropriate).
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance.
>
> Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>         * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
>         disable unrolling on unknown iteration count.
>         * config/aarch64/aarch64.c (aarch64_override_options_internal): Set
>         PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
>         * doc/invoke.texi (--param unroll-known-loop-iterations-only):
>         Document.
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>         * gcc.target/aarch64/sve/unroll-1.c: New test.
>