Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: kyrylo.tkachov@foss.arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Marcus Shawcroft <marcus.shawcroft@arm.com>,
		Richard Earnshaw <richard.earnshaw@arm.com>,
	James Greenhalgh <james.greenhalgh@arm.com>,
		Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64
Date: Tue, 13 Nov 2018 08:25:00 -0000	[thread overview]
Message-ID: <CAFiYyc1VTDC3==EYRNK=AbO-fe_TMu-BgvjE5LOsrFhLaxb9jA@mail.gmail.com> (raw)
In-Reply-To: <5BE9C452.3080709@foss.arm.com>

On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
>
> On 12/11/18 14:10, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >> On 09/11/18 12:18, Richard Biener wrote:
> >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >>> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling:
> >>>>
> >>>> fully_peel_me:
> >>>>           mov     x1, 5
> >>>>           ptrue   p1.d, all
> >>>>           whilelo p0.d, xzr, x1
> >>>>           ld1d    z0.d, p0/z, [x0]
> >>>>           fadd    z0.d, z0.d, z0.d
> >>>>           st1d    z0.d, p0, [x0]
> >>>>           cntd    x2
> >>>>           addvl   x3, x0, #1
> >>>>           whilelo p0.d, x2, x1
> >>>>           beq     .L1
> >>>>           ld1d    z0.d, p0/z, [x0, #1, mul vl]
> >>>>           fadd    z0.d, z0.d, z0.d
> >>>>           st1d    z0.d, p0, [x3]
> >>>>           cntw    x2
> >>>>           incb    x0, all, mul #2
> >>>>           whilelo p0.d, x2, x1
> >>>>           beq     .L1
> >>>>           ld1d    z0.d, p0/z, [x0]
> >>>>           fadd    z0.d, z0.d, z0.d
> >>>>           st1d    z0.d, p0, [x0]
> >>>> .L1:
> >>>>           ret
> >>>>
> >>>> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count.
> >>>> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size
> >>>> and hurts icache performance.
> >>>>
> >>>> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration
> >>>> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some
> >>>> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate
> >>>> the branches.
> >>>>
> >>>> So for the above testcase we generate now:
> >>>> fully_peel_me:
> >>>>           mov     x2, 5
> >>>>           mov     x3, x2
> >>>>           mov     x1, 0
> >>>>           whilelo p0.d, xzr, x2
> >>>>           ptrue   p1.d, all
> >>>> .L2:
> >>>>           ld1d    z0.d, p0/z, [x0, x1, lsl 3]
> >>>>           fadd    z0.d, z0.d, z0.d
> >>>>           st1d    z0.d, p0, [x0, x1, lsl 3]
> >>>>           incd    x1
> >>>>           whilelo p0.d, x1, x3
> >>>>           bne     .L2
> >>>>           ret
> >>>>
> >>>> Not perfect still, but it's preferable to the original code.
> >>>> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged
> >>>> (until other target people experiment with it and set it, if appropriate).
> >>>>
> >>>> Bootstrapped and tested on aarch64-none-linux-gnu.
> >>>> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance.
> >>>>
> >>>> Ok for trunk?
> >>> Hum.  Why introduce a new --param and not simply key on
> >>> flag_peel_loops instead?  That is
> >>> enabled by default at -O3 and with FDO but you of course can control
> >>> that in your targets
> >>> post-option-processing hook.
> >> You mean like this?
> >> It's certainly a simpler patch, but I was just a bit hesitant of making this change for all targets :)
> >> But I suppose it's a reasonable change.
> > No, that change is backward.  What I said is that peeling is already
> > conditional on
> > flag_peel_loops and that is enabled by -O3.  So you want to disable
> > flag_peel_loops for
> > SVE instead in the target.
>
> Sorry, I got confused by the similarly named functions.
> I'm talking about try_unroll_loop_completely when run as part of canonicalize_induction_variables i.e. the "ivcanon" pass
> (sorry about blaming cunroll here). This doesn't get called through the try_unroll_loops_completely path.

Well, peeling gets disabled.  From your patch I see you want to
disable "unrolling" when
the number of loop iteration is not constant.  That is called peeling
where we need to
emit the loop exit test N times.

Did you check your testcases with -fno-peel-loops?

> try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or -fno-unroll-loops.
> Maybe disabling peeling inside try_unroll_loop_completely itself when !flag_peel_loops is viable?
>
> Thanks,
> Kyrill
>
> >>> It might also make sense to have more fine-grained control for this
> >>> and allow a target
> >>> to say whether it wants to peel a specific loop or not when the
> >>> middle-end thinks that
> >>> would be profitable.
> >> Can be worth looking at as a follow-up. Do you envisage the target analysing
> >> the gimple statements of the loop to figure out its cost?
> > Kind-of.  Sth like
> >
> >    bool targetm.peel_loop (struct loop *);
> >
> > I have no idea whether you can easily detect a SVE vectorized loop though.
> > Maybe there's always a special IV or so (the mask?)
> >
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>
> >> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>
> >>          * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
> >>          loop when number of iterations is not known and flag_peel_loops is in
> >>          effect.
> >>
> >> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>
> >>          * gcc.target/aarch64/sve/unroll-1.c: New test.
> >>
>

next prev parent reply	other threads:[~2018-11-13  8:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-09 10:47 Kyrill Tkachov
2018-11-09 12:19 ` Richard Biener
2018-11-09 17:57   ` Kyrill Tkachov
2018-11-12 14:10     ` Richard Biener
2018-11-12 18:20       ` Kyrill Tkachov
2018-11-13  8:25         ` Richard Biener [this message]
2018-11-13  9:15           ` Kyrill Tkachov
2018-11-13  9:28             ` Richard Biener
2018-11-13  9:49               ` Kyrill Tkachov
2018-11-13 14:34                 ` Richard Biener
2018-11-14 15:01                   ` Richard Biener
2018-11-16 18:02                     ` [PATCH] Disable unrolling for loops vectorised with non-constant VF (was: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64) Kyrill Tkachov
2018-11-19 14:43                       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc1VTDC3==EYRNK=AbO-fe_TMu-BgvjE5LOsrFhLaxb9jA@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=marcus.shawcroft@arm.com \
    --cc=richard.earnshaw@arm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).