From: Richard Biener <richard.guenther@gmail.com>
To: kyrylo.tkachov@foss.arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Marcus Shawcroft <marcus.shawcroft@arm.com>,
Richard Earnshaw <richard.earnshaw@arm.com>,
James Greenhalgh <james.greenhalgh@arm.com>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64
Date: Mon, 12 Nov 2018 14:10:00 -0000 [thread overview]
Message-ID: <CAFiYyc1ojB9VFk63u7PG6O-fo4O64ufOCfEm__LGu_MrzEYYAg@mail.gmail.com> (raw)
In-Reply-To: <5BE5CA6D.9060408@foss.arm.com>
On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 09/11/18 12:18, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >>
> >> Hi all,
> >>
> >> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling:
> >>
> >> fully_peel_me:
> >> mov x1, 5
> >> ptrue p1.d, all
> >> whilelo p0.d, xzr, x1
> >> ld1d z0.d, p0/z, [x0]
> >> fadd z0.d, z0.d, z0.d
> >> st1d z0.d, p0, [x0]
> >> cntd x2
> >> addvl x3, x0, #1
> >> whilelo p0.d, x2, x1
> >> beq .L1
> >> ld1d z0.d, p0/z, [x0, #1, mul vl]
> >> fadd z0.d, z0.d, z0.d
> >> st1d z0.d, p0, [x3]
> >> cntw x2
> >> incb x0, all, mul #2
> >> whilelo p0.d, x2, x1
> >> beq .L1
> >> ld1d z0.d, p0/z, [x0]
> >> fadd z0.d, z0.d, z0.d
> >> st1d z0.d, p0, [x0]
> >> .L1:
> >> ret
> >>
> >> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count.
> >> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size
> >> and hurts icache performance.
> >>
> >> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration
> >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some
> >> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate
> >> the branches.
> >>
> >> So for the above testcase we generate now:
> >> fully_peel_me:
> >> mov x2, 5
> >> mov x3, x2
> >> mov x1, 0
> >> whilelo p0.d, xzr, x2
> >> ptrue p1.d, all
> >> .L2:
> >> ld1d z0.d, p0/z, [x0, x1, lsl 3]
> >> fadd z0.d, z0.d, z0.d
> >> st1d z0.d, p0, [x0, x1, lsl 3]
> >> incd x1
> >> whilelo p0.d, x1, x3
> >> bne .L2
> >> ret
> >>
> >> Not perfect still, but it's preferable to the original code.
> >> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged
> >> (until other target people experiment with it and set it, if appropriate).
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance.
> >>
> >> Ok for trunk?
> >
> > Hum. Why introduce a new --param and not simply key on
> > flag_peel_loops instead? That is
> > enabled by default at -O3 and with FDO but you of course can control
> > that in your targets
> > post-option-processing hook.
>
> You mean like this?
> It's certainly a simpler patch, but I was just a bit hesitant of making this change for all targets :)
> But I suppose it's a reasonable change.
No, that change is backward. What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3. So you want to disable
flag_peel_loops for
SVE instead in the target.
> >
> > It might also make sense to have more fine-grained control for this
> > and allow a target
> > to say whether it wants to peel a specific loop or not when the
> > middle-end thinks that
> > would be profitable.
>
> Can be worth looking at as a follow-up. Do you envisage the target analysing
> the gimple statements of the loop to figure out its cost?
Kind-of. Sth like
bool targetm.peel_loop (struct loop *);
I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)
Richard.
> Thanks,
> Kyrill
>
>
> 2018-11-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
> loop when number of iterations is not known and flag_peel_loops is in
> effect.
>
> 2018-11-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/sve/unroll-1.c: New test.
>
next prev parent reply other threads:[~2018-11-12 14:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-09 10:47 Kyrill Tkachov
2018-11-09 12:19 ` Richard Biener
2018-11-09 17:57 ` Kyrill Tkachov
2018-11-12 14:10 ` Richard Biener [this message]
2018-11-12 18:20 ` Kyrill Tkachov
2018-11-13 8:25 ` Richard Biener
2018-11-13 9:15 ` Kyrill Tkachov
2018-11-13 9:28 ` Richard Biener
2018-11-13 9:49 ` Kyrill Tkachov
2018-11-13 14:34 ` Richard Biener
2018-11-14 15:01 ` Richard Biener
2018-11-16 18:02 ` [PATCH] Disable unrolling for loops vectorised with non-constant VF (was: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64) Kyrill Tkachov
2018-11-19 14:43 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFiYyc1ojB9VFk63u7PG6O-fo4O64ufOCfEm__LGu_MrzEYYAg@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
--cc=kyrylo.tkachov@foss.arm.com \
--cc=marcus.shawcroft@arm.com \
--cc=richard.earnshaw@arm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).