From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>
Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
Date: Tue, 28 Nov 2023 08:11:08 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2311280808480.21409@jbgna.fhfr.qr> (raw)
In-Reply-To: <mptedgbujzf.fsf@arm.com>
On Mon, 27 Nov 2023, Richard Sandiford wrote:
> Catching up on backlog, so this might already be resolved, but:
>
> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 7 Nov 2023, Tamar Christina wrote:
> >
> >> > -----Original Message-----
> >> > From: Richard Biener <rguenther@suse.de>
> >> > Sent: Tuesday, November 7, 2023 9:43 AM
> >> > To: Tamar Christina <Tamar.Christina@arm.com>
> >> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> >> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> >> > vectorization
> >> >
> >> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> >
> >> > > > -----Original Message-----
> >> > > > From: Richard Biener <rguenther@suse.de>
> >> > > > Sent: Monday, November 6, 2023 2:25 PM
> >> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> >> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> >> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> >> > > > auto- vectorization
> >> > > >
> >> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> > > >
> >> > > > > Hi All,
> >> > > > >
> >> > > > > This patch adds initial support for early break vectorization in GCC.
> >> > > > > The support is added for any target that implements a vector
> >> > > > > cbranch optab, this includes both fully masked and non-masked targets.
> >> > > > >
> >> > > > > Depending on the operation, the vectorizer may also require
> >> > > > > support for boolean mask reductions using Inclusive OR. This is
> >> > > > > however only checked then the comparison would produce multiple
> >> > statements.
> >> > > > >
> >> > > > > Note: I am currently struggling to get patch 7 correct in all
> >> > > > > cases and could
> >> > > > use
> >> > > > > some feedback there.
> >> > > > >
> >> > > > > Concretely the kind of loops supported are of the forms:
> >> > > > >
> >> > > > > for (int i = 0; i < N; i++)
> >> > > > > {
> >> > > > > <statements1>
> >> > > > > if (<condition>)
> >> > > > > {
> >> > > > > ...
> >> > > > > <action>;
> >> > > > > }
> >> > > > > <statements2>
> >> > > > > }
> >> > > > >
> >> > > > > where <action> can be:
> >> > > > > - break
> >> > > > > - return
> >> > > > > - goto
> >> > > > >
> >> > > > > Any number of statements can be used before the <action> occurs.
> >> > > > >
> >> > > > > Since this is an initial version for GCC 14 it has the following
> >> > > > > limitations and
> >> > > > > features:
> >> > > > >
> >> > > > > - Only fixed sized iterations and buffers are supported. That is to say any
> >> > > > > vectors loaded or stored must be to statically allocated arrays with
> >> > known
> >> > > > > sizes. N must also be known. This limitation is because our primary
> >> > target
> >> > > > > for this optimization is SVE. For VLA SVE we can't easily do cross page
> >> > > > > iteraion checks. The result is likely to also not be beneficial. For that
> >> > > > > reason we punt support for variable buffers till we have First-Faulting
> >> > > > > support in GCC.
> >> >
> >> > Btw, for this I wonder if you thought about marking memory accesses required
> >> > for the early break condition as required to be vector-size aligned, thus peeling
> >> > or versioning them for alignment? That should ensure they do not fault.
> >> >
> >> > OTOH I somehow remember prologue peeling isn't supported for early break
> >> > vectorization? ..
> >> >
> >> > > > > - any stores in <statements1> should not be to the same objects as in
> >> > > > > <condition>. Loads are fine as long as they don't have the possibility to
> >> > > > > alias. More concretely, we block RAW dependencies when the
> >> > > > > intermediate
> >> > > > value
> >> > > > > can't be separated fromt the store, or the store itself can't be moved.
> >> > > > > - Prologue peeling, alignment peelinig and loop versioning are supported.
> >> >
> >> > .. but here you say it is. Not sure if peeling for alignment works for VLA vectors
> >> > though. Just to say x86 doesn't support first-faulting loads.
> >>
> >> For VLA we support it through masking. i.e. if you need to peel N iterations, we
> >> generate a masked copy of the loop vectorized which masks off the first N bits.
> >>
> >> This is not typically needed, but we do support it. But the problem with this
> >> scheme and early break is obviously that the peeled loop needs to be vectorized
> >> so you kinda end up with the same issue again. So Atm it rejects it for VLA.
> >
> > Hmm, I see. I thought peeling by masking is an optimization.
>
> Yeah, it's an opt-in optimisation. No current Arm cores opt in though.
>
> > Anyhow, I think it should still work here - since all accesses are aligned
> > and we know that there's at least one original scalar iteration in the
> > first masked and the following "unmasked" vector iterations there
> > should never be faults for any of the aligned accesses.
>
> Peeling via masking works by using the main loop for the "peeled"
> iteration (so it's a bit of a misnomer). The vector pointers start
> out lower than the original scalar pointers, with some leading
> inactive elements.
>
> The awkwardness would be in skipping those leading inactive elements
> in the epilogue, if an early break occurs in the first vector iteration.
> Definitely doable, but I imagine not trivial.
>
> > I think going via alignment is a way easier method to guarantee this
> > than handwaving about "declared" arrays and niter. One can try that
> > in addition of course - it's not always possible to align all
> > vector loads we are going to speculate (for VLA one could also
> > find common runtime (mis-)alignment and restrict the vector length based
> > on that, for RISC-V it seems to be efficient, not sure whether altering
> > that for SVE is though).
>
> I think both techniques (alignment and reasoning about accessibility)
> are useful. And they each help with different cases. Like you say,
> if there are two vector loads that need to be aligned, we'd need to
> version for alignment on fixed-length architectures, with a scalar
> fallback when the alignment requirement isn't met. In contrast,
> static reasoning about accessibility allows the vector loop to be
> used for all relative misalignments.
>
> So I think the aim should be to support both techniques. But IMO it's
> reasonable to start with either one. It sounds from Tamar's results
> like starting with static reasoning does fire quite often, and it
> should have less runtime overhead than the alignment approach.
Fair enough, we need to fix the correctness issues then though
(as said, correctness is way easier to assert for alignment).
> Plus, when the loop operates on chars, it's hard to predict whether
> peeling for alignment pays for itself, or whether the scalar prologue
> will end up handling the majority of cases. If we have the option
> of not peeling for alignment, then it's probably worth taking it
> for chars.
That's true.
> Capping the VL at runtime is possible on SVE. It's on the backlog
> for handling runtime aliases, where we can vectorise with a lower VF
> rather than falling back to scalar code. But first-faulting loads
> are likely to be better than halving or quartering the VL at runtime,
> so I don't think capping the VL would be the right SVE technique for
> early exits.
For targets with no first-faulting loads we only have alignment as
additional possibility then. I can look at this for next stage1.
Richard.
> Thanks,
> Richard
prev parent reply other threads:[~2023-11-28 8:11 UTC|newest]
Thread overview: 200+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-28 13:40 [PATCH v5 0/19] " Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29 ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17 ` Jason Merrill
2023-06-30 16:18 ` Tamar Christina
2023-06-30 16:44 ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54 ` Tamar Christina
2023-07-04 11:31 ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52 ` Richard Biener
2023-07-04 14:57 ` Jan Hubicka
2023-07-06 14:34 ` Jan Hubicka
2023-07-07 5:59 ` Richard Biener
2023-07-07 12:20 ` Jan Hubicka
2023-07-07 12:27 ` Tamar Christina
2023-07-07 14:10 ` Jan Hubicka
2023-07-10 7:07 ` Richard Biener
2023-07-10 8:33 ` Jan Hubicka
2023-07-10 9:24 ` Richard Biener
2023-07-10 9:23 ` Jan Hubicka
2023-07-10 9:29 ` Richard Biener
2023-07-11 9:28 ` Jan Hubicka
2023-07-11 10:31 ` Richard Biener
2023-07-11 12:40 ` Jan Hubicka
2023-07-11 13:04 ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05 ` Richard Biener
2023-07-10 15:32 ` Tamar Christina
2023-07-11 11:03 ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10 ` Richard Biener
2023-07-06 10:37 ` Tamar Christina
2023-07-06 10:51 ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32 ` Richard Biener
2023-07-13 11:54 ` Tamar Christina
2023-07-13 12:10 ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49 ` Richard Biener
2023-07-13 12:03 ` Tamar Christina
2023-07-14 9:09 ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55 ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23 ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31 ` Richard Biener
2023-07-13 19:05 ` Tamar Christina
2023-07-14 13:34 ` Richard Biener
2023-07-17 10:56 ` Tamar Christina
2023-07-17 12:48 ` Richard Biener
2023-08-18 11:35 ` Tamar Christina
2023-08-18 12:53 ` Richard Biener
2023-08-18 13:12 ` Tamar Christina
2023-08-18 13:15 ` Richard Biener
2023-10-23 20:21 ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
[not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49 ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00 ` Tamar Christina
2023-11-06 7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06 7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07 9:46 ` Richard Biener
2023-11-06 7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07 9:52 ` Richard Biener
2023-11-16 10:53 ` Richard Biener
2023-11-06 7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53 ` Richard Biener
2023-11-07 11:34 ` Tamar Christina
2023-11-07 14:23 ` Richard Biener
2023-12-19 10:11 ` Tamar Christina
2023-12-19 14:05 ` Richard Biener
2023-12-20 10:51 ` Tamar Christina
2023-12-20 12:24 ` Richard Biener
2023-11-06 7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15 0:00 ` Tamar Christina
2023-11-15 12:40 ` Richard Biener
2023-11-20 21:51 ` Tamar Christina
2023-11-24 10:16 ` Tamar Christina
2023-11-24 12:38 ` Richard Biener
2023-11-06 7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04 ` Richard Biener
2023-11-07 23:10 ` Tamar Christina
2023-11-13 20:11 ` Tamar Christina
2023-11-14 7:56 ` Richard Biener
2023-11-14 8:07 ` Tamar Christina
2023-11-14 23:59 ` Tamar Christina
2023-11-15 12:14 ` Richard Biener
2023-11-06 7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54 ` Richard Biener
2023-11-06 7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15 0:03 ` Tamar Christina
2023-11-15 13:01 ` Richard Biener
2023-11-15 13:09 ` Tamar Christina
2023-11-15 13:22 ` Richard Biener
2023-11-15 14:14 ` Tamar Christina
2023-11-16 10:40 ` Richard Biener
2023-11-16 11:08 ` Tamar Christina
2023-11-16 11:27 ` Richard Biener
2023-11-16 12:01 ` Tamar Christina
2023-11-16 12:30 ` Richard Biener
2023-11-16 13:22 ` Tamar Christina
2023-11-16 13:35 ` Richard Biener
2023-11-16 14:14 ` Tamar Christina
2023-11-16 14:17 ` Richard Biener
2023-11-16 15:19 ` Tamar Christina
2023-11-16 18:41 ` Tamar Christina
2023-11-17 10:40 ` Tamar Christina
2023-11-17 12:13 ` Richard Biener
2023-11-20 21:54 ` Tamar Christina
2023-11-24 10:18 ` Tamar Christina
2023-11-24 12:41 ` Richard Biener
2023-11-06 7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15 0:05 ` Tamar Christina
2023-11-15 13:41 ` Richard Biener
2023-11-15 14:26 ` Tamar Christina
2023-11-16 11:16 ` Richard Biener
2023-11-20 21:57 ` Tamar Christina
2023-11-24 10:20 ` Tamar Christina
2023-11-24 13:23 ` Richard Biener
2023-11-27 22:47 ` Tamar Christina
2023-11-29 13:28 ` Richard Biener
2023-11-29 21:22 ` Tamar Christina
2023-11-30 13:23 ` Richard Biener
2023-12-06 4:21 ` Tamar Christina
2023-12-06 9:33 ` Richard Biener
2023-11-06 7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49 ` Tamar Christina
2023-11-29 13:50 ` Richard Biener
2023-12-06 4:37 ` Tamar Christina
2023-12-06 9:37 ` Richard Biener
2023-12-08 8:58 ` Tamar Christina
2023-12-08 10:28 ` Richard Biener
2023-12-08 13:45 ` Tamar Christina
2023-12-08 13:59 ` Richard Biener
2023-12-08 15:01 ` Tamar Christina
2023-12-11 7:09 ` Tamar Christina
2023-12-11 9:36 ` Richard Biener
2023-12-11 23:12 ` Tamar Christina
2023-12-12 10:10 ` Richard Biener
2023-12-12 10:27 ` Tamar Christina
2023-12-12 10:59 ` Richard Sandiford
2023-12-12 11:30 ` Richard Biener
2023-12-13 14:13 ` Tamar Christina
2023-12-14 13:12 ` Richard Biener
2023-12-14 18:44 ` Tamar Christina
2023-11-06 7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49 ` Tamar Christina
2023-11-29 14:47 ` Richard Biener
2023-12-06 4:10 ` Tamar Christina
2023-12-06 9:44 ` Richard Biener
2023-11-06 7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06 7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48 ` Tamar Christina
2023-12-06 8:31 ` Richard Biener
2023-12-06 9:10 ` Tamar Christina
2023-12-06 9:27 ` Richard Biener
2023-11-06 7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48 ` Tamar Christina
2023-12-06 4:00 ` Tamar Christina
2023-12-06 8:18 ` Richard Biener
2023-12-06 8:52 ` Tamar Christina
2023-12-06 9:15 ` Richard Biener
2023-12-06 9:29 ` Tamar Christina
2023-11-06 7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44 ` Richard Biener
2023-11-06 7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38 ` Richard Sandiford
2023-12-11 7:38 ` Richard Biener
2023-12-11 8:49 ` Tamar Christina
2023-12-11 9:00 ` Richard Biener
2023-11-06 7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06 7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37 ` Richard Sandiford
2023-11-28 17:55 ` Richard Sandiford
2023-12-06 16:25 ` Tamar Christina
2023-12-07 0:56 ` Richard Sandiford
2023-12-14 18:40 ` Tamar Christina
2023-12-14 19:34 ` Richard Sandiford
2023-11-06 7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06 7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06 7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48 ` Kyrylo Tkachov
2023-11-06 7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47 ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17 ` Tamar Christina
2023-11-07 9:42 ` Richard Biener
2023-11-07 10:47 ` Tamar Christina
2023-11-07 13:58 ` Richard Biener
2023-11-27 18:30 ` Richard Sandiford
2023-11-28 8:11 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2311280808480.21409@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=Tamar.Christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).