From: Tamar Christina <Tamar.Christina@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>,
Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.
Date: Sat, 19 Nov 2022 10:49:04 +0000 [thread overview]
Message-ID: <VI1PR08MB53257504086883F81530325BFF089@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <VI1PR08MB532586C23394B78F6609271DFF099@VI1PR08MB5325.eurprd08.prod.outlook.com>
> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, November 18, 2022 6:23 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-
> vectorization.
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, November 18, 2022 3:04 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: Re: [PATCH 1/2]middle-end: Support early break/return auto-
> > vectorization.
> >
> > On Wed, 2 Nov 2022, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > This patch adds initial support for early break vectorization in GCC.
> > > The support is added for any target that implements a vector cbranch
> > optab.
> >
> > I'm looking at this now, first some high-level questions.
> >
> > Why do we need a new cbranch optab? It seems implementing
> > a vector comparison and mask test against zero sufficies?
>
> It doesn't define a new optab, It's just using the existing cbranch optab to
> check
> that the target can handle a vector comparison with 0 in a branch statement.
>
> Note that it doesn't generate a call to this optab, GIMPLE expansion already
> will.
>
> The reason I don't check against just comparison with 0 and equality is that
> Typically speaking a vector comparison with 0 is not expected to set a flag.
> i.e. typically it results in just a vector of Booleans.
>
> A vector compare with 0 in a branch will be lowered to cbranch today so I just
> use the optab to see that the target can handle this branching and leave
> it up to the target to do however it decides.
>
> The alternative would require me (I think) to reduce to a scalar for the
> equality
> check as you mentioned, but such codegen would be worse for targets like
> SVE
> which has native support for this operation. We'd have to undo the
> reduction during RTL.
>
> Even for targets like NEON, we'd have to replace the reduction code,
> because we
> can generate better code by doing the reduction using pairwise instructions.
>
> These kinds of differences today are handled by cbranch_optab already so it
> seemed
> better to just re-use it.
>
> >
> > You have some elaborate explanation on how peeling works but I
> > somewhat miss the high-level idea how to vectorize the early
> > exit. I've applied the patches and from looking at how
> > vect-early-break_1.c gets transformed on aarch64 it seems you
> > vectorize
> >
> > for (int i = 0; i < N; i++)
> > {
> > vect_b[i] = x + i;
> > if (vect_a[i] > x)
> > break;
> > vect_a[i] = x;
> > }
> >
> > as
> >
> > for (int i = 0; i < N;)
> > {
> > if (any (vect_a[i] > x))
> > break;
> > i += VF;
> > vect_b[i] = x + i;
> > vect_a[i] = x;
> > }
> > for (; i < N; i++)
> > {
> > vect_b[i] = x + i;
> > if (vect_a[i] > x)
> > break;
> > vect_a[i] = x;
> > }
> >
> > As you outline below this requires that the side-effects done as part
> > of <statements1> and <condition> before exiting can be moved after the
> > exit, basically you need to be able to compute whether any scalar
> > iteration covered by a vector iteration will exit the loop early.
> > Code generation wise you'd simply "ignore" code generating early exits
> > at the place they appear in the scalar code and instead emit them
> > vectorized in the loop header.
>
> Indeed, This is how it's handled today. For fully masked loops we can do
> better
> and would be a future expansion, but this codegen is simpler to support
> today
> and is beneficial to all targets.
>
> It also has the benefit that complicated reduction we don't support today
> don't
> abort vectorization because we just punt to scalar. E.g. today we bail out on:
>
> if (a[i] > x)
> {
> b = a[i];
> c = i;
> }
>
> But
>
> if (a[i] > x)
> {
> b = a[i];
> c = i;
> break;
> }
>
> Works fine. For fully masked loops, Richard's design with multiple rgroups
> would
> Allow us to handle these things better without the scalar loop should we
> want to
> In the future. The current design doesn't prohibit this choice in the future.
>
> >
> > > Concretely the kind of loops supported are of the forms:
> > >
> > > for (int i = 0; i < N; i++)
> > > {
> > > <statements1>
> > > if (<condition>)
> > > <action>;
> > > <statements2>
> > > }
> > >
> > > where <action> can be:
> > > - break
> > > - return
> > >
> > > Any number of statements can be used before the <action> occurs.
> > >
> > > Since this is an initial version for GCC 13 it has the following limitations:
> > >
> > > - Only fixed sized iterations and buffers are supported. That is to say any
> > > vectors loaded or stored must be to statically allocated arrays with
> known
> > > sizes. N must also be known.
> >
> > Why?
>
> Not an intrinsic limitation, just one done for practicality and to keep the patch
> simpler. These cases were most of the cases that we wanted.
>
> Supporting this requires adding support for multiple-exits to all the different
> peeling
> and versioning code at once, which would be a much bigger patch.
>
> Additionally for SVE (the main target of the codegen change) we'd want to
> do this
> using first faulting loads, but there's a dependency to other things we must
> support both in GIMPLE itself and the vectorizer before we can do this.
>
Additionally peeling for alignment only works for a single input stream (or mutually
aligned ones) which defeats some of our use-cases.
For variable length ISAs static alignment also doesn't work, so since SVE was the
main target of the patch the current limitation was a sensible one to start with.
> >
> > > - any stores in <statements1> should not be to the same objects as in
> > > <condition>. Loads are fine as long as they don't have the possibility to
> > > alias.
> >
> > I think that's a fundamental limitation - you have to be able to compute
> > the early exit condition at the beginning of the vectorized loop. For
> > a single alternate exit it might be possible to apply loop rotation to
> > move things but that can introduce "bad" cross-iteration dependences(?)
> >
>
I should be able to support WAR-dependencies without much effort though,
that would likely be the common case here.
> That's an interesting idea, I'd have to work it out on paper. I guess the main
> difficulty compared to say classical loop rotation is that the condition inside
> the early break statement can itself be dependent on other statement. So
> you still have to move a "chain" of statements which themselves still need to
> be vectorized.
>
> Where it gets difficult, and partially why I also only support 1 early exit in this
> first
> version is that a second exit has a dependency on the 1st one. And there
> may
> be other statements between the first and second exit. This is where I think
> loop rotation would fall apart vs the code motion I'm doing now.
>
Another main difference is that we don't want to rotate the entire header as
a whole, we only want to move certain instructions, and what those instructions
are highly depends on this particular use case. So we wouldn't be able to make
a independent optimization here.
Cheers,
Tamar
> > > - No support for prologue peeling. Since we only support fixed buffers
> this
> > > wouldn't be an issue as we assume the arrays are correctly aligned.
> >
> > Huh, I don't understand how prologue or epilogue peeling is an issue? Is
> > that just because you didn't handle the early exit triggering?
>
> Yeah, it's not an intrinsic limitation, and the code implemented doesn't have
> anything that would prevent this from happening in the future. It's just
> something
> we didn't require for the current use-cases.
>
> To support this we'd "just" need to support prologue peeling by branching to
> the exit block, but we'd have to split the exit block so we keep simple two
> argument phi nodes for each peeled iteration. i.e. I don't think they can all
> exit
> to the same block (do we support phi nodes with N entries?) as I don't think
> we'd be able to handle that reduction. So I know how to potentially do it,
> and kept it in mind in the implementation, but just for practicality/time
> did not do it at this time.
>
> >
> > > - Fully masked loops or unmasked loops are supported, but not partially
> > masked
> > > loops.
> > > - Only one additional exit is supported at this time. The majority of the
> > code
> > > will handle n exits. But not all so at this time this restriction is needed.
> > > - The early exit must be before the natural loop exit/latch. The vectorizer
> is
> > > designed in way to propage phi-nodes downwards. As such supporting
> > this
> > > inverted control flow is hard.
> >
> > How do you identify the "natural" exit? It's the one
> > number_of_iterations_exit works on? Your normal_exit picks the
> > first from the loops recorded exit list but I don't think that list
> > is ordered in any particular way.
>
> Ah thought it was since during the loop analysis it's always the first exit.
> But can easily update the patch to determine that in a smarter way.
>
> >
> > "normal_exit" would rather be single_countable_exit () or so? A loop
> > already has a list of control_ivs (not sure if we ever have more than
> > one), I wonder if that can be annotated with the corresponding exit
> > edge?
> >
> > I think that vect_analyze_loop_form should record the counting IV
> > exit edge and that recorded edge should be passed to utilities
> > like slpeel_can_duplicate_loop_p rather than re-querying 'normal_exit',
> > for example if we'd have
> >
> > for (;; ++i, ++j)
> > {
> > if (i < n)
> > break;
> > a[i] = 0;
> > if (j < m)
> > break;
> > }
> >
> > which counting IV we choose as "normal" should be up to the vectorizer,
> > not up to the loop infrastructure.
>
> Ah, That's a fair enough point and easy enough to do.
>
> >
> > The patch should likely be split, doing single_exit () replacements
> > with, say, LOOP_VINFO_IV_EXIT (..) first.
> >
>
> Ok, I'll start doing that now while waiting for the full review.
>
> >
> > > - No support for epilogue vectorization. The only epilogue supported is
> the
> > > scalar final one.
> > >
> > > With the help of IPA this still gets hit quite often. During bootstrap it
> > > hit rather frequently as well.
> > >
> > > This implementation does not support completely handling the early
> break
> > inside
> > > the vector loop itself but instead supports adding checks such that if we
> > know
> > > that we have to exit in the current iteration then we branch to scalar code
> > to
> > > actually do the final VF iterations which handles all the code in <action>.
> > >
> > > niters analysis and the majority of the vectorizer with hardcoded
> > single_exit
> > > have been updated with the use of a new function normal_exit which
> > returns the
> > > loop's natural exit.
> > >
> > > for niters the natural exit is still what determines the overall iterations as
> > > that is the O(iters) for the loop.
> > >
> > > For the scalar loop we know that whatever exit you take you have to
> > perform at
> > > most VF iterations.
> > >
> > > When the loop is peeled during the copying I have to go through great
> > lengths to
> > > keep the dominators up to date. All exits from the first loop are rewired
> to
> > the
> > > loop header of the second loop. But this can change the immediate
> > dominator.
> >
> > Not sure how - it would probably help to keep the original scalar loop
> > as the epilogue and instead emit the vector loop as copy on that loops
> > entry edge so wiring the alternate exits to that very same place is
> > trivial?
>
> Hmm yes flipping the loop wiring would simplify the dominators. I did it this
> way because that's the direction normal epilogue peeling did today. But
> looking
> at the code this should be easy to do.
>
> I'll also start on this now.
>
> >
> > > We had spoken on IRC about removing the dominators validation call at
> the
> > end of
> > > slpeel_tree_duplicate_loop_to_edge_cfg and leaving it up to cfg cleanup
> > to
> > > remove the intermediate blocks that cause the dominators to fail.
> > >
> > > However this turned out not to work as cfgcleanup itself requires the
> > dominators
> > > graph. So it's somewhat a chicken and egg. To work around this I added
> > some
> > > rules for when I update what dominator and also reject the forms I don't
> > support
> > > during vect_analyze_loop_form.
> > >
> > > I have tried to structure the updates to loop-manip.cc in a way that it fits
> > > with the current flow. I think I have done a decent job, but there are
> things
> > I
> > > can also do differently if preferred and have pointed them out in
> > comments in
> > > the source.
> > >
> > > For the loop peeling we rewrite the loop form:
> > >
> > >
> > > Header
> > > ---
> > > |x|
> > > 2
> > > |
> > > v
> > > -------3<------
> > > early exit | | |
> > > v v | latch
> > > 7 4----->6
> > > | |
> > > | v
> > > | 8
> > > | |
> > > | v
> > > ------>5
> > >
> > > into
> > >
> > > Header
> > > ---
> > > |x|
> > > 2
> > > |
> > > v
> > > -------3<------
> > > early exit | | |
> > > v v | latch
> > > 7 4----->6
> > > | |
> > > | v
> > > | 8
> > > | |
> > > | v
> > > | New Header
> > > | ---
> > > ----->|x|
> > > 9
> > > |
> > > v
> > > ------10<-----
> > > early exit | | |
> > > v v | latch
> > > 14 11---->13
> > > | |
> > > | v
> > > | 12
> > > | |
> > > | v
> > > ------> 5
> > >
> > > When we vectorize we move any statement not related to the early
> break
> > itself to
> > > the BB after the early exit and update all references as appropriate.
> > >
> > > This means that we check at the start of iterations whether we are going
> to
> > exit
> > > or not. During the analyis phase we check whether we are allowed to do
> > this
> > > moving of statements. Also note that we only move the vector
> statements
> > and
> > > leave the scalars alone.
> > >
> > > Codegen:
> > >
> > > for e.g.
> > >
> > > #define N 803
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > > unsigned ret = 0;
> > > for (int i = 0; i < N; i++)
> > > {
> > > vect_b[i] = x + i;
> > > if (vect_a[i] > x)
> > > break;
> > > vect_a[i] = x;
> > >
> > > }
> > > return ret;
> > > }
> > >
> > > We generate for NEON:
> > >
> > > test4:
> > > adrp x2, .LC0
> > > adrp x3, .LANCHOR0
> > > dup v2.4s, w0
> > > add x3, x3, :lo12:.LANCHOR0
> > > movi v4.4s, 0x4
> > > add x4, x3, 3216
> > > ldr q1, [x2, #:lo12:.LC0]
> > > mov x1, 0
> > > mov w2, 0
> > > .p2align 3,,7
> > > .L3:
> > > ldr q0, [x3, x1]
> > > add v3.4s, v1.4s, v2.4s
> > > add v1.4s, v1.4s, v4.4s
> > > cmhi v0.4s, v0.4s, v2.4s
> > > umaxp v0.4s, v0.4s, v0.4s
> > > fmov x5, d0
> > > cbnz x5, .L6
> > > add w2, w2, 1
> > > str q3, [x1, x4]
> > > str q2, [x3, x1]
> > > add x1, x1, 16
> > > cmp w2, 200
> > > bne .L3
> > > mov w7, 3
> > > .L2:
> > > lsl w2, w2, 2
> > > add x5, x3, 3216
> > > add w6, w2, w0
> > > sxtw x4, w2
> > > ldr w1, [x3, x4, lsl 2]
> > > str w6, [x5, x4, lsl 2]
> > > cmp w0, w1
> > > bcc .L4
> > > add w1, w2, 1
> > > str w0, [x3, x4, lsl 2]
> > > add w6, w1, w0
> > > sxtw x1, w1
> > > ldr w4, [x3, x1, lsl 2]
> > > str w6, [x5, x1, lsl 2]
> > > cmp w0, w4
> > > bcc .L4
> > > add w4, w2, 2
> > > str w0, [x3, x1, lsl 2]
> > > sxtw x1, w4
> > > add w6, w1, w0
> > > ldr w4, [x3, x1, lsl 2]
> > > str w6, [x5, x1, lsl 2]
> > > cmp w0, w4
> > > bcc .L4
> > > str w0, [x3, x1, lsl 2]
> > > add w2, w2, 3
> > > cmp w7, 3
> > > beq .L4
> > > sxtw x1, w2
> > > add w2, w2, w0
> > > ldr w4, [x3, x1, lsl 2]
> > > str w2, [x5, x1, lsl 2]
> > > cmp w0, w4
> > > bcc .L4
> > > str w0, [x3, x1, lsl 2]
> > > .L4:
> > > mov w0, 0
> > > ret
> > > .p2align 2,,3
> > > .L6:
> > > mov w7, 4
> > > b .L2
> > >
> > > and for SVE:
> > >
> > > test4:
> > > adrp x2, .LANCHOR0
> > > add x2, x2, :lo12:.LANCHOR0
> > > add x5, x2, 3216
> > > mov x3, 0
> > > mov w1, 0
> > > cntw x4
> > > mov z1.s, w0
> > > index z0.s, #0, #1
> > > ptrue p1.b, all
> > > ptrue p0.s, all
> > > .p2align 3,,7
> > > .L3:
> > > ld1w z2.s, p1/z, [x2, x3, lsl 2]
> > > add z3.s, z0.s, z1.s
> > > cmplo p2.s, p0/z, z1.s, z2.s
> > > b.any .L2
> > > st1w z3.s, p1, [x5, x3, lsl 2]
> > > add w1, w1, 1
> > > st1w z1.s, p1, [x2, x3, lsl 2]
> > > add x3, x3, x4
> > > incw z0.s
> > > cmp w3, 803
> > > bls .L3
> > > .L5:
> > > mov w0, 0
> > > ret
> > > .p2align 2,,3
> > > .L2:
> > > cntw x5
> > > mul w1, w1, w5
> > > cbz w5, .L5
> > > sxtw x1, w1
> > > sub w5, w5, #1
> > > add x5, x5, x1
> > > add x6, x2, 3216
> > > b .L6
> > > .p2align 2,,3
> > > .L14:
> > > str w0, [x2, x1, lsl 2]
> > > cmp x1, x5
> > > beq .L5
> > > mov x1, x4
> > > .L6:
> > > ldr w3, [x2, x1, lsl 2]
> > > add w4, w0, w1
> > > str w4, [x6, x1, lsl 2]
> > > add x4, x1, 1
> > > cmp w0, w3
> > > bcs .L14
> > > mov w0, 0
> > > ret
> > >
> > > On the workloads this work is based on we see between 2-3x
> performance
> > uplift
> > > using this patch.
> > >
> > > Outstanding issues:
> > > - The patch is fully functional but has two things I wonder about
> > > * In vect_transform_early_break should I just refactor
> > vectorizable_comparison
> > > and use it to generate the condition body? That would also get the
> > costing.
> >
> > I'm looking at vectorizable_early_exit and validate_early_exit_stmts
> > and I think that this should be mostly done as part of dependence
> > analysis (because that's what it is) which should also remove the
> > requirement of only handling decl-based accesses?
>
> That is fair enough, do you have a specific spot in mind where you'd
> prefer me to slot it into?
>
> >
> > As for vect_transform_early_break, sure. I fear that since you
> > transform if (_1 > _2) to some _3 = _1 > _2; use(_3) that you need
> > to expose this to the bool pattern handling machinery somehow.
> > I can see that moving stmts around and doing it the way you do
> > code-generation wise is easiest.
> >
> > How does this work with SLP btw? You don't touch tree-vect-slp.cc at all
> > but now that we have multiple BBs there's the issue of splitting
> > children across different BBs - there's only
> >
> > if ((phi_p || gimple_could_trap_p (stmt_info->stmt))
> > && (gimple_bb (first_stmt_info->stmt)
> > != gimple_bb (stmt_info->stmt)))
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "Build SLP failed: different BB for PHI "
> > "or possibly trapping operation in %G",
> > stmt);
> > /* Mismatch. */
> > continue;
> > }
> >
> > right now and the code motion you apply also might break the assumptions
> > of the dependence analysis code. I suppose that SLPing the early exit
> > isn't supported, aka
> >
> > for (;;)
> > {
> > if (a[2*i] > x) break;
> > if (a[2*i + 1 > x) break;
> > ...
> > }
> >
> > or
> > _1 = a[2*i] > x | a[2*i + 1 > x;
> > if (_1) break;
> >
> > ?
>
> Indeed, SLP traps with the failure message you highlighted above.
> At the moment I added a restriction to a single exit, this stops it from
> getting that far. (This limitation is because the code motion over multiple
> exits becomes interesting, it's not specifically for SLP, and if SLP did work I
> would move the check during SLP build, or after).
>
> Aside from that, different parts of slp build fail with e.g.
>
> Build SLP failed: different operation in stmt _11 = _4 * x_17(D);
>
> (This is testcase 6 in my list of test).
>
> Hybrid does work though, if the part with the conditional is in the non-SLP
> part.
>
> >
> > > * The testcase vect-early-break_2.c shows one form that currently
> > doesn't work
> > > and crashes. The reason is that there's a mismatch between the types
> > required
> > > to vectorize this. The vector loads cause multiple statements to be
> > generated
> > > and thus require multiple comparisons. In this case 8 of them.
> However
> > > when determining ncopies the early exit uses a boolean mode and so
> > ncopies
> > > is always 1. If I force it instead to determine ncopies based on it's
> > > operands instead of the final type then we get the conditonal
> vectorized
> > > but the it has a mismatch comparing integer vectors with boolean.
> > > It feels like I need some kind of boolean reductions here.. Should I just
> > > reject this form for now?
> >
> > That's probably the bool pattern handling I hinted at above.
> > Bools/conditions are awkward, maybe you should handle the
> > GIMPLE_CONDs as patterns computing the actual condition as mask
> > fed into a dummy .IFN_CONSUME_MASK stmt?
>
> Indeed, though one additional difficulty here is that in the example for
> instance
> the number of copies is needed, e.g. if you have to do widening before the
> compare.
> This means that you have _hi, _lo splits. So unless you short circuit this can
> lead to
> quite a number of operations before you exit.
>
> I could also generate an OR reduction in this case instead of needing a new
> IFN, but
> I'll go with whatever you prefer/recommend.
>
> >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-
> gnu
> > and issues
> > > mentioned above.
> > >
> > > OK enough design and implementation for GCC 13?
> >
> > Not sure, I didn't yet look thoroughly at the patch itself.
>
> I'll light some candles 😊
>
> Thanks for taking a look,
> Tamar
>
> >
> > Richard.
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > * cfgloop.cc (normal_exit): New.
> > > * cfgloop.h (normal_exit): New.
> > > * doc/loop.texi (normal_exit): Document.
> > > * doc/sourcebuild.texi (vect_early_break): Document.
> > > * tree-scalar-evolution.cc (get_loop_exit_condition): Refactor.
> > > (get_edge_condition): New.
> > > * tree-scalar-evolution.h (get_edge_condition): new.
> > > * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Get
> > main
> > > exit during peeling check.
> > > * tree-vect-loop-manip.cc
> > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support copying CFGs
> > with
> > > multiple exits and place at the end.
> > > (vect_update_ivs_after_vectorizer): Skip on early exits.
> > > (vect_update_ivs_after_early_break): New.
> > > (gimple_find_last_mem_use): New.
> > > (slpeel_update_phi_nodes_for_loops,
> > slpeel_update_phi_nodes_for_guard2,
> > > slpeel_update_phi_nodes_for_lcssa,
> > vect_gen_vector_loop_niters_mult_vf,
> > > slpeel_can_duplicate_loop_p,
> > vect_set_loop_condition_partial_vectors):
> > > Update for multiple exits.
> > > (vect_set_loop_condition, vect_set_loop_condition_normal):
> > Update
> > > condition for early exits.
> > > (vect_do_peeling): Peel for early breaks.
> > > * tree-vect-loop.cc (vect_get_loop_niters): Analyze and return all
> > > exits.
> > > (vect_analyze_loop_form, vect_create_loop_vinfo): Analyze all
> > conds.
> > > (vect_determine_partial_vectors_and_peeling): Suport multiple
> > exits by
> > > peeing.
> > > (vect_analyze_loop): Add anaysis for multiple exits.
> > > (move_early_exit_stmts, vect_transform_early_break,
> > > validate_early_exit_stmts, vectorizable_early_exit): New.
> > > (vectorizable_live_operation): Ignore early break statements.
> > > (scale_profile_for_vect_loop, vect_transform_loop): Support
> > multiple
> > > exits.
> > > * tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze early breaks.
> > > (prepare_vec_mask): Expose.
> > > (vect_analyze_stmt, vect_transform_stmt, vect_is_simple_use,
> > > vect_get_vector_types_for_stmt): Support loop control/early exits.
> > > * tree-vectorizer.cc (pass_vectorize::execute): Record all exits for
> > > RPO.
> > > * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > (slpeel_can_duplicate_loop_p): Change loop to loop_vec_info.
> > > (struct vect_loop_form_info): Add loop conditions.
> > > (LOOP_VINFO_EARLY_BREAKS, vect_transform_early_break,
> > > vectorizable_early_exit): New.
> > > (prepare_vec_mask): New.
> > > (vec_info): Add early_breaks.
> > > (loop_vec_info_for_loop): Make loop const.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * lib/target-supports.exp (vect_early_break): New.
> > > * g++.dg/vect/vect-early-break_1.cc: New test.
> > > * g++.dg/vect/vect-early-break_2.cc: New test.
> > > * g++.dg/vect/vect-early-break_3.cc: New test.
> > > * gcc.dg/vect/vect-early-break-run_1.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_10.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_2.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_3.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_4.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_5.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_6.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_7.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_8.c: New test.
> > > * gcc.dg/vect/vect-early-break-run_9.c: New test.
> > > * gcc.dg/vect/vect-early-break-template_1.c: New test.
> > > * gcc.dg/vect/vect-early-break-template_2.c: New test.
> > > * gcc.dg/vect/vect-early-break_1.c: New test.
> > > * gcc.dg/vect/vect-early-break_10.c: New test.
> > > * gcc.dg/vect/vect-early-break_11.c: New test.
> > > * gcc.dg/vect/vect-early-break_12.c: New test.
> > > * gcc.dg/vect/vect-early-break_13.c: New test.
> > > * gcc.dg/vect/vect-early-break_14.c: New test.
> > > * gcc.dg/vect/vect-early-break_15.c: New test.
> > > * gcc.dg/vect/vect-early-break_2.c: New test.
> > > * gcc.dg/vect/vect-early-break_3.c: New test.
> > > * gcc.dg/vect/vect-early-break_4.c: New test.
> > > * gcc.dg/vect/vect-early-break_5.c: New test.
> > > * gcc.dg/vect/vect-early-break_6.c: New test.
> > > * gcc.dg/vect/vect-early-break_7.c: New test.
> > > * gcc.dg/vect/vect-early-break_8.c: New test.
> > > * gcc.dg/vect/vect-early-break_9.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> > > index
> >
> 528b1219bc37ad8f114d5cf381c0cff899db31ee..9c7f019a51abfe2de8e1dd7135
> > dea2463b0256a0 100644
> > > --- a/gcc/cfgloop.h
> > > +++ b/gcc/cfgloop.h
> > > @@ -385,6 +385,7 @@ extern basic_block
> > *get_loop_body_in_custom_order (const class loop *, void *,
> > >
> > > extern auto_vec<edge> get_loop_exit_edges (const class loop *,
> > basic_block * = NULL);
> > > extern edge single_exit (const class loop *);
> > > +extern edge normal_exit (const class loop *);
> > > extern edge single_likely_exit (class loop *loop, const vec<edge> &);
> > > extern unsigned num_loop_branches (const class loop *);
> > >
> > > diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
> > > index
> >
> 57bf7b1855d4dd20fb3f42388124932d0ca2b48a..97a7373fb6d9514da602d5be0
> > 1050f2ec66094bc 100644
> > > --- a/gcc/cfgloop.cc
> > > +++ b/gcc/cfgloop.cc
> > > @@ -1812,6 +1812,20 @@ single_exit (const class loop *loop)
> > > return NULL;
> > > }
> > >
> > > +/* Returns the normal exit edge of LOOP, or NULL if LOOP has either no
> > exit.
> > > + If loops do not have the exits recorded, NULL is returned always. */
> > > +
> > > +edge
> > > +normal_exit (const class loop *loop)
> > > +{
> > > + struct loop_exit *exit = loop->exits->next;
> > > +
> > > + if (!loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> > > + return NULL;
> > > +
> > > + return exit->e;
> > > +}
> > > +
> > > /* Returns true when BB has an incoming edge exiting LOOP. */
> > >
> > > bool
> > > diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> > > index
> >
> 6e8657a074d2447db7ae9b75cbfbb71282b84287..e1de2ac40f87f879ab691f68b
> > d41b3bc21a83bf7 100644
> > > --- a/gcc/doc/loop.texi
> > > +++ b/gcc/doc/loop.texi
> > > @@ -211,6 +211,10 @@ relation, and breath-first search order,
> > respectively.
> > > @item @code{single_exit}: Returns the single exit edge of the loop, or
> > > @code{NULL} if the loop has more than one exit. You can only use this
> > > function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
> > > +function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
> > > +@item @code{normal_exit}: Returns the natural exit edge of the loop,
> > > +even if the loop has more than one exit. The natural exit is the exit
> > > +that would normally be taken where the loop to be fully executed.
> > > @item @code{get_loop_exit_edges}: Enumerates the exit edges of a
> > loop.
> > > @item @code{just_once_each_iteration_p}: Returns true if the basic
> > block
> > > is executed exactly once during each iteration of a loop (that is, it
> > > @@ -623,4 +627,4 @@ maximum verbosity the details of a data
> > dependence relations array,
> > > @code{dump_dist_dir_vectors} prints only the classical distance and
> > > direction vectors for a data dependence relations array, and
> > > @code{dump_data_references} prints the details of the data references
> > > -contained in a data reference array.
> > > +contained in a data reference array
> > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > > index
> >
> e21a1d381e05da1bfccb555068ea1dbeabd9fc79..16fa94ebf532d27cd9a3a45a7
> > aad578ca6920496 100644
> > > --- a/gcc/doc/sourcebuild.texi
> > > +++ b/gcc/doc/sourcebuild.texi
> > > @@ -1640,6 +1640,10 @@ Target supports hardware vectors of
> > @code{float} when
> > > @option{-funsafe-math-optimizations} is not in effect.
> > > This implies @code{vect_float}.
> > >
> > > +@item vect_early_break
> > > +Target supports hardware vectorization of loops with early breaks.
> > > +This requires an implementation of the cbranch optab for vectors.
> > > +
> > > @item vect_int
> > > Target supports hardware vectors of @code{int}.
> > >
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..6a83648ca36e2c8feeb78335fc
> > cf3f3b82a97d2e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > > @@ -0,0 +1,61 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-w -O2" } */
> > > +
> > > +void fancy_abort(char *, int, const char *)
> > __attribute__((__noreturn__));
> > > +template <unsigned N, typename> struct poly_int_pod { int coeffs[N];
> };
> > > +template <unsigned N, typename> class poly_int : public
> poly_int_pod<N,
> > int> {
> > > +public:
> > > + template <typename Ca> poly_int &operator+=(const
> poly_int_pod<N,
> > Ca> &);
> > > +};
> > > +template <unsigned N, typename C>
> > > +template <typename Ca>
> > > +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N,
> Ca>
> > &a) {
> > > + for (int i = 0; i < N; i++)
> > > + this->coeffs[i] += a.coeffs[i];
> > > + return *this;
> > > +}
> > > +template <unsigned N, typename Ca, typename Cb>
> > > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > > + poly_int<N, long> r;
> > > + return r;
> > > +}
> > > +struct vec_prefix {
> > > + unsigned m_num;
> > > +};
> > > +struct vl_ptr;
> > > +struct va_heap {
> > > + typedef vl_ptr default_layout;
> > > +};
> > > +template <typename, typename A, typename = typename
> > A::default_layout>
> > > +struct vec;
> > > +template <typename T, typename A> struct vec<T, A, int> {
> > > + T &operator[](unsigned);
> > > + vec_prefix m_vecpfx;
> > > + T m_vecdata[];
> > > +};
> > > +template <typename T, typename A> T &vec<T, A,
> > int>::operator[](unsigned ix) {
> > > + m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > > + return m_vecdata[ix];
> > > +}
> > > +template <typename T> struct vec<T, va_heap> {
> > > + T &operator[](unsigned ix) { return m_vec[ix]; }
> > > + vec<T, va_heap, int> m_vec;
> > > +};
> > > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > > +template <typename> class vector_builder : public auto_vec {};
> > > +class int_vector_builder : public vector_builder<int> {
> > > +public:
> > > + int_vector_builder(poly_int<2, long>, int, int);
> > > +};
> > > +bool vect_grouped_store_supported() {
> > > + int i;
> > > + poly_int<2, long> nelt;
> > > + int_vector_builder sel(nelt, 2, 3);
> > > + for (i = 0; i < 6; i++)
> > > + sel[i] += exact_div(nelt, 2);
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..6a83648ca36e2c8feeb78335fc
> > cf3f3b82a97d2e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > > @@ -0,0 +1,61 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-w -O2" } */
> > > +
> > > +void fancy_abort(char *, int, const char *)
> > __attribute__((__noreturn__));
> > > +template <unsigned N, typename> struct poly_int_pod { int coeffs[N];
> };
> > > +template <unsigned N, typename> class poly_int : public
> poly_int_pod<N,
> > int> {
> > > +public:
> > > + template <typename Ca> poly_int &operator+=(const
> poly_int_pod<N,
> > Ca> &);
> > > +};
> > > +template <unsigned N, typename C>
> > > +template <typename Ca>
> > > +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N,
> Ca>
> > &a) {
> > > + for (int i = 0; i < N; i++)
> > > + this->coeffs[i] += a.coeffs[i];
> > > + return *this;
> > > +}
> > > +template <unsigned N, typename Ca, typename Cb>
> > > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > > + poly_int<N, long> r;
> > > + return r;
> > > +}
> > > +struct vec_prefix {
> > > + unsigned m_num;
> > > +};
> > > +struct vl_ptr;
> > > +struct va_heap {
> > > + typedef vl_ptr default_layout;
> > > +};
> > > +template <typename, typename A, typename = typename
> > A::default_layout>
> > > +struct vec;
> > > +template <typename T, typename A> struct vec<T, A, int> {
> > > + T &operator[](unsigned);
> > > + vec_prefix m_vecpfx;
> > > + T m_vecdata[];
> > > +};
> > > +template <typename T, typename A> T &vec<T, A,
> > int>::operator[](unsigned ix) {
> > > + m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > > + return m_vecdata[ix];
> > > +}
> > > +template <typename T> struct vec<T, va_heap> {
> > > + T &operator[](unsigned ix) { return m_vec[ix]; }
> > > + vec<T, va_heap, int> m_vec;
> > > +};
> > > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > > +template <typename> class vector_builder : public auto_vec {};
> > > +class int_vector_builder : public vector_builder<int> {
> > > +public:
> > > + int_vector_builder(poly_int<2, long>, int, int);
> > > +};
> > > +bool vect_grouped_store_supported() {
> > > + int i;
> > > + poly_int<2, long> nelt;
> > > + int_vector_builder sel(nelt, 2, 3);
> > > + for (i = 0; i < 6; i++)
> > > + sel[i] += exact_div(nelt, 2);
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12
> > 273fd8e5aa2018c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-w -O2" } */
> > > +
> > > +int aarch64_advsimd_valid_immediate_hs_val32;
> > > +bool aarch64_advsimd_valid_immediate_hs() {
> > > + for (int shift = 0; shift < 32; shift += 8)
> > > + if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
> > > + return aarch64_advsimd_valid_immediate_hs_val32;
> > > + for (;;)
> > > + ;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d
> > 17a5c979fd78083
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 0
> > > +#include "vect-early-break-template_1.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d1856
> > 9b3406050e54603
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 800
> > > +#define P 799
> > > +#include "vect-early-break-template_2.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..63f63101a467909f328be7f3ac
> > bc5bcb721967ff
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 802
> > > +#include "vect-early-break-template_1.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9
> > e0264d6301c8589
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 5
> > > +#include "vect-early-break-template_1.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c
> > 15d9ed6ab15bada
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 278
> > > +#include "vect-early-break-template_1.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd23
> > 8d1aff7a7c7da
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 800
> > > +#define P 799
> > > +#include "vect-early-break-template_1.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c604
> > 76b7c8f531ddcb
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 0
> > > +#include "vect-early-break-template_2.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7
> > f5e4acde4aeec9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 802
> > > +#include "vect-early-break-template_2.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..a614925465606b54c638221ffb
> > 95a5e8d3bee797
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 5
> > > +#include "vect-early-break-template_2.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa
> > 67604563f0afee7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > +
> > > +#define N 803
> > > +#define P 278
> > > +#include "vect-early-break-template_2.c"
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f
> > 2de02ddcc95de9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > > @@ -0,0 +1,47 @@
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +
> > > +#ifndef P
> > > +#define P 0
> > > +#endif
> > > +
> > > +unsigned vect_a[N] = {0};
> > > +unsigned vect_b[N] = {0};
> > > +
> > > +__attribute__((noipa, noinline))
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +extern void abort ();
> > > +
> > > +int main ()
> > > +{
> > > +
> > > + int x = 1;
> > > + int idx = P;
> > > + vect_a[idx] = x + 1;
> > > +
> > > + test4(x);
> > > +
> > > + if (vect_b[idx] != (x + idx))
> > > + abort ();
> > > +
> > > + if (vect_a[idx] != x + 1)
> > > + abort ();
> > > +
> > > + if (idx > 0 && vect_a[idx-1] != x)
> > > + abort ();
> > > +
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f
> > 1089e5607dca0d
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > > @@ -0,0 +1,50 @@
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +
> > > +#ifndef P
> > > +#define P 0
> > > +#endif
> > > +
> > > +unsigned vect_a[N] = {0};
> > > +unsigned vect_b[N] = {0};
> > > +
> > > +__attribute__((noipa, noinline))
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return i;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +extern void abort ();
> > > +
> > > +int main ()
> > > +{
> > > +
> > > + int x = 1;
> > > + int idx = P;
> > > + vect_a[idx] = x + 1;
> > > +
> > > + unsigned res = test4(x);
> > > +
> > > + if (res != idx)
> > > + abort ();
> > > +
> > > + if (vect_b[idx] != (x + idx))
> > > + abort ();
> > > +
> > > + if (vect_a[idx] != x + 1)
> > > + abort ();
> > > +
> > > + if (idx > 0 && vect_a[idx-1] != x)
> > > + abort ();
> > > +
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c8
> > 39f98562b6d4dd7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961e
> > ad5114fcc61a11b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x,int y, int z)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] = x;
> > > + }
> > > +
> > > + ret = x + y * z;
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a
> > 24ef854994a9890
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > > @@ -0,0 +1,31 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x, int y)
> > > +{
> > > + unsigned ret = 0;
> > > +for (int o = 0; o < y; o++)
> > > +{
> > > + ret += o;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > +}
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3e
> > bcb5c25a39d1b2
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > > @@ -0,0 +1,31 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x, int y)
> > > +{
> > > + unsigned ret = 0;
> > > +for (int o = 0; o < y; o++)
> > > +{
> > > + ret += o;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return vect_a[i];
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > +}
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6e
> > dcfe7c1580c7113
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return vect_a[i] * x;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666
> > bf608e3bc6a511
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#define N 803
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +int test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return i;
> > > + vect_a[i] += x * vect_b[i];
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ff
> > a7ca01c0f8d3a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#define N 803
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +int test4(unsigned x)
> > > +{
> > > + int ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return i;
> > > + vect_a[i] += x * vect_b[i];
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53c
> > c1e44ef4b84d5c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#include <complex.h>
> > > +
> > > +#define N 1024
> > > +complex double vect_a[N];
> > > +complex double vect_b[N];
> > > +
> > > +complex double test4(complex double x)
> > > +{
> > > + complex double ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] += x + i;
> > > + if (vect_a[i] == x)
> > > + return i;
> > > + vect_a[i] += x * vect_b[i];
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89
> > e43c3b70293b7d9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i++)
> > > + {
> > > + if (vect[i] > x)
> > > + return 1;
> > > +
> > > + vect[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51
> > ff1e94270dc861
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#define N 1024
> > > +unsigned vect[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + if (i > 16 && vect[i] > x)
> > > + break;
> > > +
> > > + vect[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f
> > 78ac3b84f6de24
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#define N 1024
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + return vect_a[i];
> > > + vect_a[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..09632d9afda7e07f1a8417514e
> > f77356f00045bd
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#define N 1024
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < (N/2); i+=2)
> > > + {
> > > + vect_b[i] = x + i;
> > > + vect_b[i+1] = x + i+1;
> > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > + break;
> > > + vect_a[i] += x * vect_b[i];
> > > + vect_a[i+1] += x * vect_b[i+1];
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da10
> > 3931ca394423d5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#include <complex.h>
> > > +
> > > +#define N 1024
> > > +complex double vect_a[N];
> > > +complex double vect_b[N];
> > > +
> > > +complex double test4(complex double x)
> > > +{
> > > + complex double ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] += x + i;
> > > + if (vect_a[i] == x)
> > > + break;
> > > + vect_a[i] += x * vect_b[i];
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a7
> > 35b8d902cbb607
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#include <complex.h>
> > > +
> > > +#define N 1024
> > > +char vect_a[N];
> > > +char vect_b[N];
> > > +
> > > +char test4(char x, char * restrict res)
> > > +{
> > > + char ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_b[i] += x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] += x * vect_b[i];
> > > + res[i] *= vect_b[i];
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be80
> > 2bba51cd818393
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +#ifndef N
> > > +#define N 803
> > > +#endif
> > > +unsigned vect_a[N];
> > > +unsigned vect_b[N];
> > > +
> > > +unsigned test4(unsigned x)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < N; i++)
> > > + {
> > > + vect_a[i] = x + i;
> > > + if (vect_a[i] > x)
> > > + break;
> > > + vect_a[i] = x;
> > > +
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-
> > supports.exp
> > > index
> >
> ccbbee847f755d6f30116d5b38e4027a998b48fd..5cbf54bd2a23dfdc5dc7b148b
> > 0dc6ed4c63814ae 100644
> > > --- a/gcc/testsuite/lib/target-supports.exp
> > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > @@ -3645,6 +3645,18 @@ proc check_effective_target_vect_int { } {
> > > }}]
> > > }
> > >
> > > +# Return 1 if the target supports hardware vectorization of early breaks,
> > > +# 0 otherwise.
> > > +#
> > > +# This won't change for different subtargets so cache the result.
> > > +
> > > +proc check_effective_target_vect_early_break { } {
> > > + return [check_cached_effective_target_indexed vect_early_break {
> > > + expr {
> > > + ([istarget aarch64*-*-*]
> > > + && [check_effective_target_aarch64_sve])
> > > + }}]
> > > +}
> > > # Return 1 if the target supports hardware vectorization of complex
> > additions of
> > > # byte, 0 otherwise.
> > > #
> > > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > > index
> >
> 0f90207bc733db3cf85979d9b0b962aefa0831d6..5af7d2bba0d62195704a8d41e
> > f6e600327169770 100644
> > > --- a/gcc/tree-scalar-evolution.h
> > > +++ b/gcc/tree-scalar-evolution.h
> > > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see
> > >
> > > extern tree number_of_latch_executions (class loop *);
> > > extern gcond *get_loop_exit_condition (const class loop *);
> > > +extern gcond *get_edge_condition (edge);
> > >
> > > extern void scev_initialize (void);
> > > extern bool scev_initialized_p (void);
> > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > > index
> >
> 7e2a3e986619de87e4ae9daf16198be1f13b917c..3012871dd7f9a7d1897f96a29
> > b1b0b28d90cb63f 100644
> > > --- a/gcc/tree-scalar-evolution.cc
> > > +++ b/gcc/tree-scalar-evolution.cc
> > > @@ -884,7 +884,7 @@ scev_dfs::add_to_evolution (tree chrec_before,
> > enum tree_code code,
> > > return res;
> > > }
> > >
> > > -
> > > +
> >
> >
> > > /* Follow the ssa edge into the binary expression RHS0 CODE RHS1.
> > > Return true if the strongly connected component has been found. */
> > >
> > > @@ -1295,8 +1295,15 @@ tail_recurse:
> > > gcond *
> > > get_loop_exit_condition (const class loop *loop)
> > > {
> > > + return get_edge_condition (normal_exit (loop));
> > > +}
> > > +
> > > +/* If the statement just before the EXIT_EDGE contains a condition then
> > > + return the condition, otherwise NULL. */
> > > +
> > > +gcond *
> > > +get_edge_condition (edge exit_edge){
> > > gcond *res = NULL;
> > > - edge exit_edge = single_exit (loop);
> > >
> > > if (dump_file && (dump_flags & TDF_SCEV))
> > > fprintf (dump_file, "(get_loop_exit_condition \n ");
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> >
> 4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74..02e373171675432cd32c4a7244
> > 0eebdff988bdcf 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -2072,7 +2072,7 @@ vect_enhance_data_refs_alignment
> > (loop_vec_info loop_vinfo)
> > >
> > > /* Check if we can possibly peel the loop. */
> > > if (!vect_can_advance_ivs_p (loop_vinfo)
> > > - || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > > + || !slpeel_can_duplicate_loop_p (loop_vinfo, normal_exit (loop))
> > > || loop->inner)
> > > do_peeling = false;
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> >
> 1d96130c985e2defd141cfdf602224c73b4b41f2..0b2a4920754d83aeb3795b435
> > 693d61adcfe92b6 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -770,7 +770,7 @@ vect_set_loop_condition_partial_vectors (class
> loop
> > *loop,
> > > add_header_seq (loop, header_seq);
> > >
> > > /* Get a boolean result that tells us whether to iterate. */
> > > - edge exit_edge = single_exit (loop);
> > > + edge exit_edge = normal_exit (loop);
> > > tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR :
> > NE_EXPR;
> > > tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
> > > gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
> > > @@ -789,7 +789,7 @@ vect_set_loop_condition_partial_vectors (class
> loop
> > *loop,
> > > if (final_iv)
> > > {
> > > gassign *assign = gimple_build_assign (final_iv, orig_niters);
> > > - gsi_insert_on_edge_immediate (single_exit (loop), assign);
> > > + gsi_insert_on_edge_immediate (exit_edge, assign);
> > > }
> > >
> > > return cond_stmt;
> > > @@ -799,7 +799,8 @@ vect_set_loop_condition_partial_vectors (class
> loop
> > *loop,
> > > loop handles exactly VF scalars per iteration. */
> > >
> > > static gcond *
> > > -vect_set_loop_condition_normal (class loop *loop, tree niters, tree
> step,
> > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > + class loop *loop, tree niters, tree step,
> > > tree final_iv, bool niters_maybe_zero,
> > > gimple_stmt_iterator loop_cond_gsi)
> > > {
> > > @@ -807,7 +808,7 @@ vect_set_loop_condition_normal (class loop
> *loop,
> > tree niters, tree step,
> > > gcond *cond_stmt;
> > > gcond *orig_cond;
> > > edge pe = loop_preheader_edge (loop);
> > > - edge exit_edge = single_exit (loop);
> > > + edge exit_edge = normal_exit (loop);
> > > gimple_stmt_iterator incr_gsi;
> > > bool insert_after;
> > > enum tree_code code;
> > > @@ -872,7 +873,11 @@ vect_set_loop_condition_normal (class loop
> > *loop, tree niters, tree step,
> > > In both cases the loop limit is NITERS - STEP. */
> > > gimple_seq seq = NULL;
> > > limit = force_gimple_operand (niters, &seq, true, NULL_TREE);
> > > - limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit), limit,
> > step);
> > > + /* For VLA leave limit == niters. Though I wonder if maybe I should
> > > + force partial loops here and use
> > vect_set_loop_condition_partial_vectors
> > > + instead. The problem is that the VL check is useless here. */
> > > + if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo) &&
> > !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> > > + limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit), limit,
> > step);
> > > if (seq)
> > > {
> > > basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe,
> > seq);
> > > @@ -907,7 +912,8 @@ vect_set_loop_condition_normal (class loop
> *loop,
> > tree niters, tree step,
> > > gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > >
> > > /* Record the number of latch iterations. */
> > > - if (limit == niters)
> > > + if (limit == niters
> > > + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > /* Case A: the loop iterates NITERS times. Subtract one to get the
> > > latch count. */
> > > loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > @@ -918,10 +924,17 @@ vect_set_loop_condition_normal (class loop
> > *loop, tree niters, tree step,
> > > loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> > > limit, step);
> > >
> > > - if (final_iv)
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + /* For multiple exits we've already maintained LCSSA form and handled
> > > + the scalar iteration update in the code that deals with the merge
> > > + block and its updated guard. I could move that code here instead
> > > + of in vect_update_ivs_after_early_break but I have to still deal
> > > + with the updates to the counter `i`. So for now I'll keep them
> > > + together. */
> > > + if (final_iv && exits.length () == 1)
> > > {
> > > gassign *assign;
> > > - edge exit = single_exit (loop);
> > > + edge exit = normal_exit (loop);
> > > gcc_assert (single_pred_p (exit->dest));
> > > tree phi_dest
> > > = integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
> > > @@ -972,13 +985,15 @@ vect_set_loop_condition (class loop *loop,
> > loop_vec_info loop_vinfo,
> > > gcond *orig_cond = get_loop_exit_condition (loop);
> > > gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
> > >
> > > - if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P
> > (loop_vinfo))
> > > + if (loop_vinfo
> > > + && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> > > cond_stmt = vect_set_loop_condition_partial_vectors (loop,
> loop_vinfo,
> > > niters, final_iv,
> > > niters_maybe_zero,
> > > loop_cond_gsi);
> > > else
> > > - cond_stmt = vect_set_loop_condition_normal (loop, niters, step,
> > final_iv,
> > > + cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop,
> niters,
> > > + step, final_iv,
> > > niters_maybe_zero,
> > > loop_cond_gsi);
> > >
> > > @@ -1066,7 +1081,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > > edge exit, new_exit;
> > > bool duplicate_outer_loop = false;
> > >
> > > - exit = single_exit (loop);
> > > + exit = normal_exit (loop);
> > > at_exit = (e == exit);
> > > if (!at_exit && e != loop_preheader_edge (loop))
> > > return NULL;
> > > @@ -1104,11 +1119,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class loop *loop,
> > > bbs[0] = preheader;
> > > new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> > >
> > > - exit = single_exit (scalar_loop);
> > > + exit = normal_exit (scalar_loop);
> > > copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> > > &exit, 1, &new_exit, NULL,
> > > at_exit ? loop->latch : e->src, true);
> > > - exit = single_exit (loop);
> > > + exit = normal_exit (loop);
> > > basic_block new_preheader = new_bbs[0];
> > >
> > > /* Before installing PHI arguments make sure that the edges
> > > @@ -1176,11 +1191,53 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class loop *loop,
> > > add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > }
> > > }
> > > +
> > > + /* If have multiple exist, we now need to point the additional exits
> > > + from the old loop to the loop pre-header of the new copied loop.
> > > + Currently we only support simple early break vectorization so all
> > > + additional exits must exit the loop. Additionally we can only place
> > > + copies at the end. i.e. we cannot do prologue peeling. */
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + bool multiple_exits_p = exits.length () > 1;
> > > +
> > > + /* Check to see if all of the exits point to the loop header. If they
> > > + don't then we have an intermediate BB that's no longer useful after
> > > + the copy and we should remove it. */
> > > + bool imm_exit = true;
> > > + for (auto exit : exits)
> > > + {
> > > + imm_exit = imm_exit && exit->dest == loop->header;
> > > + if (!imm_exit)
> > > + break;
> > > + }
> > > +
> > > + for (unsigned i = 1; i < exits.length (); i++)
> > > + {
> > > + redirect_edge_and_branch (exits[i], new_preheader);
> > > + flush_pending_stmts (exits[i]);
> > > + }
> > > +
> > > + /* Main exit must be the last to be rewritten as it's the first phi node
> > > + entry. The rest are in array order. */
> > > redirect_edge_and_branch_force (e, new_preheader);
> > > flush_pending_stmts (e);
> > > - set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> > >src);
> > > +
> > > + /* Only update the dominators of the new_preheader to the old exit
> if
> > > + we have effectively a single exit. */
> > > + if (!multiple_exits_p
> > > + || exits[1]->src != EDGE_PRED (exits[0]->src, 0)->src)
> > > + set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> > >src);
> > > + else
> > > + set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> > exits[1]->src);
> > > +
> > > + auto_vec<edge> new_exits = get_loop_exit_edges (new_loop);
> > > if (was_imm_dom || duplicate_outer_loop)
> > > - set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > new_exit->src);
> > > + {
> > > + if (!multiple_exits_p)
> > > + set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > new_exit->src);
> > > + else
> > > + set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > new_exits[1]->src);
> > > + }
> > >
> > > /* And remove the non-necessary forwarder again. Keep the other
> > > one so we have a proper pre-header for the loop at the exit edge.
> */
> > > @@ -1189,6 +1246,39 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > loop *loop,
> > > delete_basic_block (preheader);
> > > set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > loop_preheader_edge (scalar_loop)->src);
> > > +
> > > + /* Finally after wiring the new epilogue we need to update its main
> > exit
> > > + to the original function exit we recorded. Other exits are already
> > > + correct. */
> > > + if (!imm_exit && multiple_exits_p)
> > > + {
> > > + /* For now we expect at most a single successor here, but we might
> > be
> > > + able to extend this to multiple. */
> > > + if (single_succ_p (new_exit->dest) && single_pred_p (new_exit-
> > >dest))
> > > + {
> > > + edge exit_edge = single_succ_edge (new_exit->dest);
> > > + /* Now correct the dominators that were messed up during the
> > copying
> > > + as the CFG was tweaked a bit. */
> > > + /* The main exit is now dominated by a new fall through edge. */
> > > + set_immediate_dominator (CDI_DOMINATORS, exit_edge->src,
> > > + new_exits[0]->src);
> > > + /* If this is a fall through edge then don't update doms. */
> > > + if (!empty_block_p (exit_edge->src))
> > > + set_immediate_dominator (CDI_DOMINATORS, exit_edge-
> > >dest,
> > > + new_exits[1]->src);
> > > + }
> > > +
> > > + /* The exits from the BB with the early exit dominate the new
> > function
> > > + exit edge and also the second part of the loop. The edges were
> > > + copied correctly but the doms are wrong because during the
> > copying
> > > + some of the intermediate edges are rewritten. */
> > > + set_immediate_dominator (CDI_DOMINATORS, new_exits[0]->src,
> > > + new_exits[1]->src);
> > > + set_immediate_dominator (CDI_DOMINATORS, new_exits[0]-
> > >dest,
> > > + new_exits[0]->src);
> > > + set_immediate_dominator (CDI_DOMINATORS, new_exits[1]-
> > >dest,
> > > + new_exits[1]->src);
> > > + }
> > > }
> > > else /* Add the copy at entry. */
> > > {
> > > @@ -1310,20 +1400,24 @@ slpeel_add_loop_guard (basic_block
> guard_bb,
> > tree cond,
> > > */
> > >
> > > bool
> > > -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
> > > +slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo,
> > const_edge e)
> > > {
> > > - edge exit_e = single_exit (loop);
> > > + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > + edge exit_e = normal_exit (loop);
> > > edge entry_e = loop_preheader_edge (loop);
> > > gcond *orig_cond = get_loop_exit_condition (loop);
> > > gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > unsigned int num_bb = loop->inner? 5 : 2;
> > >
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + num_bb += 1;
> > > +
> > > /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > the function itself. */
> > > if (!loop_outer (loop)
> > > || loop->num_nodes != num_bb
> > > || !empty_block_p (loop->latch)
> > > - || !single_exit (loop)
> > > + || (!single_exit (loop) && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo))
> > > /* Verify that new loop exit condition can be trivially modified. */
> > > || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
> > > || (e != exit_e && e != entry_e))
> > > @@ -1528,6 +1622,12 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info loop_vinfo,
> > > gphi_iterator gsi, gsi1;
> > > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > basic_block update_bb = update_e->dest;
> > > +
> > > + /* For early exits we'll update the IVs in
> > > + vect_update_ivs_after_early_break. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + return;
> > > +
> > > basic_block exit_bb = single_exit (loop)->dest;
> > >
> > > /* Make sure there exists a single-predecessor exit bb: */
> > > @@ -1613,6 +1713,186 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info loop_vinfo,
> > > /* Fix phi expressions in the successor bb. */
> > > adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > }
> > > + return;
> > > +}
> > > +
> > > +/* Function vect_update_ivs_after_early_break.
> > > +
> > > + "Advance" the induction variables of LOOP to the value they should
> > take
> > > + after the execution of LOOP. This is currently necessary because the
> > > + vectorizer does not handle induction variables that are used after the
> > > + loop. Such a situation occurs when the last iterations of LOOP are
> > > + peeled, because of the early exit. With an early exit we always peel
> the
> > > + loop.
> > > +
> > > + Input:
> > > + - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > + vectorized. The last few iterations of LOOP were peeled.
> > > + - LOOP - a loop that is going to be vectorized. The last few iterations
> > > + of LOOP were peeled.
> > > + - VF - The loop vectorization factor.
> > > + - NITERS_ORIG - the number of iterations that LOOP executes
> (before
> > it is
> > > + vectorized). i.e, the number of times the ivs should be
> > > + bumped.
> > > + - NITERS_VECTOR - The number of iterations that the vector LOOP
> > executes.
> > > + - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> path
> > > + coming out from LOOP on which there are uses of the LOOP
> > ivs
> > > + (this is the path from LOOP->exit to epilog_loop-
> > >preheader).
> > > +
> > > + The new definitions of the ivs are placed in LOOP->exit.
> > > + The phi args associated with the edge UPDATE_E in the bb
> > > + UPDATE_E->dest are updated accordingly.
> > > +
> > > + Output:
> > > + - If available, the LCSSA phi node for the loop IV temp.
> > > +
> > > + Assumption 1: Like the rest of the vectorizer, this function assumes
> > > + a single loop exit that has a single predecessor.
> > > +
> > > + Assumption 2: The phi nodes in the LOOP header and in update_bb
> are
> > > + organized in the same order.
> > > +
> > > + Assumption 3: The access function of the ivs is simple enough (see
> > > + vect_can_advance_ivs_p). This assumption will be relaxed in the
> > future.
> > > +
> > > + Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> > path
> > > + coming out of LOOP on which the ivs of LOOP are used (this is the
> path
> > > + that leads to the epilog loop; other paths skip the epilog loop). This
> > > + path starts with the edge UPDATE_E, and its destination (denoted
> > update_bb)
> > > + needs to have its phis updated.
> > > + */
> > > +
> > > +static tree
> > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> loop
> > *,
> > > + poly_int64 vf, tree niters_orig,
> > > + tree niters_vector, edge update_e)
> > > +{
> > > + gphi_iterator gsi, gsi1;
> > > + tree ni_name, ivtmp = NULL;
> > > + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > + basic_block update_bb = update_e->dest;
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +
> > > + basic_block exit_bb = exits[0]->dest;
> > > +
> > > + if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + return NULL;
> > > +
> > > + for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > (update_bb);
> > > + !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > + gsi_next (&gsi), gsi_next (&gsi1))
> > > + {
> > > + tree init_expr;
> > > + tree step_expr;
> > > + tree type;
> > > + tree var, ni;
> > > + gimple_stmt_iterator last_gsi;
> > > +
> > > + gphi *phi = gsi1.phi ();
> > > + tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > loop_preheader_edge (loop));
> > > + gphi *phi1 = as_a <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > + stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "vect_update_ivs_after_early_break: phi: %G",
> > > + (gimple *)phi);
> > > +
> > > + /* Skip reduction and virtual phis. */
> > > + if (!iv_phi_p (phi_info))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "reduc or virtual phi. skip.\n");
> > > + continue;
> > > + }
> > > +
> > > + /* For multiple exits where we handle early exits we need to carry on
> > > + with the previous IV as loop iteration was not done because we
> > exited
> > > + early. As such just grab the original IV. */
> > > + if (STMT_VINFO_TYPE (phi_info) != undef_vec_info_type)
> > > + {
> > > + type = TREE_TYPE (gimple_phi_result (phi));
> > > + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > (phi_info);
> > > + step_expr = unshare_expr (step_expr);
> > > +
> > > + /* We previously generated the new merged phi in the same BB as
> > the
> > > + guard. So use that to perform the scaling on rather than the
> > > + normal loop phi which don't take the early breaks into account. */
> > > + init_expr = gimple_phi_result (phi1); //PHI_ARG_DEF_FROM_EDGE
> > (phi1, loop_preheader_edge (loop));
> > > +
> > > + ni = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> > > + fold_convert (TREE_TYPE (step_expr), init_expr),
> > > + build_int_cst (TREE_TYPE (step_expr), vf));
> > > +
> > > + var = create_tmp_var (type, "tmp");
> > > +
> > > + last_gsi = gsi_last_bb (exit_bb);
> > > + gimple_seq new_stmts = NULL;
> > > + ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > + /* Exit_bb shouldn't be empty. */
> > > + if (!gsi_end_p (last_gsi))
> > > + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > + else
> > > + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +
> > > + /* Fix phi expressions in the successor bb. */
> > > + adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > + }
> > > + else if (STMT_VINFO_TYPE (phi_info) == undef_vec_info_type)
> > > + {
> > > + type = TREE_TYPE (gimple_phi_result (phi));
> > > + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > (phi_info);
> > > + step_expr = unshare_expr (step_expr);
> > > +
> > > + /* We previously generated the new merged phi in the same BB as
> > the
> > > + guard. So use that to perform the scaling on rather than the
> > > + normal loop phi which don't take the early breaks into account. */
> > > + init_expr = PHI_ARG_DEF_FROM_EDGE (phi1,
> > loop_preheader_edge (loop));
> > > +
> > > + if (vf.is_constant ())
> > > + {
> > > + ni = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> > > + fold_convert (TREE_TYPE (step_expr),
> > > + niters_vector),
> > > + build_int_cst (TREE_TYPE (step_expr), vf));
> > > +
> > > + ni = fold_build2 (MINUS_EXPR, TREE_TYPE (step_expr),
> > > + fold_convert (TREE_TYPE (step_expr),
> > > + niters_orig),
> > > + fold_convert (TREE_TYPE (step_expr), ni));
> > > + }
> > > + else
> > > + /* If the loop's VF isn't constant then the loop must have been
> > > + masked, so at the end of the loop we know we have finished
> > > + the entire loop and found nothing. */
> > > + ni = build_zero_cst (TREE_TYPE (step_expr));
> > > +
> > > + gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > +
> > > + var = create_tmp_var (type, "tmp");
> > > +
> > > + last_gsi = gsi_last_bb (exit_bb);
> > > + gimple_seq new_stmts = NULL;
> > > + ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > + /* Exit_bb shouldn't be empty. */
> > > + if (!gsi_end_p (last_gsi))
> > > + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > + else
> > > + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +
> > > + adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > +
> > > + for (unsigned i = 1; i < exits.length (); i++)
> > > + adjust_phi_and_debug_stmts (phi1, exits[i],
> > > + build_int_cst (TREE_TYPE
> > (step_expr),
> > > + vf));
> > > + ivtmp = gimple_phi_result (phi1);
> > > + }
> > > + else
> > > + continue;
> > > + }
> > > +
> > > + return ivtmp;
> > > }
> > >
> > > /* Return a gimple value containing the misalignment (measured in
> vector
> > > @@ -2096,7 +2376,7 @@ vect_gen_vector_loop_niters_mult_vf
> > (loop_vec_info loop_vinfo,
> > > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > tree type = TREE_TYPE (niters_vector);
> > > tree log_vf = build_int_cst (type, exact_log2 (vf));
> > > - basic_block exit_bb = single_exit (loop)->dest;
> > > + basic_block exit_bb = normal_exit (loop)->dest;
> > >
> > > gcc_assert (niters_vector_mult_vf_ptr != NULL);
> > > tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
> > > @@ -2123,19 +2403,46 @@ find_guard_arg (class loop *loop, class loop
> > *epilog ATTRIBUTE_UNUSED,
> > > gphi *lcssa_phi)
> > > {
> > > gphi_iterator gsi;
> > > - edge e = single_exit (loop);
> > > + edge e = normal_exit (loop);
> > >
> > > - gcc_assert (single_pred_p (e->dest));
> > > for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > gphi *phi = gsi.phi ();
> > > - if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > - PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > + /* Nested loops with multiple exits can have different no# phi node
> > > + arguments between the main loop and epilog as epilog falls to the
> > > + second loop. */
> > > + if (gimple_phi_num_args (phi) > e->dest_idx
> > > + && operand_equal_p (PHI_ARG_DEF (phi, e->dest_idx),
> > > + PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > return PHI_RESULT (phi);
> > > }
> > > return NULL_TREE;
> > > }
> > >
> > > +/* Starting from the current edge walk all instructions and find the last
> > > + VUSE/VDEF in the basic block. */
> > > +
> > > +static tree
> > > +gimple_find_last_mem_use (edge e)
> > > +{
> > > + basic_block bb = e->src;
> > > + tree res = NULL;
> > > + gimple_stmt_iterator iter = gsi_last_bb (bb);
> > > + do
> > > + {
> > > + gimple *stmt = gsi_stmt (iter);
> > > + if ((res = gimple_vdef (stmt)))
> > > + return res;
> > > +
> > > + if ((res = gimple_vuse (stmt)))
> > > + return res;
> > > +
> > > + gsi_prev (&iter);
> > > + } while (!gsi_end_p (iter));
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > /* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > FIRST/SECOND
> > > from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > edge, the two loops are arranged as below:
> > > @@ -2185,6 +2492,7 @@ find_guard_arg (class loop *loop, class loop
> > *epilog ATTRIBUTE_UNUSED,
> > > static void
> > > slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > class loop *first, class loop *second,
> > > + tree *lcssa_ivtmp,
> > > bool create_lcssa_for_iv_phis)
> > > {
> > > gphi_iterator gsi_update, gsi_orig;
> > > @@ -2192,10 +2500,18 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > >
> > > edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > edge second_preheader_e = loop_preheader_edge (second);
> > > - basic_block between_bb = single_exit (first)->dest;
> > > + auto_vec<edge> exits = get_loop_exit_edges (first);
> > > + basic_block between_bb = exits[0]->dest;
> > > +
> > > + bool early_exit = LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > > + /* For early exits when we create the merge BB we must maintain it in
> > > + LCSSA form, otherwise the final vectorizer passes will create the
> > > + wrong PHI nodes here. */
> > > + create_lcssa_for_iv_phis = create_lcssa_for_iv_phis || early_exit;
> > >
> > > gcc_assert (between_bb == second_preheader_e->src);
> > > - gcc_assert (single_pred_p (between_bb) && single_succ_p
> > (between_bb));
> > > + gcc_assert ((single_pred_p (between_bb) && single_succ_p
> > (between_bb))
> > > + || early_exit);
> > > /* Either the first loop or the second is the loop to be vectorized. */
> > > gcc_assert (loop == first || loop == second);
> > >
> > > @@ -2215,10 +2531,40 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > > {
> > > tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > - add_phi_arg (lcssa_phi, arg, single_exit (first),
> > UNKNOWN_LOCATION);
> > > +
> > > + /* The first exit is always the loop latch, so handle that
> > > + seperately. */
> > > + gcc_assert (arg);
> > > + add_phi_arg (lcssa_phi, arg, exits[0], UNKNOWN_LOCATION);
> > > +
> > > + /* The early exits are processed in order starting from exit 1. */
> > > + for (unsigned i = 1; i < exits.length (); i++)
> > > + {
> > > + tree phi_arg;
> > > + if (iv_phi_p (vect_phi_info))
> > > + /* For induction values just copy the previous one as the
> > > + current iteration did not finish. We'll update as needed
> > > + later on. */
> > > + phi_arg = gimple_phi_result (orig_phi);
> > > + else
> > > + phi_arg = gimple_find_last_mem_use (exits[i]);
> > > + /* If we didn't find any just copy the existing one and leave
> > > + it to the others to fix it up. */
> > > + if (!phi_arg)
> > > + phi_arg = gimple_phi_result (orig_phi);
> > > + add_phi_arg (lcssa_phi, phi_arg, exits[i], UNKNOWN_LOCATION);
> > > + }
> > > arg = new_res;
> > > }
> > >
> > > + /* Normally able to distinguish between the iterator counter and the
> > > + ivtemps bu looking at the STMT_VINFO_TYPE of the phi node.
> > > + however for some reason this isn't consistently set. Is there a
> > > + better way??. */
> > > + if (lcssa_ivtmp
> > > + && iv_phi_p (vect_phi_info))
> > > + *lcssa_ivtmp = arg;
> > > +
> > > /* Update PHI node in the second loop by replacing arg on the loop's
> > > incoming edge. */
> > > adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> > arg);
> > > @@ -2228,7 +2574,8 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > > for correct vectorization of live stmts. */
> > > if (loop == first)
> > > {
> > > - basic_block orig_exit = single_exit (second)->dest;
> > > + auto_vec<edge> new_exits = get_loop_exit_edges (second);
> > > + basic_block orig_exit = new_exits[0]->dest;
> > > for (gsi_orig = gsi_start_phis (orig_exit);
> > > !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > {
> > > @@ -2243,7 +2590,15 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > >
> > > tree new_res = copy_ssa_name (orig_arg);
> > > gphi *lcphi = create_phi_node (new_res, between_bb);
> > > - add_phi_arg (lcphi, orig_arg, single_exit (first),
> > UNKNOWN_LOCATION);
> > > + /* The first exit is always the loop latch, so handle that
> > > + seperately. */
> > > + add_phi_arg (lcphi, orig_arg, new_exits[0],
> > UNKNOWN_LOCATION);
> > > + /* The early exits are processed in order starting from exit 1. */
> > > + for (unsigned i = 1; i < new_exits.length (); i++)
> > > + {
> > > + tree phi_arg = gimple_phi_result (orig_phi);
> > > + add_phi_arg (lcphi, phi_arg, exits[i], UNKNOWN_LOCATION);
> > > + }
> > > }
> > > }
> > > }
> > > @@ -2393,13 +2748,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > > gcc_assert (single_succ_p (merge_bb));
> > > edge e = single_succ_edge (merge_bb);
> > > basic_block exit_bb = e->dest;
> > > - gcc_assert (single_pred_p (exit_bb));
> > > - gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > >
> > > for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > gphi *update_phi = gsi.phi ();
> > > - tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > + tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > >
> > > tree merge_arg = NULL_TREE;
> > >
> > > @@ -2438,12 +2791,14 @@ static void
> > > slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > {
> > > gphi_iterator gsi;
> > > - basic_block exit_bb = single_exit (epilog)->dest;
> > > + auto_vec<edge> exits = get_loop_exit_edges (epilog);
> > >
> > > - gcc_assert (single_pred_p (exit_bb));
> > > - edge e = EDGE_PRED (exit_bb, 0);
> > > - for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > - rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > + for (unsigned i = 0; i < exits.length (); i++)
> > > + {
> > > + basic_block exit_bb = exits[i]->dest;
> > > + for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > + rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > exits[i]));
> > > + }
> > > }
> > >
> > > /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> to
> > > @@ -2621,6 +2976,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > > bound_epilog += vf - 1;
> > > if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > bound_epilog += 1;
> > > + /* For early breaks the scalar loop needs to execute at most VF times
> > > + to find the element that caused the break. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + {
> > > + bound_epilog = vf;
> > > + /* Force a scalar epilogue as we can't vectorize the index finding. */
> > > + vect_epilogues = false;
> > > + }
> > > bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > poly_uint64 bound_scalar = bound_epilog;
> > >
> > > @@ -2780,16 +3143,24 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > > bound_prolog + bound_epilog)
> > > : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > || vect_epilogues));
> > > +
> > > + /* We only support early break vectorization on known bounds at this
> > time.
> > > + This means that if the vector loop can't be entered then we won't
> > generate
> > > + it at all. So for now force skip_vector off because the additional
> control
> > > + flow messes with the BB exits and we've already analyzed them. */
> > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo);
> > > +
> > > /* Epilog loop must be executed if the number of iterations for epilog
> > > loop is known at compile time, otherwise we need to add a check at
> > > the end of vector loop and skip to the end of epilog loop. */
> > > bool skip_epilog = (prolog_peeling < 0
> > > || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > || !vf.is_constant ());
> > > - /* PEELING_FOR_GAPS is special because epilog loop must be
> executed.
> > */
> > > - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > + /* PEELING_FOR_GAPS and peeling for early breaks are special
> because
> > epilog
> > > + loop must be executed. */
> > > + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > skip_epilog = false;
> > > -
> > > class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > auto_vec<profile_count> original_counts;
> > > basic_block *original_bbs = NULL;
> > > @@ -2828,7 +3199,7 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > > if (prolog_peeling)
> > > {
> > > e = loop_preheader_edge (loop);
> > > - if (!slpeel_can_duplicate_loop_p (loop, e))
> > > + if (!slpeel_can_duplicate_loop_p (loop_vinfo, e))
> > > {
> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > "loop can't be duplicated to preheader edge.\n");
> > > @@ -2843,7 +3214,7 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > > gcc_unreachable ();
> > > }
> > > prolog->force_vectorize = false;
> > > - slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > + slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,
> NULL,
> > true);
> > > first_loop = prolog;
> > > reset_original_copy_tables ();
> > >
> > > @@ -2902,11 +3273,13 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > >
> > > if (epilog_peeling)
> > > {
> > > - e = single_exit (loop);
> > > - if (!slpeel_can_duplicate_loop_p (loop, e))
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + e = exits[0];
> > > + if (!slpeel_can_duplicate_loop_p (loop_vinfo, e))
> > > {
> > > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > - "loop can't be duplicated to exit edge.\n");
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > + "loop can't be duplicated to exit edge.\n");
> > > gcc_unreachable ();
> > > }
> > > /* Peel epilog and put it on exit edge of loop. If we are vectorizing
> > > @@ -2920,12 +3293,16 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > > epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > if (!epilog)
> > > {
> > > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > - "slpeel_tree_duplicate_loop_to_edge_cfg
> > failed.\n");
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > + "slpeel_tree_duplicate_loop_to_edge_cfg
> > failed.\n");
> > > gcc_unreachable ();
> > > }
> > > epilog->force_vectorize = false;
> > > - slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > +
> > > + tree early_break_iv_name;
> > > + slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,
> > > + &early_break_iv_name, false);
> > >
> > > /* Scalar version loop may be preferred. In this case, add guard
> > > and skip to epilog. Note this only happens when the number of
> > > @@ -2978,6 +3355,7 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > > vect_gen_vector_loop_niters (loop_vinfo, niters,
> > > niters_vector, step_vector,
> > > niters_no_overflow);
> > > +
> > > if (!integer_onep (*step_vector))
> > > {
> > > /* On exit from the loop we will have an easy way of calcalating
> > > @@ -2987,9 +3365,13 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > > SSA_NAME_DEF_STMT (niters_vector_mult_vf) =
> > gimple_build_nop ();
> > > *niters_vector_mult_vf_var = niters_vector_mult_vf;
> > > }
> > > + else if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + vect_gen_vector_loop_niters_mult_vf (loop_vinfo,
> > early_break_iv_name,
> > > + &niters_vector_mult_vf);
> > > else
> > > vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector,
> > > &niters_vector_mult_vf);
> > > +
> > > /* Update IVs of original loop as if they were advanced by
> > > niters_vector_mult_vf steps. */
> > > gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > @@ -2997,12 +3379,97 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > > vect_update_ivs_after_vectorizer (loop_vinfo,
> niters_vector_mult_vf,
> > > update_e);
> > >
> > > + /* For early breaks we must create a guard to check how many
> > iterations
> > > + of the scalar loop are yet to be performed. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + {
> > > + gcc_assert (early_break_iv_name);
> > > + tree ivtmp =
> > > + vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > + *niters_vector, update_e);
> > > +
> > > + tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > + fold_convert (TREE_TYPE (niters),
> > > + ivtmp),
> > > + build_zero_cst (TREE_TYPE (niters)));
> > > + basic_block guard_bb = normal_exit (loop)->dest;
> > > + auto_vec<edge> new_exits = get_loop_exit_edges (epilog);
> > > + /* If we had a fallthrough edge, the guard will the threaded through
> > > + and so we may need to find the actual final edge. */
> > > + edge final_edge = new_exits[0];
> > > + basic_block guard_to;
> > > + bool fn_exit_p = false;
> > > + if (gsi_end_p (gsi_start_nondebug_bb (final_edge->dest))
> > > + && !gsi_end_p (gsi_start_phis (final_edge->dest))
> > > + && single_succ_p (final_edge->dest))
> > > + {
> > > + auto gsi = gsi_start_phis (final_edge->dest);
> > > + while (!gsi_end_p (gsi))
> > > + gsi_remove (&gsi, true);
> > > + guard_to = final_edge->dest;
> > > + fn_exit_p = true;
> > > + }
> > > + else
> > > + guard_to = split_edge (normal_exit (epilog));
> > > +
> > > + edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > guard_to,
> > > + guard_bb,
> > > + prob_epilog.invert (),
> > > + irred_flag);
> > > +
> > > + basic_block dest = single_succ (guard_to);
> > > + /* If we have a single pred then the previous block is the immediate
> > > + dominator. This may or may not be the guard bb. However if we
> > > + have multiple pred then the guard BB must be the dominator as all
> > > + previous exits got rewrited to the guard BB. */
> > > + if (single_pred_p (dest))
> > > + set_immediate_dominator (CDI_DOMINATORS, dest, guard_to);
> > > + else
> > > + set_immediate_dominator (CDI_DOMINATORS, dest, guard_bb);
> > > +
> > > + /* We must update all the edges from the new guard_bb. */
> > > + slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > + final_edge);
> > > +
> > > + /* If we have an additional functione exit block, then thread the
> > updates
> > > + through to the block. Leaving it up to the LCSSA cleanup pass will
> > > + get the wrong values here as it can't handle the merge block we
> > just
> > > + made correctly. */
> > > + if (fn_exit_p)
> > > + {
> > > + gphi_iterator gsi_update, gsi_orig, gsi_vect;
> > > + for (gsi_orig = gsi_start_phis (epilog->header),
> > > + gsi_update = gsi_start_phis (guard_e->dest),
> > > + gsi_vect = gsi_start_phis (loop->header);
> > > + !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update)
> > > + && !gsi_end_p (gsi_vect);
> > > + gsi_next (&gsi_orig), gsi_next (&gsi_update),
> > > + gsi_next (&gsi_vect))
> > > + {
> > > + gphi *orig_phi = gsi_orig.phi ();
> > > + gphi *update_phi = gsi_update.phi ();
> > > + gphi *vect_phi = gsi_vect.phi ();
> > > + stmt_vec_info phi_info = loop_vinfo->lookup_stmt
> > (vect_phi);
> > > +
> > > + if (iv_phi_p (phi_info))
> > > + continue;
> > > +
> > > + tree phi_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi,
> > update_e);
> > > + SET_PHI_ARG_DEF (update_phi, update_e->dest_idx,
> > phi_arg);
> > > +
> > > + phi_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, guard_e);
> > > + SET_PHI_ARG_DEF (update_phi, guard_e->dest_idx,
> > phi_arg);
> > > + }
> > > + }
> > > + flush_pending_stmts (guard_e);
> > > + }
> > > +
> > > if (skip_epilog)
> > > {
> > > guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > niters, niters_vector_mult_vf);
> > > - guard_bb = single_exit (loop)->dest;
> > > - guard_to = split_edge (single_exit (epilog));
> > > + guard_bb = normal_exit (loop)->dest;
> > > + guard_to = split_edge (normal_exit (epilog));
> > > guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > guard_to,
> > > skip_vector ? anchor : guard_bb,
> > > prob_epilog.invert (),
> > > @@ -3010,7 +3477,7 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > > if (vect_epilogues)
> > > epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > - single_exit (epilog));
> > > + normal_exit (epilog));
> > > /* Only need to handle basic block before epilog loop if it's not
> > > the guard_bb, which is the case when skip_vector is true. */
> > > if (guard_bb != bb_before_epilog)
> > > @@ -3023,7 +3490,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > > }
> > > else
> > > slpeel_update_phi_nodes_for_lcssa (epilog);
> > > -
> > > unsigned HOST_WIDE_INT bound;
> > > if (bound_scalar.is_constant (&bound))
> > > {
> > > @@ -3114,7 +3580,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > niters, tree nitersm1,
> > >
> > > adjust_vec.release ();
> > > free_original_copy_tables ();
> > > -
> > > return vect_epilogues ? epilog : NULL;
> > > }
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> >
> d5c2bff80be9be152707eb9d3932c863948daa73..548946a6bbf8892086a17fe30
> > 03da2c3dceadf5b 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -844,80 +844,106 @@ vect_fixup_scalar_cycles_with_patterns
> > (loop_vec_info loop_vinfo)
> > > in NUMBER_OF_ITERATIONSM1. Place the condition under which the
> > > niter information holds in ASSUMPTIONS.
> > >
> > > - Return the loop exit condition. */
> > > + Return the loop exit conditions. */
> > >
> > >
> > > -static gcond *
> > > +static vec<gcond *>
> > > vect_get_loop_niters (class loop *loop, tree *assumptions,
> > > tree *number_of_iterations, tree
> > *number_of_iterationsm1)
> > > {
> > > - edge exit = single_exit (loop);
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + vec<gcond *> conds;
> > > + conds.create (exits.length ());
> > > class tree_niter_desc niter_desc;
> > > tree niter_assumptions, niter, may_be_zero;
> > > - gcond *cond = get_loop_exit_condition (loop);
> > >
> > > *assumptions = boolean_true_node;
> > > *number_of_iterationsm1 = chrec_dont_know;
> > > *number_of_iterations = chrec_dont_know;
> > > +
> > > DUMP_VECT_SCOPE ("get_loop_niters");
> > >
> > > - if (!exit)
> > > - return cond;
> > > + if (exits.is_empty ())
> > > + return conds;
> > >
> > > - may_be_zero = NULL_TREE;
> > > - if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> > NULL)
> > > - || chrec_contains_undetermined (niter_desc.niter))
> > > - return cond;
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> > > + exits.length ());
> > >
> > > - niter_assumptions = niter_desc.assumptions;
> > > - may_be_zero = niter_desc.may_be_zero;
> > > - niter = niter_desc.niter;
> > > + edge exit;
> > > + unsigned int i;
> > > + FOR_EACH_VEC_ELT (exits, i, exit)
> > > + {
> > > + gcond *cond = get_edge_condition (exit);
> > > + if (cond)
> > > + conds.safe_push (cond);
> > >
> > > - if (may_be_zero && integer_zerop (may_be_zero))
> > > - may_be_zero = NULL_TREE;
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit
> > %d...\n", i);
> > >
> > > - if (may_be_zero)
> > > - {
> > > - if (COMPARISON_CLASS_P (may_be_zero))
> > > + may_be_zero = NULL_TREE;
> > > + if (!number_of_iterations_exit_assumptions (loop, exit,
> &niter_desc,
> > NULL)
> > > + || chrec_contains_undetermined (niter_desc.niter))
> > > + continue;
> > > +
> > > + niter_assumptions = niter_desc.assumptions;
> > > + may_be_zero = niter_desc.may_be_zero;
> > > + niter = niter_desc.niter;
> > > +
> > > + if (may_be_zero && integer_zerop (may_be_zero))
> > > + may_be_zero = NULL_TREE;
> > > +
> > > + if (may_be_zero)
> > > {
> > > - /* Try to combine may_be_zero with assumptions, this can simplify
> > > - computation of niter expression. */
> > > - if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > > - niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > boolean_type_node,
> > > - niter_assumptions,
> > > - fold_build1 (TRUTH_NOT_EXPR,
> > > -
> > boolean_type_node,
> > > - may_be_zero));
> > > + if (COMPARISON_CLASS_P (may_be_zero))
> > > + {
> > > + /* Try to combine may_be_zero with assumptions, this can
> > simplify
> > > + computation of niter expression. */
> > > + if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > > + niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > boolean_type_node,
> > > + niter_assumptions,
> > > + fold_build1
> > (TRUTH_NOT_EXPR,
> > > +
> > boolean_type_node,
> > > + may_be_zero));
> > > + else
> > > + niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
> > may_be_zero,
> > > + build_int_cst (TREE_TYPE (niter), 0),
> > > + rewrite_to_non_trapping_overflow
> > (niter));
> > > +
> > > + may_be_zero = NULL_TREE;
> > > + }
> > > + else if (integer_nonzerop (may_be_zero) && i == 0)
> > > + {
> > > + *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > > + *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > > + continue;
> > > + }
> > > else
> > > - niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> > > - build_int_cst (TREE_TYPE (niter), 0),
> > > - rewrite_to_non_trapping_overflow (niter));
> > > + continue;
> > > + }
> > >
> > > - may_be_zero = NULL_TREE;
> > > - }
> > > - else if (integer_nonzerop (may_be_zero))
> > > + /* Loop assumptions are based off the normal exit. */
> > > + if (i == 0)
> > > {
> > > - *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > > - *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > > - return cond;
> > > + *assumptions = niter_assumptions;
> > > + *number_of_iterationsm1 = niter;
> > > +
> > > + /* We want the number of loop header executions which is the
> > number
> > > + of latch executions plus one.
> > > + ??? For UINT_MAX latch executions this number overflows to
> > zero
> > > + for loops like do { n++; } while (n != 0); */
> > > + if (niter && !chrec_contains_undetermined (niter))
> > > + niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > > + unshare_expr (niter),
> > > + build_int_cst (TREE_TYPE (niter), 1));
> > > + *number_of_iterations = niter;
> > > }
> > > - else
> > > - return cond;
> > > }
> > >
> > > - *assumptions = niter_assumptions;
> > > - *number_of_iterationsm1 = niter;
> > > -
> > > - /* We want the number of loop header executions which is the
> number
> > > - of latch executions plus one.
> > > - ??? For UINT_MAX latch executions this number overflows to zero
> > > - for loops like do { n++; } while (n != 0); */
> > > - if (niter && !chrec_contains_undetermined (niter))
> > > - niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr
> > (niter),
> > > - build_int_cst (TREE_TYPE (niter), 1));
> > > - *number_of_iterations = niter;
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "All loop exits
> successfully
> > analyzed.\n");
> > >
> > > - return cond;
> > > + return conds;
> > > }
> > >
> > > /* Function bb_in_loop_p
> > > @@ -1455,7 +1481,8 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> > >
> > > Verify that certain CFG restrictions hold, including:
> > > - the loop has a pre-header
> > > - - the loop has a single entry and exit
> > > + - the loop has a single entry
> > > + - nested loops can have only a single exit.
> > > - the loop exit condition is simple enough
> > > - the number of iterations can be analyzed, i.e, a countable loop. The
> > > niter could be analyzed under some assumptions. */
> > > @@ -1484,11 +1511,6 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > > |
> > > (exit-bb) */
> > >
> > > - if (loop->num_nodes != 2)
> > > - return opt_result::failure_at (vect_location,
> > > - "not vectorized:"
> > > - " control flow in loop.\n");
> > > -
> > > if (empty_block_p (loop->header))
> > > return opt_result::failure_at (vect_location,
> > > "not vectorized: empty loop.\n");
> > > @@ -1559,11 +1581,13 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > "Considering outer-loop vectorization.\n");
> > > info->inner_loop_cond = inner.loop_cond;
> > > +
> > > + if (!single_exit (loop))
> > > + return opt_result::failure_at (vect_location,
> > > + "not vectorized: multiple exits.\n");
> > > +
> > > }
> > >
> > > - if (!single_exit (loop))
> > > - return opt_result::failure_at (vect_location,
> > > - "not vectorized: multiple exits.\n");
> > > if (EDGE_COUNT (loop->header->preds) != 2)
> > > return opt_result::failure_at (vect_location,
> > > "not vectorized:"
> > > @@ -1579,21 +1603,45 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > > "not vectorized: latch block not empty.\n");
> > >
> > > /* Make sure the exit is not abnormal. */
> > > - edge e = single_exit (loop);
> > > - if (e->flags & EDGE_ABNORMAL)
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + edge nexit = normal_exit (loop);
> > > + for (edge e : exits)
> > > + {
> > > + if (e->flags & EDGE_ABNORMAL)
> > > + return opt_result::failure_at (vect_location,
> > > + "not vectorized:"
> > > + " abnormal loop exit edge.\n");
> > > + /* Early break BB must be after the main exit BB. In theory we
> should
> > > + be able to vectorize the inverse order, but the current flow in the
> > > + the vectorizer always assumes you update success PHI nodes, not
> > > + preds. */
> > > + if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src,
> e-
> > >src))
> > > + return opt_result::failure_at (vect_location,
> > > + "not vectorized:"
> > > + " abnormal loop exit edge order.\n");
> > > + }
> > > +
> > > + if (exits.length () > 2)
> > > return opt_result::failure_at (vect_location,
> > > "not vectorized:"
> > > - " abnormal loop exit edge.\n");
> > > -
> > > - info->loop_cond
> > > + " too many exits. Only 1 additional exit"
> > > + " supported.\n");
> > > + if (loop->num_nodes != 2 + exits.length () - 1)
> > > + return opt_result::failure_at (vect_location,
> > > + "not vectorized:"
> > > + " unsupported control flow in loop.\n");
> > > + info->conds
> > > = vect_get_loop_niters (loop, &info->assumptions,
> > > &info->number_of_iterations,
> > > &info->number_of_iterationsm1);
> > > - if (!info->loop_cond)
> > > +
> > > + if (info->conds.length () == 0)
> > > return opt_result::failure_at
> > > (vect_location,
> > > "not vectorized: complicated exit condition.\n");
> > >
> > > + info->loop_cond = info->conds[0];
> > > +
> > > if (integer_zerop (info->assumptions)
> > > || !info->number_of_iterations
> > > || chrec_contains_undetermined (info->number_of_iterations))
> > > @@ -1638,8 +1686,17 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> > > if (!integer_onep (info->assumptions) && !main_loop_info)
> > > LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info-
> >assumptions;
> > >
> > > - stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info-
> > >loop_cond);
> > > - STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > > + unsigned int i;
> > > + gcond *cond;
> > > + FOR_EACH_VEC_ELT (info->conds, i, cond)
> > > + {
> > > + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> > > + STMT_VINFO_TYPE (loop_cond_info) =
> loop_exit_ctrl_vec_info_type;
> > > + }
> > > +
> > > + /* Check to see if we're vectorizing multiple exits. */
> > > + LOOP_VINFO_EARLY_BREAKS (loop_vinfo) = info->conds.length () > 1;
> > > +
> > > if (info->inner_loop_cond)
> > > {
> > > stmt_vec_info inner_loop_cond_info
> > > @@ -2270,10 +2327,13 @@
> vect_determine_partial_vectors_and_peeling
> > (loop_vec_info loop_vinfo,
> > > bool need_peeling_or_partial_vectors_p
> > > = vect_need_peeling_or_partial_vectors_p (loop_vinfo);
> > >
> > > - /* Decide whether to vectorize the loop with partial vectors. */
> > > + /* Decide whether to vectorize the loop with partial vectors. Currently
> > > + early break vectorization does not support partial vectors as we have
> > > + to peel a scalar loop that we can't vectorize. */
> > > LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > > LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > > if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > + && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > && need_peeling_or_partial_vectors_p)
> > > {
> > > /* For partial-vector-usage=1, try to push the handling of partial
> > > @@ -2746,13 +2806,14 @@ start_over:
> > >
> > > /* If an epilogue loop is required make sure we can create one. */
> > > if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > - || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > + || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > {
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > required\n");
> > > if (!vect_can_advance_ivs_p (loop_vinfo)
> > > - || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP
> > (loop_vinfo),
> > > - single_exit (LOOP_VINFO_LOOP
> > > + || !slpeel_can_duplicate_loop_p (loop_vinfo,
> > > + normal_exit (LOOP_VINFO_LOOP
> > > (loop_vinfo))))
> > > {
> > > ok = opt_result::failure_at (vect_location,
> > > @@ -3239,6 +3300,8 @@ vect_analyze_loop (class loop *loop,
> > vec_info_shared *shared)
> > > "***** Choosing vector mode %s\n",
> > > GET_MODE_NAME (first_loop_vinfo->vector_mode));
> > >
> > > + loop_form_info.conds.release ();
> > > +
> > > /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> > > enabled, SIMDUID is not set, it is the innermost loop and we have
> > > either already found the loop's SIMDLEN or there was no SIMDLEN to
> > > @@ -3350,6 +3413,8 @@ vect_analyze_loop (class loop *loop,
> > vec_info_shared *shared)
> > > (first_loop_vinfo->epilogue_vinfos[0]-
> > >vector_mode));
> > > }
> > >
> > > + loop_form_info.conds.release ();
> > > +
> > > return first_loop_vinfo;
> > > }
> > >
> > > @@ -7907,6 +7972,237 @@ vect_transform_reduction (loop_vec_info
> > loop_vinfo,
> > > return true;
> > > }
> > >
> > > +/* When vectorizing early break statements instructions that happen
> > before
> > > + the early break in the current BB need to be moved to after the early
> > > + break. This function deals with that and assumes that any validaty
> > > + checks has already been performed.
> > > +
> > > + While moving the instructions if it encounters a VUSE or VDEF it then
> > > + corrects the VUSES as it moves the statements along. CHAINED
> > contains
> > > + the list of SSA_NAMES that belong to the dependency chain of the
> early
> > > + break conditional. GDEST is the location in which to insert the new
> > > + statements. GSTMT is the iterator to walk up to find statements to
> > > + consider moving. REACHING_VUSE contains the dominating VUSE
> > found so far
> > > + and CURRENT_VDEF contains the last VDEF we've seen. These are
> > updated in
> > > + pre-order and updated in post-order after moving the instruction. */
> > > +
> > > +static void
> > > +move_early_exit_stmts (hash_set<tree> *chained,
> gimple_stmt_iterator
> > *gdest,
> > > + gimple_stmt_iterator *gstmt, tree *reaching_vuse,
> > > + tree *current_vdef)
> > > +{
> > > + if (gsi_end_p (*gstmt))
> > > + return;
> > > +
> > > + gimple *stmt = gsi_stmt (*gstmt);
> > > + if (gimple_has_ops (stmt))
> > > + {
> > > + tree dest = NULL_TREE;
> > > + /* Try to find the SSA_NAME being defined. For Statements with an
> > LHS
> > > + use the LHS, if not, assume that the first argument of a call is the
> > > + value being defined. e.g. MASKED_LOAD etc. */
> > > + if (gimple_has_lhs (stmt))
> > > + {
> > > + if (is_gimple_assign (stmt))
> > > + dest = gimple_assign_lhs (stmt);
> > > + else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > + dest = gimple_call_lhs (call);
> > > + }
> > > + else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > + dest = gimple_arg (call, 0);
> > > +
> > > + /* Don't move the scalar instructions. */
> > > + bool move
> > > + = dest && (VECTOR_TYPE_P (TREE_TYPE (dest))
> > > + || POINTER_TYPE_P (TREE_TYPE (dest)));
> > > +
> > > + /* If we found the defining statement of a something that's part of
> the
> > > + chain then expand the chain with the new SSA_VARs being used. */
> > > + if (chained->contains (dest))
> > > + {
> > > + for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > + if (TREE_CODE (gimple_arg (stmt, x)) == SSA_NAME)
> > > + chained->add (gimple_arg (stmt, x));
> > > +
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "found chain %G", stmt);
> > > + update_stmt (stmt);
> > > + move = false;
> > > + }
> > > +
> > > + if (move)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "moving stmt %G", stmt);
> > > + gsi_move_before (gstmt, gdest);
> > > + gsi_prev (gdest);
> > > + tree vdef = gimple_vdef (stmt);
> > > +
> > > + /* If we've moved a VDEF, extract the defining MEM and update
> > > + usages of it. TODO: I think this may need some constraints? */
> > > + if (vdef)
> > > + {
> > > + *current_vdef = vdef;
> > > + *reaching_vuse = gimple_vuse (stmt);
> > > + imm_use_iterator imm_iter;
> > > + gimple *use_stmt;
> > > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, vdef)
> > > + {
> > > + if (!is_a <gphi *> (use_stmt))
> > > + continue;
> > > + gphi *phi_stmt = as_a <gphi *> (use_stmt);
> > > +
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "updating vuse %G", use_stmt);
> > > + for (unsigned i = 0; i < gimple_phi_num_args (phi_stmt);
> > i++)
> > > + if (gimple_phi_arg_def (phi_stmt, i) == vdef)
> > > + {
> > > + SET_USE (PHI_ARG_DEF_PTR (phi_stmt, i),
> > gimple_vuse (stmt));
> > > + break;
> > > + }
> > > + }
> > > + }
> > > + update_stmt (stmt);
> > > + }
> > > + }
> > > +
> > > + gsi_prev (gstmt);
> > > + move_early_exit_stmts (chained, gdest, gstmt, reaching_vuse,
> > current_vdef);
> > > +
> > > + if (gimple_vuse (stmt)
> > > + && reaching_vuse && *reaching_vuse
> > > + && gimple_vuse (stmt) == *current_vdef)
> > > + {
> > > + unlink_stmt_vdef (stmt);
> > > + gimple_set_vuse (stmt, *reaching_vuse);
> > > + update_stmt (stmt);
> > > + }
> > > +}
> > > +
> > > +/* Transform the definition stmt STMT_INFO of an early exit
> > > + value. */
> > > +
> > > +bool
> > > +vect_transform_early_break (loop_vec_info loop_vinfo,
> > > + stmt_vec_info stmt_info, gimple_stmt_iterator
> > *gsi,
> > > + gimple **vec_stmt, slp_tree slp_node)
> > > +{
> > > + tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> > > + int i;
> > > + int ncopies;
> > > + int vec_num;
> > > +
> > > + if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > + return false;
> > > +
> > > + gimple_match_op op;
> > > + if (!gimple_extract_op (stmt_info->stmt, &op))
> > > + gcc_unreachable ();
> > > + gcc_assert (op.code.is_tree_code ());
> > > + auto code = tree_code (op.code);
> > > +
> > > + tree vectype_in = STMT_VINFO_VECTYPE (stmt_info);
> > > + gcc_assert (vectype_in);
> > > +
> > > +
> > > + if (slp_node)
> > > + {
> > > + ncopies = 1;
> > > + vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> > > + }
> > > + else
> > > + {
> > > + ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
> > > + vec_num = 1;
> > > + }
> > > +
> > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> > > + /* Transform. */
> > > + tree new_temp = NULL_TREE;
> > > + auto_vec<tree> vec_oprnds0;
> > > + auto_vec<tree> vec_oprnds1;
> > > + tree def0;
> > > +
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "transform early-
> > exit.\n");
> > > +
> > > + /* FORNOW: Multiple types are not supported for condition. */
> > > + if (code == COND_EXPR)
> > > + gcc_assert (ncopies == 1);
> > > +
> > > +
> > > + gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > + basic_block cond_bb = gimple_bb (stmt);
> > > + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb);
> > > +
> > > + vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
> > > + op.ops[0], &vec_oprnds0, op.ops[1], &vec_oprnds1,
> > > + NULL, NULL);
> > > +
> > > + gimple *new_stmt = NULL;
> > > + tree cst_0 = build_zero_cst (truth_type_for (vectype_out));
> > > + tree cst_m1 = build_minus_one_cst (truth_type_for (vectype_out));
> > > +
> > > + FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
> > > + {
> > > + tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE };
> > > + {
> > > + tree cond = make_temp_ssa_name (truth_type_for (vectype_out),
> > NULL, "mask");
> > > + gimple *vec_cmp = gimple_build_assign (cond, code, vop[0],
> > vop[1]);
> > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, vec_cmp,
> > &cond_gsi);
> > > + if (masked_loop_p)
> > > + {
> > > + tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > > + vectype_in, i);
> > > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> > > + cond, &cond_gsi);
> > > + }
> > > +
> > > + new_temp = make_temp_ssa_name (truth_type_for
> > (vectype_out), NULL, "vexit");
> > > + gimple *vec_cond = gimple_build_assign (new_temp,
> > VEC_COND_EXPR,
> > > + cond, cst_m1, cst_0);
> > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, vec_cond,
> > &cond_gsi);
> > > + new_stmt = vec_cond;
> > > + }
> > > +
> > > + if (slp_node)
> > > + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
> > > + else
> > > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > + }
> > > +
> > > + gcc_assert (new_stmt);
> > > + tree lhs = gimple_assign_lhs (new_stmt);
> > > +
> > > + tree t = fold_build2 (NE_EXPR, boolean_type_node, lhs,
> > > + build_zero_cst (truth_type_for (vectype_out)));
> > > + t = canonicalize_cond_expr_cond (t);
> > > + gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
> > > + update_stmt (stmt);
> > > +
> > > + basic_block dest_bb = EDGE_SUCC (cond_bb, 1)->dest;
> > > + gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > > +
> > > + hash_set<tree> chained;
> > > + gimple_stmt_iterator gsi2 = gsi_for_stmt (new_stmt);
> > > + chained.add (lhs);
> > > + tree vdef;
> > > + tree vuse = gimple_vuse (new_stmt);
> > > + move_early_exit_stmts (&chained, &dest_gsi, &gsi2, &vuse, &vdef);
> > > +
> > > + if (!slp_node)
> > > + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > +
> > > + return true;
> > > +}
> > > +
> > > +
> > > +
> > > /* Transform phase of a cycle PHI. */
> > >
> > > bool
> > > @@ -8185,6 +8481,186 @@ vect_transform_cycle_phi (loop_vec_info
> > loop_vinfo,
> > > return true;
> > > }
> > >
> > > +/* This function tries to validate whether an early break vectorization
> > > + is possible for the current instruction sequence. Returns True i
> > > + possible, otherwise False.
> > > +
> > > + Requirements:
> > > + - Any memory access must be to a fixed size buffer.
> > > + - There must not be any loads and stores to the same object.
> > > + - Multiple loads are allowed as long as they don't alias.
> > > +
> > > +
> > > + Arguments:
> > > + - LOOP_VINFO: loop information for the current loop.
> > > + - CHAIN: Currently detected sequence of instructions that belong
> > > + to the current early break.
> > > + - LOADS: List of all loads found during traversal.
> > > + - BASES: List of all load datareferences found during traversal.
> > > + - GSTMT: Current position to inspect for validity. The sequence
> > > + will be moved upwards from this point. */
> > > +
> > > +static bool
> > > +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree>
> > *chain,
> > > + vec<tree> *loads, vec<data_reference *> *bases,
> > > + gimple_stmt_iterator *gstmt)
> > > +{
> > > + if (gsi_end_p (*gstmt))
> > > + return true;
> > > +
> > > + gimple *stmt = gsi_stmt (*gstmt);
> > > + if (gimple_has_ops (stmt))
> > > + {
> > > + tree dest = NULL_TREE;
> > > + /* Try to find the SSA_NAME being defined. For Statements with an
> > LHS
> > > + use the LHS, if not, assume that the first argument of a call is the
> > > + value being defined. e.g. MASKED_LOAD etc. */
> > > + if (gimple_has_lhs (stmt))
> > > + {
> > > + if (is_gimple_assign (stmt))
> > > + dest = gimple_assign_lhs (stmt);
> > > + else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > + dest = gimple_call_lhs (call);
> > > + }
> > > + else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > + dest = gimple_arg (call, 0);
> > > +
> > > + /* Don't move the scalar instructions. */
> > > + bool move
> > > + = dest && (VECTOR_TYPE_P (TREE_TYPE (dest))
> > > + || POINTER_TYPE_P (TREE_TYPE (dest)));
> > > +
> > > + /* If we found the defining statement of a something that's part of
> the
> > > + chain then expand the chain with the new SSA_VARs being used. */
> > > + if (chain->contains (dest))
> > > + {
> > > + for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > + if (TREE_CODE (gimple_arg (stmt, x)) == SSA_NAME)
> > > + chain->add (gimple_arg (stmt, x));
> > > +
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "found chain %G", stmt);
> > > +
> > > + move = false;
> > > + }
> > > +
> > > + stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > + if (!stmt_vinfo)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "early breaks only supported. Unknown"
> > > + " statement: %G", stmt);
> > > + return false;
> > > + }
> > > +
> > > + auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > + if (dr_ref)
> > > + {
> > > + /* We currenly only support statically allocated objects due to
> > > + not having first-faulting loads support or peeling for alignment
> > > + support. Compute the isize of the referenced object (it could be
> > > + dynamically allocated). */
> > > + tree obj = DR_BASE_ADDRESS (dr_ref);
> > > + if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > + "early breaks only supported on statically"
> > > + " allocated objects.\n");
> > > + return false;
> > > + }
> > > +
> > > + tree refop = TREE_OPERAND (obj, 0);
> > > + tree refbase = get_base_address (refop);
> > > + if (!refbase || !DECL_P (refbase)
> > > + || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > + "early breaks only supported on statically"
> > > + " allocated objects.\n");
> > > + return false;
> > > + }
> > > +
> > > + if (!move && DR_IS_READ (dr_ref))
> > > + {
> > > + loads->safe_push (dest);
> > > + bases->safe_push (dr_ref);
> > > + }
> > > + else if (DR_IS_WRITE (dr_ref))
> > > + {
> > > + for (auto dr : bases)
> > > + if (same_data_refs_base_objects (dr, dr_ref))
> > > + return false;
> > > + }
> > > + }
> > > +
> > > + if (move)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "analyzing stmt %G", stmt);
> > > +
> > > + for (tree ref : loads)
> > > + if (stmt_may_clobber_ref_p (stmt, ref, true))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > + "early breaks not supported as memory
> > used"
> > > + " may alias.\n");
> > > + return false;
> > > + }
> > > + }
> > > + }
> > > +
> > > + gsi_prev (gstmt);
> > > + return validate_early_exit_stmts (loop_vinfo, chain, loads, bases,
> > gstmt);
> > > +}
> > > +
> > > +/* Check to see if the current early break given in STMT_INFO is valid for
> > > + vectorization. */
> > > +
> > > +bool
> > > +vectorizable_early_exit (vec_info *vinfo,
> > > + stmt_vec_info stmt_info, slp_tree /* slp_node */,
> > > + slp_instance /* slp_node_instance */,
> > > + stmt_vector_for_cost * /* cost_vec */)
> > > +{
> > > + loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +
> > > + if (!loop_vinfo
> > > + || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > + return false;
> > > +
> > > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> > > + return false;
> > > +
> > > + tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > + tree truth_type = truth_type_for (vectype);
> > > +
> > > + auto optab = direct_optab_handler (cbranch_optab, TYPE_MODE
> > (truth_type));
> > > + if (optab == CODE_FOR_nothing)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "can't vectorize early exit because the "
> > > + "target doesn't support flag setting vector "
> > > + "comparisons.\n");
> > > + return false;
> > > + }
> > > +
> > > + hash_set<tree> chain;
> > > + auto_vec<tree> loads;
> > > + auto_vec<data_reference *> bases;
> > > +
> > > + gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > + gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > > +
> > > + return validate_early_exit_stmts (loop_vinfo, &chain, &loads, &bases,
> > &gsi);
> > > +}
> > > +
> > > /* Vectorizes LC PHIs. */
> > >
> > > bool
> > > @@ -9993,13 +10469,24 @@ vectorizable_live_operation (vec_info
> *vinfo,
> > > new_tree = lane_extract <vec_lhs', ...>;
> > > lhs' = new_tree; */
> > >
> > > + /* When vectorizing an early break, any live statement that is used
> > > + outside of the loop are dead. The loop will never get to them.
> > > + We could change the liveness value during analysis instead but since
> > > + the below code is invalid anyway just ignore it during codegen. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + return true;
> > > +
> > > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > - basic_block exit_bb = single_exit (loop)->dest;
> > > + basic_block exit_bb = normal_exit (loop)->dest;
> > > gcc_assert (single_pred_p (exit_bb));
> > >
> > > tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > > gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > > + /* For early exits we need to compute the right exit. The current
> > > + approach punts to a scalar loop instead. If we were to vectorize
> > > + the exit condition below needs to take into account the difference
> > > + between a `break` edge and a `return` edge. */
> > > + SET_PHI_ARG_DEF (phi, normal_exit (loop)->dest_idx, vec_lhs);
> > >
> > > gimple_seq stmts = NULL;
> > > tree new_tree;
> > > @@ -10438,7 +10925,8 @@ scale_profile_for_vect_loop (class loop
> *loop,
> > unsigned vf)
> > > scale_loop_frequencies (loop, p);
> > > }
> > >
> > > - edge exit_e = single_exit (loop);
> > > + edge exit_e = normal_exit (loop);
> > > +
> > > exit_e->probability = profile_probability::always () / (new_est_niter +
> 1);
> > >
> > > edge exit_l = single_pred_edge (loop->latch);
> > > @@ -10787,7 +11275,7 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > > /* Make sure there exists a single-predecessor exit bb. Do this before
> > > versioning. */
> > > edge e = single_exit (loop);
> > > - if (! single_pred_p (e->dest))
> > > + if (e && ! single_pred_p (e->dest))
> > > {
> > > split_loop_exit_edge (e, true);
> > > if (dump_enabled_p ())
> > > @@ -10813,7 +11301,7 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > > if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > {
> > > e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > - if (! single_pred_p (e->dest))
> > > + if (e && ! single_pred_p (e->dest))
> > > {
> > > split_loop_exit_edge (e, true);
> > > if (dump_enabled_p ())
> > > @@ -11146,7 +11634,8 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >
> > > /* Loops vectorized with a variable factor won't benefit from
> > > unrolling/peeling. */
> > > - if (!vf.is_constant ())
> > > + if (!vf.is_constant ()
> > > + && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > {
> > > loop->unroll = 1;
> > > if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> >
> 4e0d75e0d7586ad57a37850d8a70f6182ecb13d0..4f9446a5c699288be093c556e
> > c527e87cf788317 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -342,9 +342,28 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > loop_vec_info loop_vinfo,
> > > *live_p = false;
> > >
> > > /* cond stmt other than loop exit cond. */
> > > - if (is_ctrl_stmt (stmt_info->stmt)
> > > - && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > - *relevant = vect_used_in_scope;
> > > + if (is_ctrl_stmt (stmt_info->stmt))
> > > + {
> > > + /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> but
> > > + it looks like loop_manip doesn't do that.. So we have to do it
> > > + the hard way. */
> > > + basic_block bb = gimple_bb (stmt_info->stmt);
> > > + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> > > + edge exit = BRANCH_EDGE (bb);
> > > + unsigned nbbs = loop->num_nodes;
> > > + bool exit_bb = true;
> > > + for (unsigned i = 0; i < nbbs; i++)
> > > + {
> > > + if (exit->dest == bbs[i])
> > > + {
> > > + exit_bb = false;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (exit_bb)
> > > + *relevant = vect_used_in_scope;
> > > + }
> > >
> > > /* changing memory. */
> > > if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > @@ -357,6 +376,11 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > loop_vec_info loop_vinfo,
> > > *relevant = vect_used_in_scope;
> > > }
> > >
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + auto_bitmap exit_bbs;
> > > + for (edge exit : exits)
> > > + bitmap_set_bit (exit_bbs, exit->dest->index);
> > > +
> > > /* uses outside the loop. */
> > > FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > SSA_OP_DEF)
> > > {
> > > @@ -375,7 +399,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> > > /* We expect all such uses to be in the loop exit phis
> > > (because of loop closed form) */
> > > gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > - gcc_assert (bb == single_exit (loop)->dest);
> > > + gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > >
> > > *live_p = true;
> > > }
> > > @@ -1845,7 +1869,7 @@ check_load_store_for_partial_vectors
> > (loop_vec_info loop_vinfo, tree vectype,
> > > MASK_TYPE is the type of both masks. If new statements are needed,
> > > insert them before GSI. */
> > >
> > > -static tree
> > > +tree
> > > prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree
> > loop_mask,
> > > tree vec_mask, gimple_stmt_iterator *gsi)
> > > {
> > > @@ -11158,11 +11182,14 @@ vect_analyze_stmt (vec_info *vinfo,
> > > node_instance, cost_vec);
> > > if (!res)
> > > return res;
> > > - }
> > > + }
> > > + else if (is_ctrl_stmt (stmt_info->stmt))
> > > + STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > >
> > > switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > {
> > > case vect_internal_def:
> > > + case vect_early_exit_def:
> > > break;
> > >
> > > case vect_reduction_def:
> > > @@ -11195,6 +11222,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > {
> > > gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > + || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > || (call && gimple_call_lhs (call) == NULL_TREE));
> > > *need_to_vectorize = true;
> > > }
> > > @@ -11237,7 +11265,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > > || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > > stmt_info, NULL, node)
> > > || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > - stmt_info, NULL, node, cost_vec));
> > > + stmt_info, NULL, node, cost_vec)
> > > + || vectorizable_early_exit (vinfo, stmt_info,
> > > + node, node_instance, cost_vec));
> > > else
> > > {
> > > if (bb_vinfo)
> > > @@ -11260,7 +11290,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > > NULL, NULL, node, cost_vec)
> > > || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> > > cost_vec)
> > > - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > > + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > > + || vectorizable_early_exit (vinfo, stmt_info, node,
> > > + node_instance, cost_vec));
> > > +
> > > }
> > >
> > > if (node)
> > > @@ -11418,6 +11451,12 @@ vect_transform_stmt (vec_info *vinfo,
> > > gcc_assert (done);
> > > break;
> > >
> > > + case loop_exit_ctrl_vec_info_type:
> > > + done = vect_transform_early_break (as_a <loop_vec_info> (vinfo),
> > stmt_info,
> > > + gsi, &vec_stmt, slp_node);
> > > + gcc_assert (done);
> > > + break;
> > > +
> > > default:
> > > if (!STMT_VINFO_LIVE_P (stmt_info))
> > > {
> > > @@ -11816,6 +11855,9 @@ vect_is_simple_use (tree operand, vec_info
> > *vinfo, enum vect_def_type *dt,
> > > case vect_first_order_recurrence:
> > > dump_printf (MSG_NOTE, "first order recurrence\n");
> > > break;
> > > + case vect_early_exit_def:
> > > + dump_printf (MSG_NOTE, "early exit\n");
> > > + break;
> > > case vect_unknown_def_type:
> > > dump_printf (MSG_NOTE, "unknown\n");
> > > break;
> > > @@ -12486,6 +12528,8 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > > *nunits_vectype_out = NULL_TREE;
> > >
> > > if (gimple_get_lhs (stmt) == NULL_TREE
> > > + /* Allow vector conditionals through here. */
> > > + && !is_ctrl_stmt (stmt)
> > > /* MASK_STORE has no lhs, but is ok. */
> > > && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > {
> > > @@ -12502,7 +12546,7 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > > }
> > >
> > > return opt_result::failure_at (stmt,
> > > - "not vectorized: irregular stmt.%G", stmt);
> > > + "not vectorized: irregular stmt: %G", stmt);
> > > }
> > >
> > > tree vectype;
> > > @@ -12531,6 +12575,8 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > > scalar_type = TREE_TYPE (DR_REF (dr));
> > > else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > + else if (is_ctrl_stmt (stmt))
> > > + scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
> > > else
> > > scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > >
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> >
> 016961da8510ca7dd2d07e716cbe35623ed2d9a5..edbb7228d3aae29b6f51fdab
> > 284f49ac57c6612d 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > vect_internal_def,
> > > vect_induction_def,
> > > vect_reduction_def,
> > > + vect_early_exit_def,
> > > vect_double_reduction_def,
> > > vect_nested_cycle,
> > > vect_first_order_recurrence,
> > > @@ -836,6 +837,10 @@ public:
> > > we need to peel off iterations at the end to form an epilogue loop. */
> > > bool peeling_for_niter;
> > >
> > > + /* When the loop has early breaks that we can vectorize we need to
> peel
> > > + the loop for the break finding loop. */
> > > + bool early_breaks;
> > > +
> > > /* True if there are no loop carried data dependencies in the loop.
> > > If loop->safelen <= 1, then this is always true, either the loop
> > > didn't have any loop carried data dependencies, or the loop is being
> > > @@ -921,6 +926,7 @@ public:
> > > #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains
> > > #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps
> > > #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter
> > > +#define LOOP_VINFO_EARLY_BREAKS(L) (L)->early_breaks
> > > #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > >no_data_dependencies
> > > #define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop
> > > #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)-
> > >scalar_loop_scaling
> > > @@ -970,7 +976,7 @@ public:
> > > typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > >
> > > static inline loop_vec_info
> > > -loop_vec_info_for_loop (class loop *loop)
> > > +loop_vec_info_for_loop (const class loop *loop)
> > > {
> > > return (loop_vec_info) loop->aux;
> > > }
> > > @@ -2107,7 +2113,7 @@ class auto_purge_vect_location
> > > in tree-vect-loop-manip.cc. */
> > > extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > tree, tree, tree, bool);
> > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > const_edge);
> > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > const_edge);
> > > class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > class loop *, edge);
> > > class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > @@ -2306,6 +2312,7 @@ struct vect_loop_form_info
> > > tree number_of_iterations;
> > > tree number_of_iterationsm1;
> > > tree assumptions;
> > > + vec<gcond *> conds;
> > > gcond *loop_cond;
> > > gcond *inner_loop_cond;
> > > };
> > > @@ -2326,6 +2333,9 @@ extern bool vectorizable_induction
> > (loop_vec_info, stmt_vec_info,
> > > extern bool vect_transform_reduction (loop_vec_info, stmt_vec_info,
> > > gimple_stmt_iterator *,
> > > gimple **, slp_tree);
> > > +extern bool vect_transform_early_break (loop_vec_info,
> stmt_vec_info,
> > > + gimple_stmt_iterator *,
> > > + gimple **, slp_tree);
> > > extern bool vect_transform_cycle_phi (loop_vec_info, stmt_vec_info,
> > > gimple **,
> > > slp_tree, slp_instance);
> > > @@ -2335,6 +2345,11 @@ extern bool vectorizable_phi (vec_info *,
> > stmt_vec_info, gimple **, slp_tree,
> > > stmt_vector_for_cost *);
> > > extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
> > > gimple **, slp_tree, stmt_vector_for_cost
> > *);
> > > +extern bool vectorizable_early_exit (vec_info *, stmt_vec_info,
> > > + slp_tree, slp_instance,
> > > + stmt_vector_for_cost *);
> > > +extern tree prepare_vec_mask (loop_vec_info, tree, tree,
> > > + tree, gimple_stmt_iterator *);
> > > extern bool vect_emulated_vector_p (tree);
> > > extern bool vect_can_vectorize_without_simd_p (tree_code);
> > > extern bool vect_can_vectorize_without_simd_p (code_helper);
> > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > index
> >
> 6ec49511d74bd2e0e5dd51823a6c41180f08716c..4aa46c7c0d8235d3b783ce930
> > e5df3480e1b3ef9 100644
> > > --- a/gcc/tree-vectorizer.cc
> > > +++ b/gcc/tree-vectorizer.cc
> > > @@ -1382,7 +1382,9 @@ pass_vectorize::execute (function *fun)
> > > predicates that need to be shared for optimal predicate usage.
> > > However reassoc will re-order them and prevent CSE from working
> > > as it should. CSE only the loop body, not the entry. */
> > > - bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > + auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > + for (edge exit : exits)
> > > + bitmap_set_bit (exit_bbs, exit->dest->index);
> > >
> > > edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > do_rpo_vn (fun, entry, exit_bbs);
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg,
> > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > Moerman;
> > HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2022-11-19 10:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-02 14:45 Tamar Christina
2022-11-02 14:46 ` [PATCH 2/2]AArch64 Add implementation for vector cbranch Tamar Christina
2022-11-02 21:50 ` [PATCH 1/2]middle-end: Support early break/return auto-vectorization Bernhard Reutner-Fischer
2022-11-02 22:32 ` Jeff Law
2022-11-03 8:51 ` Tamar Christina
2022-11-08 17:36 ` Tamar Christina
2022-11-15 11:11 ` Tamar Christina
2022-11-16 12:17 ` Richard Biener
2022-11-16 18:52 ` Jeff Law
2022-11-18 15:04 ` Richard Biener
2022-11-18 18:23 ` Tamar Christina
2022-11-19 10:49 ` Tamar Christina [this message]
2022-11-24 9:02 ` Richard Biener
2022-11-24 11:56 ` Tamar Christina
2022-11-25 9:33 ` Richard Biener
2022-11-25 10:32 ` Tamar Christina
2022-12-13 15:01 ` Tamar Christina
2022-12-14 9:41 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB53257504086883F81530325BFF089@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).