public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.
Date: Fri, 25 Nov 2022 10:32:00 +0000	[thread overview]
Message-ID: <VI1PR08MB5325E9813211094CF6007080FF0E9@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2211250915240.7009@jbgna.fhfr.qr>

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, November 25, 2022 9:33 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-
> vectorization.
> 
> On Thu, 24 Nov 2022, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, November 24, 2022 9:03 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > > <Richard.Sandiford@arm.com>
> > > Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-
> > > vectorization.
> > >
> > > On Fri, 18 Nov 2022, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Friday, November 18, 2022 3:04 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > > > > <Richard.Sandiford@arm.com>
> > > > > Subject: Re: [PATCH 1/2]middle-end: Support early break/return
> auto-
> > > > > vectorization.
> > > > >
> > > > > On Wed, 2 Nov 2022, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This patch adds initial support for early break vectorization in GCC.
> > > > > > The support is added for any target that implements a vector
> cbranch
> > > > > optab.
> > > > >
> > > > > I'm looking at this now, first some high-level questions.
> > > > >
> > > > > Why do we need a new cbranch optab?  It seems implementing
> > > > > a vector comparison and mask test against zero sufficies?
> > > >
> > > >  It doesn't define a new optab, It's just using the existing cbranch optab
> to
> > > check
> > > > that the target can handle a vector comparison with 0 in a branch
> > > statement.
> > > >
> > > > Note that it doesn't generate a call to this optab, GIMPLE expansion
> already
> > > will.
> > >
> > > Ah, OK.  I see expansion of if (vector != vector) goes the cbranch way
> > > only.
> > >
> > > > The reason I don't check against just comparison with 0 and equality is
> that
> > > > Typically speaking a vector comparison with 0 is not expected to set a
> flag.
> > > >  i.e. typically it results in just a vector of Booleans.
> > >
> > > On x86 SSE 4.1 ptest just sets flags though.
> >
> > Yeah, so does the SVEs, what I meant is that because the vector compare
> already
> > sets flags so when we generate the ptest from the cbranch we eliminate it
> later
> > during RTL if it's redundant. Which is why the SVE code in my example
> doesn't
> > have a ptest anymore.  But it's only redundant if the check is against 0, or
> the
> > size of the lanes of the mask vector match up to that of the comparison.
> >
> > But this is quite SVE specific and instead of encoding it in GIMPLE, we do so
> in RTL
> > and optimize.
> >
> > > > A vector compare with 0 in a branch will be lowered to cbranch today so
> I
> > > just
> > > > use the optab to see that the target can handle this branching and leave
> > > > it up to the target to do however it decides.
> > > >
> > > > The alternative would require me (I think) to reduce to a scalar for the
> > > equality
> > > > check as you mentioned, but such codegen would be worse for targets
> like
> > > SVE
> > > > which has native support for this operation.  We'd have to undo the
> > > reduction during RTL.
> > > >
> > > > Even for targets like NEON, we'd have to replace the reduction code,
> > > because we
> > > > can generate better code by doing the reduction using pairwise
> > > instructions.
> > > >
> > > > These kinds of differences today are handled by cbranch_optab already
> so
> > > it seemed
> > > > better to just re-use it.
> > >
> > > Yes, agreed.
> > >
> > > Btw, the C vector extension doesn't allow if (vector != vector) and
> > > vector lowering doesn't support lowering that if the target doesn't
> > > support it (instead we ICE).
> > >
> > > > >
> > > > > You have some elaborate explanation on how peeling works but I
> > > > > somewhat miss the high-level idea how to vectorize the early
> > > > > exit.  I've applied the patches and from looking at how
> > > > > vect-early-break_1.c gets transformed on aarch64 it seems you
> > > > > vectorize
> > > > >
> > > > >  for (int i = 0; i < N; i++)
> > > > >  {
> > > > >    vect_b[i] = x + i;
> > > > >    if (vect_a[i] > x)
> > > > >      break;
> > > > >    vect_a[i] = x;
> > > > >  }
> > > > >
> > > > > as
> > > > >
> > > > >  for (int i = 0; i < N;)
> > > > >  {
> > > > >    if (any (vect_a[i] > x))
> > > > >      break;
> > > > >    i += VF;
> > > > >    vect_b[i] = x + i;
> > > > >    vect_a[i] = x;
> > > > >  }
> > > > >  for (; i < N; i++)
> > > > >  {
> > > > >    vect_b[i] = x + i;
> > > > >    if (vect_a[i] > x)
> > > > >      break;
> > > > >    vect_a[i] = x;
> > > > >  }
> > > > >
> > > > > As you outline below this requires that the side-effects done as part
> > > > > of <statements1> and <condition> before exiting can be moved after
> the
> > > > > exit, basically you need to be able to compute whether any scalar
> > > > > iteration covered by a vector iteration will exit the loop early.
> > > > > Code generation wise you'd simply "ignore" code generating early
> exits
> > > > > at the place they appear in the scalar code and instead emit them
> > > > > vectorized in the loop header.
> > > >
> > > > Indeed, This is how it's handled today.  For fully masked loops we can
> do
> > > better
> > > > and would be a future expansion, but this codegen is simpler to support
> > > today
> > > > and is beneficial to all targets.
> > > >
> > > > It also has the benefit that complicated reduction we don't support
> today
> > > don't
> > > > abort vectorization because we just punt to scalar. E.g. today we bail
> out
> > > on:
> > > >
> > > > if (a[i] > x)
> > > > {
> > > >  b = a[i];
> > > >  c = i;
> > > > }
> > > >
> > > > But
> > > >
> > > > if (a[i] > x)
> > > > {
> > > >  b = a[i];
> > > >  c = i;
> > > >  break;
> > > > }
> > > >
> > > > Works fine.  For fully masked loops, Richard's design with multiple
> rgroups
> > > would
> > > > Allow us to handle these things better without the scalar loop should
> we
> > > want to
> > > > In the future. The current design doesn't prohibit this choice in the
> future.
> > > >
> > > > >
> > > > > > Concretely the kind of loops supported are of the forms:
> > > > > >
> > > > > >  for (int i = 0; i < N; i++)
> > > > > >  {
> > > > > >    <statements1>
> > > > > >    if (<condition>)
> > > > > >      <action>;
> > > > > >    <statements2>
> > > > > >  }
> > > > > >
> > > > > > where <action> can be:
> > > > > >  - break
> > > > > >  - return
> > > > > >
> > > > > > Any number of statements can be used before the <action> occurs.
> > > > > >
> > > > > > Since this is an initial version for GCC 13 it has the following
> limitations:
> > > > > >
> > > > > > - Only fixed sized iterations and buffers are supported.  That is to
> say
> > > any
> > > > > >   vectors loaded or stored must be to statically allocated arrays with
> > > known
> > > > > >   sizes. N must also be known.
> > > > >
> > > > > Why?
> > > >
> > > > Not an intrinsic limitation, just one done for practicality and to keep the
> > > patch
> > > > simpler.  These cases were most of the cases that we wanted.
> > > >
> > > > Supporting this requires adding support for multiple-exits to all the
> > > different peeling
> > > > and versioning code at once, which would be a much bigger patch.
> > > >
> > > > Additionally for SVE (the main target of the codegen change) we'd want
> to
> > > do this
> > > > using first faulting loads,  but there's a dependency to other things we
> must
> > > > support both in GIMPLE itself and the vectorizer before we can do this.
> > >
> > > But you do support epilogue peeling if the statically known N isn't
> > > dividable by the VF.  So I fail to see how a non-constant N fails to work?
> >
> > So today the vectorizer itself doesn't support alignment peeling for VLA
> code.
> > Take for instance:
> >
> > void
> > f (unsigned short *x, int n)
> > {
> >   for (int i = 0; i < n; ++i)
> >     x[i] += x[i - 16];
> > }
> >
> > And if you compile for a target that requires strict alignment e.g.:
> >
> > -O3 -march=armv9-a -mtune=thunderx
> >
> > You get NEON code because it can't do the peeling.  We do however
> support it
> > For VLS code, i.e. -O3 -march=armv9-a -msve-vector-bits=512 -
> mtune=thunderx
> > The alignment will happen, though not through peeling but masking the
> first
> > iteration of the loop with a predicate that will do the alignment.
> >
> > The reason this isn't a big problem for VLS code is that SVE cores typically
> don't
> > have a misalignment penalty for loads.
> >
> > Which brings us to..
> >
> > > Or maybe I misunderstood and the requirement is that there is at least
> > > one counting IV we can compute number of iterations for?
> > >
> >
> > Correct, so for SVE today only counting loops are safe since you can
> statically
> > compute the mask for the final iteration so you don't fault on the final read
> by
> > overreading.
> >
> > For non-counting loops we have a separate mechanism to handle this (First
> > Faulting Loads) but we don't support this today in the vectorizer either.
> >
> > Now as for this patch, the only changes that would be needed are in how
> we
> > Update the IVs in the guard between the two loops.  So this would be easy
> to
> > support when we do add SVE support for it.   And we'd add support for VLA
> and
> > non-VLA at the same time, but today the main target for this work is SVE.
> Though
> > we wanted to do it in a way that would also benefit non-fully masked
> targets.
> >
> > > > >
> > > > > > - any stores in <statements1> should not be to the same objects as
> in
> > > > > >   <condition>.  Loads are fine as long as they don't have the
> possibility to
> > > > > >   alias.
> > > > >
> > > > > I think that's a fundamental limitation - you have to be able to
> compute
> > > > > the early exit condition at the beginning of the vectorized loop.  For
> > > > > a single alternate exit it might be possible to apply loop rotation to
> > > > > move things but that can introduce "bad" cross-iteration
> dependences(?)
> > > > >
> > > >
> > > > That's an interesting idea, I'd have to work it out on paper.  I guess the
> main
> > > > difficulty compared to say classical loop rotation is that the condition
> inside
> > > > the early break statement can itself be dependent on other statement.
> So
> > > > you still have to move a "chain" of statements which themselves still
> need
> > > to
> > > > be vectorized.
> > > >
> > > > Where it gets difficult, and partially why I also only support 1 early exit
> in
> > > this first
> > > > version is that a second exit has a dependency on the 1st one. And
> there
> > > may
> > > > be other statements between the first and second exit.  This is where I
> > > think
> > > > loop rotation would fall apart vs the code motion I'm doing now.
> > >
> > > Possibly, I didn't fully work out how loop rotation could help but note
> > > that we already apply loop rotation via loop header copying that for
> > > multiple exit loops might select the "wrong" exit - you might want to
> > > check whether it makes sense, at least for pass_ch_vect, to arrange for
> > > the counting IV to be the loop controlling one.  That might already do
> > > the trick of the loop rotation (I just checked the pass_ch_vect pass
> > > only processes single-exit loops).
> >
> > That's a fair point.  I'll have a look and see if I need any modifications there
> > or if the new logic during loop analysis in the vectorizer suffices (particularly
> > is pass_ch_vect is bailing out.). I guess the concern here is that if you have
> > multiple IVs and one is non-counting that having the non-counting be the
> main
> > could be problematic.
> >
> > > > > > - No support for prologue peeling.  Since we only support fixed
> buffers
> > > this
> > > > > >   wouldn't be an issue as we assume the arrays are correctly aligned.
> > > > >
> > > > > Huh, I don't understand how prologue or epilogue peeling is an issue?
> Is
> > > > > that just because you didn't handle the early exit triggering?
> > > >
> > > > Yeah, it's not an intrinsic limitation, and the code implemented doesn't
> > > have
> > > > anything that would prevent this from happening in the future.  It's just
> > > something
> > > > we didn't require for the current use-cases.
> > > >
> > > > To support this we'd "just" need to support prologue peeling by
> branching
> > > to
> > > > the exit block, but we'd have to split the exit block so we keep simple
> two
> > > > argument phi nodes for each peeled iteration. i.e. I don't think they can
> all
> > > exit
> > > > to the same block (do we support phi nodes with N entries?) as I don't
> > > think
> > > > we'd be able to handle that reduction.
> > >
> > > Yes, PHI nodes can have an arbitrary number of incoming edges.
> >
> > Aha, that's good to know!
> >
> > >
> > > > So I know how to potentially do it,
> > > > and kept it in mind in the implementation, but just for practicality/time
> > > > did not do it at this time.
> > > >
> > > > >
> > > > > > - Fully masked loops or unmasked loops are supported, but not
> partially
> > > > > masked
> > > > > >   loops.
> > > > > > - Only one additional exit is supported at this time.  The majority of
> the
> > > > > code
> > > > > >   will handle n exits. But not all so at this time this restriction is
> needed.
> > > > > > - The early exit must be before the natural loop exit/latch.  The
> > > vectorizer is
> > > > > >   designed in way to propage phi-nodes downwards.  As such
> > > supporting
> > > > > this
> > > > > >   inverted control flow is hard.
> > > > >
> > > > > How do you identify the "natural" exit?  It's the one
> > > > > number_of_iterations_exit works on?  Your normal_exit picks the
> > > > > first from the loops recorded exit list but I don't think that list
> > > > > is ordered in any particular way.
> > > >
> > > > Ah thought it was since during the loop analysis it's always the first exit.
> > > > But can easily update the patch to determine that in a smarter way.
> > >
> > > That would be appreciated.
> > >
> > > > >
> > > > > "normal_exit" would rather be single_countable_exit () or so?  A loop
> > > > > already has a list of control_ivs (not sure if we ever have more than
> > > > > one), I wonder if that can be annotated with the corresponding exit
> > > > > edge?
> > > > >
> > > > > I think that vect_analyze_loop_form should record the counting IV
> > > > > exit edge and that recorded edge should be passed to utilities
> > > > > like slpeel_can_duplicate_loop_p rather than re-querying
> 'normal_exit',
> > > > > for example if we'd have
> > > > >
> > > > > for (;; ++i, ++j)
> > > > >   {
> > > > >     if (i < n)
> > > > >       break;
> > > > >     a[i] = 0;
> > > > >     if (j < m)
> > > > >       break;
> > > > >   }
> > > > >
> > > > > which counting IV we choose as "normal" should be up to the
> vectorizer,
> > > > > not up to the loop infrastructure.
> > > >
> > > > Ah, That's a fair enough point and easy enough to do.
> > > >
> > > > >
> > > > > The patch should likely be split, doing single_exit () replacements
> > > > > with, say, LOOP_VINFO_IV_EXIT (..) first.
> > > > >
> > > >
> > > > Ok, I'll start doing that now while waiting for the full review.
> > > >
> > > > >
> > > > > > - No support for epilogue vectorization.  The only epilogue
> supported is
> > > the
> > > > > >   scalar final one.
> > > > > >
> > > > > > With the help of IPA this still gets hit quite often.  During bootstrap it
> > > > > > hit rather frequently as well.
> > > > > >
> > > > > > This implementation does not support completely handling the
> early
> > > break
> > > > > inside
> > > > > > the vector loop itself but instead supports adding checks such that if
> we
> > > > > know
> > > > > > that we have to exit in the current iteration then we branch to scalar
> > > code
> > > > > to
> > > > > > actually do the final VF iterations which handles all the code in
> <action>.
> > > > > >
> > > > > > niters analysis and the majority of the vectorizer with hardcoded
> > > > > single_exit
> > > > > > have been updated with the use of a new function normal_exit
> which
> > > > > returns the
> > > > > > loop's natural exit.
> > > > > >
> > > > > > for niters the natural exit is still what determines the overall
> iterations
> > > as
> > > > > > that is the O(iters) for the loop.
> > > > > >
> > > > > > For the scalar loop we know that whatever exit you take you have
> to
> > > > > perform at
> > > > > > most VF iterations.
> > > > > >
> > > > > > When the loop is peeled during the copying I have to go through
> great
> > > > > lengths to
> > > > > > keep the dominators up to date.  All exits from the first loop are
> > > rewired to
> > > > > the
> > > > > > loop header of the second loop.  But this can change the immediate
> > > > > dominator.
> > > > >
> > > > > Not sure how - it would probably help to keep the original scalar loop
> > > > > as the epilogue and instead emit the vector loop as copy on that loops
> > > > > entry edge so wiring the alternate exits to that very same place is
> > > > > trivial?
> > > >
> > > > Hmm yes flipping the loop wiring would simplify the dominators. I did it
> this
> > > > way because that's the direction normal epilogue peeling did today.
> But
> > > looking
> > > > at the code this should be easy to do.
> > > >
> > > > I'll also start on this now.
> > >
> > > Note there's also iterate_fix_dominators used in some CFG infrastructure
> > > to fixup the dominator tree after complex transforms.
> > >
> >
> > Oh that's a handy one, I'll take a look at how it works, but we may not need
> it
> > anymore after flipping the loops.  I did at points thought it might be easier
> to
> > just recalculate this since number of BB in the loop that effects dominators
> isn't
> > typically that much?
> 
> I suppose so in practice.  I guess I'd prefer a solution using
> iterate_fix_dominators, flipping the loops might have issues with other
> code (I'm not sure) - guess you'll figure out.

Ack, will use this then 😊

> 
> > > > >
> > > > > > We had spoken on IRC about removing the dominators validation
> call at
> > > the
> > > > > end of
> > > > > > slpeel_tree_duplicate_loop_to_edge_cfg and leaving it up to cfg
> > > cleanup
> > > > > to
> > > > > > remove the intermediate blocks that cause the dominators to fail.
> > > > > >
> > > > > > However this turned out not to work as cfgcleanup itself requires
> the
> > > > > dominators
> > > > > > graph.   So it's somewhat a chicken and egg.  To work around this I
> > > added
> > > > > some
> > > > > > rules for when I update what dominator and also reject the forms I
> > > don't
> > > > > support
> > > > > > during vect_analyze_loop_form.
> > > > > >
> > > > > > I have tried to structure the updates to loop-manip.cc in a way that
> it
> > > fits
> > > > > > with the current flow.  I think I have done a decent job, but there
> are
> > > things
> > > > > I
> > > > > > can also do differently if preferred and have pointed them out in
> > > > > comments in
> > > > > > the source.
> > > > > >
> > > > > > For the loop peeling we rewrite the loop form:
> > > > > >
> > > > > >
> > > > > >                      Header
> > > > > >                       ---
> > > > > >                       |x|
> > > > > >                        2
> > > > > >                        |
> > > > > >                        v
> > > > > >                 -------3<------
> > > > > >      early exit |      |      |
> > > > > >                 v      v      | latch
> > > > > >                 7      4----->6
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 |      8
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 ------>5
> > > > > >
> > > > > > into
> > > > > >
> > > > > >                      Header
> > > > > >                       ---
> > > > > >                       |x|
> > > > > >                        2
> > > > > >                        |
> > > > > >                        v
> > > > > >                 -------3<------
> > > > > >      early exit |      |      |
> > > > > >                 v      v      | latch
> > > > > >                 7      4----->6
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 |      8
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 |  New Header
> > > > > >                 |     ---
> > > > > >                 ----->|x|
> > > > > >                        9
> > > > > >                        |
> > > > > >                        v
> > > > > >                 ------10<-----
> > > > > >      early exit |      |      |
> > > > > >                 v      v      | latch
> > > > > >                 14     11---->13
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 |      12
> > > > > >                 |      |
> > > > > >                 |      v
> > > > > >                 ------> 5
> > > > > >
> > > > > > When we vectorize we move any statement not related to the early
> > > break
> > > > > itself to
> > > > > > the BB after the early exit and update all references as appropriate.
> > > > > >
> > > > > > This means that we check at the start of iterations whether we are
> > > going to
> > > > > exit
> > > > > > or not.  During the analyis phase we check whether we are allowed
> to
> > > do
> > > > > this
> > > > > > moving of statements.  Also note that we only move the vector
> > > statements
> > > > > and
> > > > > > leave the scalars alone.
> > > > > >
> > > > > > Codegen:
> > > > > >
> > > > > > for e.g.
> > > > > >
> > > > > > #define N 803
> > > > > > unsigned vect_a[N];
> > > > > > unsigned vect_b[N];
> > > > > >
> > > > > > unsigned test4(unsigned x)
> > > > > > {
> > > > > >  unsigned ret = 0;
> > > > > >  for (int i = 0; i < N; i++)
> > > > > >  {
> > > > > >    vect_b[i] = x + i;
> > > > > >    if (vect_a[i] > x)
> > > > > >      break;
> > > > > >    vect_a[i] = x;
> > > > > >
> > > > > >  }
> > > > > >  return ret;
> > > > > > }
> > > > > >
> > > > > > We generate for NEON:
> > > > > >
> > > > > > test4:
> > > > > >         adrp    x2, .LC0
> > > > > >         adrp    x3, .LANCHOR0
> > > > > >         dup     v2.4s, w0
> > > > > >         add     x3, x3, :lo12:.LANCHOR0
> > > > > >         movi    v4.4s, 0x4
> > > > > >         add     x4, x3, 3216
> > > > > >         ldr     q1, [x2, #:lo12:.LC0]
> > > > > >         mov     x1, 0
> > > > > >         mov     w2, 0
> > > > > >         .p2align 3,,7
> > > > > > .L3:
> > > > > >         ldr     q0, [x3, x1]
> > > > > >         add     v3.4s, v1.4s, v2.4s
> > > > > >         add     v1.4s, v1.4s, v4.4s
> > > > > >         cmhi    v0.4s, v0.4s, v2.4s
> > > > > >         umaxp   v0.4s, v0.4s, v0.4s
> > > > > >         fmov    x5, d0
> > > > > >         cbnz    x5, .L6
> > > > > >         add     w2, w2, 1
> > > > > >         str     q3, [x1, x4]
> > > > > >         str     q2, [x3, x1]
> > > > > >         add     x1, x1, 16
> > > > > >         cmp     w2, 200
> > > > > >         bne     .L3
> > > > > >         mov     w7, 3
> > > > > > .L2:
> > > > > >         lsl     w2, w2, 2
> > > > > >         add     x5, x3, 3216
> > > > > >         add     w6, w2, w0
> > > > > >         sxtw    x4, w2
> > > > > >         ldr     w1, [x3, x4, lsl 2]
> > > > > >         str     w6, [x5, x4, lsl 2]
> > > > > >         cmp     w0, w1
> > > > > >         bcc     .L4
> > > > > >         add     w1, w2, 1
> > > > > >         str     w0, [x3, x4, lsl 2]
> > > > > >         add     w6, w1, w0
> > > > > >         sxtw    x1, w1
> > > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > > >         str     w6, [x5, x1, lsl 2]
> > > > > >         cmp     w0, w4
> > > > > >         bcc     .L4
> > > > > >         add     w4, w2, 2
> > > > > >         str     w0, [x3, x1, lsl 2]
> > > > > >         sxtw    x1, w4
> > > > > >         add     w6, w1, w0
> > > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > > >         str     w6, [x5, x1, lsl 2]
> > > > > >         cmp     w0, w4
> > > > > >         bcc     .L4
> > > > > >         str     w0, [x3, x1, lsl 2]
> > > > > >         add     w2, w2, 3
> > > > > >         cmp     w7, 3
> > > > > >         beq     .L4
> > > > > >         sxtw    x1, w2
> > > > > >         add     w2, w2, w0
> > > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > > >         str     w2, [x5, x1, lsl 2]
> > > > > >         cmp     w0, w4
> > > > > >         bcc     .L4
> > > > > >         str     w0, [x3, x1, lsl 2]
> > > > > > .L4:
> > > > > >         mov     w0, 0
> > > > > >         ret
> > > > > >         .p2align 2,,3
> > > > > > .L6:
> > > > > >         mov     w7, 4
> > > > > >         b       .L2
> > > > > >
> > > > > > and for SVE:
> > > > > >
> > > > > > test4:
> > > > > >         adrp    x2, .LANCHOR0
> > > > > >         add     x2, x2, :lo12:.LANCHOR0
> > > > > >         add     x5, x2, 3216
> > > > > >         mov     x3, 0
> > > > > >         mov     w1, 0
> > > > > >         cntw    x4
> > > > > >         mov     z1.s, w0
> > > > > >         index   z0.s, #0, #1
> > > > > >         ptrue   p1.b, all
> > > > > >         ptrue   p0.s, all
> > > > > >         .p2align 3,,7
> > > > > > .L3:
> > > > > >         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
> > > > > >         add     z3.s, z0.s, z1.s
> > > > > >         cmplo   p2.s, p0/z, z1.s, z2.s
> > > > > >         b.any   .L2
> > > > > >         st1w    z3.s, p1, [x5, x3, lsl 2]
> > > > > >         add     w1, w1, 1
> > > > > >         st1w    z1.s, p1, [x2, x3, lsl 2]
> > > > > >         add     x3, x3, x4
> > > > > >         incw    z0.s
> > > > > >         cmp     w3, 803
> > > > > >         bls     .L3
> > > > > > .L5:
> > > > > >         mov     w0, 0
> > > > > >         ret
> > > > > >         .p2align 2,,3
> > > > > > .L2:
> > > > > >         cntw    x5
> > > > > >         mul     w1, w1, w5
> > > > > >         cbz     w5, .L5
> > > > > >         sxtw    x1, w1
> > > > > >         sub     w5, w5, #1
> > > > > >         add     x5, x5, x1
> > > > > >         add     x6, x2, 3216
> > > > > >         b       .L6
> > > > > >         .p2align 2,,3
> > > > > > .L14:
> > > > > >         str     w0, [x2, x1, lsl 2]
> > > > > >         cmp     x1, x5
> > > > > >         beq     .L5
> > > > > >         mov     x1, x4
> > > > > > .L6:
> > > > > >         ldr     w3, [x2, x1, lsl 2]
> > > > > >         add     w4, w0, w1
> > > > > >         str     w4, [x6, x1, lsl 2]
> > > > > >         add     x4, x1, 1
> > > > > >         cmp     w0, w3
> > > > > >         bcs     .L14
> > > > > >         mov     w0, 0
> > > > > >         ret
> > > > > >
> > > > > > On the workloads this work is based on we see between 2-3x
> > > performance
> > > > > uplift
> > > > > > using this patch.
> > > > > >
> > > > > > Outstanding issues:
> > > > > >  - The patch is fully functional but has two things I wonder about
> > > > > >    * In vect_transform_early_break should I just refactor
> > > > > vectorizable_comparison
> > > > > >      and use it to generate the condition body?  That would also get
> the
> > > > > costing.
> > > > >
> > > > > I'm looking at vectorizable_early_exit and validate_early_exit_stmts
> > > > > and I think that this should be mostly done as part of dependence
> > > > > analysis (because that's what it is) which should also remove the
> > > > > requirement of only handling decl-based accesses?
> > > >
> > > > That is fair enough, do you have a specific spot in mind where you'd
> > > > prefer me to slot it into?
> > >
> > > If you record the set of alternate (non-IV) exits then I'd wire it into
> > > vect_analyze_data_ref_dependences.  There we compute all
> dependences
> > > and at the end of this function you could put in the worklist code
> > > determining the data refs to move and those to move across, re-using
> > > the computed LOOP_VINFO_DDRS (there's no easy way to look them up,
> > > but walking the LOOP_VINFO_DDRS once and going the other way should
> > > work).
> >
> > Hmm so I think we may not have enough information just on the DDRS to
> know
> > what to move or not, say e.g.
> >
> > a = x[i]
> > c += a;
> > b -= a
> > if (c > d)
> >   break;
> >
> > here both a and c need to stay but b can move. In one of my earlier version
> of the
> > patch I did compute a list of statements to move during vect and kept them
> in a cache.
> 
> Yes, I'm aware that you need to compute all stmts the early exit condition
> depends on and then move "the others" (which might have side-effects).
> 
> > This had pro and cons:
> >
> > Pro: It saved me from having to re-compute this list again.
> > Cons:  The list is the set of scalar statements, but I need to move the
> generated vector
> >             statements.  Which means for every statement in the list I'd have to
> look up its
> >             generated vec statements and move those instead.  But I also have
> to correct the
> >             VUSES.  Since I had to do the latter anyway I combined the two
> traversals.
> >
> >             I could cache a worklist of which VUSES to update during analysis as
> well though, if
> >             They can't change between analysis and codegen.  But in principle
> there's nothing
> >              Stopping a scalar or SLP pattern from being able to no?
> 
> But in your patch you are moving the scalar stmts, no?  Because we're
> going to insert the vector stmts before (or after) the scalar stmts.
> 

Ah no, I move the vector statement during materialization.  Since vectorization of the
scalar statements happen in a top down fashion, I know that when I vectorize an early
break any statement it can possibly depend on have already been vectorized so I walk
the vector use-def chain of the break arguments and leave the scalar alone.

I do this because I thought that doing it late all analysis was already done.  But also in
case vectorization fails I did not want to have had modified the scalar stmts for the next
round of iteration or the switch between SLP and non-SLP loop vect.

So effectively I just change the materialization point of the instructions.  There is some
precedence here as e.g. vectorizing a live reduction can vectorize the reduction statements
in a different BB than the scalar (for instance at the exit point).

I could move the scalar code though, but that that would have more problems.

> > >
> > > Note that at least SLP in some cases uses scalar stmt positions
> > > during analysis to validate things, so if you move stmts only
> > > a transform stage there might be unforseen problems ...
> > >
> >
> > Hmm I guess that'll be the same if for SLP the nodes themselves are
> > re-arranged?  I'll dig into when we use the scalar  stmts for SLP.
> >
> > I could in principle add an entry to the SLP node to tell it what its
> > scalar pred was.  But we currently don't support PURE so it's probably
> > not blocking, but will make sure.
> 
> So we're using the stmt UIDs to perform dominator checks
> in vect_stmt_dominates_stmt_p (UIDs are also used to map stmts to
> their stmt_vec_info), if you are re-arranging things then such
> dominator queries might go wrong.  In particular SLP live lane analysis
> and SLP code generation uses that to find insertion places.
> 
> An alternate way to perform vectorization of the early exit would be
> to code-generate them (and their dependences) at the start of the
> loop by duplicating them.  The duplication could be done by means
> of a pattern sequence - we'd just need a stmt to key that on that
> we place at the start of the loop (I don't have a nice idea right now,
> a "scalar" GIMPLE_NOP might do(?)).
> 
> For all vectorizations with similar issues we've resorted to
> if-conversion doing the "enabling" transform and using if-conversion
> versioning to ensure that only prevails on the vector loop.  The
> "enabling" transform would be to perform the code motion so that
> there's no side-effects before the early exit(s).  I know that's
> kind of a hack, but it has proven to be easiest that way.
> 
> > > > >
> > > > > As for vect_transform_early_break, sure.  I fear that since you
> > > > > transform if (_1 > _2) to some _3 = _1 > _2; use(_3) that you need
> > > > > to expose this to the bool pattern handling machinery somehow.
> > > > > I can see that moving stmts around and doing it the way you do
> > > > > code-generation wise is easiest.
> > > > >
> > > > > How does this work with SLP btw?  You don't touch tree-vect-slp.cc at
> all
> > > > > but now that we have multiple BBs there's the issue of splitting
> > > > > children across different BBs - there's only
> > > > >
> > > > >           if ((phi_p || gimple_could_trap_p (stmt_info->stmt))
> > > > >               && (gimple_bb (first_stmt_info->stmt)
> > > > >                   != gimple_bb (stmt_info->stmt)))
> > > > >             {
> > > > >               if (dump_enabled_p ())
> > > > >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > > > >                                  "Build SLP failed: different BB for PHI "
> > > > >                                  "or possibly trapping operation in %G",
> > > > > stmt);
> > > > >               /* Mismatch.  */
> > > > >               continue;
> > > > >             }
> > > > >
> > > > > right now and the code motion you apply also might break the
> > > assumptions
> > > > > of the dependence analysis code.  I suppose that SLPing the early exit
> > > > > isn't supported, aka
> > > > >
> > > > >  for (;;)
> > > > >    {
> > > > >       if (a[2*i] > x) break;
> > > > >       if (a[2*i + 1 > x) break;
> > > > > ...
> > > > >    }
> > > > >
> > > > > or
> > > > >    _1 = a[2*i] > x | a[2*i + 1 > x;
> > > > >    if (_1) break;
> > > > >
> > > > > ?
> > > >
> > > > Indeed, SLP traps with the failure message you highlighted above.
> > > > At the moment I added a restriction to a single exit, this stops it from
> > > > getting that far.  (This limitation is because the code motion over
> multiple
> > > > exits becomes interesting, it's not specifically for SLP, and if SLP did
> work I
> > > > would move the check during SLP build, or after).
> > > >
> > > > Aside from that, different parts of slp build fail with e.g.
> > > >
> > > > Build SLP failed: different operation in stmt _11 = _4 * x_17(D);
> > > >
> > > > (This is testcase 6 in my list of test).
> > > >
> > > > Hybrid does work though, if the part with the conditional is in the non-
> SLP
> > > part.
> > >
> > > OK, I see.
> > >
> > > > >
> > > > > >    * The testcase vect-early-break_2.c shows one form that currently
> > > > > doesn't work
> > > > > >      and crashes.  The reason is that there's a mismatch between the
> > > types
> > > > > required
> > > > > >      to vectorize this.  The vector loads cause multiple statements to
> be
> > > > > generated
> > > > > >      and thus require multiple comparisons.  In this case 8 of them.
> > > However
> > > > > >      when determining ncopies the early exit uses a boolean mode
> and so
> > > > > ncopies
> > > > > >      is always 1.  If I force it instead to determine ncopies based on it's
> > > > > >      operands instead of the final type then we get the conditonal
> > > vectorized
> > > > > >      but the it has a mismatch comparing integer vectors with
> boolean.
> > > > > >      It feels like I need some kind of boolean reductions here..  Should
> I
> > > just
> > > > > >      reject this form for now?
> > > > >
> > > > > That's probably the bool pattern handling I hinted at above.
> > > > > Bools/conditions are awkward, maybe you should handle the
> > > > > GIMPLE_CONDs as patterns computing the actual condition as mask
> > > > > fed into a dummy .IFN_CONSUME_MASK stmt?
> > > >
> > > > Indeed, though one additional difficulty here is that in the example for
> > > instance
> > > > the number of copies is needed, e.g. if you have to do widening before
> the
> > > compare.
> > > > This means that you have _hi, _lo splits. So unless you short circuit this
> can
> > > lead to
> > > > quite a number of operations before you exit.
> > > >
> > > > I could also generate an OR reduction in this case instead of needing a
> new
> > > IFN, but
> > > > I'll go with whatever you prefer/recommend.
> > >
> > > Hmm, I guess doing an OR reduction is simplest (but then you have to
> > > check for the NE/EQ compare operation support, not just cbranch).
> >
> > I should probably do this anyway, The Gimple expansion only supports
> > NE/EQ for cbranch (and explicitly checks).  So I'll amend my check here.
> >
> > > But
> > > does that solve the actual problem?  I think the problem is that the
> > > (scalar) branch isn't seen as consuming a "mask" value of the comparison
> > > by the pattern code, so don't we need to handle the alternate exit
> > > GIMPLE_CONDs in some way there?
> > >
> >
> > Right, so what I meant was that during codegen I can determine this case
> > exist by comparing the NUNITS of the statement in the compare with that
> > of the Boolean result.  On mismatch the ratio between these two would
> > tell me how many vectors I'd have to reduce, and I should be able to get to
> > these statements by looking at the statement_vinfo of the non 0 operand
> of the compare.
> >
> > The transform_early_break itself is the thing that needs to handle both the
> > vectorization of the GIMPLE_COND and the updating of the value in the
> value
> > in the `if`.  So all I'd have to do is point it to the reduction rather than the
> last
> > vector statement produced as we do now.
> 
> I mean for ISAs like AVX512 a boolean "value" has V16QImode but a
> boolean "mask" has HImode.  The patterns exist to make sure the
> appropriate variant is generated depending on context.  Don't you
> need to make sure you get the "correct" variant for code generating
> the early exit?  That is, it can happen you'll arrive with
> HImode < V16QImode for a char < char comparison.

Ah yes, No I see what you mean.  That's not the problem here since I don't
generate a scalar mask.  The vectorizer checks for cbranch with V16QImode in
this case if V16QImode is the truth-type result of the vector compare.

So we do that for instance for NEON and then the cbranch expansion
generates the appropriate results to compare against 0.  It's worth mentioning
that the comparison generated is not against scalar 0, but vector array of 0.

i.e.

  mask__42.16_44 = vect_cst__34 < vect__3.15_41;
  if (mask__42.16_44 != { 0, 0, 0, 0 })
    goto <bb 8>; [5.50%]
  else
    goto <bb 4>; [94.50%]

so for instance in the case of AVX I'd expect the cbranch implementation
to simply be e.g: vptest xmmN, xmmN and a branch on the ZF flags.

I think you only need AVX not AVX512, so you don't need to
reduce with the VPTESTM* variants.  We only care about the flags.

> 
> But maybe I misunderstood the ICE (haven't dug into it in detail yet).

So the ICE is more when the no copies required for the compare doesn't match
that for the branch.

So for instance

complex double vect_a[N];
if (vect_a[i] == x)

because complex is a compound type when we vectorize we need to unpack.
So we end up generating for the GIMPLE_COND:

  mask__27.77_223 = vect_cst__222 == vect__25.54_199;
  mask__27.77_224 = vect_cst__222 == vect__25.57_202;
  mask__27.77_225 = vect_cst__222 == vect__25.60_205;
  mask__27.77_226 = vect_cst__222 == vect__25.63_208;
  mask__27.77_227 = vect_cst__222 == vect__25.66_211;
  mask__27.77_228 = vect_cst__222 == vect__25.69_214;
  mask__27.77_229 = vect_cst__222 == vect__25.72_217;
  mask__27.77_230 = vect_cst__222 == vect__25.75_220;
  _27 = x$real_12 == _25;
  mask__28.78_232 = vect_cst__231 == vect__25.55_200;
  mask__28.78_233 = vect_cst__231 == vect__25.58_203;
  mask__28.78_234 = vect_cst__231 == vect__25.61_206;
  mask__28.78_235 = vect_cst__231 == vect__25.64_209;
  mask__28.78_236 = vect_cst__231 == vect__25.67_212;
  mask__28.78_237 = vect_cst__231 == vect__25.70_215;
  mask__28.78_238 = vect_cst__231 == vect__25.73_218;
  mask__28.78_239 = vect_cst__231 == vect__25.76_221;
  _28 = x$imag_10 == _26;
  mask__29.79_240 = mask__27.77_223 & mask__28.78_232;
  mask__29.79_241 = mask__27.77_224 & mask__28.78_233;
  mask__29.79_242 = mask__27.77_225 & mask__28.78_234;
  mask__29.79_243 = mask__27.77_226 & mask__28.78_235;
  mask__29.79_244 = mask__27.77_227 & mask__28.78_236;
  mask__29.79_245 = mask__27.77_228 & mask__28.78_237;
  mask__29.79_246 = mask__27.77_229 & mask__28.78_238;
  mask__29.79_247 = mask__27.77_230 & mask__28.78_239;
  _29 = _27 & _28;

Which now leads to what to do for the branch comparison,
So how to vectorize:

  if (_29 != 0)

Which I suppose I have to OR all the mask__29.79_24* masks
Together so that I can generate mask__29.80_240 != {0, 0, 0, 0 } for the
Final comparison.

I just wanted to check if that was the right thing to do.

Thanks,
Tamar

> Richard.
> 
> > Cheers,
> > Tamar
> >
> > > >
> > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-
> linux-
> > > gnu
> > > > > and issues
> > > > > > mentioned above.
> > > > > >
> > > > > > OK enough design and implementation for GCC 13?
> > > > >
> > > > > Not sure, I didn't yet look thoroughly at the patch itself.
> > > >
> > > > I'll light some candles ?
> > >
> > > ;)
> > >
> > > Richard.
> > >
> > > > Thanks for taking a look,
> > > > Tamar
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	* cfgloop.cc (normal_exit): New.
> > > > > > 	* cfgloop.h (normal_exit): New.
> > > > > > 	* doc/loop.texi (normal_exit): Document.
> > > > > > 	* doc/sourcebuild.texi (vect_early_break): Document.
> > > > > > 	* tree-scalar-evolution.cc (get_loop_exit_condition):
> Refactor.
> > > > > > 	(get_edge_condition): New.
> > > > > > 	* tree-scalar-evolution.h (get_edge_condition): new.
> > > > > > 	* tree-vect-data-refs.cc
> (vect_enhance_data_refs_alignment): Get
> > > > > main
> > > > > > 	exit during peeling check.
> > > > > > 	* tree-vect-loop-manip.cc
> > > > > > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support copying
> CFGs
> > > > > with
> > > > > > 	multiple exits and place at the end.
> > > > > > 	(vect_update_ivs_after_vectorizer): Skip on early exits.
> > > > > > 	(vect_update_ivs_after_early_break): New.
> > > > > > 	(gimple_find_last_mem_use): New.
> > > > > > 	(slpeel_update_phi_nodes_for_loops,
> > > > > slpeel_update_phi_nodes_for_guard2,
> > > > > > 	slpeel_update_phi_nodes_for_lcssa,
> > > > > vect_gen_vector_loop_niters_mult_vf,
> > > > > > 	slpeel_can_duplicate_loop_p,
> > > > > vect_set_loop_condition_partial_vectors):
> > > > > > 	Update for multiple exits.
> > > > > > 	(vect_set_loop_condition,
> vect_set_loop_condition_normal):
> > > > > Update
> > > > > > 	condition for early exits.
> > > > > > 	(vect_do_peeling): Peel for early breaks.
> > > > > > 	* tree-vect-loop.cc (vect_get_loop_niters): Analyze and
> return all
> > > > > > 	exits.
> > > > > > 	(vect_analyze_loop_form, vect_create_loop_vinfo): Analyze
> all
> > > > > conds.
> > > > > > 	(vect_determine_partial_vectors_and_peeling): Suport
> multiple
> > > > > exits by
> > > > > > 	peeing.
> > > > > > 	(vect_analyze_loop): Add anaysis for multiple exits.
> > > > > > 	(move_early_exit_stmts, vect_transform_early_break,
> > > > > > 	validate_early_exit_stmts, vectorizable_early_exit): New.
> > > > > > 	(vectorizable_live_operation): Ignore early break
> statements.
> > > > > > 	(scale_profile_for_vect_loop, vect_transform_loop):
> Support
> > > > > multiple
> > > > > > 	exits.
> > > > > > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze early
> breaks.
> > > > > > 	(prepare_vec_mask): Expose.
> > > > > > 	(vect_analyze_stmt, vect_transform_stmt,
> vect_is_simple_use,
> > > > > > 	vect_get_vector_types_for_stmt): Support loop
> control/early exits.
> > > > > > 	* tree-vectorizer.cc (pass_vectorize::execute): Record all
> exits for
> > > > > > 	RPO.
> > > > > > 	* tree-vectorizer.h (enum vect_def_type): Add
> vect_early_exit_def.
> > > > > > 	(slpeel_can_duplicate_loop_p): Change loop to
> loop_vec_info.
> > > > > > 	(struct vect_loop_form_info): Add loop conditions.
> > > > > > 	(LOOP_VINFO_EARLY_BREAKS, vect_transform_early_break,
> > > > > > 	vectorizable_early_exit): New.
> > > > > > 	(prepare_vec_mask): New.
> > > > > > 	(vec_info): Add early_breaks.
> > > > > > 	(loop_vec_info_for_loop): Make loop const.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 	* lib/target-supports.exp (vect_early_break): New.
> > > > > > 	* g++.dg/vect/vect-early-break_1.cc: New test.
> > > > > > 	* g++.dg/vect/vect-early-break_2.cc: New test.
> > > > > > 	* g++.dg/vect/vect-early-break_3.cc: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_1.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_10.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_2.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_3.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_4.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_5.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_6.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_7.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_8.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-run_9.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-template_1.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break-template_2.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_1.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_10.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_11.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_12.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_13.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_14.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_15.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_2.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_3.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_4.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_5.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_6.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_7.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_8.c: New test.
> > > > > > 	* gcc.dg/vect/vect-early-break_9.c: New test.
> > > > > >
> > > > > > --- inline copy of patch --
> > > > > > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> > > > > > index
> > > > >
> > >
> 528b1219bc37ad8f114d5cf381c0cff899db31ee..9c7f019a51abfe2de8e1dd7135
> > > > > dea2463b0256a0 100644
> > > > > > --- a/gcc/cfgloop.h
> > > > > > +++ b/gcc/cfgloop.h
> > > > > > @@ -385,6 +385,7 @@ extern basic_block
> > > > > *get_loop_body_in_custom_order (const class loop *, void *,
> > > > > >
> > > > > >  extern auto_vec<edge> get_loop_exit_edges (const class loop *,
> > > > > basic_block * = NULL);
> > > > > >  extern edge single_exit (const class loop *);
> > > > > > +extern edge normal_exit (const class loop *);
> > > > > >  extern edge single_likely_exit (class loop *loop, const vec<edge>
> &);
> > > > > >  extern unsigned num_loop_branches (const class loop *);
> > > > > >
> > > > > > diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
> > > > > > index
> > > > >
> > >
> 57bf7b1855d4dd20fb3f42388124932d0ca2b48a..97a7373fb6d9514da602d5be0
> > > > > 1050f2ec66094bc 100644
> > > > > > --- a/gcc/cfgloop.cc
> > > > > > +++ b/gcc/cfgloop.cc
> > > > > > @@ -1812,6 +1812,20 @@ single_exit (const class loop *loop)
> > > > > >      return NULL;
> > > > > >  }
> > > > > >
> > > > > > +/* Returns the normal exit edge of LOOP, or NULL if LOOP has
> either
> > > no
> > > > > exit.
> > > > > > +   If loops do not have the exits recorded, NULL is returned always.
> */
> > > > > > +
> > > > > > +edge
> > > > > > +normal_exit (const class loop *loop)
> > > > > > +{
> > > > > > +  struct loop_exit *exit = loop->exits->next;
> > > > > > +
> > > > > > +  if (!loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> > > > > > +    return NULL;
> > > > > > +
> > > > > > +  return exit->e;
> > > > > > +}
> > > > > > +
> > > > > >  /* Returns true when BB has an incoming edge exiting LOOP.  */
> > > > > >
> > > > > >  bool
> > > > > > diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> > > > > > index
> > > > >
> > >
> 6e8657a074d2447db7ae9b75cbfbb71282b84287..e1de2ac40f87f879ab691f68b
> > > > > d41b3bc21a83bf7 100644
> > > > > > --- a/gcc/doc/loop.texi
> > > > > > +++ b/gcc/doc/loop.texi
> > > > > > @@ -211,6 +211,10 @@ relation, and breath-first search order,
> > > > > respectively.
> > > > > >  @item @code{single_exit}: Returns the single exit edge of the
> loop, or
> > > > > >  @code{NULL} if the loop has more than one exit.  You can only use
> this
> > > > > >  function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
> > > > > > +function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is
> used.
> > > > > > +@item @code{normal_exit}: Returns the natural exit edge of the
> > > loop,
> > > > > > +even if the loop has more than one exit.  The natural exit is the exit
> > > > > > +that would normally be taken where the loop to be fully executed.
> > > > > >  @item @code{get_loop_exit_edges}: Enumerates the exit edges
> of a
> > > > > loop.
> > > > > >  @item @code{just_once_each_iteration_p}: Returns true if the
> basic
> > > > > block
> > > > > >  is executed exactly once during each iteration of a loop (that is, it
> > > > > > @@ -623,4 +627,4 @@ maximum verbosity the details of a data
> > > > > dependence relations array,
> > > > > >  @code{dump_dist_dir_vectors} prints only the classical distance
> and
> > > > > >  direction vectors for a data dependence relations array, and
> > > > > >  @code{dump_data_references} prints the details of the data
> > > references
> > > > > > -contained in a data reference array.
> > > > > > +contained in a data reference array
> > > > > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > > > > > index
> > > > >
> > >
> e21a1d381e05da1bfccb555068ea1dbeabd9fc79..16fa94ebf532d27cd9a3a45a7
> > > > > aad578ca6920496 100644
> > > > > > --- a/gcc/doc/sourcebuild.texi
> > > > > > +++ b/gcc/doc/sourcebuild.texi
> > > > > > @@ -1640,6 +1640,10 @@ Target supports hardware vectors of
> > > > > @code{float} when
> > > > > >  @option{-funsafe-math-optimizations} is not in effect.
> > > > > >  This implies @code{vect_float}.
> > > > > >
> > > > > > +@item vect_early_break
> > > > > > +Target supports hardware vectorization of loops with early breaks.
> > > > > > +This requires an implementation of the cbranch optab for vectors.
> > > > > > +
> > > > > >  @item vect_int
> > > > > >  Target supports hardware vectors of @code{int}.
> > > > > >
> > > > > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > > > > b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..6a83648ca36e2c8feeb78335fc
> > > > > cf3f3b82a97d2e
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > > > > > @@ -0,0 +1,61 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-w -O2" } */
> > > > > > +
> > > > > > +void fancy_abort(char *, int, const char *)
> > > > > __attribute__((__noreturn__));
> > > > > > +template <unsigned N, typename> struct poly_int_pod { int
> coeffs[N];
> > > };
> > > > > > +template <unsigned N, typename> class poly_int : public
> > > poly_int_pod<N,
> > > > > int> {
> > > > > > +public:
> > > > > > +  template <typename Ca> poly_int &operator+=(const
> > > poly_int_pod<N,
> > > > > Ca> &);
> > > > > > +};
> > > > > > +template <unsigned N, typename C>
> > > > > > +template <typename Ca>
> > > > > > +poly_int<N, C> &poly_int<N, C>::operator+=(const
> poly_int_pod<N,
> > > Ca>
> > > > > &a) {
> > > > > > +  for (int i = 0; i < N; i++)
> > > > > > +    this->coeffs[i] += a.coeffs[i];
> > > > > > +  return *this;
> > > > > > +}
> > > > > > +template <unsigned N, typename Ca, typename Cb>
> > > > > > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > > > > > +  poly_int<N, long> r;
> > > > > > +  return r;
> > > > > > +}
> > > > > > +struct vec_prefix {
> > > > > > +  unsigned m_num;
> > > > > > +};
> > > > > > +struct vl_ptr;
> > > > > > +struct va_heap {
> > > > > > +  typedef vl_ptr default_layout;
> > > > > > +};
> > > > > > +template <typename, typename A, typename = typename
> > > > > A::default_layout>
> > > > > > +struct vec;
> > > > > > +template <typename T, typename A> struct vec<T, A, int> {
> > > > > > +  T &operator[](unsigned);
> > > > > > +  vec_prefix m_vecpfx;
> > > > > > +  T m_vecdata[];
> > > > > > +};
> > > > > > +template <typename T, typename A> T &vec<T, A,
> > > > > int>::operator[](unsigned ix) {
> > > > > > +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > > > > > +  return m_vecdata[ix];
> > > > > > +}
> > > > > > +template <typename T> struct vec<T, va_heap> {
> > > > > > +  T &operator[](unsigned ix) { return m_vec[ix]; }
> > > > > > +  vec<T, va_heap, int> m_vec;
> > > > > > +};
> > > > > > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > > > > > +template <typename> class vector_builder : public auto_vec {};
> > > > > > +class int_vector_builder : public vector_builder<int> {
> > > > > > +public:
> > > > > > +  int_vector_builder(poly_int<2, long>, int, int);
> > > > > > +};
> > > > > > +bool vect_grouped_store_supported() {
> > > > > > +  int i;
> > > > > > +  poly_int<2, long> nelt;
> > > > > > +  int_vector_builder sel(nelt, 2, 3);
> > > > > > +  for (i = 0; i < 6; i++)
> > > > > > +    sel[i] += exact_div(nelt, 2);
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > > > > b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..6a83648ca36e2c8feeb78335fc
> > > > > cf3f3b82a97d2e
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > > > > > @@ -0,0 +1,61 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-w -O2" } */
> > > > > > +
> > > > > > +void fancy_abort(char *, int, const char *)
> > > > > __attribute__((__noreturn__));
> > > > > > +template <unsigned N, typename> struct poly_int_pod { int
> coeffs[N];
> > > };
> > > > > > +template <unsigned N, typename> class poly_int : public
> > > poly_int_pod<N,
> > > > > int> {
> > > > > > +public:
> > > > > > +  template <typename Ca> poly_int &operator+=(const
> > > poly_int_pod<N,
> > > > > Ca> &);
> > > > > > +};
> > > > > > +template <unsigned N, typename C>
> > > > > > +template <typename Ca>
> > > > > > +poly_int<N, C> &poly_int<N, C>::operator+=(const
> poly_int_pod<N,
> > > Ca>
> > > > > &a) {
> > > > > > +  for (int i = 0; i < N; i++)
> > > > > > +    this->coeffs[i] += a.coeffs[i];
> > > > > > +  return *this;
> > > > > > +}
> > > > > > +template <unsigned N, typename Ca, typename Cb>
> > > > > > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > > > > > +  poly_int<N, long> r;
> > > > > > +  return r;
> > > > > > +}
> > > > > > +struct vec_prefix {
> > > > > > +  unsigned m_num;
> > > > > > +};
> > > > > > +struct vl_ptr;
> > > > > > +struct va_heap {
> > > > > > +  typedef vl_ptr default_layout;
> > > > > > +};
> > > > > > +template <typename, typename A, typename = typename
> > > > > A::default_layout>
> > > > > > +struct vec;
> > > > > > +template <typename T, typename A> struct vec<T, A, int> {
> > > > > > +  T &operator[](unsigned);
> > > > > > +  vec_prefix m_vecpfx;
> > > > > > +  T m_vecdata[];
> > > > > > +};
> > > > > > +template <typename T, typename A> T &vec<T, A,
> > > > > int>::operator[](unsigned ix) {
> > > > > > +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > > > > > +  return m_vecdata[ix];
> > > > > > +}
> > > > > > +template <typename T> struct vec<T, va_heap> {
> > > > > > +  T &operator[](unsigned ix) { return m_vec[ix]; }
> > > > > > +  vec<T, va_heap, int> m_vec;
> > > > > > +};
> > > > > > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > > > > > +template <typename> class vector_builder : public auto_vec {};
> > > > > > +class int_vector_builder : public vector_builder<int> {
> > > > > > +public:
> > > > > > +  int_vector_builder(poly_int<2, long>, int, int);
> > > > > > +};
> > > > > > +bool vect_grouped_store_supported() {
> > > > > > +  int i;
> > > > > > +  poly_int<2, long> nelt;
> > > > > > +  int_vector_builder sel(nelt, 2, 3);
> > > > > > +  for (i = 0; i < 6; i++)
> > > > > > +    sel[i] += exact_div(nelt, 2);
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > > > > b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12
> > > > > 273fd8e5aa2018c
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > > > > > @@ -0,0 +1,16 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-w -O2" } */
> > > > > > +
> > > > > > +int aarch64_advsimd_valid_immediate_hs_val32;
> > > > > > +bool aarch64_advsimd_valid_immediate_hs() {
> > > > > > +  for (int shift = 0; shift < 32; shift += 8)
> > > > > > +    if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
> > > > > > +      return aarch64_advsimd_valid_immediate_hs_val32;
> > > > > > +  for (;;)
> > > > > > +    ;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d
> > > > > 17a5c979fd78083
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 0
> > > > > > +#include "vect-early-break-template_1.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d1856
> > > > > 9b3406050e54603
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 800
> > > > > > +#define P 799
> > > > > > +#include "vect-early-break-template_2.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..63f63101a467909f328be7f3ac
> > > > > bc5bcb721967ff
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 802
> > > > > > +#include "vect-early-break-template_1.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9
> > > > > e0264d6301c8589
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 5
> > > > > > +#include "vect-early-break-template_1.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c
> > > > > 15d9ed6ab15bada
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 278
> > > > > > +#include "vect-early-break-template_1.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd23
> > > > > 8d1aff7a7c7da
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 800
> > > > > > +#define P 799
> > > > > > +#include "vect-early-break-template_1.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c604
> > > > > 76b7c8f531ddcb
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 0
> > > > > > +#include "vect-early-break-template_2.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7
> > > > > f5e4acde4aeec9
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 802
> > > > > > +#include "vect-early-break-template_2.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..a614925465606b54c638221ffb
> > > > > 95a5e8d3bee797
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 5
> > > > > > +#include "vect-early-break-template_2.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa
> > > > > 67604563f0afee7
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > > > > > @@ -0,0 +1,11 @@
> > > > > > +/* { dg-do run } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast -save-temps" } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +#define P 278
> > > > > > +#include "vect-early-break-template_2.c"
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f
> > > > > 2de02ddcc95de9
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > > > > > @@ -0,0 +1,47 @@
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifndef P
> > > > > > +#define P 0
> > > > > > +#endif
> > > > > > +
> > > > > > +unsigned vect_a[N] = {0};
> > > > > > +unsigned vect_b[N] = {0};
> > > > > > +
> > > > > > +__attribute__((noipa, noinline))
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > +
> > > > > > +extern void abort ();
> > > > > > +
> > > > > > +int main ()
> > > > > > +{
> > > > > > +
> > > > > > +  int x = 1;
> > > > > > +  int idx = P;
> > > > > > +  vect_a[idx] = x + 1;
> > > > > > +
> > > > > > +  test4(x);
> > > > > > +
> > > > > > +  if (vect_b[idx] != (x + idx))
> > > > > > +    abort ();
> > > > > > +
> > > > > > +  if (vect_a[idx] != x + 1)
> > > > > > +    abort ();
> > > > > > +
> > > > > > +  if (idx > 0 && vect_a[idx-1] != x)
> > > > > > +    abort ();
> > > > > > +
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f
> > > > > 1089e5607dca0d
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > > > > > @@ -0,0 +1,50 @@
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifndef P
> > > > > > +#define P 0
> > > > > > +#endif
> > > > > > +
> > > > > > +unsigned vect_a[N] = {0};
> > > > > > +unsigned vect_b[N] = {0};
> > > > > > +
> > > > > > +__attribute__((noipa, noinline))
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return i;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > +
> > > > > > +extern void abort ();
> > > > > > +
> > > > > > +int main ()
> > > > > > +{
> > > > > > +
> > > > > > +  int x = 1;
> > > > > > +  int idx = P;
> > > > > > +  vect_a[idx] = x + 1;
> > > > > > +
> > > > > > +  unsigned res = test4(x);
> > > > > > +
> > > > > > +  if (res != idx)
> > > > > > +    abort ();
> > > > > > +
> > > > > > +  if (vect_b[idx] != (x + idx))
> > > > > > +    abort ();
> > > > > > +
> > > > > > +  if (vect_a[idx] != x + 1)
> > > > > > +    abort ();
> > > > > > +
> > > > > > +  if (idx > 0 && vect_a[idx-1] != x)
> > > > > > +    abort ();
> > > > > > +
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c8
> > > > > 39f98562b6d4dd7
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961e
> > > > > ad5114fcc61a11b
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > > > > > @@ -0,0 +1,28 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x,int y, int z)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] = x;
> > > > > > + }
> > > > > > +
> > > > > > + ret = x + y * z;
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a
> > > > > 24ef854994a9890
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > > > > > @@ -0,0 +1,31 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x, int y)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > +for (int o = 0; o < y; o++)
> > > > > > +{
> > > > > > + ret += o;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > +}
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3e
> > > > > bcb5c25a39d1b2
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > > > > > @@ -0,0 +1,31 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x, int y)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > +for (int o = 0; o < y; o++)
> > > > > > +{
> > > > > > + ret += o;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return vect_a[i];
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > +}
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6e
> > > > > dcfe7c1580c7113
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return vect_a[i] * x;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666
> > > > > bf608e3bc6a511
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > > > > > @@ -0,0 +1,25 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +int test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return i;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ff
> > > > > a7ca01c0f8d3a
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > > > > > @@ -0,0 +1,25 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#define N 803
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +int test4(unsigned x)
> > > > > > +{
> > > > > > + int ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return i;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53c
> > > > > c1e44ef4b84d5c
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#include <complex.h>
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +complex double vect_a[N];
> > > > > > +complex double vect_b[N];
> > > > > > +
> > > > > > +complex double test4(complex double x)
> > > > > > +{
> > > > > > + complex double ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] += x + i;
> > > > > > +   if (vect_a[i] == x)
> > > > > > +     return i;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89
> > > > > e43c3b70293b7d9
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > > > > > @@ -0,0 +1,20 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } }
> */
> > > > > > +
> > > > > > +unsigned test4(char x, char *vect, int n)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < n; i++)
> > > > > > + {
> > > > > > +   if (vect[i] > x)
> > > > > > +     return 1;
> > > > > > +
> > > > > > +   vect[i] = x;
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51
> > > > > ff1e94270dc861
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > > > > > @@ -0,0 +1,23 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } }
> */
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +unsigned vect[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   if (i > 16 && vect[i] > x)
> > > > > > +     break;
> > > > > > +
> > > > > > +   vect[i] = x;
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f
> > > > > 78ac3b84f6de24
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > > > > > @@ -0,0 +1,24 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     return vect_a[i];
> > > > > > +   vect_a[i] = x;
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..09632d9afda7e07f1a8417514e
> > > > > f77356f00045bd
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > > > > > @@ -0,0 +1,26 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } }
> */
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < (N/2); i+=2)
> > > > > > + {
> > > > > > +   vect_b[i] = x + i;
> > > > > > +   vect_b[i+1] = x + i+1;
> > > > > > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +   vect_a[i+1] += x * vect_b[i+1];
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da10
> > > > > 3931ca394423d5
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#include <complex.h>
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +complex double vect_a[N];
> > > > > > +complex double vect_b[N];
> > > > > > +
> > > > > > +complex double test4(complex double x)
> > > > > > +{
> > > > > > + complex double ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] += x + i;
> > > > > > +   if (vect_a[i] == x)
> > > > > > +     break;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a7
> > > > > 35b8d902cbb607
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +#include <complex.h>
> > > > > > +
> > > > > > +#define N 1024
> > > > > > +char vect_a[N];
> > > > > > +char vect_b[N];
> > > > > > +
> > > > > > +char test4(char x, char * restrict res)
> > > > > > +{
> > > > > > + char ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_b[i] += x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] += x * vect_b[i];
> > > > > > +   res[i] *= vect_b[i];
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > >
> 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be80
> > > > > 2bba51cd818393
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } }
> */
> > > > > > +
> > > > > > +#ifndef N
> > > > > > +#define N 803
> > > > > > +#endif
> > > > > > +unsigned vect_a[N];
> > > > > > +unsigned vect_b[N];
> > > > > > +
> > > > > > +unsigned test4(unsigned x)
> > > > > > +{
> > > > > > + unsigned ret = 0;
> > > > > > + for (int i = 0; i < N; i++)
> > > > > > + {
> > > > > > +   vect_a[i] = x + i;
> > > > > > +   if (vect_a[i] > x)
> > > > > > +     break;
> > > > > > +   vect_a[i] = x;
> > > > > > +
> > > > > > + }
> > > > > > + return ret;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/lib/target-supports.exp
> > > b/gcc/testsuite/lib/target-
> > > > > supports.exp
> > > > > > index
> > > > >
> > >
> ccbbee847f755d6f30116d5b38e4027a998b48fd..5cbf54bd2a23dfdc5dc7b148b
> > > > > 0dc6ed4c63814ae 100644
> > > > > > --- a/gcc/testsuite/lib/target-supports.exp
> > > > > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > > > > @@ -3645,6 +3645,18 @@ proc check_effective_target_vect_int { }
> {
> > > > > >  	}}]
> > > > > >  }
> > > > > >
> > > > > > +# Return 1 if the target supports hardware vectorization of early
> > > breaks,
> > > > > > +# 0 otherwise.
> > > > > > +#
> > > > > > +# This won't change for different subtargets so cache the result.
> > > > > > +
> > > > > > +proc check_effective_target_vect_early_break { } {
> > > > > > +    return [check_cached_effective_target_indexed
> vect_early_break {
> > > > > > +      expr {
> > > > > > +	([istarget aarch64*-*-*]
> > > > > > +	 && [check_effective_target_aarch64_sve])
> > > > > > +	}}]
> > > > > > +}
> > > > > >  # Return 1 if the target supports hardware vectorization of complex
> > > > > additions of
> > > > > >  # byte, 0 otherwise.
> > > > > >  #
> > > > > > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > > > > > index
> > > > >
> > >
> 0f90207bc733db3cf85979d9b0b962aefa0831d6..5af7d2bba0d62195704a8d41e
> > > > > f6e600327169770 100644
> > > > > > --- a/gcc/tree-scalar-evolution.h
> > > > > > +++ b/gcc/tree-scalar-evolution.h
> > > > > > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not
> see
> > > > > >
> > > > > >  extern tree number_of_latch_executions (class loop *);
> > > > > >  extern gcond *get_loop_exit_condition (const class loop *);
> > > > > > +extern gcond *get_edge_condition (edge);
> > > > > >
> > > > > >  extern void scev_initialize (void);
> > > > > >  extern bool scev_initialized_p (void);
> > > > > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-
> evolution.cc
> > > > > > index
> > > > >
> > >
> 7e2a3e986619de87e4ae9daf16198be1f13b917c..3012871dd7f9a7d1897f96a29
> > > > > b1b0b28d90cb63f 100644
> > > > > > --- a/gcc/tree-scalar-evolution.cc
> > > > > > +++ b/gcc/tree-scalar-evolution.cc
> > > > > > @@ -884,7 +884,7 @@ scev_dfs::add_to_evolution (tree
> chrec_before,
> > > > > enum tree_code code,
> > > > > >    return res;
> > > > > >  }
> > > > > >
> > > > > > -
> > > > > > +
> > > > >
> > > > >
> > > > > >  /* Follow the ssa edge into the binary expression RHS0 CODE RHS1.
> > > > > >     Return true if the strongly connected component has been found.
> */
> > > > > >
> > > > > > @@ -1295,8 +1295,15 @@ tail_recurse:
> > > > > >  gcond *
> > > > > >  get_loop_exit_condition (const class loop *loop)
> > > > > >  {
> > > > > > +  return get_edge_condition (normal_exit (loop));
> > > > > > +}
> > > > > > +
> > > > > > +/* If the statement just before the EXIT_EDGE contains a condition
> > > then
> > > > > > +   return the condition, otherwise NULL. */
> > > > > > +
> > > > > > +gcond *
> > > > > > +get_edge_condition (edge exit_edge){
> > > > > >    gcond *res = NULL;
> > > > > > -  edge exit_edge = single_exit (loop);
> > > > > >
> > > > > >    if (dump_file && (dump_flags & TDF_SCEV))
> > > > > >      fprintf (dump_file, "(get_loop_exit_condition \n  ");
> > > > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > > > > index
> > > > >
> > >
> 4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74..02e373171675432cd32c4a7244
> > > > > 0eebdff988bdcf 100644
> > > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > > @@ -2072,7 +2072,7 @@ vect_enhance_data_refs_alignment
> > > > > (loop_vec_info loop_vinfo)
> > > > > >
> > > > > >    /* Check if we can possibly peel the loop.  */
> > > > > >    if (!vect_can_advance_ivs_p (loop_vinfo)
> > > > > > -      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > > > > > +      || !slpeel_can_duplicate_loop_p (loop_vinfo, normal_exit
> (loop))
> > > > > >        || loop->inner)
> > > > > >      do_peeling = false;
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-
> manip.cc
> > > > > > index
> > > > >
> > >
> 1d96130c985e2defd141cfdf602224c73b4b41f2..0b2a4920754d83aeb3795b435
> > > > > 693d61adcfe92b6 100644
> > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > @@ -770,7 +770,7 @@ vect_set_loop_condition_partial_vectors
> (class
> > > loop
> > > > > *loop,
> > > > > >    add_header_seq (loop, header_seq);
> > > > > >
> > > > > >    /* Get a boolean result that tells us whether to iterate.  */
> > > > > > -  edge exit_edge = single_exit (loop);
> > > > > > +  edge exit_edge = normal_exit (loop);
> > > > > >    tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ?
> > > EQ_EXPR :
> > > > > NE_EXPR;
> > > > > >    tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
> > > > > >    gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
> > > > > > @@ -789,7 +789,7 @@ vect_set_loop_condition_partial_vectors
> (class
> > > loop
> > > > > *loop,
> > > > > >    if (final_iv)
> > > > > >      {
> > > > > >        gassign *assign = gimple_build_assign (final_iv, orig_niters);
> > > > > > -      gsi_insert_on_edge_immediate (single_exit (loop), assign);
> > > > > > +      gsi_insert_on_edge_immediate (exit_edge, assign);
> > > > > >      }
> > > > > >
> > > > > >    return cond_stmt;
> > > > > > @@ -799,7 +799,8 @@ vect_set_loop_condition_partial_vectors
> (class
> > > loop
> > > > > *loop,
> > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > >
> > > > > >  static gcond *
> > > > > > -vect_set_loop_condition_normal (class loop *loop, tree niters,
> tree
> > > step,
> > > > > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > +				class loop *loop, tree niters, tree
> step,
> > > > > >  				tree final_iv, bool
> niters_maybe_zero,
> > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> > > > > >  {
> > > > > > @@ -807,7 +808,7 @@ vect_set_loop_condition_normal (class loop
> > > *loop,
> > > > > tree niters, tree step,
> > > > > >    gcond *cond_stmt;
> > > > > >    gcond *orig_cond;
> > > > > >    edge pe = loop_preheader_edge (loop);
> > > > > > -  edge exit_edge = single_exit (loop);
> > > > > > +  edge exit_edge = normal_exit (loop);
> > > > > >    gimple_stmt_iterator incr_gsi;
> > > > > >    bool insert_after;
> > > > > >    enum tree_code code;
> > > > > > @@ -872,7 +873,11 @@ vect_set_loop_condition_normal (class
> loop
> > > > > *loop, tree niters, tree step,
> > > > > >  	 In both cases the loop limit is NITERS - STEP.  */
> > > > > >        gimple_seq seq = NULL;
> > > > > >        limit = force_gimple_operand (niters, &seq, true, NULL_TREE);
> > > > > > -      limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit),
> limit,
> > > > > step);
> > > > > > +      /* For VLA leave limit == niters.  Though I wonder if maybe I
> should
> > > > > > +	 force partial loops here and use
> > > > > vect_set_loop_condition_partial_vectors
> > > > > > +	 instead.  The problem is that the VL check is useless here.  */
> > > > > > +      if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo) &&
> > > > > !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> > > > > > +	limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit),
> limit,
> > > > > step);
> > > > > >        if (seq)
> > > > > >  	{
> > > > > >  	  basic_block new_bb = gsi_insert_seq_on_edge_immediate
> (pe,
> > > > > seq);
> > > > > > @@ -907,7 +912,8 @@ vect_set_loop_condition_normal (class loop
> > > *loop,
> > > > > tree niters, tree step,
> > > > > >    gsi_insert_before (&loop_cond_gsi, cond_stmt,
> GSI_SAME_STMT);
> > > > > >
> > > > > >    /* Record the number of latch iterations.  */
> > > > > > -  if (limit == niters)
> > > > > > +  if (limit == niters
> > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > > > > >         latch count.  */
> > > > > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type,
> niters,
> > > > > > @@ -918,10 +924,17 @@ vect_set_loop_condition_normal (class
> loop
> > > > > *loop, tree niters, tree step,
> > > > > >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR,
> niters_type,
> > > > > >  				       limit, step);
> > > > > >
> > > > > > -  if (final_iv)
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  /* For multiple exits we've already maintained LCSSA form and
> > > handled
> > > > > > +     the scalar iteration update in the code that deals with the merge
> > > > > > +     block and its updated guard.  I could move that code here
> instead
> > > > > > +     of in vect_update_ivs_after_early_break but I have to still deal
> > > > > > +     with the updates to the counter `i`.  So for now I'll keep them
> > > > > > +     together.  */
> > > > > > +  if (final_iv && exits.length () == 1)
> > > > > >      {
> > > > > >        gassign *assign;
> > > > > > -      edge exit = single_exit (loop);
> > > > > > +      edge exit = normal_exit (loop);
> > > > > >        gcc_assert (single_pred_p (exit->dest));
> > > > > >        tree phi_dest
> > > > > >  	= integer_zerop (init) ? final_iv : copy_ssa_name
> (indx_after_incr);
> > > > > > @@ -972,13 +985,15 @@ vect_set_loop_condition (class loop
> *loop,
> > > > > loop_vec_info loop_vinfo,
> > > > > >    gcond *orig_cond = get_loop_exit_condition (loop);
> > > > > >    gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
> > > > > >
> > > > > > -  if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P
> > > > > (loop_vinfo))
> > > > > > +  if (loop_vinfo
> > > > > > +      && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> > > > > >      cond_stmt = vect_set_loop_condition_partial_vectors (loop,
> > > loop_vinfo,
> > > > > >  							 niters,
> final_iv,
> > > > > >
> niters_maybe_zero,
> > > > > >
> loop_cond_gsi);
> > > > > >    else
> > > > > > -    cond_stmt = vect_set_loop_condition_normal (loop, niters,
> step,
> > > > > final_iv,
> > > > > > +    cond_stmt = vect_set_loop_condition_normal (loop_vinfo,
> loop,
> > > niters,
> > > > > > +						step, final_iv,
> > > > > >  						niters_maybe_zero,
> > > > > >  						loop_cond_gsi);
> > > > > >
> > > > > > @@ -1066,7 +1081,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop,
> > > > > >    edge exit, new_exit;
> > > > > >    bool duplicate_outer_loop = false;
> > > > > >
> > > > > > -  exit = single_exit (loop);
> > > > > > +  exit = normal_exit (loop);
> > > > > >    at_exit = (e == exit);
> > > > > >    if (!at_exit && e != loop_preheader_edge (loop))
> > > > > >      return NULL;
> > > > > > @@ -1104,11 +1119,11 @@
> slpeel_tree_duplicate_loop_to_edge_cfg
> > > > > (class loop *loop,
> > > > > >    bbs[0] = preheader;
> > > > > >    new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> > > > > >
> > > > > > -  exit = single_exit (scalar_loop);
> > > > > > +  exit = normal_exit (scalar_loop);
> > > > > >    copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> > > > > >  	    &exit, 1, &new_exit, NULL,
> > > > > >  	    at_exit ? loop->latch : e->src, true);
> > > > > > -  exit = single_exit (loop);
> > > > > > +  exit = normal_exit (loop);
> > > > > >    basic_block new_preheader = new_bbs[0];
> > > > > >
> > > > > >    /* Before installing PHI arguments make sure that the edges
> > > > > > @@ -1176,11 +1191,53 @@
> slpeel_tree_duplicate_loop_to_edge_cfg
> > > > > (class loop *loop,
> > > > > >  	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > > > >  	    }
> > > > > >  	}
> > > > > > +
> > > > > > +      /* If have multiple exist, we now need to point the additional
> exits
> > > > > > +	 from the old loop to the loop pre-header of the new copied
> loop.
> > > > > > +	 Currently we only support simple early break vectorization
> so all
> > > > > > +	 additional exits must exit the loop. Additionally we can only
> place
> > > > > > +	 copies at the end.  i.e. we cannot do prologue peeling.  */
> > > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +      bool multiple_exits_p = exits.length () > 1;
> > > > > > +
> > > > > > +      /* Check to see if all of the exits point to the loop header.  If
> they
> > > > > > +	 don't then we have an intermediate BB that's no longer
> useful after
> > > > > > +	 the copy and we should remove it. */
> > > > > > +      bool imm_exit = true;
> > > > > > +      for (auto exit : exits)
> > > > > > +	{
> > > > > > +	   imm_exit = imm_exit && exit->dest == loop->header;
> > > > > > +	   if (!imm_exit)
> > > > > > +	     break;
> > > > > > +	}
> > > > > > +
> > > > > > +      for (unsigned i = 1; i < exits.length (); i++)
> > > > > > +	{
> > > > > > +	  redirect_edge_and_branch (exits[i], new_preheader);
> > > > > > +	  flush_pending_stmts (exits[i]);
> > > > > > +	}
> > > > > > +
> > > > > > +      /* Main exit must be the last to be rewritten as it's the first phi
> > > node
> > > > > > +	 entry.  The rest are in array order.  */
> > > > > >        redirect_edge_and_branch_force (e, new_preheader);
> > > > > >        flush_pending_stmts (e);
> > > > > > -      set_immediate_dominator (CDI_DOMINATORS,
> new_preheader,
> > > e-
> > > > > >src);
> > > > > > +
> > > > > > +      /* Only update the dominators of the new_preheader to the
> old
> > > exit if
> > > > > > +	 we have effectively a single exit.  */
> > > > > > +      if (!multiple_exits_p
> > > > > > +	  || exits[1]->src != EDGE_PRED (exits[0]->src, 0)->src)
> > > > > > +        set_immediate_dominator (CDI_DOMINATORS,
> new_preheader,
> > > e-
> > > > > >src);
> > > > > > +      else
> > > > > > +	set_immediate_dominator (CDI_DOMINATORS,
> new_preheader,
> > > > > exits[1]->src);
> > > > > > +
> > > > > > +      auto_vec<edge> new_exits = get_loop_exit_edges
> (new_loop);
> > > > > >        if (was_imm_dom || duplicate_outer_loop)
> > > > > > -	set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > > > > new_exit->src);
> > > > > > +	{
> > > > > > +	  if (!multiple_exits_p)
> > > > > > +	    set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > > > > new_exit->src);
> > > > > > +	  else
> > > > > > +	    set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > > > > new_exits[1]->src);
> > > > > > +	}
> > > > > >
> > > > > >        /* And remove the non-necessary forwarder again.  Keep the
> other
> > > > > >           one so we have a proper pre-header for the loop at the exit
> edge.
> > > */
> > > > > > @@ -1189,6 +1246,39 @@
> slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop,
> > > > > >        delete_basic_block (preheader);
> > > > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop-
> > > >header,
> > > > > >  			       loop_preheader_edge (scalar_loop)-
> >src);
> > > > > > +
> > > > > > +      /* Finally after wiring the new epilogue we need to update its
> main
> > > > > exit
> > > > > > +	 to the original function exit we recorded.  Other exits are
> already
> > > > > > +	 correct.  */
> > > > > > +      if (!imm_exit && multiple_exits_p)
> > > > > > +	{
> > > > > > +	  /* For now we expect at most a single successor here, but
> we might
> > > > > be
> > > > > > +	     able to extend this to multiple.  */
> > > > > > +	  if (single_succ_p (new_exit->dest) && single_pred_p
> (new_exit-
> > > > > >dest))
> > > > > > +	    {
> > > > > > +	      edge exit_edge = single_succ_edge (new_exit->dest);
> > > > > > +	      /* Now correct the dominators that were messed up
> during the
> > > > > copying
> > > > > > +		 as the CFG was tweaked a bit.  */
> > > > > > +	      /* The main exit is now dominated by a new fall through
> edge.  */
> > > > > > +	      set_immediate_dominator (CDI_DOMINATORS,
> exit_edge->src,
> > > > > > +				       new_exits[0]->src);
> > > > > > +	      /* If this is a fall through edge then don't update doms.  */
> > > > > > +	      if (!empty_block_p (exit_edge->src))
> > > > > > +		set_immediate_dominator (CDI_DOMINATORS,
> exit_edge-
> > > > > >dest,
> > > > > > +					 new_exits[1]->src);
> > > > > > +	    }
> > > > > > +
> > > > > > +	  /* The exits from the BB with the early exit dominate the
> new
> > > > > function
> > > > > > +	     exit edge and also the second part of the loop.  The edges
> were
> > > > > > +	     copied correctly but the doms are wrong because during
> the
> > > > > copying
> > > > > > +	     some of the intermediate edges are rewritten.   */
> > > > > > +	  set_immediate_dominator (CDI_DOMINATORS,
> new_exits[0]->src,
> > > > > > +				   new_exits[1]->src);
> > > > > > +	  set_immediate_dominator (CDI_DOMINATORS,
> new_exits[0]-
> > > > > >dest,
> > > > > > +				   new_exits[0]->src);
> > > > > > +	  set_immediate_dominator (CDI_DOMINATORS,
> new_exits[1]-
> > > > > >dest,
> > > > > > +				   new_exits[1]->src);
> > > > > > +	}
> > > > > >      }
> > > > > >    else /* Add the copy at entry.  */
> > > > > >      {
> > > > > > @@ -1310,20 +1400,24 @@ slpeel_add_loop_guard (basic_block
> > > guard_bb,
> > > > > tree cond,
> > > > > >   */
> > > > > >
> > > > > >  bool
> > > > > > -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge
> e)
> > > > > > +slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo,
> > > > > const_edge e)
> > > > > >  {
> > > > > > -  edge exit_e = single_exit (loop);
> > > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > +  edge exit_e = normal_exit (loop);
> > > > > >    edge entry_e = loop_preheader_edge (loop);
> > > > > >    gcond *orig_cond = get_loop_exit_condition (loop);
> > > > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > > > >
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    num_bb += 1;
> > > > > > +
> > > > > >    /* All loops have an outer scope; the only case loop->outer is NULL
> is
> > > for
> > > > > >       the function itself.  */
> > > > > >    if (!loop_outer (loop)
> > > > > >        || loop->num_nodes != num_bb
> > > > > >        || !empty_block_p (loop->latch)
> > > > > > -      || !single_exit (loop)
> > > > > > +      || (!single_exit (loop) && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo))
> > > > > >        /* Verify that new loop exit condition can be trivially modified.
> */
> > > > > >        || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
> > > > > >        || (e != exit_e && e != entry_e))
> > > > > > @@ -1528,6 +1622,12 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info loop_vinfo,
> > > > > >    gphi_iterator gsi, gsi1;
> > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > >    basic_block update_bb = update_e->dest;
> > > > > > +
> > > > > > +  /* For early exits we'll update the IVs in
> > > > > > +     vect_update_ivs_after_early_break.  */
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    return;
> > > > > > +
> > > > > >    basic_block exit_bb = single_exit (loop)->dest;
> > > > > >
> > > > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > > > @@ -1613,6 +1713,186 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info loop_vinfo,
> > > > > >        /* Fix phi expressions in the successor bb.  */
> > > > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > > >      }
> > > > > > +  return;
> > > > > > +}
> > > > > > +
> > > > > > +/*   Function vect_update_ivs_after_early_break.
> > > > > > +
> > > > > > +     "Advance" the induction variables of LOOP to the value they
> should
> > > > > take
> > > > > > +     after the execution of LOOP.  This is currently necessary
> because
> > > the
> > > > > > +     vectorizer does not handle induction variables that are used
> after
> > > the
> > > > > > +     loop.  Such a situation occurs when the last iterations of LOOP
> are
> > > > > > +     peeled, because of the early exit.  With an early exit we always
> peel
> > > the
> > > > > > +     loop.
> > > > > > +
> > > > > > +     Input:
> > > > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to
> be
> > > > > > +		    vectorized. The last few iterations of LOOP were
> peeled.
> > > > > > +     - LOOP - a loop that is going to be vectorized. The last few
> iterations
> > > > > > +	      of LOOP were peeled.
> > > > > > +     - VF - The loop vectorization factor.
> > > > > > +     - NITERS_ORIG - the number of iterations that LOOP executes
> > > (before
> > > > > it is
> > > > > > +		     vectorized). i.e, the number of times the ivs should
> be
> > > > > > +		     bumped.
> > > > > > +     - NITERS_VECTOR - The number of iterations that the vector
> LOOP
> > > > > executes.
> > > > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the
> (only)
> > > path
> > > > > > +		  coming out from LOOP on which there are uses of
> the LOOP
> > > > > ivs
> > > > > > +		  (this is the path from LOOP->exit to epilog_loop-
> > > > > >preheader).
> > > > > > +
> > > > > > +		  The new definitions of the ivs are placed in LOOP-
> >exit.
> > > > > > +		  The phi args associated with the edge UPDATE_E in
> the bb
> > > > > > +		  UPDATE_E->dest are updated accordingly.
> > > > > > +
> > > > > > +     Output:
> > > > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > > > +
> > > > > > +     Assumption 1: Like the rest of the vectorizer, this function
> assumes
> > > > > > +     a single loop exit that has a single predecessor.
> > > > > > +
> > > > > > +     Assumption 2: The phi nodes in the LOOP header and in
> update_bb
> > > are
> > > > > > +     organized in the same order.
> > > > > > +
> > > > > > +     Assumption 3: The access function of the ivs is simple enough
> (see
> > > > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in
> the
> > > > > future.
> > > > > > +
> > > > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is
> on a
> > > > > path
> > > > > > +     coming out of LOOP on which the ivs of LOOP are used (this is
> the
> > > path
> > > > > > +     that leads to the epilog loop; other paths skip the epilog loop).
> This
> > > > > > +     path starts with the edge UPDATE_E, and its destination
> (denoted
> > > > > update_bb)
> > > > > > +     needs to have its phis updated.
> > > > > > + */
> > > > > > +
> > > > > > +static tree
> > > > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo,
> class
> > > loop
> > > > > *,
> > > > > > +				   poly_int64 vf, tree niters_orig,
> > > > > > +				   tree niters_vector, edge update_e)
> > > > > > +{
> > > > > > +  gphi_iterator gsi, gsi1;
> > > > > > +  tree ni_name, ivtmp = NULL;
> > > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > +  basic_block update_bb = update_e->dest;
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +
> > > > > > +  basic_block exit_bb = exits[0]->dest;
> > > > > > +
> > > > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    return NULL;
> > > > > > +
> > > > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > > > > (update_bb);
> > > > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > > > +    {
> > > > > > +      tree init_expr;
> > > > > > +      tree step_expr;
> > > > > > +      tree type;
> > > > > > +      tree var, ni;
> > > > > > +      gimple_stmt_iterator last_gsi;
> > > > > > +
> > > > > > +      gphi *phi = gsi1.phi ();
> > > > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > loop_preheader_edge (loop));
> > > > > > +      gphi *phi1 = as_a <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > > > +      if (dump_enabled_p ())
> > > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +			 "vect_update_ivs_after_early_break: phi:
> %G",
> > > > > > +			 (gimple *)phi);
> > > > > > +
> > > > > > +      /* Skip reduction and virtual phis.  */
> > > > > > +      if (!iv_phi_p (phi_info))
> > > > > > +	{
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +			     "reduc or virtual phi. skip.\n");
> > > > > > +	  continue;
> > > > > > +	}
> > > > > > +
> > > > > > +      /* For multiple exits where we handle early exits we need to
> carry
> > > on
> > > > > > +	 with the previous IV as loop iteration was not done because
> we
> > > > > exited
> > > > > > +	 early.  As such just grab the original IV.  */
> > > > > > +      if (STMT_VINFO_TYPE (phi_info) != undef_vec_info_type)
> > > > > > +	{
> > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > > > > (phi_info);
> > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > +
> > > > > > +	  /* We previously generated the new merged phi in the
> same BB as
> > > > > the
> > > > > > +	     guard.  So use that to perform the scaling on rather than
> the
> > > > > > +	     normal loop phi which don't take the early breaks into
> account.  */
> > > > > > +	  init_expr = gimple_phi_result (phi1);
> //PHI_ARG_DEF_FROM_EDGE
> > > > > (phi1, loop_preheader_edge (loop));
> > > > > > +
> > > > > > +	  ni = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> > > > > > +			    fold_convert (TREE_TYPE (step_expr),
> init_expr),
> > > > > > +			    build_int_cst (TREE_TYPE (step_expr), vf));
> > > > > > +
> > > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > > +
> > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> var);
> > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +	  else
> > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +
> > > > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > > > +	}
> > > > > > +      else if (STMT_VINFO_TYPE (phi_info) == undef_vec_info_type)
> > > > > > +	{
> > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > > > > (phi_info);
> > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > +
> > > > > > +	  /* We previously generated the new merged phi in the
> same BB as
> > > > > the
> > > > > > +	     guard.  So use that to perform the scaling on rather than
> the
> > > > > > +	     normal loop phi which don't take the early breaks into
> account.  */
> > > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1,
> > > > > loop_preheader_edge (loop));
> > > > > > +
> > > > > > +	  if (vf.is_constant ())
> > > > > > +	    {
> > > > > > +	      ni = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> > > > > > +				fold_convert (TREE_TYPE
> (step_expr),
> > > > > > +					      niters_vector),
> > > > > > +				build_int_cst (TREE_TYPE
> (step_expr), vf));
> > > > > > +
> > > > > > +	      ni = fold_build2 (MINUS_EXPR, TREE_TYPE (step_expr),
> > > > > > +				fold_convert (TREE_TYPE
> (step_expr),
> > > > > > +					      niters_orig),
> > > > > > +				fold_convert (TREE_TYPE
> (step_expr), ni));
> > > > > > +	    }
> > > > > > +	  else
> > > > > > +	    /* If the loop's VF isn't constant then the loop must have
> been
> > > > > > +	       masked, so at the end of the loop we know we have
> finished
> > > > > > +	       the entire loop and found nothing.  */
> > > > > > +	    ni = build_zero_cst (TREE_TYPE (step_expr));
> > > > > > +
> > > > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > > > +
> > > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > > +
> > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> var);
> > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +	  else
> > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +
> > > > > > +	  adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > > > +
> > > > > > +	  for (unsigned i = 1; i < exits.length (); i++)
> > > > > > +	    adjust_phi_and_debug_stmts (phi1, exits[i],
> > > > > > +					build_int_cst (TREE_TYPE
> > > > > (step_expr),
> > > > > > +						       vf));
> > > > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > > > +	}
> > > > > > +      else
> > > > > > +	continue;
> > > > > > +    }
> > > > > > +
> > > > > > +  return ivtmp;
> > > > > >  }
> > > > > >
> > > > > >  /* Return a gimple value containing the misalignment (measured in
> > > vector
> > > > > > @@ -2096,7 +2376,7 @@ vect_gen_vector_loop_niters_mult_vf
> > > > > (loop_vec_info loop_vinfo,
> > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > >    tree type = TREE_TYPE (niters_vector);
> > > > > >    tree log_vf = build_int_cst (type, exact_log2 (vf));
> > > > > > -  basic_block exit_bb = single_exit (loop)->dest;
> > > > > > +  basic_block exit_bb = normal_exit (loop)->dest;
> > > > > >
> > > > > >    gcc_assert (niters_vector_mult_vf_ptr != NULL);
> > > > > >    tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
> > > > > > @@ -2123,19 +2403,46 @@ find_guard_arg (class loop *loop, class
> loop
> > > > > *epilog ATTRIBUTE_UNUSED,
> > > > > >  		gphi *lcssa_phi)
> > > > > >  {
> > > > > >    gphi_iterator gsi;
> > > > > > -  edge e = single_exit (loop);
> > > > > > +  edge e = normal_exit (loop);
> > > > > >
> > > > > > -  gcc_assert (single_pred_p (e->dest));
> > > > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next
> (&gsi))
> > > > > >      {
> > > > > >        gphi *phi = gsi.phi ();
> > > > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > > +      /* Nested loops with multiple exits can have different no# phi
> > > node
> > > > > > +	 arguments between the main loop and epilog as epilog falls
> to the
> > > > > > +	 second loop.  */
> > > > > > +      if (gimple_phi_num_args (phi) > e->dest_idx
> > > > > > +	  && operand_equal_p (PHI_ARG_DEF (phi, e->dest_idx),
> > > > > > +			      PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > >  	return PHI_RESULT (phi);
> > > > > >      }
> > > > > >    return NULL_TREE;
> > > > > >  }
> > > > > >
> > > > > > +/* Starting from the current edge walk all instructions and find the
> last
> > > > > > +   VUSE/VDEF in the basic block.  */
> > > > > > +
> > > > > > +static tree
> > > > > > +gimple_find_last_mem_use (edge e)
> > > > > > +{
> > > > > > +  basic_block bb = e->src;
> > > > > > +  tree res = NULL;
> > > > > > +  gimple_stmt_iterator iter = gsi_last_bb (bb);
> > > > > > +  do
> > > > > > +  {
> > > > > > +    gimple *stmt = gsi_stmt (iter);
> > > > > > +    if ((res = gimple_vdef (stmt)))
> > > > > > +      return res;
> > > > > > +
> > > > > > +    if ((res = gimple_vuse (stmt)))
> > > > > > +      return res;
> > > > > > +
> > > > > > +    gsi_prev (&iter);
> > > > > > +  } while (!gsi_end_p (iter));
> > > > > > +
> > > > > > +  return NULL;
> > > > > > +}
> > > > > > +
> > > > > >  /* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > > > FIRST/SECOND
> > > > > >     from SECOND/FIRST and puts it at the original loop's
> preheader/exit
> > > > > >     edge, the two loops are arranged as below:
> > > > > > @@ -2185,6 +2492,7 @@ find_guard_arg (class loop *loop, class
> loop
> > > > > *epilog ATTRIBUTE_UNUSED,
> > > > > >  static void
> > > > > >  slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > > >  				   class loop *first, class loop *second,
> > > > > > +				   tree *lcssa_ivtmp,
> > > > > >  				   bool create_lcssa_for_iv_phis)
> > > > > >  {
> > > > > >    gphi_iterator gsi_update, gsi_orig;
> > > > > > @@ -2192,10 +2500,18 @@ slpeel_update_phi_nodes_for_loops
> > > > > (loop_vec_info loop_vinfo,
> > > > > >
> > > > > >    edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > > >    edge second_preheader_e = loop_preheader_edge (second);
> > > > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (first);
> > > > > > +  basic_block between_bb = exits[0]->dest;
> > > > > > +
> > > > > > +  bool early_exit = LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > > > > > +  /* For early exits when we create the merge BB we must
> maintain it
> > > in
> > > > > > +     LCSSA form, otherwise the final vectorizer passes will create the
> > > > > > +     wrong PHI nodes here.  */
> > > > > > +  create_lcssa_for_iv_phis = create_lcssa_for_iv_phis ||
> early_exit;
> > > > > >
> > > > > >    gcc_assert (between_bb == second_preheader_e->src);
> > > > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > > > > (between_bb));
> > > > > > +  gcc_assert ((single_pred_p (between_bb) && single_succ_p
> > > > > (between_bb))
> > > > > > +	      || early_exit);
> > > > > >    /* Either the first loop or the second is the loop to be vectorized.
> */
> > > > > >    gcc_assert (loop == first || loop == second);
> > > > > >
> > > > > > @@ -2215,10 +2531,40 @@ slpeel_update_phi_nodes_for_loops
> > > > > (loop_vec_info loop_vinfo,
> > > > > >  	{
> > > > > >  	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > > >  	  gphi *lcssa_phi = create_phi_node (new_res,
> between_bb);
> > > > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > > > UNKNOWN_LOCATION);
> > > > > > +
> > > > > > +	  /* The first exit is always the loop latch, so handle that
> > > > > > +	     seperately.  */
> > > > > > +	  gcc_assert (arg);
> > > > > > +	  add_phi_arg (lcssa_phi, arg, exits[0],
> UNKNOWN_LOCATION);
> > > > > > +
> > > > > > +	  /* The early exits are processed in order starting from exit 1.
> */
> > > > > > +	  for (unsigned i = 1; i < exits.length (); i++)
> > > > > > +	    {
> > > > > > +	      tree phi_arg;
> > > > > > +	      if (iv_phi_p (vect_phi_info))
> > > > > > +		/* For induction values just copy the previous one as
> the
> > > > > > +		   current iteration did not finish.  We'll update as
> needed
> > > > > > +		   later on.  */
> > > > > > +		phi_arg = gimple_phi_result (orig_phi);
> > > > > > +	      else
> > > > > > +		phi_arg = gimple_find_last_mem_use (exits[i]);
> > > > > > +	      /* If we didn't find any just copy the existing one and
> leave
> > > > > > +		 it to the others to fix it up.  */
> > > > > > +	      if (!phi_arg)
> > > > > > +		phi_arg = gimple_phi_result (orig_phi);
> > > > > > +	      add_phi_arg (lcssa_phi, phi_arg, exits[i],
> UNKNOWN_LOCATION);
> > > > > > +	    }
> > > > > >  	  arg = new_res;
> > > > > >  	}
> > > > > >
> > > > > > +      /* Normally able to distinguish between the iterator counter
> and
> > > the
> > > > > > +	 ivtemps bu looking at the STMT_VINFO_TYPE of the phi
> node.
> > > > > > +	 however for some reason this isn't consistently set.  Is there
> a
> > > > > > +	 better way??.  */
> > > > > > +      if (lcssa_ivtmp
> > > > > > +	  && iv_phi_p (vect_phi_info))
> > > > > > +	*lcssa_ivtmp = arg;
> > > > > > +
> > > > > >        /* Update PHI node in the second loop by replacing arg on the
> > > loop's
> > > > > >  	 incoming edge.  */
> > > > > >        adjust_phi_and_debug_stmts (update_phi,
> second_preheader_e,
> > > > > arg);
> > > > > > @@ -2228,7 +2574,8 @@ slpeel_update_phi_nodes_for_loops
> > > > > (loop_vec_info loop_vinfo,
> > > > > >       for correct vectorization of live stmts.  */
> > > > > >    if (loop == first)
> > > > > >      {
> > > > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > > > +      auto_vec<edge> new_exits = get_loop_exit_edges (second);
> > > > > > +      basic_block orig_exit = new_exits[0]->dest;
> > > > > >        for (gsi_orig = gsi_start_phis (orig_exit);
> > > > > >  	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > > >  	{
> > > > > > @@ -2243,7 +2590,15 @@ slpeel_update_phi_nodes_for_loops
> > > > > (loop_vec_info loop_vinfo,
> > > > > >
> > > > > >  	  tree new_res = copy_ssa_name (orig_arg);
> > > > > >  	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > > > UNKNOWN_LOCATION);
> > > > > > +	  /* The first exit is always the loop latch, so handle that
> > > > > > +	     seperately.  */
> > > > > > +	  add_phi_arg (lcphi, orig_arg, new_exits[0],
> > > > > UNKNOWN_LOCATION);
> > > > > > +	  /* The early exits are processed in order starting from exit 1.
> */
> > > > > > +	  for (unsigned i = 1; i < new_exits.length (); i++)
> > > > > > +	    {
> > > > > > +	      tree phi_arg = gimple_phi_result (orig_phi);
> > > > > > +	      add_phi_arg (lcphi, phi_arg, exits[i],
> UNKNOWN_LOCATION);
> > > > > > +	    }
> > > > > >  	}
> > > > > >      }
> > > > > >  }
> > > > > > @@ -2393,13 +2748,11 @@ slpeel_update_phi_nodes_for_guard2
> > > (class
> > > > > loop *loop, class loop *epilog,
> > > > > >    gcc_assert (single_succ_p (merge_bb));
> > > > > >    edge e = single_succ_edge (merge_bb);
> > > > > >    basic_block exit_bb = e->dest;
> > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > > > >
> > > > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next
> (&gsi))
> > > > > >      {
> > > > > >        gphi *update_phi = gsi.phi ();
> > > > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > > > >
> > > > > >        tree merge_arg = NULL_TREE;
> > > > > >
> > > > > > @@ -2438,12 +2791,14 @@ static void
> > > > > >  slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > > >  {
> > > > > >    gphi_iterator gsi;
> > > > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (epilog);
> > > > > >
> > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next
> (&gsi))
> > > > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> e));
> > > > > > +  for (unsigned i = 0; i < exits.length (); i++)
> > > > > > +    {
> > > > > > +      basic_block exit_bb = exits[i]->dest;
> > > > > > +      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next
> (&gsi))
> > > > > > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi
> (),
> > > > > exits[i]));
> > > > > > +    }
> > > > > >  }
> > > > > >
> > > > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would
> > > need to
> > > > > > @@ -2621,6 +2976,14 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >      bound_epilog += vf - 1;
> > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > >      bound_epilog += 1;
> > > > > > +  /* For early breaks the scalar loop needs to execute at most VF
> times
> > > > > > +     to find the element that caused the break.  */
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    {
> > > > > > +      bound_epilog = vf;
> > > > > > +      /* Force a scalar epilogue as we can't vectorize the index
> finding.
> > > */
> > > > > > +      vect_epilogues = false;
> > > > > > +    }
> > > > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > > > >    poly_uint64 bound_scalar = bound_epilog;
> > > > > >
> > > > > > @@ -2780,16 +3143,24 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >  				  bound_prolog + bound_epilog)
> > > > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > > > >  			 || vect_epilogues));
> > > > > > +
> > > > > > +  /* We only support early break vectorization on known bounds at
> > > this
> > > > > time.
> > > > > > +     This means that if the vector loop can't be entered then we
> won't
> > > > > generate
> > > > > > +     it at all.  So for now force skip_vector off because the additional
> > > control
> > > > > > +     flow messes with the BB exits and we've already analyzed them.
> */
> > > > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > > > > (loop_vinfo);
> > > > > > +
> > > > > >    /* Epilog loop must be executed if the number of iterations for
> epilog
> > > > > >       loop is known at compile time, otherwise we need to add a check
> at
> > > > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > > > >    bool skip_epilog = (prolog_peeling < 0
> > > > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P
> (loop_vinfo)
> > > > > >  		      || !vf.is_constant ());
> > > > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be
> > > executed.
> > > > > */
> > > > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special
> > > because
> > > > > epilog
> > > > > > +     loop must be executed.  */
> > > > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      skip_epilog = false;
> > > > > > -
> > > > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP
> (loop_vinfo);
> > > > > >    auto_vec<profile_count> original_counts;
> > > > > >    basic_block *original_bbs = NULL;
> > > > > > @@ -2828,7 +3199,7 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >    if (prolog_peeling)
> > > > > >      {
> > > > > >        e = loop_preheader_edge (loop);
> > > > > > -      if (!slpeel_can_duplicate_loop_p (loop, e))
> > > > > > +      if (!slpeel_can_duplicate_loop_p (loop_vinfo, e))
> > > > > >  	{
> > > > > >  	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > > > >  			   "loop can't be duplicated to preheader
> edge.\n");
> > > > > > @@ -2843,7 +3214,7 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >  	  gcc_unreachable ();
> > > > > >  	}
> > > > > >        prolog->force_vectorize = false;
> > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,
> > > true);
> > > > > > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,
> > > NULL,
> > > > > true);
> > > > > >        first_loop = prolog;
> > > > > >        reset_original_copy_tables ();
> > > > > >
> > > > > > @@ -2902,11 +3273,13 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >
> > > > > >    if (epilog_peeling)
> > > > > >      {
> > > > > > -      e = single_exit (loop);
> > > > > > -      if (!slpeel_can_duplicate_loop_p (loop, e))
> > > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +      e = exits[0];
> > > > > > +      if (!slpeel_can_duplicate_loop_p (loop_vinfo, e))
> > > > > >  	{
> > > > > > -	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > > > > -			   "loop can't be duplicated to exit edge.\n");
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > > > > +			     "loop can't be duplicated to exit edge.\n");
> > > > > >  	  gcc_unreachable ();
> > > > > >  	}
> > > > > >        /* Peel epilog and put it on exit edge of loop.  If we are
> vectorizing
> > > > > > @@ -2920,12 +3293,16 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >        epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog,
> e);
> > > > > >        if (!epilog)
> > > > > >  	{
> > > > > > -	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > > > > -			   "slpeel_tree_duplicate_loop_to_edge_cfg
> > > > > failed.\n");
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
> > > > > > +			     "slpeel_tree_duplicate_loop_to_edge_cfg
> > > > > failed.\n");
> > > > > >  	  gcc_unreachable ();
> > > > > >  	}
> > > > > >        epilog->force_vectorize = false;
> > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,
> > > false);
> > > > > > +
> > > > > > +      tree early_break_iv_name;
> > > > > > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,
> > > > > > +					 &early_break_iv_name,
> false);
> > > > > >
> > > > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > > > >  	 and skip to epilog.  Note this only happens when the number
> of
> > > > > > @@ -2978,6 +3355,7 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >        vect_gen_vector_loop_niters (loop_vinfo, niters,
> > > > > >  				   niters_vector, step_vector,
> > > > > >  				   niters_no_overflow);
> > > > > > +
> > > > > >        if (!integer_onep (*step_vector))
> > > > > >  	{
> > > > > >  	  /* On exit from the loop we will have an easy way of
> calcalating
> > > > > > @@ -2987,9 +3365,13 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >  	  SSA_NAME_DEF_STMT (niters_vector_mult_vf) =
> > > > > gimple_build_nop ();
> > > > > >  	  *niters_vector_mult_vf_var = niters_vector_mult_vf;
> > > > > >  	}
> > > > > > +      else if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	vect_gen_vector_loop_niters_mult_vf (loop_vinfo,
> > > > > early_break_iv_name,
> > > > > > +					     &niters_vector_mult_vf);
> > > > > >        else
> > > > > >  	vect_gen_vector_loop_niters_mult_vf (loop_vinfo,
> *niters_vector,
> > > > > >  					     &niters_vector_mult_vf);
> > > > > > +
> > > > > >        /* Update IVs of original loop as if they were advanced by
> > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > > @@ -2997,12 +3379,97 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >        vect_update_ivs_after_vectorizer (loop_vinfo,
> > > niters_vector_mult_vf,
> > > > > >  					update_e);
> > > > > >
> > > > > > +      /* For early breaks we must create a guard to check how many
> > > > > iterations
> > > > > > +	 of the scalar loop are yet to be performed.  */
> > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	{
> > > > > > +	  gcc_assert (early_break_iv_name);
> > > > > > +	  tree ivtmp =
> > > > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf,
> niters,
> > > > > > +					       *niters_vector,
> update_e);
> > > > > > +
> > > > > > +	  tree guard_cond = fold_build2 (EQ_EXPR,
> boolean_type_node,
> > > > > > +					 fold_convert (TREE_TYPE
> (niters),
> > > > > > +						       ivtmp),
> > > > > > +					 build_zero_cst (TREE_TYPE
> (niters)));
> > > > > > +	  basic_block guard_bb = normal_exit (loop)->dest;
> > > > > > +	  auto_vec<edge> new_exits = get_loop_exit_edges
> (epilog);
> > > > > > +	  /* If we had a fallthrough edge, the guard will the threaded
> through
> > > > > > +	     and so we may need to find the actual final edge.  */
> > > > > > +	  edge final_edge = new_exits[0];
> > > > > > +	  basic_block guard_to;
> > > > > > +	  bool fn_exit_p = false;
> > > > > > +	  if (gsi_end_p (gsi_start_nondebug_bb (final_edge->dest))
> > > > > > +	      && !gsi_end_p (gsi_start_phis (final_edge->dest))
> > > > > > +	      && single_succ_p (final_edge->dest))
> > > > > > +	    {
> > > > > > +	      auto gsi = gsi_start_phis (final_edge->dest);
> > > > > > +	      while (!gsi_end_p (gsi))
> > > > > > +		gsi_remove (&gsi, true);
> > > > > > +	      guard_to = final_edge->dest;
> > > > > > +	      fn_exit_p = true;
> > > > > > +	    }
> > > > > > +	  else
> > > > > > +	    guard_to = split_edge (normal_exit (epilog));
> > > > > > +
> > > > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb,
> guard_cond,
> > > > > guard_to,
> > > > > > +					   guard_bb,
> > > > > > +					   prob_epilog.invert (),
> > > > > > +					   irred_flag);
> > > > > > +
> > > > > > +	  basic_block dest = single_succ (guard_to);
> > > > > > +	  /* If we have a single pred then the previous block is the
> immediate
> > > > > > +	     dominator.   This may or may not be the guard bb.
> However if we
> > > > > > +	     have multiple pred then the guard BB must be the
> dominator as all
> > > > > > +	     previous exits got rewrited to the guard BB.  */
> > > > > > +	  if (single_pred_p (dest))
> > > > > > +	    set_immediate_dominator (CDI_DOMINATORS, dest,
> guard_to);
> > > > > > +	  else
> > > > > > +	    set_immediate_dominator (CDI_DOMINATORS, dest,
> guard_bb);
> > > > > > +
> > > > > > +	  /* We must update all the edges from the new guard_bb.
> */
> > > > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog,
> guard_e,
> > > > > > +					      final_edge);
> > > > > > +
> > > > > > +	  /* If we have an additional functione exit block, then thread
> the
> > > > > updates
> > > > > > +	     through to the block.  Leaving it up to the LCSSA cleanup
> pass will
> > > > > > +	     get the wrong values here as it can't handle the merge
> block we
> > > > > just
> > > > > > +	     made correctly.  */
> > > > > > +	  if (fn_exit_p)
> > > > > > +	    {
> > > > > > +		gphi_iterator gsi_update, gsi_orig, gsi_vect;
> > > > > > +		for (gsi_orig = gsi_start_phis (epilog->header),
> > > > > > +		     gsi_update = gsi_start_phis (guard_e->dest),
> > > > > > +		     gsi_vect = gsi_start_phis (loop->header);
> > > > > > +		     !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update)
> > > > > > +		     && !gsi_end_p (gsi_vect);
> > > > > > +		     gsi_next (&gsi_orig), gsi_next (&gsi_update),
> > > > > > +		     gsi_next (&gsi_vect))
> > > > > > +		  {
> > > > > > +		    gphi *orig_phi = gsi_orig.phi ();
> > > > > > +		    gphi *update_phi = gsi_update.phi ();
> > > > > > +		    gphi *vect_phi = gsi_vect.phi ();
> > > > > > +		    stmt_vec_info phi_info = loop_vinfo->lookup_stmt
> > > > > (vect_phi);
> > > > > > +
> > > > > > +		    if (iv_phi_p (phi_info))
> > > > > > +		      continue;
> > > > > > +
> > > > > > +		    tree phi_arg = PHI_ARG_DEF_FROM_EDGE
> (orig_phi,
> > > > > update_e);
> > > > > > +		    SET_PHI_ARG_DEF (update_phi, update_e-
> >dest_idx,
> > > > > phi_arg);
> > > > > > +
> > > > > > +		    phi_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi,
> guard_e);
> > > > > > +		    SET_PHI_ARG_DEF (update_phi, guard_e-
> >dest_idx,
> > > > > phi_arg);
> > > > > > +		  }
> > > > > > +	    }
> > > > > > +	  flush_pending_stmts (guard_e);
> > > > > > +	}
> > > > > > +
> > > > > >        if (skip_epilog)
> > > > > >  	{
> > > > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > >  				    niters, niters_vector_mult_vf);
> > > > > > -	  guard_bb = single_exit (loop)->dest;
> > > > > > -	  guard_to = split_edge (single_exit (epilog));
> > > > > > +	  guard_bb = normal_exit (loop)->dest;
> > > > > > +	  guard_to = split_edge (normal_exit (epilog));
> > > > > >  	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > > > guard_to,
> > > > > >  					   skip_vector ? anchor :
> guard_bb,
> > > > > >  					   prob_epilog.invert (),
> > > > > > @@ -3010,7 +3477,7 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >  	  if (vect_epilogues)
> > > > > >  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > > > >  	  slpeel_update_phi_nodes_for_guard2 (loop, epilog,
> guard_e,
> > > > > > -					      single_exit (epilog));
> > > > > > +					      normal_exit (epilog));
> > > > > >  	  /* Only need to handle basic block before epilog loop if it's
> not
> > > > > >  	     the guard_bb, which is the case when skip_vector is true.
> */
> > > > > >  	  if (guard_bb != bb_before_epilog)
> > > > > > @@ -3023,7 +3490,6 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >  	}
> > > > > >        else
> > > > > >  	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > > > > -
> > > > > >        unsigned HOST_WIDE_INT bound;
> > > > > >        if (bound_scalar.is_constant (&bound))
> > > > > >  	{
> > > > > > @@ -3114,7 +3580,6 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >
> > > > > >    adjust_vec.release ();
> > > > > >    free_original_copy_tables ();
> > > > > > -
> > > > > >    return vect_epilogues ? epilog : NULL;
> > > > > >  }
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > index
> > > > >
> > >
> d5c2bff80be9be152707eb9d3932c863948daa73..548946a6bbf8892086a17fe30
> > > > > 03da2c3dceadf5b 100644
> > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > @@ -844,80 +844,106 @@ vect_fixup_scalar_cycles_with_patterns
> > > > > (loop_vec_info loop_vinfo)
> > > > > >     in NUMBER_OF_ITERATIONSM1.  Place the condition under which
> the
> > > > > >     niter information holds in ASSUMPTIONS.
> > > > > >
> > > > > > -   Return the loop exit condition.  */
> > > > > > +   Return the loop exit conditions.  */
> > > > > >
> > > > > >
> > > > > > -static gcond *
> > > > > > +static vec<gcond *>
> > > > > >  vect_get_loop_niters (class loop *loop, tree *assumptions,
> > > > > >  		      tree *number_of_iterations, tree
> > > > > *number_of_iterationsm1)
> > > > > >  {
> > > > > > -  edge exit = single_exit (loop);
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  vec<gcond *> conds;
> > > > > > +  conds.create (exits.length ());
> > > > > >    class tree_niter_desc niter_desc;
> > > > > >    tree niter_assumptions, niter, may_be_zero;
> > > > > > -  gcond *cond = get_loop_exit_condition (loop);
> > > > > >
> > > > > >    *assumptions = boolean_true_node;
> > > > > >    *number_of_iterationsm1 = chrec_dont_know;
> > > > > >    *number_of_iterations = chrec_dont_know;
> > > > > > +
> > > > > >    DUMP_VECT_SCOPE ("get_loop_niters");
> > > > > >
> > > > > > -  if (!exit)
> > > > > > -    return cond;
> > > > > > +  if (exits.is_empty ())
> > > > > > +    return conds;
> > > > > >
> > > > > > -  may_be_zero = NULL_TREE;
> > > > > > -  if (!number_of_iterations_exit_assumptions (loop, exit,
> &niter_desc,
> > > > > NULL)
> > > > > > -      || chrec_contains_undetermined (niter_desc.niter))
> > > > > > -    return cond;
> > > > > > +  if (dump_enabled_p ())
> > > > > > +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d
> > > exits.\n",
> > > > > > +		     exits.length ());
> > > > > >
> > > > > > -  niter_assumptions = niter_desc.assumptions;
> > > > > > -  may_be_zero = niter_desc.may_be_zero;
> > > > > > -  niter = niter_desc.niter;
> > > > > > +  edge exit;
> > > > > > +  unsigned int i;
> > > > > > +  FOR_EACH_VEC_ELT (exits, i, exit)
> > > > > > +    {
> > > > > > +      gcond *cond = get_edge_condition (exit);
> > > > > > +      if (cond)
> > > > > > +	conds.safe_push (cond);
> > > > > >
> > > > > > -  if (may_be_zero && integer_zerop (may_be_zero))
> > > > > > -    may_be_zero = NULL_TREE;
> > > > > > +      if (dump_enabled_p ())
> > > > > > +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit
> > > > > %d...\n", i);
> > > > > >
> > > > > > -  if (may_be_zero)
> > > > > > -    {
> > > > > > -      if (COMPARISON_CLASS_P (may_be_zero))
> > > > > > +      may_be_zero = NULL_TREE;
> > > > > > +      if (!number_of_iterations_exit_assumptions (loop, exit,
> > > &niter_desc,
> > > > > NULL)
> > > > > > +          || chrec_contains_undetermined (niter_desc.niter))
> > > > > > +	continue;
> > > > > > +
> > > > > > +      niter_assumptions = niter_desc.assumptions;
> > > > > > +      may_be_zero = niter_desc.may_be_zero;
> > > > > > +      niter = niter_desc.niter;
> > > > > > +
> > > > > > +      if (may_be_zero && integer_zerop (may_be_zero))
> > > > > > +	may_be_zero = NULL_TREE;
> > > > > > +
> > > > > > +      if (may_be_zero)
> > > > > >  	{
> > > > > > -	  /* Try to combine may_be_zero with assumptions, this can
> simplify
> > > > > > -	     computation of niter expression.  */
> > > > > > -	  if (niter_assumptions && !integer_nonzerop
> (niter_assumptions))
> > > > > > -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > > > > boolean_type_node,
> > > > > > -					     niter_assumptions,
> > > > > > -					     fold_build1
> (TRUTH_NOT_EXPR,
> > > > > > -
> > > > > boolean_type_node,
> > > > > > -
> may_be_zero));
> > > > > > +	  if (COMPARISON_CLASS_P (may_be_zero))
> > > > > > +	    {
> > > > > > +	      /* Try to combine may_be_zero with assumptions, this
> can
> > > > > simplify
> > > > > > +		 computation of niter expression.  */
> > > > > > +	      if (niter_assumptions && !integer_nonzerop
> (niter_assumptions))
> > > > > > +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > > > > boolean_type_node,
> > > > > > +						 niter_assumptions,
> > > > > > +						 fold_build1
> > > > > (TRUTH_NOT_EXPR,
> > > > > > +
> > > > > boolean_type_node,
> > > > > > +
> may_be_zero));
> > > > > > +	      else
> > > > > > +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
> > > > > may_be_zero,
> > > > > > +				     build_int_cst (TREE_TYPE (niter),
> 0),
> > > > > > +
> rewrite_to_non_trapping_overflow
> > > > > (niter));
> > > > > > +
> > > > > > +	      may_be_zero = NULL_TREE;
> > > > > > +	    }
> > > > > > +	  else if (integer_nonzerop (may_be_zero) && i == 0)
> > > > > > +	    {
> > > > > > +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE
> (niter), 0);
> > > > > > +	      *number_of_iterations = build_int_cst (TREE_TYPE
> (niter), 1);
> > > > > > +	      continue;
> > > > > > +	    }
> > > > > >  	  else
> > > > > > -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
> may_be_zero,
> > > > > > -				 build_int_cst (TREE_TYPE (niter), 0),
> > > > > > -				 rewrite_to_non_trapping_overflow
> (niter));
> > > > > > +	    continue;
> > > > > > +       }
> > > > > >
> > > > > > -	  may_be_zero = NULL_TREE;
> > > > > > -	}
> > > > > > -      else if (integer_nonzerop (may_be_zero))
> > > > > > +      /* Loop assumptions are based off the normal exit.  */
> > > > > > +      if (i == 0)
> > > > > >  	{
> > > > > > -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE
> (niter), 0);
> > > > > > -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter),
> 1);
> > > > > > -	  return cond;
> > > > > > +	  *assumptions = niter_assumptions;
> > > > > > +	  *number_of_iterationsm1 = niter;
> > > > > > +
> > > > > > +	  /* We want the number of loop header executions which is
> the
> > > > > number
> > > > > > +	     of latch executions plus one.
> > > > > > +	     ???  For UINT_MAX latch executions this number
> overflows to
> > > > > zero
> > > > > > +	     for loops like do { n++; } while (n != 0);  */
> > > > > > +	  if (niter && !chrec_contains_undetermined (niter))
> > > > > > +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > > > > > +				 unshare_expr (niter),
> > > > > > +				 build_int_cst (TREE_TYPE (niter), 1));
> > > > > > +	  *number_of_iterations = niter;
> > > > > >  	}
> > > > > > -      else
> > > > > > -	return cond;
> > > > > >      }
> > > > > >
> > > > > > -  *assumptions = niter_assumptions;
> > > > > > -  *number_of_iterationsm1 = niter;
> > > > > > -
> > > > > > -  /* We want the number of loop header executions which is the
> > > number
> > > > > > -     of latch executions plus one.
> > > > > > -     ???  For UINT_MAX latch executions this number overflows to
> zero
> > > > > > -     for loops like do { n++; } while (n != 0);  */
> > > > > > -  if (niter && !chrec_contains_undetermined (niter))
> > > > > > -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr
> > > > > (niter),
> > > > > > -			  build_int_cst (TREE_TYPE (niter), 1));
> > > > > > -  *number_of_iterations = niter;
> > > > > > +  if (dump_enabled_p ())
> > > > > > +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits
> > > successfully
> > > > > analyzed.\n");
> > > > > >
> > > > > > -  return cond;
> > > > > > +  return conds;
> > > > > >  }
> > > > > >
> > > > > >  /* Function bb_in_loop_p
> > > > > > @@ -1455,7 +1481,8 @@
> vect_compute_single_scalar_iteration_cost
> > > > > (loop_vec_info loop_vinfo)
> > > > > >
> > > > > >     Verify that certain CFG restrictions hold, including:
> > > > > >     - the loop has a pre-header
> > > > > > -   - the loop has a single entry and exit
> > > > > > +   - the loop has a single entry
> > > > > > +   - nested loops can have only a single exit.
> > > > > >     - the loop exit condition is simple enough
> > > > > >     - the number of iterations can be analyzed, i.e, a countable loop.
> The
> > > > > >       niter could be analyzed under some assumptions.  */
> > > > > > @@ -1484,11 +1511,6 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >                             |
> > > > > >                          (exit-bb)  */
> > > > > >
> > > > > > -      if (loop->num_nodes != 2)
> > > > > > -	return opt_result::failure_at (vect_location,
> > > > > > -				       "not vectorized:"
> > > > > > -				       " control flow in loop.\n");
> > > > > > -
> > > > > >        if (empty_block_p (loop->header))
> > > > > >  	return opt_result::failure_at (vect_location,
> > > > > >  				       "not vectorized: empty loop.\n");
> > > > > > @@ -1559,11 +1581,13 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > > > >  			 "Considering outer-loop vectorization.\n");
> > > > > >        info->inner_loop_cond = inner.loop_cond;
> > > > > > +
> > > > > > +      if (!single_exit (loop))
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized: multiple
> exits.\n");
> > > > > > +
> > > > > >      }
> > > > > >
> > > > > > -  if (!single_exit (loop))
> > > > > > -    return opt_result::failure_at (vect_location,
> > > > > > -				   "not vectorized: multiple exits.\n");
> > > > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > > > >      return opt_result::failure_at (vect_location,
> > > > > >  				   "not vectorized:"
> > > > > > @@ -1579,21 +1603,45 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >  				   "not vectorized: latch block not
> empty.\n");
> > > > > >
> > > > > >    /* Make sure the exit is not abnormal.  */
> > > > > > -  edge e = single_exit (loop);
> > > > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  edge nexit = normal_exit (loop);
> > > > > > +  for (edge e : exits)
> > > > > > +    {
> > > > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized:"
> > > > > > +				       " abnormal loop exit edge.\n");
> > > > > > +      /* Early break BB must be after the main exit BB.  In theory we
> > > should
> > > > > > +	 be able to vectorize the inverse order, but the current flow
> in the
> > > > > > +	 the vectorizer always assumes you update success PHI
> nodes, not
> > > > > > +	 preds.  */
> > > > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit-
> >src,
> > > e-
> > > > > >src))
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized:"
> > > > > > +				       " abnormal loop exit edge
> order.\n");
> > > > > > +    }
> > > > > > +
> > > > > > +  if (exits.length () > 2)
> > > > > >      return opt_result::failure_at (vect_location,
> > > > > >  				   "not vectorized:"
> > > > > > -				   " abnormal loop exit edge.\n");
> > > > > > -
> > > > > > -  info->loop_cond
> > > > > > +				   " too many exits. Only 1 additional
> exit"
> > > > > > +				   " supported.\n");
> > > > > > +  if (loop->num_nodes != 2 + exits.length () - 1)
> > > > > > +    return opt_result::failure_at (vect_location,
> > > > > > +				   "not vectorized:"
> > > > > > +				   " unsupported control flow in
> loop.\n");
> > > > > > +  info->conds
> > > > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > > >  			    &info->number_of_iterations,
> > > > > >  			    &info->number_of_iterationsm1);
> > > > > > -  if (!info->loop_cond)
> > > > > > +
> > > > > > +  if (info->conds.length () == 0)
> > > > > >      return opt_result::failure_at
> > > > > >        (vect_location,
> > > > > >         "not vectorized: complicated exit condition.\n");
> > > > > >
> > > > > > +  info->loop_cond = info->conds[0];
> > > > > > +
> > > > > >    if (integer_zerop (info->assumptions)
> > > > > >        || !info->number_of_iterations
> > > > > >        || chrec_contains_undetermined (info-
> >number_of_iterations))
> > > > > > @@ -1638,8 +1686,17 @@ vect_create_loop_vinfo (class loop
> *loop,
> > > > > vec_info_shared *shared,
> > > > > >    if (!integer_onep (info->assumptions) && !main_loop_info)
> > > > > >      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info-
> > > >assumptions;
> > > > > >
> > > > > > -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info-
> > > > > >loop_cond);
> > > > > > -  STMT_VINFO_TYPE (loop_cond_info) =
> > > loop_exit_ctrl_vec_info_type;
> > > > > > +  unsigned int i;
> > > > > > +  gcond *cond;
> > > > > > +  FOR_EACH_VEC_ELT (info->conds, i, cond)
> > > > > > +    {
> > > > > > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt
> (cond);
> > > > > > +      STMT_VINFO_TYPE (loop_cond_info) =
> > > loop_exit_ctrl_vec_info_type;
> > > > > > +    }
> > > > > > +
> > > > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo) = info->conds.length
> () >
> > > 1;
> > > > > > +
> > > > > >    if (info->inner_loop_cond)
> > > > > >      {
> > > > > >        stmt_vec_info inner_loop_cond_info
> > > > > > @@ -2270,10 +2327,13 @@
> > > vect_determine_partial_vectors_and_peeling
> > > > > (loop_vec_info loop_vinfo,
> > > > > >    bool need_peeling_or_partial_vectors_p
> > > > > >      = vect_need_peeling_or_partial_vectors_p (loop_vinfo);
> > > > > >
> > > > > > -  /* Decide whether to vectorize the loop with partial vectors.  */
> > > > > > +  /* Decide whether to vectorize the loop with partial vectors.
> > > Currently
> > > > > > +     early break vectorization does not support partial vectors as we
> > > have
> > > > > > +     to peel a scalar loop that we can't vectorize.  */
> > > > > >    LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > > > > >    LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) =
> > > false;
> > > > > >    if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > >        && need_peeling_or_partial_vectors_p)
> > > > > >      {
> > > > > >        /* For partial-vector-usage=1, try to push the handling of partial
> > > > > > @@ -2746,13 +2806,14 @@ start_over:
> > > > > >
> > > > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      {
> > > > > >        if (dump_enabled_p ())
> > > > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > > > > required\n");
> > > > > >        if (!vect_can_advance_ivs_p (loop_vinfo)
> > > > > > -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP
> > > > > (loop_vinfo),
> > > > > > -					   single_exit
> (LOOP_VINFO_LOOP
> > > > > > +	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
> > > > > > +					   normal_exit
> (LOOP_VINFO_LOOP
> > > > > >
> (loop_vinfo))))
> > > > > >          {
> > > > > >  	  ok = opt_result::failure_at (vect_location,
> > > > > > @@ -3239,6 +3300,8 @@ vect_analyze_loop (class loop *loop,
> > > > > vec_info_shared *shared)
> > > > > >  		     "***** Choosing vector mode %s\n",
> > > > > >  		     GET_MODE_NAME (first_loop_vinfo-
> >vector_mode));
> > > > > >
> > > > > > +  loop_form_info.conds.release ();
> > > > > > +
> > > > > >    /* Only vectorize epilogues if
> PARAM_VECT_EPILOGUES_NOMASK is
> > > > > >       enabled, SIMDUID is not set, it is the innermost loop and we
> have
> > > > > >       either already found the loop's SIMDLEN or there was no
> SIMDLEN
> > > to
> > > > > > @@ -3350,6 +3413,8 @@ vect_analyze_loop (class loop *loop,
> > > > > vec_info_shared *shared)
> > > > > >  			   (first_loop_vinfo->epilogue_vinfos[0]-
> > > > > >vector_mode));
> > > > > >      }
> > > > > >
> > > > > > +  loop_form_info.conds.release ();
> > > > > > +
> > > > > >    return first_loop_vinfo;
> > > > > >  }
> > > > > >
> > > > > > @@ -7907,6 +7972,237 @@ vect_transform_reduction
> (loop_vec_info
> > > > > loop_vinfo,
> > > > > >    return true;
> > > > > >  }
> > > > > >
> > > > > > +/*  When vectorizing early break statements instructions that
> happen
> > > > > before
> > > > > > +    the early break in the current BB need to be moved to after the
> > > early
> > > > > > +    break.  This function deals with that and assumes that any
> validaty
> > > > > > +    checks has already been performed.
> > > > > > +
> > > > > > +    While moving the instructions if it encounters a VUSE or VDEF it
> then
> > > > > > +    corrects the VUSES as it moves the statements along.  CHAINED
> > > > > contains
> > > > > > +    the list of SSA_NAMES that belong to the dependency chain of
> the
> > > early
> > > > > > +    break conditional.  GDEST is the location in which to insert the
> new
> > > > > > +    statements.  GSTMT is the iterator to walk up to find statements
> to
> > > > > > +    consider moving.  REACHING_VUSE contains the dominating
> VUSE
> > > > > found so far
> > > > > > +    and CURRENT_VDEF contains the last VDEF we've seen.  These
> are
> > > > > updated in
> > > > > > +    pre-order and updated in post-order after moving the
> instruction.
> > > */
> > > > > > +
> > > > > > +static void
> > > > > > +move_early_exit_stmts (hash_set<tree> *chained,
> > > gimple_stmt_iterator
> > > > > *gdest,
> > > > > > +		       gimple_stmt_iterator *gstmt, tree
> *reaching_vuse,
> > > > > > +		       tree *current_vdef)
> > > > > > +{
> > > > > > +  if (gsi_end_p (*gstmt))
> > > > > > +    return;
> > > > > > +
> > > > > > +  gimple *stmt = gsi_stmt (*gstmt);
> > > > > > +  if (gimple_has_ops (stmt))
> > > > > > +    {
> > > > > > +      tree dest = NULL_TREE;
> > > > > > +      /* Try to find the SSA_NAME being defined.  For Statements
> with
> > > an
> > > > > LHS
> > > > > > +	 use the LHS, if not, assume that the first argument of a call is
> the
> > > > > > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > > > > > +      if (gimple_has_lhs (stmt))
> > > > > > +	{
> > > > > > +	  if (is_gimple_assign (stmt))
> > > > > > +	    dest = gimple_assign_lhs (stmt);
> > > > > > +	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > > > > +	    dest = gimple_call_lhs (call);
> > > > > > +	}
> > > > > > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > > > > +	dest = gimple_arg (call, 0);
> > > > > > +
> > > > > > +      /* Don't move the scalar instructions.  */
> > > > > > +      bool move
> > > > > > +	= dest && (VECTOR_TYPE_P (TREE_TYPE (dest))
> > > > > > +		   || POINTER_TYPE_P (TREE_TYPE (dest)));
> > > > > > +
> > > > > > +      /* If we found the defining statement of a something that's
> part of
> > > the
> > > > > > +	 chain then expand the chain with the new SSA_VARs being
> used.  */
> > > > > > +      if (chained->contains (dest))
> > > > > > +	{
> > > > > > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > > > > +	    if (TREE_CODE (gimple_arg (stmt, x)) == SSA_NAME)
> > > > > > +	      chained->add (gimple_arg (stmt, x));
> > > > > > +
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +			     "found chain %G", stmt);
> > > > > > +	  update_stmt (stmt);
> > > > > > +	  move = false;
> > > > > > +	}
> > > > > > +
> > > > > > +      if (move)
> > > > > > +	{
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +		             "moving stmt %G", stmt);
> > > > > > +	  gsi_move_before (gstmt, gdest);
> > > > > > +	  gsi_prev (gdest);
> > > > > > +	  tree vdef = gimple_vdef (stmt);
> > > > > > +
> > > > > > +	  /* If we've moved a VDEF, extract the defining MEM and
> update
> > > > > > +	     usages of it.  TODO: I think this may need some
> constraints? */
> > > > > > +	  if (vdef)
> > > > > > +	    {
> > > > > > +	      *current_vdef = vdef;
> > > > > > +	      *reaching_vuse = gimple_vuse (stmt);
> > > > > > +	      imm_use_iterator imm_iter;
> > > > > > +	      gimple *use_stmt;
> > > > > > +	      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, vdef)
> > > > > > +		{
> > > > > > +		   if (!is_a <gphi *> (use_stmt))
> > > > > > +		     continue;
> > > > > > +		   gphi *phi_stmt = as_a <gphi *> (use_stmt);
> > > > > > +
> > > > > > +		   if (dump_enabled_p ())
> > > > > > +		     dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +				"updating vuse %G", use_stmt);
> > > > > > +		   for (unsigned i = 0; i < gimple_phi_num_args
> (phi_stmt);
> > > > > i++)
> > > > > > +		    if (gimple_phi_arg_def (phi_stmt, i) == vdef)
> > > > > > +		      {
> > > > > > +			SET_USE (PHI_ARG_DEF_PTR (phi_stmt, i),
> > > > > gimple_vuse (stmt));
> > > > > > +			break;
> > > > > > +		      }
> > > > > > +		}
> > > > > > +	    }
> > > > > > +	  update_stmt (stmt);
> > > > > > +	}
> > > > > > +    }
> > > > > > +
> > > > > > +  gsi_prev (gstmt);
> > > > > > +  move_early_exit_stmts (chained, gdest, gstmt, reaching_vuse,
> > > > > current_vdef);
> > > > > > +
> > > > > > +  if (gimple_vuse (stmt)
> > > > > > +      && reaching_vuse && *reaching_vuse
> > > > > > +      && gimple_vuse (stmt) == *current_vdef)
> > > > > > +    {
> > > > > > +      unlink_stmt_vdef (stmt);
> > > > > > +      gimple_set_vuse (stmt, *reaching_vuse);
> > > > > > +      update_stmt (stmt);
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +/* Transform the definition stmt STMT_INFO of an early exit
> > > > > > +   value.  */
> > > > > > +
> > > > > > +bool
> > > > > > +vect_transform_early_break (loop_vec_info loop_vinfo,
> > > > > > +			    stmt_vec_info stmt_info,
> gimple_stmt_iterator
> > > > > *gsi,
> > > > > > +			    gimple **vec_stmt, slp_tree slp_node)
> > > > > > +{
> > > > > > +  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> > > > > > +  int i;
> > > > > > +  int ncopies;
> > > > > > +  int vec_num;
> > > > > > +
> > > > > > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > > > > +    return false;
> > > > > > +
> > > > > > +  gimple_match_op op;
> > > > > > +  if (!gimple_extract_op (stmt_info->stmt, &op))
> > > > > > +    gcc_unreachable ();
> > > > > > +  gcc_assert (op.code.is_tree_code ());
> > > > > > +  auto code = tree_code (op.code);
> > > > > > +
> > > > > > +  tree vectype_in = STMT_VINFO_VECTYPE (stmt_info);
> > > > > > +  gcc_assert (vectype_in);
> > > > > > +
> > > > > > +
> > > > > > +  if (slp_node)
> > > > > > +    {
> > > > > > +      ncopies = 1;
> > > > > > +      vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
> > > > > > +      vec_num = 1;
> > > > > > +    }
> > > > > > +
> > > > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > > > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P
> > > (loop_vinfo);
> > > > > > +
> > > > > > +  /* Transform.  */
> > > > > > +  tree new_temp = NULL_TREE;
> > > > > > +  auto_vec<tree> vec_oprnds0;
> > > > > > +  auto_vec<tree> vec_oprnds1;
> > > > > > +  tree def0;
> > > > > > +
> > > > > > +  if (dump_enabled_p ())
> > > > > > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-
> > > > > exit.\n");
> > > > > > +
> > > > > > +  /* FORNOW: Multiple types are not supported for condition.  */
> > > > > > +  if (code == COND_EXPR)
> > > > > > +    gcc_assert (ncopies == 1);
> > > > > > +
> > > > > > +
> > > > > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > > > > +  basic_block cond_bb = gimple_bb (stmt);
> > > > > > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > > > > > +
> > > > > > +  vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
> > > > > > +		     op.ops[0], &vec_oprnds0, op.ops[1],
> &vec_oprnds1,
> > > > > > +		     NULL, NULL);
> > > > > > +
> > > > > > +  gimple *new_stmt = NULL;
> > > > > > +  tree cst_0 = build_zero_cst (truth_type_for (vectype_out));
> > > > > > +  tree cst_m1 = build_minus_one_cst (truth_type_for
> (vectype_out));
> > > > > > +
> > > > > > +  FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
> > > > > > +    {
> > > > > > +      tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE };
> > > > > > +	{
> > > > > > +	  tree cond = make_temp_ssa_name (truth_type_for
> (vectype_out),
> > > > > NULL, "mask");
> > > > > > +	  gimple *vec_cmp = gimple_build_assign (cond, code,
> vop[0],
> > > > > vop[1]);
> > > > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info,
> vec_cmp,
> > > > > &cond_gsi);
> > > > > > +          if (masked_loop_p)
> > > > > > +	    {
> > > > > > +	      tree mask = vect_get_loop_mask (gsi, masks, vec_num *
> ncopies,
> > > > > > +					      vectype_in, i);
> > > > > > +	      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE
> (mask), mask,
> > > > > > +				       cond, &cond_gsi);
> > > > > > +	    }
> > > > > > +
> > > > > > +	  new_temp = make_temp_ssa_name (truth_type_for
> > > > > (vectype_out), NULL, "vexit");
> > > > > > +          gimple *vec_cond = gimple_build_assign (new_temp,
> > > > > VEC_COND_EXPR,
> > > > > > +						  cond, cst_m1,
> cst_0);
> > > > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info,
> vec_cond,
> > > > > &cond_gsi);
> > > > > > +	  new_stmt = vec_cond;
> > > > > > +	}
> > > > > > +
> > > > > > +      if (slp_node)
> > > > > > +	SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
> > > > > > +      else
> > > > > > +	STMT_VINFO_VEC_STMTS (stmt_info).safe_push
> (new_stmt);
> > > > > > +    }
> > > > > > +
> > > > > > +  gcc_assert (new_stmt);
> > > > > > +  tree lhs = gimple_assign_lhs (new_stmt);
> > > > > > +
> > > > > > +  tree t = fold_build2 (NE_EXPR, boolean_type_node, lhs,
> > > > > > +			build_zero_cst (truth_type_for
> (vectype_out)));
> > > > > > +  t = canonicalize_cond_expr_cond (t);
> > > > > > +  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
> > > > > > +  update_stmt (stmt);
> > > > > > +
> > > > > > +  basic_block dest_bb = EDGE_SUCC (cond_bb, 1)->dest;
> > > > > > +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > > > > > +
> > > > > > +  hash_set<tree> chained;
> > > > > > +  gimple_stmt_iterator gsi2 = gsi_for_stmt (new_stmt);
> > > > > > +  chained.add (lhs);
> > > > > > +  tree vdef;
> > > > > > +  tree vuse = gimple_vuse (new_stmt);
> > > > > > +  move_early_exit_stmts (&chained, &dest_gsi, &gsi2, &vuse,
> &vdef);
> > > > > > +
> > > > > > +  if (!slp_node)
> > > > > > +    *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > > > +
> > > > > > +  return true;
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > +
> > > > > >  /* Transform phase of a cycle PHI.  */
> > > > > >
> > > > > >  bool
> > > > > > @@ -8185,6 +8481,186 @@ vect_transform_cycle_phi
> (loop_vec_info
> > > > > loop_vinfo,
> > > > > >    return true;
> > > > > >  }
> > > > > >
> > > > > > +/* This function tries to validate whether an early break
> vectorization
> > > > > > +   is possible for the current instruction sequence. Returns True i
> > > > > > +   possible, otherwise False.
> > > > > > +
> > > > > > +   Requirements:
> > > > > > +     - Any memory access must be to a fixed size buffer.
> > > > > > +     - There must not be any loads and stores to the same object.
> > > > > > +     - Multiple loads are allowed as long as they don't alias.
> > > > > > +
> > > > > > +
> > > > > > +   Arguments:
> > > > > > +     - LOOP_VINFO: loop information for the current loop.
> > > > > > +     - CHAIN: Currently detected sequence of instructions that
> belong
> > > > > > +	      to the current early break.
> > > > > > +     - LOADS: List of all loads found during traversal.
> > > > > > +     - BASES: List of all load datareferences found during traversal.
> > > > > > +     - GSTMT: Current position to inspect for validity.  The sequence
> > > > > > +	      will be moved upwards from this point.  */
> > > > > > +
> > > > > > +static bool
> > > > > > +validate_early_exit_stmts (loop_vec_info loop_vinfo,
> hash_set<tree>
> > > > > *chain,
> > > > > > +			   vec<tree> *loads, vec<data_reference *>
> *bases,
> > > > > > +			   gimple_stmt_iterator *gstmt)
> > > > > > +{
> > > > > > +  if (gsi_end_p (*gstmt))
> > > > > > +    return true;
> > > > > > +
> > > > > > +  gimple *stmt = gsi_stmt (*gstmt);
> > > > > > +  if (gimple_has_ops (stmt))
> > > > > > +    {
> > > > > > +      tree dest = NULL_TREE;
> > > > > > +      /* Try to find the SSA_NAME being defined.  For Statements
> with
> > > an
> > > > > LHS
> > > > > > +	 use the LHS, if not, assume that the first argument of a call is
> the
> > > > > > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > > > > > +      if (gimple_has_lhs (stmt))
> > > > > > +	{
> > > > > > +	  if (is_gimple_assign (stmt))
> > > > > > +	    dest = gimple_assign_lhs (stmt);
> > > > > > +	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > > > > +	    dest = gimple_call_lhs (call);
> > > > > > +	}
> > > > > > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > > > > +	dest = gimple_arg (call, 0);
> > > > > > +
> > > > > > +      /* Don't move the scalar instructions.  */
> > > > > > +      bool move
> > > > > > +	= dest && (VECTOR_TYPE_P (TREE_TYPE (dest))
> > > > > > +		   || POINTER_TYPE_P (TREE_TYPE (dest)));
> > > > > > +
> > > > > > +      /* If we found the defining statement of a something that's
> part of
> > > the
> > > > > > +	 chain then expand the chain with the new SSA_VARs being
> used.  */
> > > > > > +      if (chain->contains (dest))
> > > > > > +	{
> > > > > > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > > > > +	    if (TREE_CODE (gimple_arg (stmt, x)) == SSA_NAME)
> > > > > > +	      chain->add (gimple_arg (stmt, x));
> > > > > > +
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	      dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +				"found chain %G", stmt);
> > > > > > +
> > > > > > +	  move = false;
> > > > > > +	}
> > > > > > +
> > > > > > +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > > +      if (!stmt_vinfo)
> > > > > > +	{
> > > > > > +	   if (dump_enabled_p ())
> > > > > > +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > > > > > +			      "early breaks only supported. Unknown"
> > > > > > +			      " statement: %G", stmt);
> > > > > > +	   return false;
> > > > > > +	}
> > > > > > +
> > > > > > +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > > +      if (dr_ref)
> > > > > > +	{
> > > > > > +	   /* We currenly only support statically allocated objects due
> to
> > > > > > +	      not having first-faulting loads support or peeling for
> alignment
> > > > > > +	      support.  Compute the isize of the referenced object (it
> could be
> > > > > > +	      dynamically allocated).  */
> > > > > > +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> > > > > > +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > > > > > +	     {
> > > > > > +	       if (dump_enabled_p ())
> > > > > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > > > vect_location,
> > > > > > +				  "early breaks only supported on
> statically"
> > > > > > +				  " allocated objects.\n");
> > > > > > +	       return false;
> > > > > > +	     }
> > > > > > +
> > > > > > +	   tree refop = TREE_OPERAND (obj, 0);
> > > > > > +	   tree refbase = get_base_address (refop);
> > > > > > +	   if (!refbase || !DECL_P (refbase)
> > > > > > +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > > > > > +	     {
> > > > > > +	       if (dump_enabled_p ())
> > > > > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > > > vect_location,
> > > > > > +				  "early breaks only supported on
> statically"
> > > > > > +				  " allocated objects.\n");
> > > > > > +	       return false;
> > > > > > +	     }
> > > > > > +
> > > > > > +	   if (!move && DR_IS_READ (dr_ref))
> > > > > > +	     {
> > > > > > +		loads->safe_push (dest);
> > > > > > +		bases->safe_push (dr_ref);
> > > > > > +	     }
> > > > > > +	   else if (DR_IS_WRITE (dr_ref))
> > > > > > +	     {
> > > > > > +		for (auto dr : bases)
> > > > > > +		  if (same_data_refs_base_objects (dr, dr_ref))
> > > > > > +		    return false;
> > > > > > +	     }
> > > > > > +	}
> > > > > > +
> > > > > > +      if (move)
> > > > > > +	{
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +		             "analyzing stmt %G", stmt);
> > > > > > +
> > > > > > +	  for (tree ref : loads)
> > > > > > +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> > > > > > +	      {
> > > > > > +	        if (dump_enabled_p ())
> > > > > > +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > > > vect_location,
> > > > > > +				   "early breaks not supported as
> memory
> > > > > used"
> > > > > > +				   " may alias.\n");
> > > > > > +	        return false;
> > > > > > +	      }
> > > > > > +	}
> > > > > > +    }
> > > > > > +
> > > > > > +  gsi_prev (gstmt);
> > > > > > +  return validate_early_exit_stmts (loop_vinfo, chain, loads, bases,
> > > > > gstmt);
> > > > > > +}
> > > > > > +
> > > > > > +/* Check to see if the current early break given in STMT_INFO is
> valid
> > > for
> > > > > > +   vectorization.  */
> > > > > > +
> > > > > > +bool
> > > > > > +vectorizable_early_exit (vec_info *vinfo,
> > > > > > +			stmt_vec_info stmt_info, slp_tree /*
> slp_node */,
> > > > > > +			slp_instance /* slp_node_instance */,
> > > > > > +			stmt_vector_for_cost * /* cost_vec */)
> > > > > > +{
> > > > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > > > +
> > > > > > +  if (!loop_vinfo
> > > > > > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > > > > +    return false;
> > > > > > +
> > > > > > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> > > > > > +    return false;
> > > > > > +
> > > > > > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > > > > +  tree truth_type = truth_type_for (vectype);
> > > > > > +
> > > > > > +  auto optab = direct_optab_handler (cbranch_optab, TYPE_MODE
> > > > > (truth_type));
> > > > > > +  if (optab == CODE_FOR_nothing)
> > > > > > +    {
> > > > > > +      if (dump_enabled_p ())
> > > > > > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > > > > > +				 "can't vectorize early exit because
> the "
> > > > > > +				 "target doesn't support flag setting
> vector "
> > > > > > +				 "comparisons.\n");
> > > > > > +      return false;
> > > > > > +    }
> > > > > > +
> > > > > > +  hash_set<tree> chain;
> > > > > > +  auto_vec<tree> loads;
> > > > > > +  auto_vec<data_reference *> bases;
> > > > > > +
> > > > > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > > > > +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > > > > > +
> > > > > > +  return validate_early_exit_stmts (loop_vinfo, &chain, &loads,
> > > &bases,
> > > > > &gsi);
> > > > > > +}
> > > > > > +
> > > > > >  /* Vectorizes LC PHIs.  */
> > > > > >
> > > > > >  bool
> > > > > > @@ -9993,13 +10469,24 @@ vectorizable_live_operation (vec_info
> > > *vinfo,
> > > > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > > > >  	   lhs' = new_tree;  */
> > > > > >
> > > > > > +      /* When vectorizing an early break, any live statement that is
> used
> > > > > > +	 outside of the loop are dead.  The loop will never get to
> them.
> > > > > > +	 We could change the liveness value during analysis instead
> but since
> > > > > > +	 the below code is invalid anyway just ignore it during
> codegen.  */
> > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	return true;
> > > > > > +
> > > > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > -      basic_block exit_bb = single_exit (loop)->dest;
> > > > > > +      basic_block exit_bb = normal_exit (loop)->dest;
> > > > > >        gcc_assert (single_pred_p (exit_bb));
> > > > > >
> > > > > >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > > > > >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > > > > -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > > > > > +      /* For early exits we need to compute the right exit.  The
> current
> > > > > > +	 approach punts to a scalar loop instead.  If we were to
> vectorize
> > > > > > +	 the exit condition below needs to take into account the
> difference
> > > > > > +	 between a `break` edge and a `return` edge.  */
> > > > > > +      SET_PHI_ARG_DEF (phi, normal_exit (loop)->dest_idx,
> vec_lhs);
> > > > > >
> > > > > >        gimple_seq stmts = NULL;
> > > > > >        tree new_tree;
> > > > > > @@ -10438,7 +10925,8 @@ scale_profile_for_vect_loop (class loop
> > > *loop,
> > > > > unsigned vf)
> > > > > >        scale_loop_frequencies (loop, p);
> > > > > >      }
> > > > > >
> > > > > > -  edge exit_e = single_exit (loop);
> > > > > > +  edge exit_e = normal_exit (loop);
> > > > > > +
> > > > > >    exit_e->probability = profile_probability::always () /
> (new_est_niter +
> > > 1);
> > > > > >
> > > > > >    edge exit_l = single_pred_edge (loop->latch);
> > > > > > @@ -10787,7 +11275,7 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >    /* Make sure there exists a single-predecessor exit bb.  Do this
> before
> > > > > >       versioning.   */
> > > > > >    edge e = single_exit (loop);
> > > > > > -  if (! single_pred_p (e->dest))
> > > > > > +  if (e && ! single_pred_p (e->dest))
> > > > > >      {
> > > > > >        split_loop_exit_edge (e, true);
> > > > > >        if (dump_enabled_p ())
> > > > > > @@ -10813,7 +11301,7 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > > > >      {
> > > > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > > > -      if (! single_pred_p (e->dest))
> > > > > > +      if (e && ! single_pred_p (e->dest))
> > > > > >  	{
> > > > > >  	  split_loop_exit_edge (e, true);
> > > > > >  	  if (dump_enabled_p ())
> > > > > > @@ -11146,7 +11634,8 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >
> > > > > >    /* Loops vectorized with a variable factor won't benefit from
> > > > > >       unrolling/peeling.  */
> > > > > > -  if (!vf.is_constant ())
> > > > > > +  if (!vf.is_constant ()
> > > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      {
> > > > > >        loop->unroll = 1;
> > > > > >        if (dump_enabled_p ())
> > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > index
> > > > >
> > >
> 4e0d75e0d7586ad57a37850d8a70f6182ecb13d0..4f9446a5c699288be093c556e
> > > > > c527e87cf788317 100644
> > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > @@ -342,9 +342,28 @@ vect_stmt_relevant_p (stmt_vec_info
> > > stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >    *live_p = false;
> > > > > >
> > > > > >    /* cond stmt other than loop exit cond.  */
> > > > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > > > -      && STMT_VINFO_TYPE (stmt_info) !=
> > > loop_exit_ctrl_vec_info_type)
> > > > > > -    *relevant = vect_used_in_scope;
> > > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > +    {
> > > > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit
> edge,
> > > but
> > > > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > > > +	 the hard way.  */
> > > > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > > > +      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> > > > > > +      edge exit = BRANCH_EDGE (bb);
> > > > > > +      unsigned nbbs = loop->num_nodes;
> > > > > > +      bool exit_bb = true;
> > > > > > +      for (unsigned i = 0; i < nbbs; i++)
> > > > > > +	{
> > > > > > +	  if (exit->dest == bbs[i])
> > > > > > +	    {
> > > > > > +	      exit_bb = false;
> > > > > > +	      break;
> > > > > > +	    }
> > > > > > +	}
> > > > > > +
> > > > > > +      if (exit_bb)
> > > > > > +	*relevant = vect_used_in_scope;
> > > > > > +    }
> > > > > >
> > > > > >    /* changing memory.  */
> > > > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > > > @@ -357,6 +376,11 @@ vect_stmt_relevant_p (stmt_vec_info
> > > stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >  	*relevant = vect_used_in_scope;
> > > > > >        }
> > > > > >
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  auto_bitmap exit_bbs;
> > > > > > +  for (edge exit : exits)
> > > > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > > +
> > > > > >    /* uses outside the loop.  */
> > > > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > > SSA_OP_DEF)
> > > > > >      {
> > > > > > @@ -375,7 +399,7 @@ vect_stmt_relevant_p (stmt_vec_info
> > > stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >  	      /* We expect all such uses to be in the loop exit phis
> > > > > >  		 (because of loop closed form)   */
> > > > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) ==
> GIMPLE_PHI);
> > > > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > > > > >
> > > > > >                *live_p = true;
> > > > > >  	    }
> > > > > > @@ -1845,7 +1869,7 @@ check_load_store_for_partial_vectors
> > > > > (loop_vec_info loop_vinfo, tree vectype,
> > > > > >     MASK_TYPE is the type of both masks.  If new statements are
> > > needed,
> > > > > >     insert them before GSI.  */
> > > > > >
> > > > > > -static tree
> > > > > > +tree
> > > > > >  prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type,
> tree
> > > > > loop_mask,
> > > > > >  		  tree vec_mask, gimple_stmt_iterator *gsi)
> > > > > >  {
> > > > > > @@ -11158,11 +11182,14 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >  			     node_instance, cost_vec);
> > > > > >        if (!res)
> > > > > >  	return res;
> > > > > > -   }
> > > > > > +    }
> > > > > > +  else if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > > > >
> > > > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > > > >      {
> > > > > >        case vect_internal_def:
> > > > > > +      case vect_early_exit_def:
> > > > > >          break;
> > > > > >
> > > > > >        case vect_reduction_def:
> > > > > > @@ -11195,6 +11222,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >      {
> > > > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > > > +		  || gimple_code (stmt_info->stmt) ==
> GIMPLE_COND
> > > > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > > > >        *need_to_vectorize = true;
> > > > > >      }
> > > > > > @@ -11237,7 +11265,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > > > > >  				  stmt_info, NULL, node)
> > > > > >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > > > > -				   stmt_info, NULL, node, cost_vec));
> > > > > > +				   stmt_info, NULL, node, cost_vec)
> > > > > > +	  || vectorizable_early_exit (vinfo, stmt_info,
> > > > > > +				      node, node_instance, cost_vec));
> > > > > >    else
> > > > > >      {
> > > > > >        if (bb_vinfo)
> > > > > > @@ -11260,7 +11290,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >  					 NULL, NULL, node, cost_vec)
> > > > > >  	      || vectorizable_comparison (vinfo, stmt_info, NULL,
> NULL, node,
> > > > > >  					  cost_vec)
> > > > > > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node,
> cost_vec));
> > > > > > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node,
> cost_vec)
> > > > > > +	      || vectorizable_early_exit (vinfo, stmt_info, node,
> > > > > > +					  node_instance, cost_vec));
> > > > > > +
> > > > > >      }
> > > > > >
> > > > > >    if (node)
> > > > > > @@ -11418,6 +11451,12 @@ vect_transform_stmt (vec_info *vinfo,
> > > > > >        gcc_assert (done);
> > > > > >        break;
> > > > > >
> > > > > > +    case loop_exit_ctrl_vec_info_type:
> > > > > > +      done = vect_transform_early_break (as_a <loop_vec_info>
> > > (vinfo),
> > > > > stmt_info,
> > > > > > +				         gsi, &vec_stmt, slp_node);
> > > > > > +      gcc_assert (done);
> > > > > > +      break;
> > > > > > +
> > > > > >      default:
> > > > > >        if (!STMT_VINFO_LIVE_P (stmt_info))
> > > > > >  	{
> > > > > > @@ -11816,6 +11855,9 @@ vect_is_simple_use (tree operand,
> > > vec_info
> > > > > *vinfo, enum vect_def_type *dt,
> > > > > >  	case vect_first_order_recurrence:
> > > > > >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> > > > > >  	  break;
> > > > > > +	case vect_early_exit_def:
> > > > > > +	  dump_printf (MSG_NOTE, "early exit\n");
> > > > > > +	  break;
> > > > > >  	case vect_unknown_def_type:
> > > > > >  	  dump_printf (MSG_NOTE, "unknown\n");
> > > > > >  	  break;
> > > > > > @@ -12486,6 +12528,8 @@ vect_get_vector_types_for_stmt
> (vec_info
> > > > > *vinfo, stmt_vec_info stmt_info,
> > > > > >    *nunits_vectype_out = NULL_TREE;
> > > > > >
> > > > > >    if (gimple_get_lhs (stmt) == NULL_TREE
> > > > > > +      /* Allow vector conditionals through here.  */
> > > > > > +      && !is_ctrl_stmt (stmt)
> > > > > >        /* MASK_STORE has no lhs, but is ok.  */
> > > > > >        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > > > >      {
> > > > > > @@ -12502,7 +12546,7 @@ vect_get_vector_types_for_stmt
> (vec_info
> > > > > *vinfo, stmt_vec_info stmt_info,
> > > > > >  	}
> > > > > >
> > > > > >        return opt_result::failure_at (stmt,
> > > > > > -				     "not vectorized: irregular
> stmt.%G", stmt);
> > > > > > +				     "not vectorized: irregular stmt:
> %G", stmt);
> > > > > >      }
> > > > > >
> > > > > >    tree vectype;
> > > > > > @@ -12531,6 +12575,8 @@ vect_get_vector_types_for_stmt
> (vec_info
> > > > > *vinfo, stmt_vec_info stmt_info,
> > > > > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > > > > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > > > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > > > > +      else if (is_ctrl_stmt (stmt))
> > > > > > +	scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
> > > > > >        else
> > > > > >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > > > > >
> > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > index
> > > > >
> > >
> 016961da8510ca7dd2d07e716cbe35623ed2d9a5..edbb7228d3aae29b6f51fdab
> > > > > 284f49ac57c6612d 100644
> > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > > > >    vect_internal_def,
> > > > > >    vect_induction_def,
> > > > > >    vect_reduction_def,
> > > > > > +  vect_early_exit_def,
> > > > > >    vect_double_reduction_def,
> > > > > >    vect_nested_cycle,
> > > > > >    vect_first_order_recurrence,
> > > > > > @@ -836,6 +837,10 @@ public:
> > > > > >       we need to peel off iterations at the end to form an epilogue
> loop.
> > > */
> > > > > >    bool peeling_for_niter;
> > > > > >
> > > > > > +  /* When the loop has early breaks that we can vectorize we need
> to
> > > peel
> > > > > > +     the loop for the break finding loop.  */
> > > > > > +  bool early_breaks;
> > > > > > +
> > > > > >    /* True if there are no loop carried data dependencies in the loop.
> > > > > >       If loop->safelen <= 1, then this is always true, either the loop
> > > > > >       didn't have any loop carried data dependencies, or the loop is
> being
> > > > > > @@ -921,6 +926,7 @@ public:
> > > > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)-
> >reduction_chains
> > > > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)-
> > > >peeling_for_gaps
> > > > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)-
> > > >peeling_for_niter
> > > > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > > > >no_data_dependencies
> > > > > >  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
> > > > > >  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)-
> > > > > >scalar_loop_scaling
> > > > > > @@ -970,7 +976,7 @@ public:
> > > > > >  typedef opt_pointer_wrapper <loop_vec_info>
> opt_loop_vec_info;
> > > > > >
> > > > > >  static inline loop_vec_info
> > > > > > -loop_vec_info_for_loop (class loop *loop)
> > > > > > +loop_vec_info_for_loop (const class loop *loop)
> > > > > >  {
> > > > > >    return (loop_vec_info) loop->aux;
> > > > > >  }
> > > > > > @@ -2107,7 +2113,7 @@ class auto_purge_vect_location
> > > > > >     in tree-vect-loop-manip.cc.  */
> > > > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > > > >  				     tree, tree, tree, bool);
> > > > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > > > > const_edge);
> > > > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > > > const_edge);
> > > > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > > >  						     class loop *, edge);
> > > > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > > > > @@ -2306,6 +2312,7 @@ struct vect_loop_form_info
> > > > > >    tree number_of_iterations;
> > > > > >    tree number_of_iterationsm1;
> > > > > >    tree assumptions;
> > > > > > +  vec<gcond *> conds;
> > > > > >    gcond *loop_cond;
> > > > > >    gcond *inner_loop_cond;
> > > > > >  };
> > > > > > @@ -2326,6 +2333,9 @@ extern bool vectorizable_induction
> > > > > (loop_vec_info, stmt_vec_info,
> > > > > >  extern bool vect_transform_reduction (loop_vec_info,
> stmt_vec_info,
> > > > > >  				      gimple_stmt_iterator *,
> > > > > >  				      gimple **, slp_tree);
> > > > > > +extern bool vect_transform_early_break (loop_vec_info,
> > > stmt_vec_info,
> > > > > > +					gimple_stmt_iterator *,
> > > > > > +					gimple **, slp_tree);
> > > > > >  extern bool vect_transform_cycle_phi (loop_vec_info,
> stmt_vec_info,
> > > > > >  				      gimple **,
> > > > > >  				      slp_tree, slp_instance);
> > > > > > @@ -2335,6 +2345,11 @@ extern bool vectorizable_phi (vec_info *,
> > > > > stmt_vec_info, gimple **, slp_tree,
> > > > > >  			      stmt_vector_for_cost *);
> > > > > >  extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
> > > > > >  				  gimple **, slp_tree,
> stmt_vector_for_cost
> > > > > *);
> > > > > > +extern bool vectorizable_early_exit (vec_info *, stmt_vec_info,
> > > > > > +				     slp_tree, slp_instance,
> > > > > > +				     stmt_vector_for_cost *);
> > > > > > +extern tree prepare_vec_mask (loop_vec_info, tree, tree,
> > > > > > +			      tree, gimple_stmt_iterator *);
> > > > > >  extern bool vect_emulated_vector_p (tree);
> > > > > >  extern bool vect_can_vectorize_without_simd_p (tree_code);
> > > > > >  extern bool vect_can_vectorize_without_simd_p (code_helper);
> > > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > > > index
> > > > >
> > >
> 6ec49511d74bd2e0e5dd51823a6c41180f08716c..4aa46c7c0d8235d3b783ce930
> > > > > e5df3480e1b3ef9 100644
> > > > > > --- a/gcc/tree-vectorizer.cc
> > > > > > +++ b/gcc/tree-vectorizer.cc
> > > > > > @@ -1382,7 +1382,9 @@ pass_vectorize::execute (function *fun)
> > > > > >  	 predicates that need to be shared for optimal predicate
> usage.
> > > > > >  	 However reassoc will re-order them and prevent CSE from
> working
> > > > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +      for (edge exit : exits)
> > > > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > >
> > > > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > Nuernberg,
> > > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > > > Moerman;
> > > > > HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > Moerman;
> > > HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> Moerman;
> HRB 36809 (AG Nuernberg)

  reply	other threads:[~2022-11-25 10:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02 14:45 Tamar Christina
2022-11-02 14:46 ` [PATCH 2/2]AArch64 Add implementation for vector cbranch Tamar Christina
2022-11-02 21:50 ` [PATCH 1/2]middle-end: Support early break/return auto-vectorization Bernhard Reutner-Fischer
2022-11-02 22:32   ` Jeff Law
2022-11-03  8:51     ` Tamar Christina
2022-11-08 17:36 ` Tamar Christina
2022-11-15 11:11   ` Tamar Christina
2022-11-16 12:17     ` Richard Biener
2022-11-16 18:52       ` Jeff Law
2022-11-18 15:04 ` Richard Biener
2022-11-18 18:23   ` Tamar Christina
2022-11-19 10:49     ` Tamar Christina
2022-11-24  9:02     ` Richard Biener
2022-11-24 11:56       ` Tamar Christina
2022-11-25  9:33         ` Richard Biener
2022-11-25 10:32           ` Tamar Christina [this message]
2022-12-13 15:01             ` Tamar Christina
2022-12-14  9:41               ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325E9813211094CF6007080FF0E9@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).